decomp.semantics.predpatt.utils¶

Utility functions for PredPatt including linearization, visualization, and Universal Dependencies schema handling.

Utility functions for PredPatt processing and visualization.

This module provides utility functions for linearizing PredPatt structures into flat representations, visualizing dependency trees, and formatting output for display.

Functions¶

linearize: Convert PredPatt structures to linearized string format.
linearize_pprint: Pretty-print linearized PredPatt structures.
construct_pred_from_flat: Reconstruct predicate from linearized format.
linear_to_string: Convert linearized structure to string representation.

Classes¶

LinearizedPPOpts: Options for controlling linearization output format.

class LinearizedPPOpts[source]¶

Bases: object

Options for linearization of PredPatt structures.

Parameters:

recursive (bool, optional) – Whether to recursively linearize embedded predicates (default: True).
distinguish_header (bool, optional) – Whether to distinguish predicate/argument heads with special suffix (default: True).
only_head (bool, optional) – Whether to include only head tokens instead of full phrases (default: False).

__init__(recursive=True, distinguish_header=True, only_head=False)[source]¶

construct_pred_from_flat(tokens)[source]¶

Construct predicates from flat token list.

Parameters:: tokens (list[str]) – List of tokens to parse.
Returns:: List of constructed predicates.
Return type:: list[Predicate]

linear_to_string(tokens)[source]¶

Convert linearized tokens back to plain text.

Parameters:: tokens (list[str]) – List of linearized tokens.
Returns:: List of plain text tokens.
Return type:: list[str]

linearize(pp, opt=None, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶

Convert PredPatt output to linearized form.

Here we define the way to represent the predpatt output in a linearized form:

Add a label to each token to indicate that it is a predicate or argument token:
- argument_token:a
- predicate_token:p
Build the dependency tree among the heads of predicates.
Print the predpatt output in a depth-first manner. At each layer, items are sorted by position. There are following items:
- argument_token
- predicate_token
- predicate that depends on token in this layer
The output of each layer is enclosed by a pair of parentheses:
- Special parentheses “(:a predpatt_output ):a” are used for predicates that are dependents of clausal predicate.
- Normal parentheses “( predpatt_output )” are used for for predicates that are noun dependents.

Parameters:

pp (PredPatt) – The PredPatt instance to linearize.
opt (LinearizedPPOpts, optional) – Linearization options (default: LinearizedPPOpts()).
ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

Linearized representation of the PredPatt structure.

Return type:

str

linearize_pprint(s)¶

Pretty print linearized string with readable brackets.

Parameters:: s (str) – Linearized string to pretty print.
Returns:: Pretty printed string with brackets.
Return type:: str

Submodules¶

decomp.semantics.predpatt.utils.linearization¶

Linearization utilities for PredPatt.

This module provides functions to convert PredPatt structures into a linearized form that represents the predicate-argument relationships in a flat string format. The linearization preserves hierarchical structure using special markers and can be used for serialization, comparison, or display purposes.

class HasChildren[source]¶

Bases: HasPosition

Protocol for objects that can have children list.

children: list[Predicate]¶

class LinearizedPPOpts[source]¶

Bases: object

Options for linearization of PredPatt structures.

Parameters:

recursive (bool, optional) – Whether to recursively linearize embedded predicates (default: True).
distinguish_header (bool, optional) – Whether to distinguish predicate/argument heads with special suffix (default: True).
only_head (bool, optional) – Whether to include only head tokens instead of full phrases (default: False).

__init__(recursive=True, distinguish_header=True, only_head=False)[source]¶

sort_by_position(x)[source]¶

Sort items by their position attribute.

Parameters:: x (list[Any]) – List of items with position attribute.
Returns:: Sorted list by position.
Return type:: list[Any]

is_dep_of_pred(t, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶

Check if token is a dependent of a predicate.

Parameters:

t (Token) – Token to check.
ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

True if token is predicate dependent, None otherwise.

Return type:

bool | None

important_pred_tokens(p, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶

Get important tokens from a predicate (root and negation).

Parameters:

p (Predicate) – The predicate to extract tokens from.
ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

List of important tokens sorted by position.

Return type:

list[Token]

likely_to_be_pred(pred, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶

Check if a predicate is likely to be a true predicate.

Parameters:

pred (Predicate) – The predicate to check.
ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

True if likely to be predicate, None otherwise.

Return type:

bool | None

build_pred_dep(pp)[source]¶

Build dependencies between predicates.

Parameters:: pp (PredPatt) – The PredPatt instance containing predicates.
Returns:: List of root predicates sorted by position.
Return type:: list[Predicate]

get_prediates(pp, only_head=False)[source]¶

Get predicates as formatted strings.

Parameters:

pp (PredPatt) – The PredPatt instance.
only_head (bool, optional) – Whether to return only head tokens (default: False).

Returns:

List of formatted predicate strings.

Return type:

list[str]

linearize(pp, opt=None, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶

Convert PredPatt output to linearized form.

Here we define the way to represent the predpatt output in a linearized form:

Add a label to each token to indicate that it is a predicate or argument token:
- argument_token:a
- predicate_token:p
Build the dependency tree among the heads of predicates.
Print the predpatt output in a depth-first manner. At each layer, items are sorted by position. There are following items:
- argument_token
- predicate_token
- predicate that depends on token in this layer
The output of each layer is enclosed by a pair of parentheses:
- Special parentheses “(:a predpatt_output ):a” are used for predicates that are dependents of clausal predicate.
- Normal parentheses “( predpatt_output )” are used for for predicates that are noun dependents.

Parameters:

pp (PredPatt) – The PredPatt instance to linearize.
opt (LinearizedPPOpts, optional) – Linearization options (default: LinearizedPPOpts()).
ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

Linearized representation of the PredPatt structure.

Return type:

str

flatten_and_enclose_pred(pred, opt, ud)[source]¶

Flatten and enclose a predicate with appropriate markers.

Parameters:

pred (Predicate) – The predicate to flatten.
opt (LinearizedPPOpts) – Linearization options.
ud (module) – Universal Dependencies module.

Returns:

Flattened and enclosed predicate string.

Return type:

str

flatten_pred(pred, opt, ud)[source]¶

Flatten a predicate into a string representation.

Parameters:

pred (Predicate) – The predicate to flatten.
opt (LinearizedPPOpts) – Linearization options.
ud (module) – Universal Dependencies module.

Returns:

Flattened string and whether it’s a dependent of predicate.

Return type:

tuple[str, bool | None]

phrase_and_enclose_arg(arg, opt)[source]¶

Format and enclose an argument with markers.

Parameters:

arg (Argument) – The argument to format.
opt (LinearizedPPOpts) – Linearization options.

Returns:

Formatted and enclosed argument string.

Return type:

str

collect_embebdded_tokens(tokens_iter, start_token)[source]¶

Collect tokens within embedded structure markers.

Parameters:

tokens_iter (iterator) – Iterator over (index, token) pairs.
start_token (str) – The starting token marker.

Returns:

List of embedded tokens.

Return type:

list[str]

linear_to_string(tokens)[source]¶

Convert linearized tokens back to plain text.

Parameters:: tokens (list[str]) – List of linearized tokens.
Returns:: List of plain text tokens.
Return type:: list[str]

get_something(something_idx, tokens_iter)[source]¶

Get SOMETHING argument from token iterator.

Parameters:

something_idx (int) – Index of SOMETHING token.
tokens_iter (iterator) – Iterator over (index, token) pairs.

Returns:

The SOMETHING argument.

Return type:

Argument

is_argument_finished(t, current_argument)[source]¶

Check if argument construction is finished.

Parameters:

t (str) – Current token.
current_argument (Argument) – Argument being constructed.

Returns:

True if argument is finished.

Return type:

bool

construct_arg_from_flat(tokens_iter)[source]¶

Construct an argument from flat token iterator.

Parameters:: tokens_iter (iterator) – Iterator over (index, token) pairs.
Returns:: Constructed argument.
Return type:: Argument

construct_pred_from_flat(tokens)[source]¶

Construct predicates from flat token list.

Parameters:: tokens (list[str]) – List of tokens to parse.
Returns:: List of constructed predicates.
Return type:: list[Predicate]

check_recoverability(tokens)[source]¶

Check if linearized tokens can be recovered to predicates.

Parameters:: tokens (list[str]) – List of tokens to check.
Returns:: Whether tokens are recoverable and the token list.
Return type:: tuple[bool, list[str]]

pprint_preds(preds)[source]¶

Pretty print list of predicates.

Parameters:: preds (list[Predicate]) – List of predicates to format.
Returns:: List of formatted predicate strings.
Return type:: list[str]

argument_names(args)[source]¶

Give arguments alpha-numeric names.

Examples

>>> names = argument_names(range(100))
>>> [names[i] for i in range(0,100,26)]
['?a', '?a1', '?a2', '?a3']
>>> [names[i] for i in range(1,100,26)]
['?b', '?b1', '?b2', '?b3']

Parameters:: args (list[Any]) – List of arguments to name.
Returns:: Mapping from argument to its name.
Return type:: dict[Any, str]

format_pred(pred, indent='\\t')[source]¶

Format a predicate for display.

Parameters:

pred (Predicate) – The predicate to format.
indent (str, optional) – Indentation string (default: “t”).

Returns:

Formatted predicate string.

Return type:

str

pprint(s)[source]¶

Pretty print linearized string with readable brackets.

Parameters:: s (str) – Linearized string to pretty print.
Returns:: Pretty printed string with brackets.
Return type:: str

test(data)[source]¶

Test linearization functionality.

Parameters:: data (str) – Path to test data file.
Return type:: None

decomp.semantics.predpatt.utils.visualization¶

Visualization and output formatting utilities for PredPatt.

This module provides functions for pretty-printing PredPatt extractions, including support for colored output, rule tracking, and various output formats.

Functions¶

no_color: Pass-through function for plain text output without colors.
argument_names: Generate unique names for predicate arguments.
format_predicate: Format a predicate with argument placeholders.
format_predicate_instance: Format a complete predicate-argument structure.
pprint: Pretty-print all extracted predicates from PredPatt.
pprint_ud_parse: Pretty-print dependency parse in tabular format.

Notes

This module supports both colored (via termcolor) and plain text output. Colored output is optional and degrades gracefully if termcolor is not installed.

decomp.semantics.predpatt.utils.ud_schema¶

Universal Dependencies schema definitions for PredPatt.

This module provides POS tags and dependency relation definitions for both UD v1.0 and v2.0, supporting version-specific processing in the PredPatt semantic extraction system.

The dependency relation classes define core syntactic relations (subject, object, modifiers) and relation sets used by PredPatt for pattern matching during predicate-argument extraction.

Classes¶

POSTag: Universal Dependencies part-of-speech tags.
DependencyRelationsBase: Abstract base class for dependency relations.
DependencyRelationsV1: UD v1.0 dependency relation definitions.
DependencyRelationsV2: UD v2.0 dependency relation definitions.

Functions¶

get_dependency_relations: Helper to get relations for a specific version.

Constants¶

postag: Alias for POSTag class.
dep_v1: Instance of DependencyRelationsV1.
dep_v2: Instance of DependencyRelationsV2.

class POSTag[source]¶

Bases: object

Universal Dependencies part-of-speech tags.

Reference: http://universaldependencies.org/u/pos/index.html

ADJ: ClassVar[str] = 'ADJ'¶

ADV: ClassVar[str] = 'ADV'¶

INTJ: ClassVar[str] = 'INTJ'¶

NOUN: ClassVar[str] = 'NOUN'¶

PROPN: ClassVar[str] = 'PROPN'¶

VERB: ClassVar[str] = 'VERB'¶

ADP: ClassVar[str] = 'ADP'¶

AUX: ClassVar[str] = 'AUX'¶

CCONJ: ClassVar[str] = 'CCONJ'¶

DET: ClassVar[str] = 'DET'¶

NUM: ClassVar[str] = 'NUM'¶

PART: ClassVar[str] = 'PART'¶

PRON: ClassVar[str] = 'PRON'¶

SCONJ: ClassVar[str] = 'SCONJ'¶

PUNCT: ClassVar[str] = 'PUNCT'¶

SYM: ClassVar[str] = 'SYM'¶

X: ClassVar[str] = 'X'¶

class DependencyRelationsBase[source]¶

Bases: ABC

Base class for Universal Dependencies relation definitions.

VERSION: ClassVar[str]¶

abstract property nsubj: str¶: Nominal subject relation.

abstract property nsubjpass: str¶: Passive nominal subject relation.

abstract property dobj: str¶: Direct object relation.

abstract property auxpass: str¶: Passive auxiliary relation.

abstract property subj: set[str]¶: All subject relations.

abstract property obj: set[str]¶: All object relations.

class DependencyRelationsV1[source]¶

Bases: DependencyRelationsBase

Universal Dependencies v1.0 relation definitions.

VERSION: ClassVar[str] = '1.0'¶

nsubj: ClassVar[str] = 'nsubj'¶

nsubjpass: ClassVar[str] = 'nsubjpass'¶

csubj: ClassVar[str] = 'csubj'¶

csubjpass: ClassVar[str] = 'csubjpass'¶

dobj: ClassVar[str] = 'dobj'¶

iobj: ClassVar[str] = 'iobj'¶

cop: ClassVar[str] = 'cop'¶

aux: ClassVar[str] = 'aux'¶

auxpass: ClassVar[str] = 'auxpass'¶

neg: ClassVar[str] = 'neg'¶

amod: ClassVar[str] = 'amod'¶

advmod: ClassVar[str] = 'advmod'¶

nmod: ClassVar[str] = 'nmod'¶

nmod_poss: ClassVar[str] = 'nmod:poss'¶

nmod_tmod: ClassVar[str] = 'nmod:tmod'¶

nmod_npmod: ClassVar[str] = 'nmod:npmod'¶

obl: ClassVar[str] = 'nmod'¶

obl_npmod: ClassVar[str] = 'nmod:npmod'¶

appos: ClassVar[str] = 'appos'¶

cc: ClassVar[str] = 'cc'¶

conj: ClassVar[str] = 'conj'¶

cc_preconj: ClassVar[str] = 'cc:preconj'¶

mark: ClassVar[str] = 'mark'¶

case: ClassVar[str] = 'case'¶

mwe: ClassVar[str] = 'fixed'¶

parataxis: ClassVar[str] = 'parataxis'¶

punct: ClassVar[str] = 'punct'¶

ccomp: ClassVar[str] = 'ccomp'¶

xcomp: ClassVar[str] = 'xcomp'¶

advcl: ClassVar[str] = 'advcl'¶

acl: ClassVar[str] = 'acl'¶

aclrelcl: ClassVar[str] = 'acl:relcl'¶

dep: ClassVar[str] = 'dep'¶

SUBJ: ClassVar[set[str]] = {'csubj', 'csubjpass', 'nsubj', 'nsubjpass'}¶

OBJ: ClassVar[set[str]] = {'dobj', 'iobj'}¶

NMODS: ClassVar[set[str]] = {'nmod', 'nmod:npmod', 'nmod:tmod'}¶

ADJ_LIKE_MODS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'amod', 'appos'}¶

ARG_LIKE: ClassVar[set[str]] = {'csubj', 'csubjpass', 'dobj', 'iobj', 'nmod', 'nmod:npmod', 'nmod:tmod', 'nsubj'}¶

TRIVIALS: ClassVar[set[str]] = {'cc', 'mark', 'punct'}¶

PRED_DEPS_TO_DROP: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'appos', 'ccomp', 'csubj', 'dep', 'nmod:tmod', 'parataxis'}¶

SPECIAL_ARG_DEPS_TO_DROP: ClassVar[set[str]] = {'advcl', 'aux', 'auxpass', 'ccomp', 'cop', 'csubj', 'csubjpass', 'dobj', 'fixed', 'iobj', 'mark', 'neg', 'nsubj', 'parataxis'}¶

HARD_TO_FIND_ARGS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'amod', 'conj', 'dep'}¶

property subj: set[str]¶: All subject relations.

property obj: set[str]¶: All object relations.

class DependencyRelationsV2[source]¶

Bases: DependencyRelationsBase

Universal Dependencies v2.0 relation definitions.

VERSION: ClassVar[str] = '2.0'¶

nsubj: ClassVar[str] = 'nsubj'¶

nsubjpass: ClassVar[str] = 'nsubj:pass'¶

csubj: ClassVar[str] = 'csubj'¶

csubjpass: ClassVar[str] = 'csubj:pass'¶

dobj: ClassVar[str] = 'obj'¶

iobj: ClassVar[str] = 'iobj'¶

aux: ClassVar[str] = 'aux'¶

auxpass: ClassVar[str] = 'aux:pass'¶

neg: ClassVar[str] = 'neg'¶

cop: ClassVar[str] = 'cop'¶

amod: ClassVar[str] = 'amod'¶

advmod: ClassVar[str] = 'advmod'¶

nmod: ClassVar[str] = 'nmod'¶

nmod_poss: ClassVar[str] = 'nmod:poss'¶

nmod_tmod: ClassVar[str] = 'nmod:tmod'¶

nmod_npmod: ClassVar[str] = 'nmod:npmod'¶

obl: ClassVar[str] = 'obl'¶

obl_npmod: ClassVar[str] = 'obl:npmod'¶

appos: ClassVar[str] = 'appos'¶

cc: ClassVar[str] = 'cc'¶

conj: ClassVar[str] = 'conj'¶

cc_preconj: ClassVar[str] = 'cc:preconj'¶

mark: ClassVar[str] = 'mark'¶

case: ClassVar[str] = 'case'¶

mwe: ClassVar[str] = 'fixed'¶

parataxis: ClassVar[str] = 'parataxis'¶

punct: ClassVar[str] = 'punct'¶

ccomp: ClassVar[str] = 'ccomp'¶

xcomp: ClassVar[str] = 'xcomp'¶

advcl: ClassVar[str] = 'advcl'¶

acl: ClassVar[str] = 'acl'¶

aclrelcl: ClassVar[str] = 'acl:relcl'¶

dep: ClassVar[str] = 'dep'¶

SUBJ: ClassVar[set[str]] = {'csubj', 'csubj:pass', 'nsubj', 'nsubj:pass'}¶

OBJ: ClassVar[set[str]] = {'iobj', 'obj'}¶

NMODS: ClassVar[set[str]] = {'nmod', 'nmod:npmod', 'nmod:tmod', 'obl'}¶

ADJ_LIKE_MODS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'amod', 'appos'}¶

ARG_LIKE: ClassVar[set[str]] = {'csubj', 'csubj:pass', 'iobj', 'nmod', 'nmod:npmod', 'nmod:tmod', 'nsubj', 'obj', 'obl'}¶

TRIVIALS: ClassVar[set[str]] = {'cc', 'mark', 'punct'}¶

PRED_DEPS_TO_DROP: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'appos', 'ccomp', 'csubj', 'dep', 'nmod:tmod', 'parataxis'}¶

SPECIAL_ARG_DEPS_TO_DROP: ClassVar[set[str]] = {'advcl', 'aux', 'aux:pass', 'ccomp', 'cop', 'csubj', 'csubj:pass', 'fixed', 'iobj', 'mark', 'neg', 'nsubj', 'obj', 'parataxis'}¶

HARD_TO_FIND_ARGS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'amod', 'conj', 'dep'}¶

property subj: set[str]¶: All subject relations.

property obj: set[str]¶: All object relations.

postag¶: alias of POSTag

dep_v1¶: alias of DependencyRelationsV1

dep_v2¶: alias of DependencyRelationsV2

get_dependency_relations(version='2.0')[source]¶

Get dependency relations for a specific UD version.

Parameters:: version (str, optional) – The UD version (“1.0” or “2.0”), by default “2.0”
Returns:: The dependency relations class for the specified version
Return type:: type[DependencyRelationsBase]
Raises:: ValueError – If an unsupported version is specified