decomp.semantics.predpatt.utils¶
Utility functions for PredPatt including linearization, visualization, and Universal Dependencies schema handling.
Utility functions for PredPatt processing and visualization.
This module provides utility functions for linearizing PredPatt structures into flat representations, visualizing dependency trees, and formatting output for display.
Functions¶
- linearize
Convert PredPatt structures to linearized string format.
- linearize_pprint
Pretty-print linearized PredPatt structures.
- construct_pred_from_flat
Reconstruct predicate from linearized format.
- linear_to_string
Convert linearized structure to string representation.
Classes¶
- LinearizedPPOpts
Options for controlling linearization output format.
- class LinearizedPPOpts[source]¶
Bases:
objectOptions for linearization of PredPatt structures.
- Parameters:
recursive (bool, optional) – Whether to recursively linearize embedded predicates (default: True).
distinguish_header (bool, optional) – Whether to distinguish predicate/argument heads with special suffix (default: True).
only_head (bool, optional) – Whether to include only head tokens instead of full phrases (default: False).
- linearize(pp, opt=None, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶
Convert PredPatt output to linearized form.
Here we define the way to represent the predpatt output in a linearized form:
Add a label to each token to indicate that it is a predicate or argument token:
argument_token:a
predicate_token:p
Build the dependency tree among the heads of predicates.
Print the predpatt output in a depth-first manner. At each layer, items are sorted by position. There are following items:
argument_token
predicate_token
predicate that depends on token in this layer
The output of each layer is enclosed by a pair of parentheses:
Special parentheses “(:a predpatt_output ):a” are used for predicates that are dependents of clausal predicate.
Normal parentheses “( predpatt_output )” are used for for predicates that are noun dependents.
- Parameters:
pp (PredPatt) – The PredPatt instance to linearize.
opt (LinearizedPPOpts, optional) – Linearization options (default: LinearizedPPOpts()).
ud (module, optional) – Universal Dependencies module (default: dep_v1).
- Returns:
Linearized representation of the PredPatt structure.
- Return type:
- linearize_pprint(s)¶
Pretty print linearized string with readable brackets.
Submodules¶
decomp.semantics.predpatt.utils.linearization¶
Linearization utilities for PredPatt.
This module provides functions to convert PredPatt structures into a linearized form that represents the predicate-argument relationships in a flat string format. The linearization preserves hierarchical structure using special markers and can be used for serialization, comparison, or display purposes.
- class HasChildren[source]¶
Bases:
HasPositionProtocol for objects that can have children list.
- class LinearizedPPOpts[source]¶
Bases:
objectOptions for linearization of PredPatt structures.
- Parameters:
recursive (bool, optional) – Whether to recursively linearize embedded predicates (default: True).
distinguish_header (bool, optional) – Whether to distinguish predicate/argument heads with special suffix (default: True).
only_head (bool, optional) – Whether to include only head tokens instead of full phrases (default: False).
- is_dep_of_pred(t, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶
Check if token is a dependent of a predicate.
- important_pred_tokens(p, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶
Get important tokens from a predicate (root and negation).
- likely_to_be_pred(pred, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶
Check if a predicate is likely to be a true predicate.
- linearize(pp, opt=None, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶
Convert PredPatt output to linearized form.
Here we define the way to represent the predpatt output in a linearized form:
Add a label to each token to indicate that it is a predicate or argument token:
argument_token:a
predicate_token:p
Build the dependency tree among the heads of predicates.
Print the predpatt output in a depth-first manner. At each layer, items are sorted by position. There are following items:
argument_token
predicate_token
predicate that depends on token in this layer
The output of each layer is enclosed by a pair of parentheses:
Special parentheses “(:a predpatt_output ):a” are used for predicates that are dependents of clausal predicate.
Normal parentheses “( predpatt_output )” are used for for predicates that are noun dependents.
- Parameters:
pp (PredPatt) – The PredPatt instance to linearize.
opt (LinearizedPPOpts, optional) – Linearization options (default: LinearizedPPOpts()).
ud (module, optional) – Universal Dependencies module (default: dep_v1).
- Returns:
Linearized representation of the PredPatt structure.
- Return type:
- flatten_and_enclose_pred(pred, opt, ud)[source]¶
Flatten and enclose a predicate with appropriate markers.
- Parameters:
pred (Predicate) – The predicate to flatten.
opt (LinearizedPPOpts) – Linearization options.
ud (module) – Universal Dependencies module.
- Returns:
Flattened and enclosed predicate string.
- Return type:
- flatten_pred(pred, opt, ud)[source]¶
Flatten a predicate into a string representation.
- Parameters:
pred (Predicate) – The predicate to flatten.
opt (LinearizedPPOpts) – Linearization options.
ud (module) – Universal Dependencies module.
- Returns:
Flattened string and whether it’s a dependent of predicate.
- Return type:
- phrase_and_enclose_arg(arg, opt)[source]¶
Format and enclose an argument with markers.
- Parameters:
arg (Argument) – The argument to format.
opt (LinearizedPPOpts) – Linearization options.
- Returns:
Formatted and enclosed argument string.
- Return type:
- collect_embebdded_tokens(tokens_iter, start_token)[source]¶
Collect tokens within embedded structure markers.
- construct_arg_from_flat(tokens_iter)[source]¶
Construct an argument from flat token iterator.
- Parameters:
tokens_iter (iterator) – Iterator over (index, token) pairs.
- Returns:
Constructed argument.
- Return type:
- argument_names(args)[source]¶
Give arguments alpha-numeric names.
Examples
>>> names = argument_names(range(100)) >>> [names[i] for i in range(0,100,26)] ['?a', '?a1', '?a2', '?a3'] >>> [names[i] for i in range(1,100,26)] ['?b', '?b1', '?b2', '?b3']
decomp.semantics.predpatt.utils.visualization¶
Visualization and output formatting utilities for PredPatt.
This module provides functions for pretty-printing PredPatt extractions, including support for colored output, rule tracking, and various output formats.
Functions¶
- no_color
Pass-through function for plain text output without colors.
- argument_names
Generate unique names for predicate arguments.
- format_predicate
Format a predicate with argument placeholders.
- format_predicate_instance
Format a complete predicate-argument structure.
- pprint
Pretty-print all extracted predicates from PredPatt.
- pprint_ud_parse
Pretty-print dependency parse in tabular format.
Notes
This module supports both colored (via termcolor) and plain text output. Colored output is optional and degrades gracefully if termcolor is not installed.
See also
decomp.semantics.predpatt.extraction.engineMain extraction engine
decomp.semantics.predpatt.coreCore classes for predicates and arguments
- colored(text, color=None, on_color=None, attrs=None)[source]¶
Wrap termcolor.colored with consistent signature.
- Return type:
- argument_names(args)[source]¶
Give arguments alpha-numeric names.
Arguments are named using lowercase letters with optional numeric suffixes when there are more than 26 arguments.
- Parameters:
- Returns:
Mapping from arguments to their names (e.g., ?a, ?b, ?c, ?a1, ?b1, etc.)
- Return type:
Examples
>>> names = argument_names(list(range(100))) >>> [names[i] for i in range(0, 100, 26)] ['?a', '?a1', '?a2', '?a3'] >>> [names[i] for i in range(1, 100, 26)] ['?b', '?b1', '?b2', '?b3']
- format_predicate(predicate, name, c=<function no_color>)[source]¶
Format a predicate with its arguments interpolated.
- format_predicate_instance(predicate, track_rule=False, c=<function no_color>, indent='\\t')[source]¶
Format a single predicate instance with its arguments.
- Parameters:
- Returns:
Formatted predicate instance with arguments listed below
- Return type:
- pprint(predpatt, color=False, track_rule=False)[source]¶
Pretty-print extracted predicate-argument tuples.
decomp.semantics.predpatt.utils.ud_schema¶
Universal Dependencies schema definitions for PredPatt.
This module provides POS tags and dependency relation definitions for both UD v1.0 and v2.0, supporting version-specific processing in the PredPatt semantic extraction system.
The dependency relation classes define core syntactic relations (subject, object, modifiers) and relation sets used by PredPatt for pattern matching during predicate-argument extraction.
Classes¶
- POSTag
Universal Dependencies part-of-speech tags.
- DependencyRelationsBase
Abstract base class for dependency relations.
- DependencyRelationsV1
UD v1.0 dependency relation definitions.
- DependencyRelationsV2
UD v2.0 dependency relation definitions.
Functions¶
- get_dependency_relations
Helper to get relations for a specific version.
Constants¶
- postag
Alias for POSTag class.
- dep_v1
Instance of DependencyRelationsV1.
- dep_v2
Instance of DependencyRelationsV2.
- class POSTag[source]¶
Bases:
objectUniversal Dependencies part-of-speech tags.
Reference: http://universaldependencies.org/u/pos/index.html
- class DependencyRelationsBase[source]¶
Bases:
ABCBase class for Universal Dependencies relation definitions.
- class DependencyRelationsV1[source]¶
Bases:
DependencyRelationsBaseUniversal Dependencies v1.0 relation definitions.
-
ARG_LIKE:
ClassVar[set[str]] = {'csubj', 'csubjpass', 'dobj', 'iobj', 'nmod', 'nmod:npmod', 'nmod:tmod', 'nsubj'}¶
-
PRED_DEPS_TO_DROP:
ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'appos', 'ccomp', 'csubj', 'dep', 'nmod:tmod', 'parataxis'}¶
-
ARG_LIKE:
- class DependencyRelationsV2[source]¶
Bases:
DependencyRelationsBaseUniversal Dependencies v2.0 relation definitions.
-
ARG_LIKE:
ClassVar[set[str]] = {'csubj', 'csubj:pass', 'iobj', 'nmod', 'nmod:npmod', 'nmod:tmod', 'nsubj', 'obj', 'obl'}¶
-
PRED_DEPS_TO_DROP:
ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'appos', 'ccomp', 'csubj', 'dep', 'nmod:tmod', 'parataxis'}¶
-
ARG_LIKE:
- dep_v1¶
alias of
DependencyRelationsV1
- dep_v2¶
alias of
DependencyRelationsV2
- get_dependency_relations(version='2.0')[source]¶
Get dependency relations for a specific UD version.
- Parameters:
version (str, optional) – The UD version (“1.0” or “2.0”), by default “2.0”
- Returns:
The dependency relations class for the specified version
- Return type:
- Raises:
ValueError – If an unsupported version is specified