decomp.semantics.predpatt.utils

Utility functions for PredPatt including linearization, visualization, and Universal Dependencies schema handling.

Utility functions for PredPatt processing and visualization.

This module provides utility functions for linearizing PredPatt structures into flat representations, visualizing dependency trees, and formatting output for display.

Functions

linearize

Convert PredPatt structures to linearized string format.

linearize_pprint

Pretty-print linearized PredPatt structures.

construct_pred_from_flat

Reconstruct predicate from linearized format.

linear_to_string

Convert linearized structure to string representation.

Classes

LinearizedPPOpts

Options for controlling linearization output format.

class LinearizedPPOpts[source]

Bases: object

Options for linearization of PredPatt structures.

Parameters:
  • recursive (bool, optional) – Whether to recursively linearize embedded predicates (default: True).

  • distinguish_header (bool, optional) – Whether to distinguish predicate/argument heads with special suffix (default: True).

  • only_head (bool, optional) – Whether to include only head tokens instead of full phrases (default: False).

__init__(recursive=True, distinguish_header=True, only_head=False)[source]
construct_pred_from_flat(tokens)[source]

Construct predicates from flat token list.

Parameters:

tokens (list[str]) – List of tokens to parse.

Returns:

List of constructed predicates.

Return type:

list[Predicate]

linear_to_string(tokens)[source]

Convert linearized tokens back to plain text.

Parameters:

tokens (list[str]) – List of linearized tokens.

Returns:

List of plain text tokens.

Return type:

list[str]

linearize(pp, opt=None, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Convert PredPatt output to linearized form.

Here we define the way to represent the predpatt output in a linearized form:

  1. Add a label to each token to indicate that it is a predicate or argument token:

    • argument_token:a

    • predicate_token:p

  2. Build the dependency tree among the heads of predicates.

  3. Print the predpatt output in a depth-first manner. At each layer, items are sorted by position. There are following items:

    • argument_token

    • predicate_token

    • predicate that depends on token in this layer

  4. The output of each layer is enclosed by a pair of parentheses:

    • Special parentheses “(:a predpatt_output ):a” are used for predicates that are dependents of clausal predicate.

    • Normal parentheses “( predpatt_output )” are used for for predicates that are noun dependents.

Parameters:
  • pp (PredPatt) – The PredPatt instance to linearize.

  • opt (LinearizedPPOpts, optional) – Linearization options (default: LinearizedPPOpts()).

  • ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

Linearized representation of the PredPatt structure.

Return type:

str

linearize_pprint(s)

Pretty print linearized string with readable brackets.

Parameters:

s (str) – Linearized string to pretty print.

Returns:

Pretty printed string with brackets.

Return type:

str

Submodules

decomp.semantics.predpatt.utils.linearization

Linearization utilities for PredPatt.

This module provides functions to convert PredPatt structures into a linearized form that represents the predicate-argument relationships in a flat string format. The linearization preserves hierarchical structure using special markers and can be used for serialization, comparison, or display purposes.

class HasChildren[source]

Bases: HasPosition

Protocol for objects that can have children list.

children: list[Predicate]
class LinearizedPPOpts[source]

Bases: object

Options for linearization of PredPatt structures.

Parameters:
  • recursive (bool, optional) – Whether to recursively linearize embedded predicates (default: True).

  • distinguish_header (bool, optional) – Whether to distinguish predicate/argument heads with special suffix (default: True).

  • only_head (bool, optional) – Whether to include only head tokens instead of full phrases (default: False).

__init__(recursive=True, distinguish_header=True, only_head=False)[source]
sort_by_position(x)[source]

Sort items by their position attribute.

Parameters:

x (list[Any]) – List of items with position attribute.

Returns:

Sorted list by position.

Return type:

list[Any]

is_dep_of_pred(t, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Check if token is a dependent of a predicate.

Parameters:
  • t (Token) – Token to check.

  • ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

True if token is predicate dependent, None otherwise.

Return type:

bool | None

important_pred_tokens(p, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Get important tokens from a predicate (root and negation).

Parameters:
  • p (Predicate) – The predicate to extract tokens from.

  • ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

List of important tokens sorted by position.

Return type:

list[Token]

likely_to_be_pred(pred, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Check if a predicate is likely to be a true predicate.

Parameters:
  • pred (Predicate) – The predicate to check.

  • ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

True if likely to be predicate, None otherwise.

Return type:

bool | None

build_pred_dep(pp)[source]

Build dependencies between predicates.

Parameters:

pp (PredPatt) – The PredPatt instance containing predicates.

Returns:

List of root predicates sorted by position.

Return type:

list[Predicate]

get_prediates(pp, only_head=False)[source]

Get predicates as formatted strings.

Parameters:
  • pp (PredPatt) – The PredPatt instance.

  • only_head (bool, optional) – Whether to return only head tokens (default: False).

Returns:

List of formatted predicate strings.

Return type:

list[str]

linearize(pp, opt=None, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Convert PredPatt output to linearized form.

Here we define the way to represent the predpatt output in a linearized form:

  1. Add a label to each token to indicate that it is a predicate or argument token:

    • argument_token:a

    • predicate_token:p

  2. Build the dependency tree among the heads of predicates.

  3. Print the predpatt output in a depth-first manner. At each layer, items are sorted by position. There are following items:

    • argument_token

    • predicate_token

    • predicate that depends on token in this layer

  4. The output of each layer is enclosed by a pair of parentheses:

    • Special parentheses “(:a predpatt_output ):a” are used for predicates that are dependents of clausal predicate.

    • Normal parentheses “( predpatt_output )” are used for for predicates that are noun dependents.

Parameters:
  • pp (PredPatt) – The PredPatt instance to linearize.

  • opt (LinearizedPPOpts, optional) – Linearization options (default: LinearizedPPOpts()).

  • ud (module, optional) – Universal Dependencies module (default: dep_v1).

Returns:

Linearized representation of the PredPatt structure.

Return type:

str

flatten_and_enclose_pred(pred, opt, ud)[source]

Flatten and enclose a predicate with appropriate markers.

Parameters:
  • pred (Predicate) – The predicate to flatten.

  • opt (LinearizedPPOpts) – Linearization options.

  • ud (module) – Universal Dependencies module.

Returns:

Flattened and enclosed predicate string.

Return type:

str

flatten_pred(pred, opt, ud)[source]

Flatten a predicate into a string representation.

Parameters:
  • pred (Predicate) – The predicate to flatten.

  • opt (LinearizedPPOpts) – Linearization options.

  • ud (module) – Universal Dependencies module.

Returns:

Flattened string and whether it’s a dependent of predicate.

Return type:

tuple[str, bool | None]

phrase_and_enclose_arg(arg, opt)[source]

Format and enclose an argument with markers.

Parameters:
Returns:

Formatted and enclosed argument string.

Return type:

str

collect_embebdded_tokens(tokens_iter, start_token)[source]

Collect tokens within embedded structure markers.

Parameters:
  • tokens_iter (iterator) – Iterator over (index, token) pairs.

  • start_token (str) – The starting token marker.

Returns:

List of embedded tokens.

Return type:

list[str]

linear_to_string(tokens)[source]

Convert linearized tokens back to plain text.

Parameters:

tokens (list[str]) – List of linearized tokens.

Returns:

List of plain text tokens.

Return type:

list[str]

get_something(something_idx, tokens_iter)[source]

Get SOMETHING argument from token iterator.

Parameters:
  • something_idx (int) – Index of SOMETHING token.

  • tokens_iter (iterator) – Iterator over (index, token) pairs.

Returns:

The SOMETHING argument.

Return type:

Argument

is_argument_finished(t, current_argument)[source]

Check if argument construction is finished.

Parameters:
  • t (str) – Current token.

  • current_argument (Argument) – Argument being constructed.

Returns:

True if argument is finished.

Return type:

bool

construct_arg_from_flat(tokens_iter)[source]

Construct an argument from flat token iterator.

Parameters:

tokens_iter (iterator) – Iterator over (index, token) pairs.

Returns:

Constructed argument.

Return type:

Argument

construct_pred_from_flat(tokens)[source]

Construct predicates from flat token list.

Parameters:

tokens (list[str]) – List of tokens to parse.

Returns:

List of constructed predicates.

Return type:

list[Predicate]

check_recoverability(tokens)[source]

Check if linearized tokens can be recovered to predicates.

Parameters:

tokens (list[str]) – List of tokens to check.

Returns:

Whether tokens are recoverable and the token list.

Return type:

tuple[bool, list[str]]

pprint_preds(preds)[source]

Pretty print list of predicates.

Parameters:

preds (list[Predicate]) – List of predicates to format.

Returns:

List of formatted predicate strings.

Return type:

list[str]

argument_names(args)[source]

Give arguments alpha-numeric names.

Examples

>>> names = argument_names(range(100))
>>> [names[i] for i in range(0,100,26)]
['?a', '?a1', '?a2', '?a3']
>>> [names[i] for i in range(1,100,26)]
['?b', '?b1', '?b2', '?b3']
Parameters:

args (list[Any]) – List of arguments to name.

Returns:

Mapping from argument to its name.

Return type:

dict[Any, str]

format_pred(pred, indent='\\t')[source]

Format a predicate for display.

Parameters:
  • pred (Predicate) – The predicate to format.

  • indent (str, optional) – Indentation string (default: “t”).

Returns:

Formatted predicate string.

Return type:

str

pprint(s)[source]

Pretty print linearized string with readable brackets.

Parameters:

s (str) – Linearized string to pretty print.

Returns:

Pretty printed string with brackets.

Return type:

str

test(data)[source]

Test linearization functionality.

Parameters:

data (str) – Path to test data file.

Return type:

None

decomp.semantics.predpatt.utils.visualization

Visualization and output formatting utilities for PredPatt.

This module provides functions for pretty-printing PredPatt extractions, including support for colored output, rule tracking, and various output formats.

Functions

no_color

Pass-through function for plain text output without colors.

argument_names

Generate unique names for predicate arguments.

format_predicate

Format a predicate with argument placeholders.

format_predicate_instance

Format a complete predicate-argument structure.

pprint

Pretty-print all extracted predicates from PredPatt.

pprint_ud_parse

Pretty-print dependency parse in tabular format.

Notes

This module supports both colored (via termcolor) and plain text output. Colored output is optional and degrades gracefully if termcolor is not installed.

See also

decomp.semantics.predpatt.extraction.engine

Main extraction engine

decomp.semantics.predpatt.core

Core classes for predicates and arguments

colored(text, color=None, on_color=None, attrs=None)[source]

Wrap termcolor.colored with consistent signature.

Return type:

str

no_color(x, _)[source]

No-color function for plain text output.

Return type:

str

argument_names(args)[source]

Give arguments alpha-numeric names.

Arguments are named using lowercase letters with optional numeric suffixes when there are more than 26 arguments.

Parameters:

args (list[Argument]) – List of arguments to name

Returns:

Mapping from arguments to their names (e.g., ?a, ?b, ?c, ?a1, ?b1, etc.)

Return type:

dict[Argument, str]

Examples

>>> names = argument_names(list(range(100)))
>>> [names[i] for i in range(0, 100, 26)]
['?a', '?a1', '?a2', '?a3']
>>> [names[i] for i in range(1, 100, 26)]
['?b', '?b1', '?b2', '?b3']
format_predicate(predicate, name, c=<function no_color>)[source]

Format a predicate with its arguments interpolated.

Parameters:
  • predicate (Predicate) – The predicate to format

  • name (dict[Argument, str]) – Mapping from arguments to their names

  • c (Callable[[str, str], str], optional) – Color function for special predicate types

Returns:

Formatted predicate string with argument placeholders

Return type:

str

format_predicate_instance(predicate, track_rule=False, c=<function no_color>, indent='\\t')[source]

Format a single predicate instance with its arguments.

Parameters:
  • predicate (Predicate) – The predicate instance to format

  • track_rule (bool, optional) – Whether to include rule tracking information

  • c (Callable[[str, str], str], optional) – Color function for output

  • indent (str, optional) – Indentation string for formatting

Returns:

Formatted predicate instance with arguments listed below

Return type:

str

pprint(predpatt, color=False, track_rule=False)[source]

Pretty-print extracted predicate-argument tuples.

Parameters:
  • predpatt (PredPatt) – The PredPatt instance containing extracted predicates

  • color (bool, optional) – Whether to use colored output

  • track_rule (bool, optional) – Whether to include rule tracking information

Returns:

Formatted string representation of all predicates

Return type:

str

pprint_ud_parse(parse, color=False, k=1)[source]

Pretty-print list of dependencies from a UDParse instance.

Parameters:
  • parse (UDParse) – The dependency parse to visualize

  • color (bool, optional) – Whether to use colored output

  • k (int, optional) – Number of columns for output

Returns:

Formatted dependency relations in tabular format

Return type:

str

decomp.semantics.predpatt.utils.ud_schema

Universal Dependencies schema definitions for PredPatt.

This module provides POS tags and dependency relation definitions for both UD v1.0 and v2.0, supporting version-specific processing in the PredPatt semantic extraction system.

The dependency relation classes define core syntactic relations (subject, object, modifiers) and relation sets used by PredPatt for pattern matching during predicate-argument extraction.

Classes

POSTag

Universal Dependencies part-of-speech tags.

DependencyRelationsBase

Abstract base class for dependency relations.

DependencyRelationsV1

UD v1.0 dependency relation definitions.

DependencyRelationsV2

UD v2.0 dependency relation definitions.

Functions

get_dependency_relations

Helper to get relations for a specific version.

Constants

postag

Alias for POSTag class.

dep_v1

Instance of DependencyRelationsV1.

dep_v2

Instance of DependencyRelationsV2.

class POSTag[source]

Bases: object

Universal Dependencies part-of-speech tags.

Reference: http://universaldependencies.org/u/pos/index.html

ADJ: ClassVar[str] = 'ADJ'
ADV: ClassVar[str] = 'ADV'
INTJ: ClassVar[str] = 'INTJ'
NOUN: ClassVar[str] = 'NOUN'
PROPN: ClassVar[str] = 'PROPN'
VERB: ClassVar[str] = 'VERB'
ADP: ClassVar[str] = 'ADP'
AUX: ClassVar[str] = 'AUX'
CCONJ: ClassVar[str] = 'CCONJ'
DET: ClassVar[str] = 'DET'
NUM: ClassVar[str] = 'NUM'
PART: ClassVar[str] = 'PART'
PRON: ClassVar[str] = 'PRON'
SCONJ: ClassVar[str] = 'SCONJ'
PUNCT: ClassVar[str] = 'PUNCT'
SYM: ClassVar[str] = 'SYM'
X: ClassVar[str] = 'X'
class DependencyRelationsBase[source]

Bases: ABC

Base class for Universal Dependencies relation definitions.

VERSION: ClassVar[str]
abstract property nsubj: str

Nominal subject relation.

abstract property nsubjpass: str

Passive nominal subject relation.

abstract property dobj: str

Direct object relation.

abstract property auxpass: str

Passive auxiliary relation.

abstract property subj: set[str]

All subject relations.

abstract property obj: set[str]

All object relations.

class DependencyRelationsV1[source]

Bases: DependencyRelationsBase

Universal Dependencies v1.0 relation definitions.

VERSION: ClassVar[str] = '1.0'
nsubj: ClassVar[str] = 'nsubj'
nsubjpass: ClassVar[str] = 'nsubjpass'
csubj: ClassVar[str] = 'csubj'
csubjpass: ClassVar[str] = 'csubjpass'
dobj: ClassVar[str] = 'dobj'
iobj: ClassVar[str] = 'iobj'
cop: ClassVar[str] = 'cop'
aux: ClassVar[str] = 'aux'
auxpass: ClassVar[str] = 'auxpass'
neg: ClassVar[str] = 'neg'
amod: ClassVar[str] = 'amod'
advmod: ClassVar[str] = 'advmod'
nmod: ClassVar[str] = 'nmod'
nmod_poss: ClassVar[str] = 'nmod:poss'
nmod_tmod: ClassVar[str] = 'nmod:tmod'
nmod_npmod: ClassVar[str] = 'nmod:npmod'
obl: ClassVar[str] = 'nmod'
obl_npmod: ClassVar[str] = 'nmod:npmod'
appos: ClassVar[str] = 'appos'
cc: ClassVar[str] = 'cc'
conj: ClassVar[str] = 'conj'
cc_preconj: ClassVar[str] = 'cc:preconj'
mark: ClassVar[str] = 'mark'
case: ClassVar[str] = 'case'
mwe: ClassVar[str] = 'fixed'
parataxis: ClassVar[str] = 'parataxis'
punct: ClassVar[str] = 'punct'
ccomp: ClassVar[str] = 'ccomp'
xcomp: ClassVar[str] = 'xcomp'
advcl: ClassVar[str] = 'advcl'
acl: ClassVar[str] = 'acl'
aclrelcl: ClassVar[str] = 'acl:relcl'
dep: ClassVar[str] = 'dep'
SUBJ: ClassVar[set[str]] = {'csubj', 'csubjpass', 'nsubj', 'nsubjpass'}
OBJ: ClassVar[set[str]] = {'dobj', 'iobj'}
NMODS: ClassVar[set[str]] = {'nmod', 'nmod:npmod', 'nmod:tmod'}
ADJ_LIKE_MODS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'amod', 'appos'}
ARG_LIKE: ClassVar[set[str]] = {'csubj', 'csubjpass', 'dobj', 'iobj', 'nmod', 'nmod:npmod', 'nmod:tmod', 'nsubj'}
TRIVIALS: ClassVar[set[str]] = {'cc', 'mark', 'punct'}
PRED_DEPS_TO_DROP: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'appos', 'ccomp', 'csubj', 'dep', 'nmod:tmod', 'parataxis'}
SPECIAL_ARG_DEPS_TO_DROP: ClassVar[set[str]] = {'advcl', 'aux', 'auxpass', 'ccomp', 'cop', 'csubj', 'csubjpass', 'dobj', 'fixed', 'iobj', 'mark', 'neg', 'nsubj', 'parataxis'}
HARD_TO_FIND_ARGS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'amod', 'conj', 'dep'}
property subj: set[str]

All subject relations.

property obj: set[str]

All object relations.

class DependencyRelationsV2[source]

Bases: DependencyRelationsBase

Universal Dependencies v2.0 relation definitions.

VERSION: ClassVar[str] = '2.0'
nsubj: ClassVar[str] = 'nsubj'
nsubjpass: ClassVar[str] = 'nsubj:pass'
csubj: ClassVar[str] = 'csubj'
csubjpass: ClassVar[str] = 'csubj:pass'
dobj: ClassVar[str] = 'obj'
iobj: ClassVar[str] = 'iobj'
aux: ClassVar[str] = 'aux'
auxpass: ClassVar[str] = 'aux:pass'
neg: ClassVar[str] = 'neg'
cop: ClassVar[str] = 'cop'
amod: ClassVar[str] = 'amod'
advmod: ClassVar[str] = 'advmod'
nmod: ClassVar[str] = 'nmod'
nmod_poss: ClassVar[str] = 'nmod:poss'
nmod_tmod: ClassVar[str] = 'nmod:tmod'
nmod_npmod: ClassVar[str] = 'nmod:npmod'
obl: ClassVar[str] = 'obl'
obl_npmod: ClassVar[str] = 'obl:npmod'
appos: ClassVar[str] = 'appos'
cc: ClassVar[str] = 'cc'
conj: ClassVar[str] = 'conj'
cc_preconj: ClassVar[str] = 'cc:preconj'
mark: ClassVar[str] = 'mark'
case: ClassVar[str] = 'case'
mwe: ClassVar[str] = 'fixed'
parataxis: ClassVar[str] = 'parataxis'
punct: ClassVar[str] = 'punct'
ccomp: ClassVar[str] = 'ccomp'
xcomp: ClassVar[str] = 'xcomp'
advcl: ClassVar[str] = 'advcl'
acl: ClassVar[str] = 'acl'
aclrelcl: ClassVar[str] = 'acl:relcl'
dep: ClassVar[str] = 'dep'
SUBJ: ClassVar[set[str]] = {'csubj', 'csubj:pass', 'nsubj', 'nsubj:pass'}
OBJ: ClassVar[set[str]] = {'iobj', 'obj'}
NMODS: ClassVar[set[str]] = {'nmod', 'nmod:npmod', 'nmod:tmod', 'obl'}
ADJ_LIKE_MODS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'amod', 'appos'}
ARG_LIKE: ClassVar[set[str]] = {'csubj', 'csubj:pass', 'iobj', 'nmod', 'nmod:npmod', 'nmod:tmod', 'nsubj', 'obj', 'obl'}
TRIVIALS: ClassVar[set[str]] = {'cc', 'mark', 'punct'}
PRED_DEPS_TO_DROP: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'appos', 'ccomp', 'csubj', 'dep', 'nmod:tmod', 'parataxis'}
SPECIAL_ARG_DEPS_TO_DROP: ClassVar[set[str]] = {'advcl', 'aux', 'aux:pass', 'ccomp', 'cop', 'csubj', 'csubj:pass', 'fixed', 'iobj', 'mark', 'neg', 'nsubj', 'obj', 'parataxis'}
HARD_TO_FIND_ARGS: ClassVar[set[str]] = {'acl', 'acl:relcl', 'advcl', 'amod', 'conj', 'dep'}
property subj: set[str]

All subject relations.

property obj: set[str]

All object relations.

postag

alias of POSTag

dep_v1

alias of DependencyRelationsV1

dep_v2

alias of DependencyRelationsV2

get_dependency_relations(version='2.0')[source]

Get dependency relations for a specific UD version.

Parameters:

version (str, optional) – The UD version (“1.0” or “2.0”), by default “2.0”

Returns:

The dependency relations class for the specified version

Return type:

type[DependencyRelationsBase]

Raises:

ValueError – If an unsupported version is specified