decomp.semantics.predpatt.core

Core PredPatt data structures for representing tokens, predicates, and arguments in dependency parses.

Core PredPatt classes with modern Python implementation.

This module contains the core data structures used by PredPatt for representing tokens, predicates, and arguments in dependency parses.

class Argument[source]

Bases: object

Represents an argument of a predicate.

Arguments are extracted from dependency parse trees and represent the participants in predicate-argument structures.

Parameters:
  • root (Token) – The root token of the argument.

  • ud (module, optional) – The Universal Dependencies module to use (default: dep_v1).

  • rules (list, optional) – List of rules that led to this argument’s extraction.

root

The root token of the argument.

Type:

Token

rules

List of extraction rules applied.

Type:

list

position

Position of the root token (copied from root.position).

Type:

int

ud

The UD version module being used.

Type:

module

tokens

List of tokens forming the argument phrase.

Type:

list[Token]

share

Whether this is a shared/borrowed argument (default: False).

Type:

bool

__init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, share=False)[source]

Initialize an Argument.

Parameters:
  • root (Token) – The root token of the argument.

  • ud (module, optional) – The Universal Dependencies module to use.

  • rules (list, optional) – List of rules that led to this argument’s extraction. WARNING: Default is mutable list - modifying one argument’s rules may affect others if default is used. This behavior is intentional to match the original PredPatt implementation.

__repr__()[source]

Return string representation.

Returns:

String in format ‘Argument(root)’.

Return type:

str

coords()[source]

Get coordinated arguments including this one.

Expands coordinated structures by finding conjunct dependents of the root token. Does not expand ccomp or csubj arguments.

Returns:

List of arguments including self and any conjuncts, sorted by position.

Return type:

list[Argument]

copy()[source]

Create a copy of this argument.

Creates a new Argument with the same root and copied lists for rules and tokens. The share flag is not copied.

Returns:

A new argument with copied rules and tokens lists.

Return type:

Argument

is_reference()[source]

Check if this is a reference (shared) argument.

Returns:

True if share attribute is True.

Return type:

bool

isclausal()[source]

Check if this is a clausal argument.

Clausal arguments are those with governor relations indicating embedded clauses: ccomp, csubj, csubjpass, or xcomp.

Returns:

True if the argument root has a clausal governor relation.

Return type:

bool

phrase()[source]

Get the argument phrase.

Joins the text of all tokens in the argument with spaces. The tokens are joined in the order they appear in the tokens list, which may be sorted by position during phrase extraction.

Returns:

Space-joined text of all tokens in the argument.

Return type:

str

reference()[source]

Create a reference (shared) copy of this argument.

Creates a new Argument marked as shared (share=True) with the same tokens list (not copied). Used for borrowed arguments.

Returns:

A new argument with share=True and shared tokens list.

Return type:

Argument

class PredPattOpts[source]

Bases: object

Configuration options for PredPatt extraction.

Controls various aspects of predicate-argument extraction including simplification, resolution of special constructions, and formatting.

Parameters:
  • simple (bool, optional) – Extract simple predicates (exclude aux and advmod). Default: False.

  • cut (bool, optional) – Cut: treat xcomp as independent predicate. Default: False.

  • resolve_relcl (bool, optional) – Resolve relative clause modifiers. Default: False.

  • resolve_appos (bool, optional) – Resolve appositives. Default: False.

  • resolve_amod (bool, optional) – Resolve adjectival modifiers. Default: False.

  • resolve_conj (bool, optional) – Resolve conjunctions. Default: False.

  • resolve_poss (bool, optional) – Resolve possessives. Default: False.

  • borrow_arg_for_relcl (bool, optional) – Borrow arguments for relative clauses. Default: True.

  • big_args (bool, optional) – Use big argument extraction (include all subtree tokens). Default: False.

  • strip (bool, optional) – Strip leading/trailing punctuation from phrases. Default: True.

  • ud (str, optional) – Universal Dependencies version (“1.0” or “2.0”). Default: “1.0”.

simple

Extract simple predicates (exclude aux and advmod).

Type:

bool

cut

Cut: treat xcomp as independent predicate.

Type:

bool

resolve_relcl

Resolve relative clause modifiers.

Type:

bool

resolve_appos

Resolve appositives.

Type:

bool

resolve_amod

Resolve adjectival modifiers.

Type:

bool

resolve_conj

Resolve conjunctions.

Type:

bool

resolve_poss

Resolve possessives.

Type:

bool

borrow_arg_for_relcl

Borrow arguments for relative clauses.

Type:

bool

big_args

Use big argument extraction.

Type:

bool

strip

Strip leading/trailing punctuation.

Type:

bool

ud

Universal Dependencies version string.

Type:

str

__init__(simple=False, cut=False, resolve_relcl=False, resolve_appos=False, resolve_amod=False, resolve_conj=False, resolve_poss=False, borrow_arg_for_relcl=True, big_args=False, strip=True, ud='1.0')[source]

Initialize PredPattOpts with configuration values.

Parameters are assigned in the exact same order as the original to ensure identical behavior and initialization.

class Predicate[source]

Bases: object

Represents a predicate extracted from a dependency parse.

A predicate consists of a root token and potentially multiple tokens that form the predicate phrase, along with its arguments.

Parameters:
  • root (Token) – The root token of the predicate.

  • ud (module, optional) – The Universal Dependencies module to use (default: dep_v1).

  • rules (list, optional) – List of rules that led to this predicate’s extraction.

  • type (PredicateType, optional) – Type of predicate (PredicateType.NORMAL, POSS, APPOS, or AMOD).

root

The root token of the predicate.

Type:

Token

rules

List of extraction rules applied.

Type:

list

position

Position of the root token.

Type:

int

ud

The UD version module being used.

Type:

module

arguments

List of arguments for this predicate.

Type:

list[Argument]

type

Type of predicate.

Type:

PredicateType

tokens

List of tokens forming the predicate phrase.

Type:

list[Token]

__init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, type_=PredicateType.NORMAL)[source]

Initialize a Predicate.

__repr__()[source]

Return string representation.

Return type:

str

copy()[source]

Only copy the complex predicate. The arguments are shared among each other.

Returns:

A new predicate with shared argument references and copied tokens.

Return type:

Predicate

format(track_rule=False, c=<function no_color>, indent='\\t')[source]

Format predicate with arguments for display.

Parameters:
  • track_rule (bool, optional) – Whether to include rule tracking information.

  • c (callable, optional) – Color function for formatting.

  • indent (str, optional) – Indentation string to use.

Returns:

Formatted predicate with arguments.

Return type:

str

has_borrowed_arg()[source]

Check if any argument is borrowed (shared).

Returns:

True if any argument has share=True and has rules.

Return type:

bool

has_obj()[source]

Check if predicate has an object argument.

Returns:

True if any argument is an object.

Return type:

bool

has_subj()[source]

Check if predicate has a subject argument.

Returns:

True if any argument is a subject.

Return type:

bool

has_token(token)[source]

Check if predicate contains a token at given position.

Parameters:

token (Token) – Token to check (only position is compared).

Returns:

True if any token in predicate has same position.

Return type:

bool

identifier()[source]

Generate unique identifier for this predicate.

Returns:

Identifier in format ‘pred.{type}.{position}.{arg_positions}’.

Return type:

str

is_broken()[source]

Check if predicate is malformed.

Returns:

True if broken, None if valid.

Return type:

bool | None

obj()[source]

Get the object argument if present.

Returns:

The first object argument, or None if no object.

Return type:

Argument | None

phrase()[source]

Get the predicate phrase with argument placeholders.

Returns:

The formatted predicate phrase.

Return type:

str

share_subj(other)[source]

Check if two predicates share the same subject.

Parameters:

other (Predicate) – The other predicate to compare with.

Returns:

True if both have subjects at same position, None if either lacks a subject.

Return type:

bool | None

subj()[source]

Get the subject argument if present.

Returns:

The first subject argument, or None if no subject.

Return type:

Argument | None

class PredicateType[source]

Bases: str, Enum

Enumeration of predicate types in PredPatt.

Inherits from str to maintain backward compatibility with string comparisons.

__new__(value)
NORMAL = 'normal'
POSS = 'poss'
APPOS = 'appos'
AMOD = 'amod'
class Token[source]

Bases: object

Represents a single token in a dependency parse.

position

The position of the token in the sentence (0-based).

Type:

int

text

The text content of the token.

Type:

str

tag

The part-of-speech tag of the token.

Type:

str

dependents

List of dependent edges where this token is the governor. Initially set to None.

Type:

list[DepTriple] | None

gov

The governing token (parent) in the dependency tree. Initially set to None.

Type:

Token | None

gov_rel

The dependency relation to the governing token. Initially set to None.

Type:

str | None

ud

The Universal Dependencies module (dep_v1 or dep_v2) that defines relation types and constants.

Type:

UDSchema

__init__(position, text, tag, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Initialize a Token.

Parameters:
  • position (int) – The position of the token in the sentence (0-based).

  • text (str) – The text content of the token.

  • tag (str) – The part-of-speech tag of the token.

  • ud (UDSchema, optional) – The Universal Dependencies module, by default dep_v1.

__repr__()[source]

Return string representation of the token.

Returns:

String in format ‘text/position’.

Return type:

str

argument_like()[source]

Check if this token looks like the root of an argument.

Returns:

True if the token’s gov_rel is in ARG_LIKE relations.

Return type:

bool

hard_to_find_arguments()[source]

Check if this is potentially the root of a predicate with hard-to-find arguments.

This func is only called when one of its dependents is an easy predicate. Here, we’re checking: Is this potentially the root of an easy predicate, which will have an argment?

Returns:

True if this could be a predicate root with hard-to-find arguments.

Return type:

bool

property isword: bool

Check if the token is not punctuation.

Returns:

True if the token is not punctuation, False otherwise.

Return type:

bool

argument_names(args)[source]

Give arguments alpha-numeric names.

Parameters:

args (list[T]) – List of arguments to name.

Returns:

Mapping from argument to its name (e.g., ‘?a’, ‘?b’, etc.).

Return type:

dict[T, str]

Examples

>>> names = argument_names(range(100))
>>> [names[i] for i in range(0,100,26)]
['?a', '?a1', '?a2', '?a3']
>>> [names[i] for i in range(1,100,26)]
['?b', '?b1', '?b2', '?b3']
no_color(x, _)[source]

Identity function for when color is disabled.

Return type:

str

sort_by_position(x)[source]

Sort items by their position attribute.

Return type:

list[TypeVar(T, bound= HasPosition)]

Submodules

decomp.semantics.predpatt.core.token

Token representation for dependency parsing in PredPatt.

This module defines the core Token class that represents individual tokens (words) in a dependency parse tree. Tokens store linguistic information including text, part-of-speech tags, and dependency relations.

Classes

Token

Represents a single token with its linguistic properties and dependency relations. Used as the basic unit in dependency parsing for predicate-argument extraction.

class Token[source]

Bases: object

Represents a single token in a dependency parse.

position

The position of the token in the sentence (0-based).

Type:

int

text

The text content of the token.

Type:

str

tag

The part-of-speech tag of the token.

Type:

str

dependents

List of dependent edges where this token is the governor. Initially set to None.

Type:

list[DepTriple] | None

gov

The governing token (parent) in the dependency tree. Initially set to None.

Type:

Token | None

gov_rel

The dependency relation to the governing token. Initially set to None.

Type:

str | None

ud

The Universal Dependencies module (dep_v1 or dep_v2) that defines relation types and constants.

Type:

UDSchema

__init__(position, text, tag, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]

Initialize a Token.

Parameters:
  • position (int) – The position of the token in the sentence (0-based).

  • text (str) – The text content of the token.

  • tag (str) – The part-of-speech tag of the token.

  • ud (UDSchema, optional) – The Universal Dependencies module, by default dep_v1.

__repr__()[source]

Return string representation of the token.

Returns:

String in format ‘text/position’.

Return type:

str

property isword: bool

Check if the token is not punctuation.

Returns:

True if the token is not punctuation, False otherwise.

Return type:

bool

argument_like()[source]

Check if this token looks like the root of an argument.

Returns:

True if the token’s gov_rel is in ARG_LIKE relations.

Return type:

bool

hard_to_find_arguments()[source]

Check if this is potentially the root of a predicate with hard-to-find arguments.

This func is only called when one of its dependents is an easy predicate. Here, we’re checking: Is this potentially the root of an easy predicate, which will have an argment?

Returns:

True if this could be a predicate root with hard-to-find arguments.

Return type:

bool

decomp.semantics.predpatt.core.predicate

Predicate representation for semantic role labeling in PredPatt.

This module defines the core predicate structures used in the PredPatt system for extracting and representing predicates from dependency parses. It handles various predicate types including verbal, possessive, appositional, and adjectival predicates.

Classes

Predicate

Main class representing a predicate with its root token, arguments, and predicate type. Supports different predicate types (normal, possessive, appositive, adjectival).

PredicateType

Enumeration defining the four types of predicates that PredPatt can extract: NORMAL, POSS, APPOS, and AMOD.

Functions

argument_names

Utility function to generate alphabetic names for arguments (?a, ?b, etc.) for display and debugging purposes.

sort_by_position

Helper function to sort items by their position attribute, used for ordering tokens and arguments.

no_color

Identity function that returns text unchanged (used when color is disabled).

class PredicateType[source]

Bases: str, Enum

Enumeration of predicate types in PredPatt.

Inherits from str to maintain backward compatibility with string comparisons.

NORMAL = 'normal'
POSS = 'poss'
APPOS = 'appos'
AMOD = 'amod'
__new__(value)
argument_names(args)[source]

Give arguments alpha-numeric names.

Parameters:

args (list[T]) – List of arguments to name.

Returns:

Mapping from argument to its name (e.g., ‘?a’, ‘?b’, etc.).

Return type:

dict[T, str]

Examples

>>> names = argument_names(range(100))
>>> [names[i] for i in range(0,100,26)]
['?a', '?a1', '?a2', '?a3']
>>> [names[i] for i in range(1,100,26)]
['?b', '?b1', '?b2', '?b3']
sort_by_position(x)[source]

Sort items by their position attribute.

Return type:

list[TypeVar(T, bound= HasPosition)]

no_color(x, _)[source]

Identity function for when color is disabled.

Return type:

str

class Predicate[source]

Bases: object

Represents a predicate extracted from a dependency parse.

A predicate consists of a root token and potentially multiple tokens that form the predicate phrase, along with its arguments.

Parameters:
  • root (Token) – The root token of the predicate.

  • ud (module, optional) – The Universal Dependencies module to use (default: dep_v1).

  • rules (list, optional) – List of rules that led to this predicate’s extraction.

  • type (PredicateType, optional) – Type of predicate (PredicateType.NORMAL, POSS, APPOS, or AMOD).

root

The root token of the predicate.

Type:

Token

rules

List of extraction rules applied.

Type:

list

position

Position of the root token.

Type:

int

ud

The UD version module being used.

Type:

module

arguments

List of arguments for this predicate.

Type:

list[Argument]

type

Type of predicate.

Type:

PredicateType

tokens

List of tokens forming the predicate phrase.

Type:

list[Token]

__init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, type_=PredicateType.NORMAL)[source]

Initialize a Predicate.

__repr__()[source]

Return string representation.

Return type:

str

copy()[source]

Only copy the complex predicate. The arguments are shared among each other.

Returns:

A new predicate with shared argument references and copied tokens.

Return type:

Predicate

identifier()[source]

Generate unique identifier for this predicate.

Returns:

Identifier in format ‘pred.{type}.{position}.{arg_positions}’.

Return type:

str

has_token(token)[source]

Check if predicate contains a token at given position.

Parameters:

token (Token) – Token to check (only position is compared).

Returns:

True if any token in predicate has same position.

Return type:

bool

has_subj()[source]

Check if predicate has a subject argument.

Returns:

True if any argument is a subject.

Return type:

bool

has_obj()[source]

Check if predicate has an object argument.

Returns:

True if any argument is an object.

Return type:

bool

subj()[source]

Get the subject argument if present.

Returns:

The first subject argument, or None if no subject.

Return type:

Argument | None

obj()[source]

Get the object argument if present.

Returns:

The first object argument, or None if no object.

Return type:

Argument | None

share_subj(other)[source]

Check if two predicates share the same subject.

Parameters:

other (Predicate) – The other predicate to compare with.

Returns:

True if both have subjects at same position, None if either lacks a subject.

Return type:

bool | None

has_borrowed_arg()[source]

Check if any argument is borrowed (shared).

Returns:

True if any argument has share=True and has rules.

Return type:

bool

phrase()[source]

Get the predicate phrase with argument placeholders.

Returns:

The formatted predicate phrase.

Return type:

str

is_broken()[source]

Check if predicate is malformed.

Returns:

True if broken, None if valid.

Return type:

bool | None

format(track_rule=False, c=<function no_color>, indent='\\t')[source]

Format predicate with arguments for display.

Parameters:
  • track_rule (bool, optional) – Whether to include rule tracking information.

  • c (callable, optional) – Color function for formatting.

  • indent (str, optional) – Indentation string to use.

Returns:

Formatted predicate with arguments.

Return type:

str

decomp.semantics.predpatt.core.argument

Argument representation for predicate-argument structures.

This module provides the Argument class, which represents arguments extracted from dependency parse trees in the PredPatt semantic extraction system. Arguments are the participants in predicate-argument structures, such as subjects, objects, and other dependents of predicates.

Arguments can be simple (single tokens) or complex (multi-token phrases), and support operations like copying, creating references (for shared arguments), and expanding coordinated structures.

Classes

Argument

The main class representing predicate arguments.

Functions

sort_by_position

Utility function for sorting items by position.

sort_by_position(x)[source]

Sort items by their position attribute.

Return type:

list[TypeVar(T, bound= HasPosition)]

class Argument[source]

Bases: object

Represents an argument of a predicate.

Arguments are extracted from dependency parse trees and represent the participants in predicate-argument structures.

Parameters:
  • root (Token) – The root token of the argument.

  • ud (module, optional) – The Universal Dependencies module to use (default: dep_v1).

  • rules (list, optional) – List of rules that led to this argument’s extraction.

root

The root token of the argument.

Type:

Token

rules

List of extraction rules applied.

Type:

list

position

Position of the root token (copied from root.position).

Type:

int

ud

The UD version module being used.

Type:

module

tokens

List of tokens forming the argument phrase.

Type:

list[Token]

share

Whether this is a shared/borrowed argument (default: False).

Type:

bool

__init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, share=False)[source]

Initialize an Argument.

Parameters:
  • root (Token) – The root token of the argument.

  • ud (module, optional) – The Universal Dependencies module to use.

  • rules (list, optional) – List of rules that led to this argument’s extraction. WARNING: Default is mutable list - modifying one argument’s rules may affect others if default is used. This behavior is intentional to match the original PredPatt implementation.

__repr__()[source]

Return string representation.

Returns:

String in format ‘Argument(root)’.

Return type:

str

copy()[source]

Create a copy of this argument.

Creates a new Argument with the same root and copied lists for rules and tokens. The share flag is not copied.

Returns:

A new argument with copied rules and tokens lists.

Return type:

Argument

reference()[source]

Create a reference (shared) copy of this argument.

Creates a new Argument marked as shared (share=True) with the same tokens list (not copied). Used for borrowed arguments.

Returns:

A new argument with share=True and shared tokens list.

Return type:

Argument

is_reference()[source]

Check if this is a reference (shared) argument.

Returns:

True if share attribute is True.

Return type:

bool

isclausal()[source]

Check if this is a clausal argument.

Clausal arguments are those with governor relations indicating embedded clauses: ccomp, csubj, csubjpass, or xcomp.

Returns:

True if the argument root has a clausal governor relation.

Return type:

bool

phrase()[source]

Get the argument phrase.

Joins the text of all tokens in the argument with spaces. The tokens are joined in the order they appear in the tokens list, which may be sorted by position during phrase extraction.

Returns:

Space-joined text of all tokens in the argument.

Return type:

str

coords()[source]

Get coordinated arguments including this one.

Expands coordinated structures by finding conjunct dependents of the root token. Does not expand ccomp or csubj arguments.

Returns:

List of arguments including self and any conjuncts, sorted by position.

Return type:

list[Argument]

decomp.semantics.predpatt.core.options

Options configuration for PredPatt extraction.

This module contains the PredPattOpts class which configures the behavior of predicate-argument extraction in the PredPatt system.

class PredPattOpts[source]

Bases: object

Configuration options for PredPatt extraction.

Controls various aspects of predicate-argument extraction including simplification, resolution of special constructions, and formatting.

Parameters:
  • simple (bool, optional) – Extract simple predicates (exclude aux and advmod). Default: False.

  • cut (bool, optional) – Cut: treat xcomp as independent predicate. Default: False.

  • resolve_relcl (bool, optional) – Resolve relative clause modifiers. Default: False.

  • resolve_appos (bool, optional) – Resolve appositives. Default: False.

  • resolve_amod (bool, optional) – Resolve adjectival modifiers. Default: False.

  • resolve_conj (bool, optional) – Resolve conjunctions. Default: False.

  • resolve_poss (bool, optional) – Resolve possessives. Default: False.

  • borrow_arg_for_relcl (bool, optional) – Borrow arguments for relative clauses. Default: True.

  • big_args (bool, optional) – Use big argument extraction (include all subtree tokens). Default: False.

  • strip (bool, optional) – Strip leading/trailing punctuation from phrases. Default: True.

  • ud (str, optional) – Universal Dependencies version (“1.0” or “2.0”). Default: “1.0”.

simple

Extract simple predicates (exclude aux and advmod).

Type:

bool

cut

Cut: treat xcomp as independent predicate.

Type:

bool

resolve_relcl

Resolve relative clause modifiers.

Type:

bool

resolve_appos

Resolve appositives.

Type:

bool

resolve_amod

Resolve adjectival modifiers.

Type:

bool

resolve_conj

Resolve conjunctions.

Type:

bool

resolve_poss

Resolve possessives.

Type:

bool

borrow_arg_for_relcl

Borrow arguments for relative clauses.

Type:

bool

big_args

Use big argument extraction.

Type:

bool

strip

Strip leading/trailing punctuation.

Type:

bool

ud

Universal Dependencies version string.

Type:

str

__init__(simple=False, cut=False, resolve_relcl=False, resolve_appos=False, resolve_amod=False, resolve_conj=False, resolve_poss=False, borrow_arg_for_relcl=True, big_args=False, strip=True, ud='1.0')[source]

Initialize PredPattOpts with configuration values.

Parameters are assigned in the exact same order as the original to ensure identical behavior and initialization.