decomp.semantics.predpatt.core¶
Core PredPatt data structures for representing tokens, predicates, and arguments in dependency parses.
Core PredPatt classes with modern Python implementation.
This module contains the core data structures used by PredPatt for representing tokens, predicates, and arguments in dependency parses.
- class Argument[source]¶
Bases:
objectRepresents an argument of a predicate.
Arguments are extracted from dependency parse trees and represent the participants in predicate-argument structures.
- Parameters:
- ud¶
The UD version module being used.
- Type:
module
Whether this is a shared/borrowed argument (default: False).
- Type:
- __init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, share=False)[source]¶
Initialize an Argument.
- Parameters:
root (Token) – The root token of the argument.
ud (module, optional) – The Universal Dependencies module to use.
rules (list, optional) – List of rules that led to this argument’s extraction. WARNING: Default is mutable list - modifying one argument’s rules may affect others if default is used. This behavior is intentional to match the original PredPatt implementation.
- __repr__()[source]¶
Return string representation.
- Returns:
String in format ‘Argument(root)’.
- Return type:
- coords()[source]¶
Get coordinated arguments including this one.
Expands coordinated structures by finding conjunct dependents of the root token. Does not expand ccomp or csubj arguments.
- copy()[source]¶
Create a copy of this argument.
Creates a new Argument with the same root and copied lists for rules and tokens. The share flag is not copied.
- Returns:
A new argument with copied rules and tokens lists.
- Return type:
- is_reference()[source]¶
Check if this is a reference (shared) argument.
- Returns:
True if share attribute is True.
- Return type:
- isclausal()[source]¶
Check if this is a clausal argument.
Clausal arguments are those with governor relations indicating embedded clauses: ccomp, csubj, csubjpass, or xcomp.
- Returns:
True if the argument root has a clausal governor relation.
- Return type:
- class PredPattOpts[source]¶
Bases:
objectConfiguration options for PredPatt extraction.
Controls various aspects of predicate-argument extraction including simplification, resolution of special constructions, and formatting.
- Parameters:
simple (bool, optional) – Extract simple predicates (exclude aux and advmod). Default: False.
cut (bool, optional) – Cut: treat xcomp as independent predicate. Default: False.
resolve_relcl (bool, optional) – Resolve relative clause modifiers. Default: False.
resolve_appos (bool, optional) – Resolve appositives. Default: False.
resolve_amod (bool, optional) – Resolve adjectival modifiers. Default: False.
resolve_conj (bool, optional) – Resolve conjunctions. Default: False.
resolve_poss (bool, optional) – Resolve possessives. Default: False.
borrow_arg_for_relcl (bool, optional) – Borrow arguments for relative clauses. Default: True.
big_args (bool, optional) – Use big argument extraction (include all subtree tokens). Default: False.
strip (bool, optional) – Strip leading/trailing punctuation from phrases. Default: True.
ud (str, optional) – Universal Dependencies version (“1.0” or “2.0”). Default: “1.0”.
- __init__(simple=False, cut=False, resolve_relcl=False, resolve_appos=False, resolve_amod=False, resolve_conj=False, resolve_poss=False, borrow_arg_for_relcl=True, big_args=False, strip=True, ud='1.0')[source]¶
Initialize PredPattOpts with configuration values.
Parameters are assigned in the exact same order as the original to ensure identical behavior and initialization.
- class Predicate[source]¶
Bases:
objectRepresents a predicate extracted from a dependency parse.
A predicate consists of a root token and potentially multiple tokens that form the predicate phrase, along with its arguments.
- Parameters:
root (Token) – The root token of the predicate.
ud (module, optional) – The Universal Dependencies module to use (default: dep_v1).
rules (list, optional) – List of rules that led to this predicate’s extraction.
type (PredicateType, optional) – Type of predicate (PredicateType.NORMAL, POSS, APPOS, or AMOD).
- ud¶
The UD version module being used.
- Type:
module
- type¶
Type of predicate.
- Type:
- __init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, type_=PredicateType.NORMAL)[source]¶
Initialize a Predicate.
- copy()[source]¶
Only copy the complex predicate. The arguments are shared among each other.
- Returns:
A new predicate with shared argument references and copied tokens.
- Return type:
- format(track_rule=False, c=<function no_color>, indent='\\t')[source]¶
Format predicate with arguments for display.
- has_borrowed_arg()[source]¶
Check if any argument is borrowed (shared).
- Returns:
True if any argument has share=True and has rules.
- Return type:
- has_obj()[source]¶
Check if predicate has an object argument.
- Returns:
True if any argument is an object.
- Return type:
- has_subj()[source]¶
Check if predicate has a subject argument.
- Returns:
True if any argument is a subject.
- Return type:
- identifier()[source]¶
Generate unique identifier for this predicate.
- Returns:
Identifier in format ‘pred.{type}.{position}.{arg_positions}’.
- Return type:
- is_broken()[source]¶
Check if predicate is malformed.
- Returns:
True if broken, None if valid.
- Return type:
bool | None
- obj()[source]¶
Get the object argument if present.
- Returns:
The first object argument, or None if no object.
- Return type:
Argument | None
- phrase()[source]¶
Get the predicate phrase with argument placeholders.
- Returns:
The formatted predicate phrase.
- Return type:
Check if two predicates share the same subject.
- class PredicateType[source]¶
-
Enumeration of predicate types in PredPatt.
Inherits from str to maintain backward compatibility with string comparisons.
- __new__(value)¶
- NORMAL = 'normal'¶
- POSS = 'poss'¶
- APPOS = 'appos'¶
- AMOD = 'amod'¶
- class Token[source]¶
Bases:
objectRepresents a single token in a dependency parse.
- dependents¶
List of dependent edges where this token is the governor. Initially set to None.
- ud¶
The Universal Dependencies module (dep_v1 or dep_v2) that defines relation types and constants.
- Type:
UDSchema
- __init__(position, text, tag, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]¶
Initialize a Token.
- __repr__()[source]¶
Return string representation of the token.
- Returns:
String in format ‘text/position’.
- Return type:
- argument_like()[source]¶
Check if this token looks like the root of an argument.
- Returns:
True if the token’s gov_rel is in ARG_LIKE relations.
- Return type:
- hard_to_find_arguments()[source]¶
Check if this is potentially the root of a predicate with hard-to-find arguments.
This func is only called when one of its dependents is an easy predicate. Here, we’re checking: Is this potentially the root of an easy predicate, which will have an argment?
- Returns:
True if this could be a predicate root with hard-to-find arguments.
- Return type:
- argument_names(args)[source]¶
Give arguments alpha-numeric names.
- Parameters:
args (list[T]) – List of arguments to name.
- Returns:
Mapping from argument to its name (e.g., ‘?a’, ‘?b’, etc.).
- Return type:
Examples
>>> names = argument_names(range(100)) >>> [names[i] for i in range(0,100,26)] ['?a', '?a1', '?a2', '?a3'] >>> [names[i] for i in range(1,100,26)] ['?b', '?b1', '?b2', '?b3']
- sort_by_position(x)[source]¶
Sort items by their position attribute.
- Return type:
list[TypeVar(T, bound=HasPosition)]
Submodules¶
decomp.semantics.predpatt.core.token¶
Token representation for dependency parsing in PredPatt.
This module defines the core Token class that represents individual tokens (words) in a dependency parse tree. Tokens store linguistic information including text, part-of-speech tags, and dependency relations.
Classes¶
- Token
Represents a single token with its linguistic properties and dependency relations. Used as the basic unit in dependency parsing for predicate-argument extraction.
- class Token[source]
Bases:
objectRepresents a single token in a dependency parse.
- position
The position of the token in the sentence (0-based).
- Type:
- text
The text content of the token.
- Type:
- tag
The part-of-speech tag of the token.
- Type:
- dependents
List of dependent edges where this token is the governor. Initially set to None.
- gov
The governing token (parent) in the dependency tree. Initially set to None.
- Type:
Token | None
- gov_rel
The dependency relation to the governing token. Initially set to None.
- Type:
str | None
- ud
The Universal Dependencies module (dep_v1 or dep_v2) that defines relation types and constants.
- Type:
UDSchema
- __init__(position, text, tag, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>)[source]
Initialize a Token.
- __repr__()[source]
Return string representation of the token.
- Returns:
String in format ‘text/position’.
- Return type:
- property isword: bool
Check if the token is not punctuation.
- Returns:
True if the token is not punctuation, False otherwise.
- Return type:
- argument_like()[source]
Check if this token looks like the root of an argument.
- Returns:
True if the token’s gov_rel is in ARG_LIKE relations.
- Return type:
- hard_to_find_arguments()[source]
Check if this is potentially the root of a predicate with hard-to-find arguments.
This func is only called when one of its dependents is an easy predicate. Here, we’re checking: Is this potentially the root of an easy predicate, which will have an argment?
- Returns:
True if this could be a predicate root with hard-to-find arguments.
- Return type:
decomp.semantics.predpatt.core.predicate¶
Predicate representation for semantic role labeling in PredPatt.
This module defines the core predicate structures used in the PredPatt system for extracting and representing predicates from dependency parses. It handles various predicate types including verbal, possessive, appositional, and adjectival predicates.
Classes¶
- Predicate
Main class representing a predicate with its root token, arguments, and predicate type. Supports different predicate types (normal, possessive, appositive, adjectival).
- PredicateType
Enumeration defining the four types of predicates that PredPatt can extract: NORMAL, POSS, APPOS, and AMOD.
Functions¶
- argument_names
Utility function to generate alphabetic names for arguments (?a, ?b, etc.) for display and debugging purposes.
- sort_by_position
Helper function to sort items by their position attribute, used for ordering tokens and arguments.
- no_color
Identity function that returns text unchanged (used when color is disabled).
- class PredicateType[source]
-
Enumeration of predicate types in PredPatt.
Inherits from str to maintain backward compatibility with string comparisons.
- NORMAL = 'normal'
- POSS = 'poss'
- APPOS = 'appos'
- AMOD = 'amod'
- __new__(value)
- argument_names(args)[source]
Give arguments alpha-numeric names.
- Parameters:
args (list[T]) – List of arguments to name.
- Returns:
Mapping from argument to its name (e.g., ‘?a’, ‘?b’, etc.).
- Return type:
Examples
>>> names = argument_names(range(100)) >>> [names[i] for i in range(0,100,26)] ['?a', '?a1', '?a2', '?a3'] >>> [names[i] for i in range(1,100,26)] ['?b', '?b1', '?b2', '?b3']
- sort_by_position(x)[source]
Sort items by their position attribute.
- Return type:
list[TypeVar(T, bound=HasPosition)]
- class Predicate[source]
Bases:
objectRepresents a predicate extracted from a dependency parse.
A predicate consists of a root token and potentially multiple tokens that form the predicate phrase, along with its arguments.
- Parameters:
root (Token) – The root token of the predicate.
ud (module, optional) – The Universal Dependencies module to use (default: dep_v1).
rules (list, optional) – List of rules that led to this predicate’s extraction.
type (PredicateType, optional) – Type of predicate (PredicateType.NORMAL, POSS, APPOS, or AMOD).
- root
The root token of the predicate.
- Type:
- rules
List of extraction rules applied.
- Type:
- position
Position of the root token.
- Type:
- ud
The UD version module being used.
- Type:
module
- type
Type of predicate.
- Type:
- __init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, type_=PredicateType.NORMAL)[source]
Initialize a Predicate.
- copy()[source]
Only copy the complex predicate. The arguments are shared among each other.
- Returns:
A new predicate with shared argument references and copied tokens.
- Return type:
- identifier()[source]
Generate unique identifier for this predicate.
- Returns:
Identifier in format ‘pred.{type}.{position}.{arg_positions}’.
- Return type:
- has_token(token)[source]
Check if predicate contains a token at given position.
- has_subj()[source]
Check if predicate has a subject argument.
- Returns:
True if any argument is a subject.
- Return type:
- has_obj()[source]
Check if predicate has an object argument.
- Returns:
True if any argument is an object.
- Return type:
- subj()[source]
Get the subject argument if present.
- Returns:
The first subject argument, or None if no subject.
- Return type:
Argument | None
- obj()[source]
Get the object argument if present.
- Returns:
The first object argument, or None if no object.
- Return type:
Argument | None
- share_subj(other)[source]
Check if two predicates share the same subject.
- has_borrowed_arg()[source]
Check if any argument is borrowed (shared).
- Returns:
True if any argument has share=True and has rules.
- Return type:
- phrase()[source]
Get the predicate phrase with argument placeholders.
- Returns:
The formatted predicate phrase.
- Return type:
- is_broken()[source]
Check if predicate is malformed.
- Returns:
True if broken, None if valid.
- Return type:
bool | None
- format(track_rule=False, c=<function no_color>, indent='\\t')[source]
Format predicate with arguments for display.
decomp.semantics.predpatt.core.argument¶
Argument representation for predicate-argument structures.
This module provides the Argument class, which represents arguments extracted from dependency parse trees in the PredPatt semantic extraction system. Arguments are the participants in predicate-argument structures, such as subjects, objects, and other dependents of predicates.
Arguments can be simple (single tokens) or complex (multi-token phrases), and support operations like copying, creating references (for shared arguments), and expanding coordinated structures.
Classes¶
- Argument
The main class representing predicate arguments.
Functions¶
- sort_by_position
Utility function for sorting items by position.
- sort_by_position(x)[source]
Sort items by their position attribute.
- Return type:
list[TypeVar(T, bound=HasPosition)]
- class Argument[source]
Bases:
objectRepresents an argument of a predicate.
Arguments are extracted from dependency parse trees and represent the participants in predicate-argument structures.
- Parameters:
- root
The root token of the argument.
- Type:
- rules
List of extraction rules applied.
- Type:
- position
Position of the root token (copied from root.position).
- Type:
- ud
The UD version module being used.
- Type:
module
- share
Whether this is a shared/borrowed argument (default: False).
- Type:
- __init__(root, ud=<class 'decomp.semantics.predpatt.utils.ud_schema.DependencyRelationsV1'>, rules=None, share=False)[source]
Initialize an Argument.
- Parameters:
root (Token) – The root token of the argument.
ud (module, optional) – The Universal Dependencies module to use.
rules (list, optional) – List of rules that led to this argument’s extraction. WARNING: Default is mutable list - modifying one argument’s rules may affect others if default is used. This behavior is intentional to match the original PredPatt implementation.
- __repr__()[source]
Return string representation.
- Returns:
String in format ‘Argument(root)’.
- Return type:
- copy()[source]
Create a copy of this argument.
Creates a new Argument with the same root and copied lists for rules and tokens. The share flag is not copied.
- Returns:
A new argument with copied rules and tokens lists.
- Return type:
- reference()[source]
Create a reference (shared) copy of this argument.
Creates a new Argument marked as shared (share=True) with the same tokens list (not copied). Used for borrowed arguments.
- Returns:
A new argument with share=True and shared tokens list.
- Return type:
- is_reference()[source]
Check if this is a reference (shared) argument.
- Returns:
True if share attribute is True.
- Return type:
- isclausal()[source]
Check if this is a clausal argument.
Clausal arguments are those with governor relations indicating embedded clauses: ccomp, csubj, csubjpass, or xcomp.
- Returns:
True if the argument root has a clausal governor relation.
- Return type:
- phrase()[source]
Get the argument phrase.
Joins the text of all tokens in the argument with spaces. The tokens are joined in the order they appear in the tokens list, which may be sorted by position during phrase extraction.
- Returns:
Space-joined text of all tokens in the argument.
- Return type:
decomp.semantics.predpatt.core.options¶
Options configuration for PredPatt extraction.
This module contains the PredPattOpts class which configures the behavior of predicate-argument extraction in the PredPatt system.
- class PredPattOpts[source]
Bases:
objectConfiguration options for PredPatt extraction.
Controls various aspects of predicate-argument extraction including simplification, resolution of special constructions, and formatting.
- Parameters:
simple (bool, optional) – Extract simple predicates (exclude aux and advmod). Default: False.
cut (bool, optional) – Cut: treat xcomp as independent predicate. Default: False.
resolve_relcl (bool, optional) – Resolve relative clause modifiers. Default: False.
resolve_appos (bool, optional) – Resolve appositives. Default: False.
resolve_amod (bool, optional) – Resolve adjectival modifiers. Default: False.
resolve_conj (bool, optional) – Resolve conjunctions. Default: False.
resolve_poss (bool, optional) – Resolve possessives. Default: False.
borrow_arg_for_relcl (bool, optional) – Borrow arguments for relative clauses. Default: True.
big_args (bool, optional) – Use big argument extraction (include all subtree tokens). Default: False.
strip (bool, optional) – Strip leading/trailing punctuation from phrases. Default: True.
ud (str, optional) – Universal Dependencies version (“1.0” or “2.0”). Default: “1.0”.
- simple
Extract simple predicates (exclude aux and advmod).
- Type:
- cut
Cut: treat xcomp as independent predicate.
- Type:
- resolve_relcl
Resolve relative clause modifiers.
- Type:
- resolve_appos
Resolve appositives.
- Type:
- resolve_amod
Resolve adjectival modifiers.
- Type:
- resolve_conj
Resolve conjunctions.
- Type:
- resolve_poss
Resolve possessives.
- Type:
- borrow_arg_for_relcl
Borrow arguments for relative clauses.
- Type:
- big_args
Use big argument extraction.
- Type:
- strip
Strip leading/trailing punctuation.
- Type:
- ud
Universal Dependencies version string.
- Type:
- __init__(simple=False, cut=False, resolve_relcl=False, resolve_appos=False, resolve_amod=False, resolve_conj=False, resolve_poss=False, borrow_arg_for_relcl=True, big_args=False, strip=True, ud='1.0')[source]
Initialize PredPattOpts with configuration values.
Parameters are assigned in the exact same order as the original to ensure identical behavior and initialization.