decomp.semantics.predpatt.extraction

Main extraction engine that orchestrates the application of linguistic rules to extract predicate-argument structures from Universal Dependencies parses.

Core extraction engine for PredPatt semantic structures.

This module contains the main extraction engine that orchestrates the application of linguistic rules to extract predicate-argument structures from Universal Dependencies parses.

Classes

PredPattEngine

Main engine for extracting predicates and arguments from dependency parses.

See also

decomp.semantics.predpatt.rules

Linguistic rules used by the engine

decomp.semantics.predpatt.filters

Filters for refining extractions

class PredPattEngine[source]

Bases: object

Main extraction engine for PredPatt predicate-argument structures.

This class orchestrates the complete extraction pipeline for identifying predicates and their arguments from Universal Dependencies parses. It follows the exact same processing order and behavior as the original PredPatt implementation.

Parameters:
  • parse (UDParse) – The Universal Dependencies parse to extract from.

  • opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.

options

Configuration options controlling extraction behavior.

Type:

PredPattOpts

ud

Universal Dependencies schema (dep_v1 or dep_v2) based on options.

Type:

object

tokens

List of Token objects from the parse.

Type:

list[Token]

edges

List of dependency triples from the parse.

Type:

list[DepTriple]

instances

Final list of predicate instances after all processing.

Type:

list[Predicate]

events

List of predicate events before coordination expansion.

Type:

list[Predicate] | None

event_dict

Mapping from root tokens to their predicate objects.

Type:

dict[Token, Predicate] | None

__init__(parse, opts=None)[source]

Initialize PredPattEngine with parse and options.

Sets up the extraction engine with configuration and prepares the parse for processing. Automatically triggers the complete extraction pipeline.

Parameters:
  • parse (UDParse) – The Universal Dependencies parse to extract from.

  • opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.

argument_extract(predicate)[source]

Extract argument root tokens for a given predicate.

Applies argument identification rules in the exact same order as the original implementation. This includes core arguments (g1), nominal modifiers (h1, h2), clausal arguments (k), and special predicate type arguments (i, j, w1, w2).

Parameters:

predicate (Predicate) – The predicate to extract arguments for.

Returns:

List of argument objects for this predicate.

Return type:

list[Argument]

expand_coord(predicate)[source]

Expand coordinated arguments.

Creates separate predicate instances for each combination of coordinated arguments (Cartesian product). For example: “A and B eat C and D” → 4 instances: (A,C), (A,D), (B,C), (B,D)

Parameters:

predicate (Predicate) – The predicate to expand coordinated arguments for.

Returns:

List of predicate instances with expanded argument combinations.

Return type:

list[Predicate]

extract()[source]

Execute the complete predicate-argument extraction pipeline.

Orchestrates all phases of extraction in the exact order specified in the PREDPATT_EXTRACTION_PIPELINE.md documentation:

  1. Predicate root identification

  2. Event dictionary creation

  3. Argument root extraction

  4. Argument resolution

  5. Argument sorting

  6. Phrase extraction

  7. Argument simplification (optional)

  8. Conjunction resolution

  9. Coordination expansion

  10. Relative clause cleanup

  11. Final cleanup

This method modifies the engine state and populates the instances attribute with the final extraction results.

Return type:

None

classmethod from_constituency(parse_string, cacheable=True, opts=None)[source]

Create PredPattEngine from a constituency parse string.

Warning

This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().

Converts constituency parse to Universal Dependencies automatically. [English only]

Parameters:
  • parse_string (str) – The constituency parse string to convert.

  • cacheable (bool, optional) – Whether to use cached parser instance. Default: True.

  • opts (PredPattOpts, optional) – Configuration options for extraction.

Returns:

Engine instance with extraction results from converted parse.

Return type:

PredPattEngine

Raises:

NotImplementedError – Always raised as this feature is not yet implemented.

classmethod from_sentence(sentence, cacheable=True, opts=None)[source]

Create PredPattEngine from a sentence string.

Warning

This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().

Parses sentence and converts to Universal Dependencies automatically. [English only]

Parameters:
  • sentence (str) – The sentence string to parse and extract from.

  • cacheable (bool, optional) – Whether to use cached parser instance. Default: True.

  • opts (PredPattOpts, optional) – Configuration options for extraction.

Returns:

Engine instance with extraction results from parsed sentence.

Return type:

PredPattEngine

Raises:

NotImplementedError – Always raised as this feature is not yet implemented.

identify_predicate_roots()[source]

Predicate root identification.

Identifies predicate root tokens by applying predicate identification rules in the exact same order as the original implementation. This includes special predicate types (APPOS, POSS, AMOD) and conjunction expansion.

Returns:

List of predicate objects sorted by position.

Return type:

list[Predicate]

parents(predicate)[source]

Iterate over the chain of parents (governing predicates).

Yields predicates that govern the given predicate by following the chain of governor tokens.

Parameters:

predicate (Predicate) – The predicate to start from.

Yields:

Predicate – Each governing predicate in the chain.

Return type:

Iterator[Predicate]

pprint(color=False, track_rule=False)[source]

Pretty-print extracted predicate-argument tuples.

Parameters:
  • color (bool, optional) – Whether to use colored output (default: False).

  • track_rule (bool, optional) – Whether to include rule tracking information (default: False).

Returns:

Pretty-printed string representation of predicates and arguments.

Return type:

str

qualified_conjoined_predicate(gov, dep)[source]

Check if the conjunction (dep) of a predicate (gov) is another predicate.

Parameters:
  • gov (Token) – The governing token (existing predicate).

  • dep (Token) – The dependent token (potential conjoined predicate).

Returns:

True if the dependent qualifies as a conjoined predicate.

Return type:

bool

static subtree(s, follow=<function PredPattEngine.<lambda>>)[source]

Breadth-first iterator over nodes in a dependency tree.

Parameters:
  • s (Token) – Initial state token to start traversal from.

  • follow (callable, optional) – Function that takes an edge and returns True if we should follow the edge. Default follows all edges.

Yields:

Token – Each token in the dependency subtree in breadth-first order.

Return type:

Iterator[Token]

Submodules

decomp.semantics.predpatt.extraction.engine

Main extraction engine for PredPatt predicate-argument extraction.

This module contains the PredPattEngine class which is responsible for orchestrating the entire predicate-argument extraction pipeline from Universal Dependencies parses. The engine coordinates all phases of extraction from predicate identification through argument resolution and coordination expansion.

Classes

PredPattEngine

Main extraction engine coordinating the complete predicate-argument pipeline.

Functions

gov_looks_like_predicate

Check if a governor token appears to be a predicate based on its dependents.

sort_by_position

Sort objects by their position attribute.

convert_parse

Convert dependency parse from integer indices to Token objects.

See also

decomp.semantics.predpatt.core

Core classes for predicates and arguments

decomp.semantics.predpatt.rules

Linguistic rules for extraction

decomp.semantics.predpatt.parsing

Parse handling and conversion

gov_looks_like_predicate(e, ud)[source]

Check if e.gov looks like a predicate because it has potential arguments.

Parameters:
  • e (DepTriple) – The dependency edge to check.

  • ud (object) – Universal Dependencies schema object.

Returns:

True if the governor looks like a predicate based on its arguments.

Return type:

bool

sort_by_position(x)[source]

Sort objects by their position attribute.

Parameters:

x (list) – List of objects with position attributes.

Returns:

Sorted list ordered by position.

Return type:

list

convert_parse(parse, ud)[source]

Convert dependency parse on integers into a dependency parse on Tokens.

Parameters:
  • parse (UDParse) – The parse to convert with integer-based dependencies.

  • ud (object) – Universal Dependencies schema object (dep_v1 or dep_v2).

Returns:

Parse converted to use Token objects with full dependency structure.

Return type:

UDParse

class PredPattEngine[source]

Bases: object

Main extraction engine for PredPatt predicate-argument structures.

This class orchestrates the complete extraction pipeline for identifying predicates and their arguments from Universal Dependencies parses. It follows the exact same processing order and behavior as the original PredPatt implementation.

Parameters:
  • parse (UDParse) – The Universal Dependencies parse to extract from.

  • opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.

options

Configuration options controlling extraction behavior.

Type:

PredPattOpts

ud

Universal Dependencies schema (dep_v1 or dep_v2) based on options.

Type:

object

tokens

List of Token objects from the parse.

Type:

list[Token]

edges

List of dependency triples from the parse.

Type:

list[DepTriple]

instances

Final list of predicate instances after all processing.

Type:

list[Predicate]

events

List of predicate events before coordination expansion.

Type:

list[Predicate] | None

event_dict

Mapping from root tokens to their predicate objects.

Type:

dict[Token, Predicate] | None

__init__(parse, opts=None)[source]

Initialize PredPattEngine with parse and options.

Sets up the extraction engine with configuration and prepares the parse for processing. Automatically triggers the complete extraction pipeline.

Parameters:
  • parse (UDParse) – The Universal Dependencies parse to extract from.

  • opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.

classmethod from_constituency(parse_string, cacheable=True, opts=None)[source]

Create PredPattEngine from a constituency parse string.

Warning

This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().

Converts constituency parse to Universal Dependencies automatically. [English only]

Parameters:
  • parse_string (str) – The constituency parse string to convert.

  • cacheable (bool, optional) – Whether to use cached parser instance. Default: True.

  • opts (PredPattOpts, optional) – Configuration options for extraction.

Returns:

Engine instance with extraction results from converted parse.

Return type:

PredPattEngine

Raises:

NotImplementedError – Always raised as this feature is not yet implemented.

classmethod from_sentence(sentence, cacheable=True, opts=None)[source]

Create PredPattEngine from a sentence string.

Warning

This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().

Parses sentence and converts to Universal Dependencies automatically. [English only]

Parameters:
  • sentence (str) – The sentence string to parse and extract from.

  • cacheable (bool, optional) – Whether to use cached parser instance. Default: True.

  • opts (PredPattOpts, optional) – Configuration options for extraction.

Returns:

Engine instance with extraction results from parsed sentence.

Return type:

PredPattEngine

Raises:

NotImplementedError – Always raised as this feature is not yet implemented.

extract()[source]

Execute the complete predicate-argument extraction pipeline.

Orchestrates all phases of extraction in the exact order specified in the PREDPATT_EXTRACTION_PIPELINE.md documentation:

  1. Predicate root identification

  2. Event dictionary creation

  3. Argument root extraction

  4. Argument resolution

  5. Argument sorting

  6. Phrase extraction

  7. Argument simplification (optional)

  8. Conjunction resolution

  9. Coordination expansion

  10. Relative clause cleanup

  11. Final cleanup

This method modifies the engine state and populates the instances attribute with the final extraction results.

Return type:

None

identify_predicate_roots()[source]

Predicate root identification.

Identifies predicate root tokens by applying predicate identification rules in the exact same order as the original implementation. This includes special predicate types (APPOS, POSS, AMOD) and conjunction expansion.

Returns:

List of predicate objects sorted by position.

Return type:

list[Predicate]

qualified_conjoined_predicate(gov, dep)[source]

Check if the conjunction (dep) of a predicate (gov) is another predicate.

Parameters:
  • gov (Token) – The governing token (existing predicate).

  • dep (Token) – The dependent token (potential conjoined predicate).

Returns:

True if the dependent qualifies as a conjoined predicate.

Return type:

bool

argument_extract(predicate)[source]

Extract argument root tokens for a given predicate.

Applies argument identification rules in the exact same order as the original implementation. This includes core arguments (g1), nominal modifiers (h1, h2), clausal arguments (k), and special predicate type arguments (i, j, w1, w2).

Parameters:

predicate (Predicate) – The predicate to extract arguments for.

Returns:

List of argument objects for this predicate.

Return type:

list[Argument]

parents(predicate)[source]

Iterate over the chain of parents (governing predicates).

Yields predicates that govern the given predicate by following the chain of governor tokens.

Parameters:

predicate (Predicate) – The predicate to start from.

Yields:

Predicate – Each governing predicate in the chain.

Return type:

Iterator[Predicate]

expand_coord(predicate)[source]

Expand coordinated arguments.

Creates separate predicate instances for each combination of coordinated arguments (Cartesian product). For example: “A and B eat C and D” → 4 instances: (A,C), (A,D), (B,C), (B,D)

Parameters:

predicate (Predicate) – The predicate to expand coordinated arguments for.

Returns:

List of predicate instances with expanded argument combinations.

Return type:

list[Predicate]

static subtree(s, follow=<function PredPattEngine.<lambda>>)[source]

Breadth-first iterator over nodes in a dependency tree.

Parameters:
  • s (Token) – Initial state token to start traversal from.

  • follow (callable, optional) – Function that takes an edge and returns True if we should follow the edge. Default follows all edges.

Yields:

Token – Each token in the dependency subtree in breadth-first order.

Return type:

Iterator[Token]

pprint(color=False, track_rule=False)[source]

Pretty-print extracted predicate-argument tuples.

Parameters:
  • color (bool, optional) – Whether to use colored output (default: False).

  • track_rule (bool, optional) – Whether to include rule tracking information (default: False).

Returns:

Pretty-printed string representation of predicates and arguments.

Return type:

str