decomp.semantics.predpatt.extraction¶
Main extraction engine that orchestrates the application of linguistic rules to extract predicate-argument structures from Universal Dependencies parses.
Core extraction engine for PredPatt semantic structures.
This module contains the main extraction engine that orchestrates the application of linguistic rules to extract predicate-argument structures from Universal Dependencies parses.
Classes¶
- PredPattEngine
Main engine for extracting predicates and arguments from dependency parses.
See also
decomp.semantics.predpatt.rulesLinguistic rules used by the engine
decomp.semantics.predpatt.filtersFilters for refining extractions
- class PredPattEngine[source]¶
Bases:
objectMain extraction engine for PredPatt predicate-argument structures.
This class orchestrates the complete extraction pipeline for identifying predicates and their arguments from Universal Dependencies parses. It follows the exact same processing order and behavior as the original PredPatt implementation.
- Parameters:
parse (UDParse) – The Universal Dependencies parse to extract from.
opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.
- options¶
Configuration options controlling extraction behavior.
- Type:
- event_dict¶
Mapping from root tokens to their predicate objects.
- __init__(parse, opts=None)[source]¶
Initialize PredPattEngine with parse and options.
Sets up the extraction engine with configuration and prepares the parse for processing. Automatically triggers the complete extraction pipeline.
- Parameters:
parse (UDParse) – The Universal Dependencies parse to extract from.
opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.
- argument_extract(predicate)[source]¶
Extract argument root tokens for a given predicate.
Applies argument identification rules in the exact same order as the original implementation. This includes core arguments (g1), nominal modifiers (h1, h2), clausal arguments (k), and special predicate type arguments (i, j, w1, w2).
- expand_coord(predicate)[source]¶
Expand coordinated arguments.
Creates separate predicate instances for each combination of coordinated arguments (Cartesian product). For example: “A and B eat C and D” → 4 instances: (A,C), (A,D), (B,C), (B,D)
- extract()[source]¶
Execute the complete predicate-argument extraction pipeline.
Orchestrates all phases of extraction in the exact order specified in the PREDPATT_EXTRACTION_PIPELINE.md documentation:
Predicate root identification
Event dictionary creation
Argument root extraction
Argument resolution
Argument sorting
Phrase extraction
Argument simplification (optional)
Conjunction resolution
Coordination expansion
Relative clause cleanup
Final cleanup
This method modifies the engine state and populates the instances attribute with the final extraction results.
- Return type:
- classmethod from_constituency(parse_string, cacheable=True, opts=None)[source]¶
Create PredPattEngine from a constituency parse string.
Warning
This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().
Converts constituency parse to Universal Dependencies automatically. [English only]
- Parameters:
parse_string (str) – The constituency parse string to convert.
cacheable (bool, optional) – Whether to use cached parser instance. Default: True.
opts (PredPattOpts, optional) – Configuration options for extraction.
- Returns:
Engine instance with extraction results from converted parse.
- Return type:
- Raises:
NotImplementedError – Always raised as this feature is not yet implemented.
- classmethod from_sentence(sentence, cacheable=True, opts=None)[source]¶
Create PredPattEngine from a sentence string.
Warning
This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().
Parses sentence and converts to Universal Dependencies automatically. [English only]
- Parameters:
sentence (str) – The sentence string to parse and extract from.
cacheable (bool, optional) – Whether to use cached parser instance. Default: True.
opts (PredPattOpts, optional) – Configuration options for extraction.
- Returns:
Engine instance with extraction results from parsed sentence.
- Return type:
- Raises:
NotImplementedError – Always raised as this feature is not yet implemented.
- identify_predicate_roots()[source]¶
Predicate root identification.
Identifies predicate root tokens by applying predicate identification rules in the exact same order as the original implementation. This includes special predicate types (APPOS, POSS, AMOD) and conjunction expansion.
- parents(predicate)[source]¶
Iterate over the chain of parents (governing predicates).
Yields predicates that govern the given predicate by following the chain of governor tokens.
- qualified_conjoined_predicate(gov, dep)[source]¶
Check if the conjunction (dep) of a predicate (gov) is another predicate.
- static subtree(s, follow=<function PredPattEngine.<lambda>>)[source]¶
Breadth-first iterator over nodes in a dependency tree.
- Parameters:
s (Token) – Initial state token to start traversal from.
follow (callable, optional) – Function that takes an edge and returns True if we should follow the edge. Default follows all edges.
- Yields:
Token – Each token in the dependency subtree in breadth-first order.
- Return type:
Submodules¶
decomp.semantics.predpatt.extraction.engine¶
Main extraction engine for PredPatt predicate-argument extraction.
This module contains the PredPattEngine class which is responsible for orchestrating the entire predicate-argument extraction pipeline from Universal Dependencies parses. The engine coordinates all phases of extraction from predicate identification through argument resolution and coordination expansion.
Classes¶
- PredPattEngine
Main extraction engine coordinating the complete predicate-argument pipeline.
Functions¶
- gov_looks_like_predicate
Check if a governor token appears to be a predicate based on its dependents.
- sort_by_position
Sort objects by their position attribute.
- convert_parse
Convert dependency parse from integer indices to Token objects.
See also
decomp.semantics.predpatt.coreCore classes for predicates and arguments
decomp.semantics.predpatt.rulesLinguistic rules for extraction
decomp.semantics.predpatt.parsingParse handling and conversion
- gov_looks_like_predicate(e, ud)[source]¶
Check if e.gov looks like a predicate because it has potential arguments.
- convert_parse(parse, ud)[source]¶
Convert dependency parse on integers into a dependency parse on Tokens.
- class PredPattEngine[source]¶
Bases:
objectMain extraction engine for PredPatt predicate-argument structures.
This class orchestrates the complete extraction pipeline for identifying predicates and their arguments from Universal Dependencies parses. It follows the exact same processing order and behavior as the original PredPatt implementation.
- Parameters:
parse (UDParse) – The Universal Dependencies parse to extract from.
opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.
- options¶
Configuration options controlling extraction behavior.
- Type:
- event_dict¶
Mapping from root tokens to their predicate objects.
- __init__(parse, opts=None)[source]¶
Initialize PredPattEngine with parse and options.
Sets up the extraction engine with configuration and prepares the parse for processing. Automatically triggers the complete extraction pipeline.
- Parameters:
parse (UDParse) – The Universal Dependencies parse to extract from.
opts (PredPattOpts, optional) – Configuration options for extraction. If None, uses default options.
- classmethod from_constituency(parse_string, cacheable=True, opts=None)[source]¶
Create PredPattEngine from a constituency parse string.
Warning
This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().
Converts constituency parse to Universal Dependencies automatically. [English only]
- Parameters:
parse_string (str) – The constituency parse string to convert.
cacheable (bool, optional) – Whether to use cached parser instance. Default: True.
opts (PredPattOpts, optional) – Configuration options for extraction.
- Returns:
Engine instance with extraction results from converted parse.
- Return type:
- Raises:
NotImplementedError – Always raised as this feature is not yet implemented.
- classmethod from_sentence(sentence, cacheable=True, opts=None)[source]¶
Create PredPattEngine from a sentence string.
Warning
This method is not yet implemented. Automatic parsing is a planned future feature. Currently, you must use pre-parsed UD data with the standard constructor or load_conllu().
Parses sentence and converts to Universal Dependencies automatically. [English only]
- Parameters:
sentence (str) – The sentence string to parse and extract from.
cacheable (bool, optional) – Whether to use cached parser instance. Default: True.
opts (PredPattOpts, optional) – Configuration options for extraction.
- Returns:
Engine instance with extraction results from parsed sentence.
- Return type:
- Raises:
NotImplementedError – Always raised as this feature is not yet implemented.
- extract()[source]¶
Execute the complete predicate-argument extraction pipeline.
Orchestrates all phases of extraction in the exact order specified in the PREDPATT_EXTRACTION_PIPELINE.md documentation:
Predicate root identification
Event dictionary creation
Argument root extraction
Argument resolution
Argument sorting
Phrase extraction
Argument simplification (optional)
Conjunction resolution
Coordination expansion
Relative clause cleanup
Final cleanup
This method modifies the engine state and populates the instances attribute with the final extraction results.
- Return type:
- identify_predicate_roots()[source]¶
Predicate root identification.
Identifies predicate root tokens by applying predicate identification rules in the exact same order as the original implementation. This includes special predicate types (APPOS, POSS, AMOD) and conjunction expansion.
- qualified_conjoined_predicate(gov, dep)[source]¶
Check if the conjunction (dep) of a predicate (gov) is another predicate.
- argument_extract(predicate)[source]¶
Extract argument root tokens for a given predicate.
Applies argument identification rules in the exact same order as the original implementation. This includes core arguments (g1), nominal modifiers (h1, h2), clausal arguments (k), and special predicate type arguments (i, j, w1, w2).
- parents(predicate)[source]¶
Iterate over the chain of parents (governing predicates).
Yields predicates that govern the given predicate by following the chain of governor tokens.
- expand_coord(predicate)[source]¶
Expand coordinated arguments.
Creates separate predicate instances for each combination of coordinated arguments (Cartesian product). For example: “A and B eat C and D” → 4 instances: (A,C), (A,D), (B,C), (B,D)
- static subtree(s, follow=<function PredPattEngine.<lambda>>)[source]¶
Breadth-first iterator over nodes in a dependency tree.
- Parameters:
s (Token) – Initial state token to start traversal from.
follow (callable, optional) – Function that takes an edge and returns True if we should follow the edge. Default follows all edges.
- Yields:
Token – Each token in the dependency subtree in breadth-first order.
- Return type: