decomp.semantics.predpatt¶
PredPatt module for extracting predicate-argument structures from Universal Dependencies parses.
This module provides functionality for identifying verbal predicates and their arguments through linguistic rules applied to dependency parse trees. The extracted semantic structures can be integrated with the Universal Decompositional Semantics (UDS) framework for further annotation.
Overview¶
The PredPatt system consists of several key components:
Core data structures (
core) for representing tokens, predicates, and argumentsParsing utilities (
parsing) for loading and processing Universal Dependencies parsesExtraction engine (
extraction) that orchestrates the rule application processLinguistic rules (
rules) for identifying predicates and their argumentsFiltering system (
filters) for refining extractions based on linguistic criteriaIntegration utilities (
corpus,graph) for working with UDS corporaSupport utilities (
utils) for visualization and debugging
Usage Example¶
from decomp.semantics.predpatt import PredPatt, load_conllu
# Load a dependency parse
sentences = load_conllu('example.conllu')
# Extract predicates and arguments
pp = PredPatt(sentences[0])
# Access extracted predicates
for predicate in pp.predicates:
print(f"Predicate: {predicate}")
for arg in predicate.arguments:
print(f" Argument: {arg}")
from decomp.semantics.predpatt import PredPatt, PredPattOpts, load_conllu
# Configure extraction options
opts = PredPattOpts(
resolve_relcl=True, # Resolve relative clauses
resolve_conj=True, # Resolve conjunctions
cut=True, # Apply cutting rules
simple=False # Include all predicates
)
# Load and process
sentences = load_conllu('example.conllu')
pp = PredPatt(sentences[0], opts=opts)
from decomp.semantics.predpatt import PredPattCorpus
from decomp.semantics.uds import UDSCorpus
# Load UDS corpus
uds = UDSCorpus()
# Create PredPatt corpus
predpatt_corpus = PredPattCorpus.from_ud(
uds.syntax_graphs()
)
# Access predicate-argument structures
for graph_id, predpatt in predpatt_corpus:
for pred in predpatt.predicates:
print(f"{pred.root.text}: {[arg.phrase() for arg in pred.arguments]}")
PredPatt predicate-argument structure extraction module.
This module provides functionality for extracting predicate-argument structures from Universal Dependencies parses using the PredPatt framework. It identifies verbal predicates and their arguments through linguistic rules applied to dependency parse trees.
The extracted semantic structures can be integrated with the Universal Decompositional Semantics (UDS) framework for further annotation.
Note
Automatic parsing functionality (from_sentence, from_constituency) is a planned
future feature. Currently, you must provide pre-parsed Universal Dependencies
data using load_conllu() or similar methods. To prepare for future parsing
features, install with: pip install decomp[parsing]
Classes¶
- Argument
Represents an argument of a predicate with its token span.
- Predicate
Represents a predicate with its arguments and type.
- Token
Represents a single token in a dependency parse.
- PredPattOpts
Configuration options for controlling extraction behavior.
- PredPatt
Main extraction engine (alias for PredPattEngine).
- PredPattCorpus
Container for collections of PredPatt extractions.
- PredPattGraphBuilder
Converts PredPatt extractions to NetworkX graphs.
Functions¶
- load_conllu
Load dependency parses from CoNLL-U format files.
- load_comm
Load dependency parses from Concrete communications.
Constants¶
- DEFAULT_PREDPATT_OPTIONS
Default configuration with relative clause resolution enabled.
- PredPatt¶
alias of
PredPattEngine
- class PredPattCorpus[source]¶
Bases:
Corpus[tuple[PredPattEngine,DiGraph],DiGraph]Container for managing collections of PredPatt semantic graphs.
This class extends the base Corpus class to handle PredPatt extractions paired with their dependency graphs. It provides methods for loading corpora from CoNLL format and converting them to NetworkX graphs with semantic annotations.
- classmethod from_conll(corpus, name='ewt', options=None)[source]¶
Load a CoNLL-U dependency corpus and extract predicate-argument structures.
Parses Universal Dependencies format data and applies PredPatt extraction rules to identify predicates and their arguments. Each sentence in the corpus is processed to create a semantic graph.
- Parameters:
corpus (str | TextIO) – Path to a .conllu file, raw CoNLL-U formatted string, or open file handle
name (str, optional) – Corpus name used as prefix for graph identifiers. Default is ‘ewt’
options (PredPattOpts | None, optional) – Configuration options for PredPatt extraction. If None, uses default options with relative clause resolution and argument borrowing enabled
- Returns:
Corpus containing PredPatt extractions and their graphs
- Return type:
- Raises:
ValueError – If PredPatt cannot parse the provided CoNLL-U data, likely due to incompatible Universal Dependencies version
- class PredPattGraphBuilder[source]¶
Bases:
objectConstructs NetworkX graphs from PredPatt extractions.
This class provides static methods for converting PredPatt’s predicate and argument objects into a unified graph representation that includes both syntactic dependencies and semantic relations.
- classmethod from_predpatt(predpatt, depgraph, graphid='')[source]¶
Build a unified graph from PredPatt extraction and dependency parse.
Creates a NetworkX graph that contains: - All syntax nodes and edges from the original dependency parse - Semantic predicate and argument nodes extracted by PredPatt - Interface edges linking semantic nodes to their syntactic heads - Semantic edges connecting predicates to their arguments
- Parameters:
predpatt (PredPatt) – The PredPatt extraction containing identified predicates and arguments
depgraph (DiGraph) – The source dependency graph with syntactic relations
graphid (str, optional) – Identifier prefix for all nodes in the graph. Default is empty string
- Returns:
NetworkX graph with nodes in three domains: - syntax: original dependency parse nodes - semantics: predicate and argument nodes - interface: edges linking syntax and semantics
- Return type:
DiGraph
- load_comm(filename, tool='ud converted ptb trees using pyStanfordDependencies')[source]¶
Load a concrete communication file with required pyStanfordDependencies output.
Warning
This function is part of a planned parsing feature that is not yet fully supported. It requires the
concretepackage (available viapip install decomp[parsing]). Full parsing functionality with modern UD parsers will be added in a future release.- Parameters:
- Yields:
tuple[str, UDParse] – Tuples of (section_label, parse) for each sentence.
- Raises:
ImportError – If the concrete package is not installed.
- Return type:
- load_conllu(filename_or_content)[source]¶
Load CoNLL-U style files (e.g., the Universal Dependencies treebank).
- Parameters:
filename_or_content (str) – Either a path to a CoNLL-U file or the content string itself.
- Yields:
tuple[str, UDParse] – Tuples of (sentence_id, parse) for each sentence in the file.
- Return type:
Notes
Sentence IDs default to “sent_N” where N starts at 1
Lines starting with “# sent_id” override the sentence ID
Other comment lines (starting with #) are used as ID if no sent_id found
Multi-token lines (with ‘-’ in first column) are skipped
Expects 10 tab-separated columns per data line
Submodules¶
- decomp.semantics.predpatt.core
- decomp.semantics.predpatt.extraction
- decomp.semantics.predpatt.parsing
- decomp.semantics.predpatt.rules
- Classes
- Functions
A1A2G1H1H2N1N2N3N4N5N6P1P2W1W2ArgPhraseRuleArgResolveRelclArgumentResolutionArgumentRootRuleBBorrowObjBorrowSubjCCleanArgTokenConjunctionResolutionCutBorrowObjCutBorrowOtherCutBorrowSubjDDropApposDropCcDropConjDropUnknownEEmbeddedAdvclEmbeddedCcompEmbeddedUnknownEnRelclDummyArgFilterArgEnRelclDummyArgFilterPredEnglishSpecificFIJKLLanguageSpecificMMoveCaseTokenToPredPredConjBorrowAuxNegPredConjBorrowTokensXcompPredConjRulePredPhraseRulePredResolveRelclPredicateHasPredicateRootRuleQRRuleShareArgumentSimplifyRuleSpecialArgDropDirectDepUVa1a2arg_resolve_relclbborrow_objborrow_subjcclean_arg_tokencut_borrow_objcut_borrow_othercut_borrow_subjddrop_apposdrop_ccdrop_conjdrop_unknowneembedded_advclembedded_ccompembedded_unknownen_relcl_dummy_arg_filterfg1gov_looks_like_predicate()h1h2ijkll_rulemmove_case_token_to_predn1n2n3n4n5n6p1p2pred_conj_borrow_aux_negpred_conj_borrow_tokens_xcomppred_resolve_relclpredicate_hasqrshare_argumentspecial_arg_drop_direct_depuvw1w2- Submodules
- decomp.semantics.predpatt.rules.base
- decomp.semantics.predpatt.rules.predicate_rules
- decomp.semantics.predpatt.rules.argument_rules
- decomp.semantics.predpatt.rules.helpers
- decomp.semantics.predpatt.filters
- Functions
activate()apply_filters()filter_events_NUCL()filter_events_SPRL()filter_events_nucl()filter_events_sprl()hasSubj()has_direct_arc()has_subj()isGoodAncestor()isGoodDescendants()isNotCopula()isNotHave()isNotInterrogative()isNotPronoun()isPredVerb()isSbjOrObj()is_good_ancestor()is_good_descendants()is_not_copula()is_not_have()is_not_interrogative()is_not_pronoun()is_pred_verb()is_sbj_or_obj()- Submodules
- decomp.semantics.predpatt.filters.predicate_filters
- decomp.semantics.predpatt.filters.argument_filters
- decomp.semantics.predpatt.corpus
- decomp.semantics.predpatt.graph
- decomp.semantics.predpatt.utils
- decomp.semantics.predpatt.typing