decomp.semantics.predpatt.corpus

Container classes for collections of PredPatt extractions and integration with UDS corpora.

Corpus management for PredPatt semantic extractions.

This module provides functionality for loading and managing collections of PredPatt semantic graphs from CoNLL-U format dependency corpora.

Classes

PredPattCorpus

Container class extending the base Corpus for managing PredPatt semantic extractions paired with their dependency graphs.

class PredPattCorpus[source]

Bases: Corpus[tuple[PredPattEngine, DiGraph], DiGraph]

Container for managing collections of PredPatt semantic graphs.

This class extends the base Corpus class to handle PredPatt extractions paired with their dependency graphs. It provides methods for loading corpora from CoNLL format and converting them to NetworkX graphs with semantic annotations.

classmethod from_conll(corpus, name='ewt', options=None)[source]

Load a CoNLL-U dependency corpus and extract predicate-argument structures.

Parses Universal Dependencies format data and applies PredPatt extraction rules to identify predicates and their arguments. Each sentence in the corpus is processed to create a semantic graph.

Parameters:
  • corpus (str | TextIO) – Path to a .conllu file, raw CoNLL-U formatted string, or open file handle

  • name (str, optional) – Corpus name used as prefix for graph identifiers. Default is ‘ewt’

  • options (PredPattOpts | None, optional) – Configuration options for PredPatt extraction. If None, uses default options with relative clause resolution and argument borrowing enabled

Returns:

Corpus containing PredPatt extractions and their graphs

Return type:

PredPattCorpus

Raises:

ValueError – If PredPatt cannot parse the provided CoNLL-U data, likely due to incompatible Universal Dependencies version