decomp.semantics.predpatt.parsing¶
Data structures and functions for working with Universal Dependencies parses that serve as input to the PredPatt semantic extraction system.
Dependency parsing structures and loaders for PredPatt.
This module provides data structures and functions for working with Universal Dependencies parses that serve as input to the PredPatt semantic extraction system.
Classes¶
- DepTriple
Named tuple representing a single dependency relation.
- UDParse
Container for dependency parse trees with tokens and relations.
Functions¶
- load_conllu
Load dependency parses from CoNLL-U format files.
- class DepTriple[source]¶
Bases:
DepTripleDependency triple representing a single dependency relation.
A named tuple with three fields representing a dependency edge in the parse tree.
Notes
The __repr__ format shows the relation with dependent first: rel(dep,gov). This ordering (dep before gov) is preserved for compatibility.
- class UDParse[source]¶
Bases:
objectUniversal Dependencies parse representation.
Container for a dependency parse including tokens, POS tags, and dependency relations.
- Parameters:
- ud¶
The UD module (always set to dep_v1 regardless of parameter).
- Type:
module
- __init__(tokens, tags, triples, ud=None)[source]¶
Initialize UDParse with tokens, tags, and dependency triples.
- latex()[source]¶
Generate LaTeX code for dependency diagram.
Creates LaTeX code using tikz-dependency package for visualization.
- Returns:
UTF-8 encoded LaTeX document.
- Return type:
- toimage()[source]¶
Convert parse diagram to PNG image.
Creates a PNG image of the dependency parse diagram.
- Returns:
Path to the generated PNG file, or None if generation fails.
- Return type:
str | None
- load_conllu(filename_or_content)[source]¶
Load CoNLL-U style files (e.g., the Universal Dependencies treebank).
- Parameters:
filename_or_content (str) – Either a path to a CoNLL-U file or the content string itself.
- Yields:
tuple[str, UDParse] – Tuples of (sentence_id, parse) for each sentence in the file.
- Return type:
Notes
Sentence IDs default to “sent_N” where N starts at 1
Lines starting with “# sent_id” override the sentence ID
Other comment lines (starting with #) are used as ID if no sent_id found
Multi-token lines (with ‘-’ in first column) are skipped
Expects 10 tab-separated columns per data line
Submodules¶
decomp.semantics.predpatt.parsing.udparse¶
Universal Dependencies parse representation and visualization.
This module provides data structures for representing and visualizing Universal Dependencies (UD) parse trees. It includes classes for storing dependency relations and methods for pretty-printing and visualizing parse structures.
The UDParse class supports various output formats including pretty-printed text, LaTeX diagrams, and PDF visualization.
Classes¶
- DepTriple
Named tuple representing a single dependency relation.
- UDParse
Container for complete dependency parse with tokens and relations.
- class DepTriple[source]¶
Bases:
DepTripleDependency triple representing a single dependency relation.
A named tuple with three fields representing a dependency edge in the parse tree.
Notes
The __repr__ format shows the relation with dependent first: rel(dep,gov). This ordering (dep before gov) is preserved for compatibility.
- class UDParse[source]¶
Bases:
objectUniversal Dependencies parse representation.
Container for a dependency parse including tokens, POS tags, and dependency relations.
- Parameters:
- ud¶
The UD module (always set to dep_v1 regardless of parameter).
- Type:
module
- __init__(tokens, tags, triples, ud=None)[source]¶
Initialize UDParse with tokens, tags, and dependency triples.
- latex()[source]¶
Generate LaTeX code for dependency diagram.
Creates LaTeX code using tikz-dependency package for visualization.
- Returns:
UTF-8 encoded LaTeX document.
- Return type:
decomp.semantics.predpatt.parsing.loader¶
Load different sources of data.
This module provides functions to load dependency parses from various formats, particularly focusing on CoNLL-U format files.
- load_comm(filename, tool='ud converted ptb trees using pyStanfordDependencies')[source]¶
Load a concrete communication file with required pyStanfordDependencies output.
Warning
This function is part of a planned parsing feature that is not yet fully supported. It requires the
concretepackage (available viapip install decomp[parsing]). Full parsing functionality with modern UD parsers will be added in a future release.- Parameters:
- Yields:
tuple[str, UDParse] – Tuples of (section_label, parse) for each sentence.
- Raises:
ImportError – If the concrete package is not installed.
- Return type:
- load_conllu(filename_or_content)[source]¶
Load CoNLL-U style files (e.g., the Universal Dependencies treebank).
- Parameters:
filename_or_content (str) – Either a path to a CoNLL-U file or the content string itself.
- Yields:
tuple[str, UDParse] – Tuples of (sentence_id, parse) for each sentence in the file.
- Return type:
Notes
Sentence IDs default to “sent_N” where N starts at 1
Lines starting with “# sent_id” override the sentence ID
Other comment lines (starting with #) are used as ID if no sent_id found
Multi-token lines (with ‘-’ in first column) are skipped
Expects 10 tab-separated columns per data line
- get_tags(tokenization, tagging_type='POS')[source]¶
Extract tags of a specific type from a tokenization.
Note
This function requires the
concretepackage to be installed.