decomp.semantics.predpatt.parsing

Data structures and functions for working with Universal Dependencies parses that serve as input to the PredPatt semantic extraction system.

Dependency parsing structures and loaders for PredPatt.

This module provides data structures and functions for working with Universal Dependencies parses that serve as input to the PredPatt semantic extraction system.

Classes

DepTriple

Named tuple representing a single dependency relation.

UDParse

Container for dependency parse trees with tokens and relations.

Functions

load_conllu

Load dependency parses from CoNLL-U format files.

class DepTriple[source]

Bases: DepTriple

Dependency triple representing a single dependency relation.

A named tuple with three fields representing a dependency edge in the parse tree.

rel

The dependency relation type (e.g., ‘nsubj’, ‘dobj’).

Type:

str

gov

The governor (head) of the dependency. Can be token index or Token object.

Type:

int | Token

dep

The dependent of the dependency. Can be token index or Token object.

Type:

int | Token

Notes

The __repr__ format shows the relation with dependent first: rel(dep,gov). This ordering (dep before gov) is preserved for compatibility.

__repr__()[source]

Return string representation in format rel(dep,gov).

Note that dependent comes before governor in the output.

Returns:

String representation like ‘nsubj(0,2)’.

Return type:

str

class UDParse[source]

Bases: object

Universal Dependencies parse representation.

Container for a dependency parse including tokens, POS tags, and dependency relations.

Parameters:
  • tokens (list) – List of tokens (strings or Token objects) in the sentence.

  • tags (list[str]) – List of POS tags corresponding to tokens.

  • triples (list[DepTriple]) – List of dependency relations in the parse.

  • ud (module, optional) – Universal Dependencies module (ignored - always uses dep_v1).

ud

The UD module (always set to dep_v1 regardless of parameter).

Type:

module

tokens

List of tokens in the sentence.

Type:

list

tags

List of POS tags.

Type:

list[str]

triples

List of dependency relations.

Type:

list[DepTriple]

governor

Maps dependent index/token to its governing DepTriple.

Type:

dict

dependents

Maps governor index/token to list of dependent DepTriples.

Type:

defaultdict[list]

__init__(tokens, tags, triples, ud=None)[source]

Initialize UDParse with tokens, tags, and dependency triples.

Parameters:
  • tokens (list[str | Token]) – List of tokens (strings or Token objects).

  • tags (list[str]) – List of POS tags.

  • triples (list[DepTriple]) – List of dependency relations.

  • ud (module, optional) – UD module (ignored - always uses dep_v1).

latex()[source]

Generate LaTeX code for dependency diagram.

Creates LaTeX code using tikz-dependency package for visualization.

Returns:

UTF-8 encoded LaTeX document.

Return type:

bytes

pprint(color=False, k=1)[source]

Pretty-print list of dependencies.

Parameters:
  • color (bool, optional) – Whether to use colored output (default: False).

  • k (int, optional) – Number of columns to use (default: 1).

Returns:

Formatted string representation of dependencies.

Return type:

str

toimage()[source]

Convert parse diagram to PNG image.

Creates a PNG image of the dependency parse diagram.

Returns:

Path to the generated PNG file, or None if generation fails.

Return type:

str | None

view(do_open=True)[source]

Open a dependency parse diagram of the sentence.

Requires that pdflatex be in PATH and that Daniele Pighin’s tikz-dependency.sty be in the current directory.

Parameters:

do_open (bool, optional) – Whether to open the PDF file (default: True).

Returns:

Path to the generated PDF file, or None if generation fails.

Return type:

str | None

load_conllu(filename_or_content)[source]

Load CoNLL-U style files (e.g., the Universal Dependencies treebank).

Parameters:

filename_or_content (str) – Either a path to a CoNLL-U file or the content string itself.

Yields:

tuple[str, UDParse] – Tuples of (sentence_id, parse) for each sentence in the file.

Return type:

Iterator[tuple[str, UDParse]]

Notes

  • Sentence IDs default to “sent_N” where N starts at 1

  • Lines starting with “# sent_id” override the sentence ID

  • Other comment lines (starting with #) are used as ID if no sent_id found

  • Multi-token lines (with ‘-’ in first column) are skipped

  • Expects 10 tab-separated columns per data line

Submodules

decomp.semantics.predpatt.parsing.udparse

Universal Dependencies parse representation and visualization.

This module provides data structures for representing and visualizing Universal Dependencies (UD) parse trees. It includes classes for storing dependency relations and methods for pretty-printing and visualizing parse structures.

The UDParse class supports various output formats including pretty-printed text, LaTeX diagrams, and PDF visualization.

Classes

DepTriple

Named tuple representing a single dependency relation.

UDParse

Container for complete dependency parse with tokens and relations.

class DepTriple[source]

Bases: DepTriple

Dependency triple representing a single dependency relation.

A named tuple with three fields representing a dependency edge in the parse tree.

rel

The dependency relation type (e.g., ‘nsubj’, ‘dobj’).

Type:

str

gov

The governor (head) of the dependency. Can be token index or Token object.

Type:

int | Token

dep

The dependent of the dependency. Can be token index or Token object.

Type:

int | Token

Notes

The __repr__ format shows the relation with dependent first: rel(dep,gov). This ordering (dep before gov) is preserved for compatibility.

__repr__()[source]

Return string representation in format rel(dep,gov).

Note that dependent comes before governor in the output.

Returns:

String representation like ‘nsubj(0,2)’.

Return type:

str

class UDParse[source]

Bases: object

Universal Dependencies parse representation.

Container for a dependency parse including tokens, POS tags, and dependency relations.

Parameters:
  • tokens (list) – List of tokens (strings or Token objects) in the sentence.

  • tags (list[str]) – List of POS tags corresponding to tokens.

  • triples (list[DepTriple]) – List of dependency relations in the parse.

  • ud (module, optional) – Universal Dependencies module (ignored - always uses dep_v1).

ud

The UD module (always set to dep_v1 regardless of parameter).

Type:

module

tokens

List of tokens in the sentence.

Type:

list

tags

List of POS tags.

Type:

list[str]

triples

List of dependency relations.

Type:

list[DepTriple]

governor

Maps dependent index/token to its governing DepTriple.

Type:

dict

dependents

Maps governor index/token to list of dependent DepTriples.

Type:

defaultdict[list]

__init__(tokens, tags, triples, ud=None)[source]

Initialize UDParse with tokens, tags, and dependency triples.

Parameters:
  • tokens (list[str | Token]) – List of tokens (strings or Token objects).

  • tags (list[str]) – List of POS tags.

  • triples (list[DepTriple]) – List of dependency relations.

  • ud (module, optional) – UD module (ignored - always uses dep_v1).

pprint(color=False, k=1)[source]

Pretty-print list of dependencies.

Parameters:
  • color (bool, optional) – Whether to use colored output (default: False).

  • k (int, optional) – Number of columns to use (default: 1).

Returns:

Formatted string representation of dependencies.

Return type:

str

latex()[source]

Generate LaTeX code for dependency diagram.

Creates LaTeX code using tikz-dependency package for visualization.

Returns:

UTF-8 encoded LaTeX document.

Return type:

bytes

view(do_open=True)[source]

Open a dependency parse diagram of the sentence.

Requires that pdflatex be in PATH and that Daniele Pighin’s tikz-dependency.sty be in the current directory.

Parameters:

do_open (bool, optional) – Whether to open the PDF file (default: True).

Returns:

Path to the generated PDF file, or None if generation fails.

Return type:

str | None

toimage()[source]

Convert parse diagram to PNG image.

Creates a PNG image of the dependency parse diagram.

Returns:

Path to the generated PNG file, or None if generation fails.

Return type:

str | None

decomp.semantics.predpatt.parsing.loader

Load different sources of data.

This module provides functions to load dependency parses from various formats, particularly focusing on CoNLL-U format files.

load_comm(filename, tool='ud converted ptb trees using pyStanfordDependencies')[source]

Load a concrete communication file with required pyStanfordDependencies output.

Warning

This function is part of a planned parsing feature that is not yet fully supported. It requires the concrete package (available via pip install decomp[parsing]). Full parsing functionality with modern UD parsers will be added in a future release.

Parameters:
  • filename (str) – Path to the concrete communication file.

  • tool (str, optional) – The tool name to look for in the dependency parse metadata.

Yields:

tuple[str, UDParse] – Tuples of (section_label, parse) for each sentence.

Raises:

ImportError – If the concrete package is not installed.

Return type:

Iterator[tuple[str, UDParse]]

load_conllu(filename_or_content)[source]

Load CoNLL-U style files (e.g., the Universal Dependencies treebank).

Parameters:

filename_or_content (str) – Either a path to a CoNLL-U file or the content string itself.

Yields:

tuple[str, UDParse] – Tuples of (sentence_id, parse) for each sentence in the file.

Return type:

Iterator[tuple[str, UDParse]]

Notes

  • Sentence IDs default to “sent_N” where N starts at 1

  • Lines starting with “# sent_id” override the sentence ID

  • Other comment lines (starting with #) are used as ID if no sent_id found

  • Multi-token lines (with ‘-’ in first column) are skipped

  • Expects 10 tab-separated columns per data line

get_tags(tokenization, tagging_type='POS')[source]

Extract tags of a specific type from a tokenization.

Note

This function requires the concrete package to be installed.

Parameters:
  • tokenization (Tokenization) – A Concrete tokenization object.

  • tagging_type (str, optional) – The type of tagging to extract (default: ‘POS’).

Returns:

List of tags in token order.

Return type:

list[str]

get_udparse(sent, tool)[source]

Create a UDParse from a sentence extracted from a Communication.

Note

This function requires the concrete package to be installed.

Parameters:
  • sent (Sentence) – A Concrete Sentence object.

  • tool (str) – The tool name to look for in dependency parse metadata.

Returns:

The parsed representation of the sentence.

Return type:

UDParse