decomp.semantics.uds

Universal Decompositional Semantics (UDS) representation framework.

This module provides a comprehensive framework for working with Universal Decompositional Semantics (UDS) datasets. UDS is a semantic annotation framework that captures diverse semantic properties of natural language texts through real-valued annotations on predicate-argument structures.

The module is organized hierarchically:

  • Annotations (annotation): Provides classes for handling UDS property annotations in both raw (multi-annotator) and normalized (aggregated) formats.

  • Graphs (graph): Implements graph representations at sentence and document levels, integrating syntactic dependency structures with semantic annotations.

  • Documents (document): Represents complete documents containing multiple sentences with their associated graphs and metadata.

  • Corpus (corpus): Manages collections of UDS documents and provides functionality for loading, querying, and serializing UDS datasets.

Classes

NormalizedUDSAnnotation

Annotations with aggregated values and confidence scores from multiple annotators.

RawUDSAnnotation

Annotations preserving individual annotator responses before aggregation.

UDSSentenceGraph

Graph representation of a single sentence with syntax and semantics layers.

UDSDocumentGraph

Graph connecting multiple sentence graphs within a document.

UDSDocument

Container for sentence graphs and document-level annotations.

UDSCorpus

Collection of UDS documents with support for various data formats and queries.

Notes

The UDS framework builds upon the PredPatt system for extracting predicate-argument structures and extends it with rich semantic annotations. All graph representations use NetworkX for the underlying graph structure and support SPARQL queries via RDF conversion.

class NormalizedUDSAnnotation[source]

Bases: UDSAnnotation

A normalized Universal Decompositional Semantics annotation.

Properties in a NormalizedUDSAnnotation may have only a single str, int, or float value and a single str, int, or float confidence.

Parameters:
  • metadata (UDSAnnotationMetadata) – The metadata for the annotations.

  • data (dict[str, dict[str, dict[str, dict[str, TypeAliasType]]]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

__init__(metadata, data)[source]
classmethod from_json(jsonfile)[source]

Load a dataset of normalized annotations from a JSON file.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.

DATA in the above is assumed to have the following structure:

{SUBSPACE_1: {PROP_1_1: {'value': VALUE,
                        'confidence': VALUE},
             ...},
 SUBSPACE_2: {PROP_2_1: {'value': VALUE,
                         'confidence': VALUE},
             ...},
}

VALUE in the above is assumed to be unstructured.

Return type:

NormalizedUDSAnnotation

class RawUDSAnnotation[source]

Bases: UDSAnnotation

A raw Universal Decompositional Semantics dataset.

Unlike decomp.semantics.uds.NormalizedUDSAnnotation, objects of this class may have multiple annotations for a particular attribute. Each annotation is associated with an annotator ID, and different annotators may have annotated different numbers of items.

Parameters:

annotation – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence for each annotator. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

__init__(metadata, data)[source]
annotators(subspace=None, prop=None)[source]

Get annotator IDs for a subspace and property.

If neither subspace nor property are specified, all annotator IDs are returned. If only the subspace is specified, all annotator IDs for the subspace are returned.

Parameters:
  • subspace (str | None, optional) – The subspace to filter by

  • prop (str | None, optional) – The property to filter by

Returns:

Set of annotator IDs or None if no annotators found

Return type:

set[str] | None

classmethod from_json(jsonfile)[source]

Load a dataset for raw annotations from a JSON file.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.

DATA in the above is assumed to have the following structure:

{SUBSPACE_1: {PROP_1_1: {'value': {
                            ANNOTATOR1: VALUE1,
                            ANNOTATOR2: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR1: CONF1,
                            ANNOTATOR2: CONF2,
                            ...
                                       }
                        },
              PROP_1_2: {'value': {
                            ANNOTATOR1: VALUE1,
                            ANNOTATOR2: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR1: CONF1,
                            ANNOTATOR2: CONF2,
                            ...
                                       }
                        },
              ...},
 SUBSPACE_2: {PROP_2_1: {'value': {
                            ANNOTATOR3: VALUE1,
                            ANNOTATOR4: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR3: CONF1,
                            ANNOTATOR4: CONF2,
                            ...
                                       }
                        },
             ...},
...}

VALUEi and CONFi are assumed to be unstructured.

Return type:

RawUDSAnnotation

items(annotation_type=None, annotator_id=None)[source]

Dictionary-like items generator for attributes.

This method behaves exactly like UDSAnnotation.items, except that, if an annotator ID is passed, it generates only items annotated by the specified annotator.

Parameters:
  • annotation_type (str | None, default: None) – Whether to return node annotations, edge annotations, or both (default)

  • annotator_id (str | None, default: None) – The annotator whose annotations will be returned by the generator (defaults to all annotators)

Raises:

ValueError – If both annotation_type and annotator_id are passed and the relevant annotator gives no annotations of the relevant type, and exception is raised

Return type:

TypeAliasType

class UDSCorpus[source]

Bases: PredPattCorpus

A collection of Universal Decompositional Semantics graphs.

Parameters:
  • sentences (PredPattCorpus | dict[str, UDSSentenceGraph] | None, default: None) – the predpatt sentence graphs to associate the annotations with

  • documents (dict[str, UDSDocument] | None, default: None) – the documents associated with the predpatt sentence graphs

  • sentence_annotations (list[UDSAnnotation] | None, default: None) – additional annotations to associate with predpatt nodes on sentence-level graphs; in most cases, no such annotations will be passed, since the standard UDS annotations are automatically loaded

  • document_annotations (list[UDSAnnotation] | None, default: None) – additional annotations to associate with predpatt nodes on document-level graphs

  • version (str, default: '2.0') – the version of UDS datasets to use

  • split (str | None, default: None) – the split to load: “train”, “dev”, or “test”

  • annotation_format (str, default: 'normalized') – which annotation type to load (“raw” or “normalized”)

ANN_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/decomp/checkouts/stable/decomp/data/'
CACHE_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/decomp/checkouts/stable/decomp/data/'
UD_URL = 'https://github.com/UniversalDependencies/UD_English-EWT/archive/r1.2.zip'
__init__(sentences=None, documents=None, sentence_annotations=None, document_annotations=None, version='2.0', split=None, annotation_format='normalized')[source]
add_annotation(sentence_annotation=None, document_annotation=None)[source]

Add annotations to UDS sentence and document graphs.

Parameters:
  • sentence_annotation (list[UDSAnnotation] | None, default: None) – the annotations to add to the sentence graphs in the corpus

  • document_annotation (list[UDSAnnotation] | None, default: None) – the annotations to add to the document graphs in the corpus

Return type:

None

add_corpus_metadata(metadata)[source]

Add metadata to the corpus.

Parameters:

metadata (UDSCorpusMetadata) – Metadata to merge with existing corpus metadata

Return type:

None

add_document_annotation(annotation)[source]

Add annotations to UDS documents.

Parameters:

annotation (UDSAnnotation) – the annotations to add to the documents in the corpus

Return type:

None

add_sentence_annotation(annotation)[source]

Add annotations to UDS sentence graphs.

Parameters:

annotation (UDSAnnotation) – the annotations to add to the graphs in the corpus

Return type:

None

property document_edge_subspaces: set[str]

The UDS document edge subspaces in the corpus.

Returns:

Set of subspace names for document edges

Return type:

set[str]

property document_node_subspaces: set[str]

The UDS document node subspaces in the corpus.

Returns:

Set of subspace names for document nodes

Return type:

set[str]

Raises:

NotImplementedError – This property is not yet implemented

document_properties(subspace=None)[source]

Return the properties in a document subspace.

Parameters:

subspace (str | None, optional) – Subspace to query, or None for all properties

Returns:

Property names in the subspace

Return type:

set[str]

Raises:

NotImplementedError – This method is not yet implemented

document_property_metadata(subspace, prop)[source]

Return the metadata for a property in a document subspace.

Parameters:
  • subspace (str) – The subspace the property is in

  • prop (str) – The property in the subspace

Return type:

UDSPropertyMetadata

property document_subspaces: set[str]

All UDS document subspaces (node and edge) in the corpus.

Returns:

Union of document node and edge subspaces

Return type:

set[str]

property documentids: list[str]

The document IDs in the corpus.

Returns:

List of all document identifiers

Return type:

list[str]

property documents: dict[str, UDSDocument]

The documents in the corpus.

Returns:

Mapping from document IDs to UDSDocument objects

Return type:

dict[str, UDSDocument]

classmethod from_conll_and_annotations(corpus, sentence_annotations=[], document_annotations=[], annotation_format='normalized', version='2.0', name='ewt')[source]

Load UDS graph corpus from CoNLL (dependencies) and JSON (annotations).

This method should only be used if the UDS corpus is being (re)built. Otherwise, loading the corpus from the JSON shipped with this package using UDSCorpus.__init__ or UDSCorpus.from_json is suggested.

Parameters:
  • corpus (TypeAliasType) – (path to) Universal Dependencies corpus in conllu format

  • sentence_annotations (Sequence[TypeAliasType], default: []) – a list of paths to JSON files or open JSON files containing sentence-level annotations

  • document_annotations (Sequence[TypeAliasType], default: []) – a list of paths to JSON files or open JSON files containing document-level annotations

  • annotation_format (str, default: 'normalized') – Whether the annotation is raw or normalized

  • version (str, default: '2.0') – the version of UDS datasets to use

  • name (str, default: 'ewt') – corpus name to be appended to the beginning of graph ids

Return type:

UDSCorpus

classmethod from_json(sentences_jsonfile, documents_jsonfile)[source]

Load annotated UDS graph corpus (including annotations) from JSON.

This is the suggested method for loading the UDS corpus.

Parameters:
  • sentences_jsonfile (TypeAliasType) – file containing Universal Decompositional Semantics corpus sentence-level graphs in JSON format

  • documents_jsonfile (TypeAliasType) – file containing Universal Decompositional Semantics corpus document-level graphs in JSON format

Return type:

UDSCorpus

property metadata: UDSCorpusMetadata

The corpus metadata.

Returns:

Metadata for sentence and document annotations

Return type:

UDSCorpusMetadata

property ndocuments: int

The number of documents in the corpus.

Returns:

Total document count

Return type:

int

query(query, query_type=None, cache_query=True, cache_rdf=True)[source]

Query all graphs in the corpus using SPARQL 1.1.

Parameters:
  • query (str | Query) – a SPARQL 1.1 query

  • query_type (str | None, default: None) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.

  • cache_query (bool, default: True) – whether to cache the query. This should usually be set to True. It should generally only be False when querying particular nodes or edges–e.g. as in precompiled queries.

  • clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory

Return type:

dict[str, Result | dict[str, TypeAliasType] | dict[TypeAliasType, TypeAliasType]]

sample_documents(k)[source]

Sample k documents without replacement.

Parameters:

k (int) – the number of documents to sample

Return type:

dict[str, UDSDocument]

property sentence_edge_subspaces: set[str]

The UDS sentence edge subspaces in the corpus.

Returns:

Set of subspace names for sentence edges

Return type:

set[str]

Raises:

NotImplementedError – This property is not yet implemented

property sentence_node_subspaces: set[str]

The UDS sentence node subspaces in the corpus.

Returns:

Set of subspace names for sentence nodes

Return type:

set[str]

Raises:

NotImplementedError – This property is not yet implemented

sentence_properties(subspace=None)[source]

Return the properties in a sentence subspace.

Parameters:

subspace (str | None, optional) – Subspace to query, or None for all properties

Returns:

Property names in the subspace

Return type:

set[str]

Raises:

NotImplementedError – This method is not yet implemented

sentence_property_metadata(subspace, prop)[source]

Return the metadata for a property in a sentence subspace.

Parameters:
  • subspace (str) – The subspace the property is in

  • prop (str) – The property in the subspace

Return type:

UDSPropertyMetadata

property sentence_subspaces: set[str]

All UDS sentence subspaces (node and edge) in the corpus.

Returns:

Union of sentence node and edge subspaces

Return type:

set[str]

to_json(sentences_outfile=None, documents_outfile=None)[source]

Serialize corpus to json.

Parameters:
  • sentences_outfile (TypeAliasType | None, default: None) – file to serialize sentence-level graphs to

  • documents_outfile (TypeAliasType | None, default: None) – file to serialize document-level graphs to

Return type:

str | None

class UDSDocument[source]

Bases: object

A Universal Decompositional Semantics document.

Parameters:
  • sentence_graphs (TypeAliasType) – the UDSSentenceGraphs associated with each sentence in the document

  • sentence_ids (TypeAliasType) – the UD sentence IDs for each graph

  • name (str) – the name of the document (i.e. the UD document ID)

  • genre (str) – the genre of the document (e.g. weblog)

  • timestamp (str | None, default: None) – the timestamp of the UD document on which this UDSDocument is based

  • doc_graph (UDSDocumentGraph | None, default: None) – the NetworkX DiGraph for the document. If not provided, this will be initialized without edges from sentence_graphs

__init__(sentence_graphs, sentence_ids, name, genre, timestamp=None, doc_graph=None)[source]
add_annotation(node_attrs, edge_attrs)[source]

Add annotations to the document-level graph.

Delegates to the document graph’s add_annotation method, passing along the sentence IDs for validation.

Parameters:
  • node_attrs (dict[str, NodeAttributes]) – Node annotations keyed by node ID

  • edge_attrs (dict[EdgeKey, EdgeAttributes]) – Edge annotations keyed by (source, target) tuples

Return type:

None

add_sentence_graphs(sentence_graphs, sentence_ids)[source]

Add sentence graphs to the document.

Creates document-level nodes for each semantics node in the sentence graphs and updates the sentence graph metadata with document information.

Parameters:
  • sentence_graphs (SentenceGraphDict) – Dictionary mapping graph names to UDSSentenceGraph objects

  • sentence_ids (SentenceIDDict) – Dictionary mapping graph names to UD sentence identifiers

Return type:

None

classmethod from_dict(document, sentence_graphs, sentence_ids, name='UDS')[source]

Construct a UDSDocument from a dictionary.

Since only the document graphs are serialized, the sentence graphs must also be provided to this method call in order to properly associate them with their documents.

Parameters:
  • document (dict[str, dict]) – a dictionary constructed by networkx.adjacency_data, containing the graph for the document

  • sentence_graphs (dict[str, UDSSentenceGraph]) – a dictionary containing (possibly a superset of) the sentence-level graphs for the sentences in the document

  • sentence_ids (dict[str, str]) – a dictionary containing (possibly a superset of) the UD sentence IDs for each graph

  • name (str, default: 'UDS') – identifier to append to the beginning of node ids

Return type:

UDSDocument

semantics_node(document_node)[source]

Get the semantics node corresponding to a document node.

Document nodes maintain references to their corresponding semantics nodes through the ‘semantics’ attribute, which contains the graph name and node ID.

Parameters:

document_node (str) – The document domain node ID

Returns:

Single-item dict mapping node ID to its attributes

Return type:

dict[str, BasicNodeAttrs]

Raises:
  • TypeError – If the semantics attribute is not a dictionary

  • KeyError – If required keys are missing from semantics dict

property text: str

The full document text reconstructed from sentences.

Concatenates the text from all sentence graphs in sorted order with space separation.

Returns:

The complete document text

Return type:

str

to_dict()[source]

Convert the document graph to a dictionary.

Returns:

NetworkX adjacency data format for the document graph

Return type:

NetworkXGraphData

class UDSDocumentGraph[source]

Bases: UDSGraph

A Universal Decompositional Semantics document-level graph.

Parameters:
  • graph (DiGraph) – the NetworkX DiGraph from which the document-level graph is to be constructed

  • name (str) – the name of the graph

__init__(graph, name)[source]
add_annotation(node_attrs, edge_attrs, sentence_ids)[source]

Add node and or edge annotations to the graph.

Parameters:
  • node_attrs (dict[str, TypeAliasType]) – the node annotations to be added

  • edge_attrs (dict[TypeAliasType, TypeAliasType]) – the edge annotations to be added

  • sentence_ids (dict[str, str]) – the IDs of all sentences in the document

Return type:

None

class UDSSentenceGraph[source]

Bases: UDSGraph

A Universal Decompositional Semantics sentence-level graph.

Parameters:
  • graph (DiGraph) – the NetworkX DiGraph from which the sentence-level graph is to be constructed

  • name (str) – the name of the graph

  • sentence_id (str | None, default: None) – the UD identifier for the sentence associated with this graph

  • document_id (str | None, default: None) – the UD identifier for the document associated with this graph

QUERIES: ClassVar[dict[str, Query]] = {}
__init__(graph, name, sentence_id=None, document_id=None)[source]
add_annotation(node_attrs, edge_attrs, add_heads=True, add_subargs=False, add_subpreds=False, add_orphans=False)[source]

Add node and or edge annotations to the graph.

Parameters:
  • node_attrs (dict[str, TypeAliasType])

  • edge_attrs (dict[TypeAliasType, TypeAliasType])

  • add_heads (bool, default: True)

  • add_subargs (bool, default: False)

  • add_subpreds (bool, default: False)

  • add_orphans (bool, default: False)

Return type:

None

argument_edges(nodeid=None)[source]

Return edges between predicates and their arguments.

Parameters:

nodeid (str | None, default: None) – The node that must be incident on an edge

Return type:

dict[TypeAliasType, TypeAliasType]

argument_head_edges(nodeid=None)[source]

Return edges between nodes and their semantic heads.

Parameters:

nodeid (str | None, default: None) – The node that must be incident on an edge

Return type:

dict[TypeAliasType, TypeAliasType]

property argument_nodes: dict[str, NodeAttributes]

All argument nodes in the semantics domain.

Returns:

Mapping of node IDs to attributes for arguments

Return type:

dict[str, NodeAttributes]

head(nodeid, attrs=None)[source]

Get the head corresponding to a semantics node.

Parameters:
  • nodeid (str) – the node identifier for a semantics node

  • attrs (list[str] | None, default: None) – a list of syntax node attributes to return

Return type:

tuple[int, list[TypeAliasType]]

Returns:

  • a pairing of the head position and the requested

  • attributes

instance_edges(nodeid=None)[source]

Return edges between syntax nodes and semantics nodes.

Parameters:

nodeid (str | None, default: None) – The node that must be incident on an edge

Return type:

dict[TypeAliasType, TypeAliasType]

maxima(nodeids=None)[source]

Find nodes not dominated by any other nodes in the set.

Parameters:

nodeids (list[str] | None, optional) – Nodes to consider. If None, uses all nodes.

Returns:

Node IDs that have no incoming edges from other nodes in the set

Return type:

list[str]

minima(nodeids=None)[source]

Find nodes not dominating any other nodes in the set.

Parameters:

nodeids (list[str] | None, optional) – Nodes to consider. If None, uses all nodes.

Returns:

Node IDs that have no outgoing edges to other nodes in the set

Return type:

list[str]

property predicate_nodes: dict[str, NodeAttributes]

All predicate nodes in the semantics domain.

Returns:

Mapping of node IDs to attributes for predicates

Return type:

dict[str, NodeAttributes]

query(query, query_type=None, cache_query=True, cache_rdf=True)[source]

Query graph using SPARQL 1.1.

Parameters:
  • query (str | Query) – a SPARQL 1.1 query

  • query_type (str | None, default: None) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.

  • cache_query (bool, default: True) – whether to cache the query; false when querying particular nodes or edges using precompiled queries

  • clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory

Return type:

Result | dict[str, TypeAliasType] | dict[TypeAliasType, TypeAliasType]

property rdf: Graph

The graph converted to RDF format.

Returns:

RDFLib graph representation

Return type:

Graph

Raises:

AttributeError – If RDFConverter is not available

property rootid: NodeID

The ID of the graph’s root node.

Returns:

The root node identifier

Return type:

NodeID

Raises:

ValueError – If the graph has no root or multiple roots

semantics_edges(nodeid=None, edgetype=None)[source]

Return edges between semantics nodes.

Parameters:
  • nodeid (str | None, default: None) – The node that must be incident on an edge

  • edgetype (str | None, default: None) – The type of edge (“dependency” or “head”)

Return type:

dict[TypeAliasType, TypeAliasType]

property semantics_nodes: dict[str, NodeAttributes]

All semantics domain nodes.

Returns:

Mapping of node IDs to attributes for semantics nodes

Return type:

dict[str, NodeAttributes]

property semantics_subgraph: DiGraph

Subgraph containing only semantics nodes.

Returns:

NetworkX subgraph with semantics nodes

Return type:

DiGraph

property sentence: str

The sentence text reconstructed from syntax nodes.

Returns:

The sentence text with tokens in surface order

Return type:

str

span(nodeid, attrs=None)[source]

Get the span corresponding to a semantics node.

Parameters:
  • nodeid (str) – the node identifier for a semantics node

  • attrs (list[str] | None, default: None) – a list of syntax node attributes to return

Return type:

dict[int, list[TypeAliasType]]

Returns:

  • a mapping from positions in the span to the requested

  • attributes in those positions

syntax_edges(nodeid=None)[source]

Return edges between syntax nodes.

Parameters:

nodeid (str | None, default: None) – The node that must be incident on an edge

Return type:

dict[TypeAliasType, TypeAliasType]

property syntax_nodes: dict[str, NodeAttributes]

All syntax domain token nodes.

Returns:

Mapping of node IDs to attributes for syntax tokens

Return type:

dict[str, NodeAttributes]

property syntax_subgraph: DiGraph

Subgraph containing only syntax nodes.

Returns:

NetworkX subgraph with syntax nodes

Return type:

DiGraph