decomp.semantics.uds¶

Universal Decompositional Semantics (UDS) representation framework.

This module provides a comprehensive framework for working with Universal Decompositional Semantics (UDS) datasets. UDS is a semantic annotation framework that captures diverse semantic properties of natural language texts through real-valued annotations on predicate-argument structures.

The module is organized hierarchically:

Annotations (annotation): Provides classes for handling UDS property annotations in both raw (multi-annotator) and normalized (aggregated) formats.
Graphs (graph): Implements graph representations at sentence and document levels, integrating syntactic dependency structures with semantic annotations.
Documents (document): Represents complete documents containing multiple sentences with their associated graphs and metadata.
Corpus (corpus): Manages collections of UDS documents and provides functionality for loading, querying, and serializing UDS datasets.

Classes¶

NormalizedUDSAnnotation: Annotations with aggregated values and confidence scores from multiple annotators.
RawUDSAnnotation: Annotations preserving individual annotator responses before aggregation.
UDSSentenceGraph: Graph representation of a single sentence with syntax and semantics layers.
UDSDocumentGraph: Graph connecting multiple sentence graphs within a document.
UDSDocument: Container for sentence graphs and document-level annotations.
UDSCorpus: Collection of UDS documents with support for various data formats and queries.

Notes

The UDS framework builds upon the PredPatt system for extracting predicate-argument structures and extends it with rich semantic annotations. All graph representations use NetworkX for the underlying graph structure and support SPARQL queries via RDF conversion.

class NormalizedUDSAnnotation[source]¶

Bases: UDSAnnotation

A normalized Universal Decompositional Semantics annotation.

Properties in a NormalizedUDSAnnotation may have only a single str, int, or float value and a single str, int, or float confidence.

Parameters:

metadata (UDSAnnotationMetadata) – The metadata for the annotations.
data (dict[str, dict[str, dict[str, dict[str, TypeAliasType]]]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

__init__(metadata, data)[source]¶

classmethod from_json(jsonfile)[source]¶

Load a dataset of normalized annotations from a JSON file.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.

DATA in the above is assumed to have the following structure:

{SUBSPACE_1: {PROP_1_1: {'value': VALUE,
                        'confidence': VALUE},
             ...},
 SUBSPACE_2: {PROP_2_1: {'value': VALUE,
                         'confidence': VALUE},
             ...},
}

VALUE in the above is assumed to be unstructured.

Return type:: NormalizedUDSAnnotation

class RawUDSAnnotation[source]¶

Bases: UDSAnnotation

A raw Universal Decompositional Semantics dataset.

Unlike decomp.semantics.uds.NormalizedUDSAnnotation, objects of this class may have multiple annotations for a particular attribute. Each annotation is associated with an annotator ID, and different annotators may have annotated different numbers of items.

Parameters:: annotation – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence for each annotator. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

__init__(metadata, data)[source]¶

annotators(subspace=None, prop=None)[source]¶

Get annotator IDs for a subspace and property.

If neither subspace nor property are specified, all annotator IDs are returned. If only the subspace is specified, all annotator IDs for the subspace are returned.

Parameters:

subspace (str | None, optional) – The subspace to filter by
prop (str | None, optional) – The property to filter by

Returns:

Set of annotator IDs or None if no annotators found

Return type:

set[str] | None

classmethod from_json(jsonfile)[source]¶

Load a dataset for raw annotations from a JSON file.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.

DATA in the above is assumed to have the following structure:

{SUBSPACE_1: {PROP_1_1: {'value': {
                            ANNOTATOR1: VALUE1,
                            ANNOTATOR2: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR1: CONF1,
                            ANNOTATOR2: CONF2,
                            ...
                                       }
                        },
              PROP_1_2: {'value': {
                            ANNOTATOR1: VALUE1,
                            ANNOTATOR2: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR1: CONF1,
                            ANNOTATOR2: CONF2,
                            ...
                                       }
                        },
              ...},
 SUBSPACE_2: {PROP_2_1: {'value': {
                            ANNOTATOR3: VALUE1,
                            ANNOTATOR4: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR3: CONF1,
                            ANNOTATOR4: CONF2,
                            ...
                                       }
                        },
             ...},
...}

VALUEi and CONFi are assumed to be unstructured.

Return type:: RawUDSAnnotation

items(annotation_type=None, annotator_id=None)[source]¶

Dictionary-like items generator for attributes.

This method behaves exactly like UDSAnnotation.items, except that, if an annotator ID is passed, it generates only items annotated by the specified annotator.

Parameters:

annotation_type (str | None, default: None) – Whether to return node annotations, edge annotations, or both (default)
annotator_id (str | None, default: None) – The annotator whose annotations will be returned by the generator (defaults to all annotators)

Raises:

ValueError – If both annotation_type and annotator_id are passed and the relevant annotator gives no annotations of the relevant type, and exception is raised

Return type:

TypeAliasType

class UDSCorpus[source]¶

Bases: PredPattCorpus

A collection of Universal Decompositional Semantics graphs.

Parameters:

sentences (PredPattCorpus | dict[str, UDSSentenceGraph] | None, default: None) – the predpatt sentence graphs to associate the annotations with
documents (dict[str, UDSDocument] | None, default: None) – the documents associated with the predpatt sentence graphs
sentence_annotations (list[UDSAnnotation] | None, default: None) – additional annotations to associate with predpatt nodes on sentence-level graphs; in most cases, no such annotations will be passed, since the standard UDS annotations are automatically loaded
document_annotations (list[UDSAnnotation] | None, default: None) – additional annotations to associate with predpatt nodes on document-level graphs
version (str, default: '2.0') – the version of UDS datasets to use
split (str | None, default: None) – the split to load: “train”, “dev”, or “test”
annotation_format (str, default: 'normalized') – which annotation type to load (“raw” or “normalized”)

ANN_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/decomp/checkouts/stable/decomp/data/'¶

CACHE_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/decomp/checkouts/stable/decomp/data/'¶

UD_URL = 'https://github.com/UniversalDependencies/UD_English-EWT/archive/r1.2.zip'¶

__init__(sentences=None, documents=None, sentence_annotations=None, document_annotations=None, version='2.0', split=None, annotation_format='normalized')[source]¶

add_annotation(sentence_annotation=None, document_annotation=None)[source]¶

Add annotations to UDS sentence and document graphs.

Parameters:

sentence_annotation (list[UDSAnnotation] | None, default: None) – the annotations to add to the sentence graphs in the corpus
document_annotation (list[UDSAnnotation] | None, default: None) – the annotations to add to the document graphs in the corpus

Return type:

None

add_corpus_metadata(metadata)[source]¶

Add metadata to the corpus.

Parameters:: metadata (UDSCorpusMetadata) – Metadata to merge with existing corpus metadata
Return type:: None

add_document_annotation(annotation)[source]¶

Add annotations to UDS documents.

Parameters:: annotation (UDSAnnotation) – the annotations to add to the documents in the corpus
Return type:: None

add_sentence_annotation(annotation)[source]¶

Add annotations to UDS sentence graphs.

Parameters:: annotation (UDSAnnotation) – the annotations to add to the graphs in the corpus
Return type:: None

property document_edge_subspaces: set[str]¶

The UDS document edge subspaces in the corpus.

Returns:: Set of subspace names for document edges
Return type:: set[str]

property document_node_subspaces: set[str]¶

The UDS document node subspaces in the corpus.

Returns:: Set of subspace names for document nodes
Return type:: set[str]
Raises:: NotImplementedError – This property is not yet implemented

document_properties(subspace=None)[source]¶

Return the properties in a document subspace.

Parameters:: subspace (str | None, optional) – Subspace to query, or None for all properties
Returns:: Property names in the subspace
Return type:: set[str]
Raises:: NotImplementedError – This method is not yet implemented

document_property_metadata(subspace, prop)[source]¶

Return the metadata for a property in a document subspace.

Parameters:

subspace (str) – The subspace the property is in
prop (str) – The property in the subspace

Return type:

UDSPropertyMetadata

property document_subspaces: set[str]¶

All UDS document subspaces (node and edge) in the corpus.

Returns:: Union of document node and edge subspaces
Return type:: set[str]

property documentids: list[str]¶

The document IDs in the corpus.

Returns:: List of all document identifiers
Return type:: list[str]

property documents: dict[str, UDSDocument]¶

The documents in the corpus.

Returns:: Mapping from document IDs to UDSDocument objects
Return type:: dict[str, UDSDocument]

classmethod from_conll_and_annotations(corpus, sentence_annotations=[], document_annotations=[], annotation_format='normalized', version='2.0', name='ewt')[source]¶

Load UDS graph corpus from CoNLL (dependencies) and JSON (annotations).

This method should only be used if the UDS corpus is being (re)built. Otherwise, loading the corpus from the JSON shipped with this package using UDSCorpus.__init__ or UDSCorpus.from_json is suggested.

Parameters:

corpus (TypeAliasType) – (path to) Universal Dependencies corpus in conllu format
sentence_annotations (Sequence[TypeAliasType], default: []) – a list of paths to JSON files or open JSON files containing sentence-level annotations
document_annotations (Sequence[TypeAliasType], default: []) – a list of paths to JSON files or open JSON files containing document-level annotations
annotation_format (str, default: 'normalized') – Whether the annotation is raw or normalized
version (str, default: '2.0') – the version of UDS datasets to use
name (str, default: 'ewt') – corpus name to be appended to the beginning of graph ids

Return type:

UDSCorpus

classmethod from_json(sentences_jsonfile, documents_jsonfile)[source]¶

Load annotated UDS graph corpus (including annotations) from JSON.

This is the suggested method for loading the UDS corpus.

Parameters:

sentences_jsonfile (TypeAliasType) – file containing Universal Decompositional Semantics corpus sentence-level graphs in JSON format
documents_jsonfile (TypeAliasType) – file containing Universal Decompositional Semantics corpus document-level graphs in JSON format

Return type:

UDSCorpus

property metadata: UDSCorpusMetadata¶

The corpus metadata.

Returns:: Metadata for sentence and document annotations
Return type:: UDSCorpusMetadata

property ndocuments: int¶

The number of documents in the corpus.

Returns:: Total document count
Return type:: int

query(query, query_type=None, cache_query=True, cache_rdf=True)[source]¶

Query all graphs in the corpus using SPARQL 1.1.

Parameters:

query (str | Query) – a SPARQL 1.1 query
query_type (str | None, default: None) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.
cache_query (bool, default: True) – whether to cache the query. This should usually be set to True. It should generally only be False when querying particular nodes or edges–e.g. as in precompiled queries.
clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory

Return type:

dict[str, Result | dict[str, TypeAliasType] | dict[TypeAliasType, TypeAliasType]]

sample_documents(k)[source]¶

Sample k documents without replacement.

Parameters:: k (int) – the number of documents to sample
Return type:: dict[str, UDSDocument]

property sentence_edge_subspaces: set[str]¶

The UDS sentence edge subspaces in the corpus.

Returns:: Set of subspace names for sentence edges
Return type:: set[str]
Raises:: NotImplementedError – This property is not yet implemented

property sentence_node_subspaces: set[str]¶

The UDS sentence node subspaces in the corpus.

Returns:: Set of subspace names for sentence nodes
Return type:: set[str]
Raises:: NotImplementedError – This property is not yet implemented

sentence_properties(subspace=None)[source]¶

Return the properties in a sentence subspace.

Parameters:: subspace (str | None, optional) – Subspace to query, or None for all properties
Returns:: Property names in the subspace
Return type:: set[str]
Raises:: NotImplementedError – This method is not yet implemented

sentence_property_metadata(subspace, prop)[source]¶

Return the metadata for a property in a sentence subspace.

Parameters:

subspace (str) – The subspace the property is in
prop (str) – The property in the subspace

Return type:

UDSPropertyMetadata

property sentence_subspaces: set[str]¶

All UDS sentence subspaces (node and edge) in the corpus.

Returns:: Union of sentence node and edge subspaces
Return type:: set[str]

to_json(sentences_outfile=None, documents_outfile=None)[source]¶

Serialize corpus to json.

Parameters:

sentences_outfile (TypeAliasType | None, default: None) – file to serialize sentence-level graphs to
documents_outfile (TypeAliasType | None, default: None) – file to serialize document-level graphs to

Return type:

str | None

class UDSDocument[source]¶

Bases: object

A Universal Decompositional Semantics document.

Parameters:

sentence_graphs (TypeAliasType) – the UDSSentenceGraphs associated with each sentence in the document
sentence_ids (TypeAliasType) – the UD sentence IDs for each graph
name (str) – the name of the document (i.e. the UD document ID)
genre (str) – the genre of the document (e.g. weblog)
timestamp (str | None, default: None) – the timestamp of the UD document on which this UDSDocument is based
doc_graph (UDSDocumentGraph | None, default: None) – the NetworkX DiGraph for the document. If not provided, this will be initialized without edges from sentence_graphs

__init__(sentence_graphs, sentence_ids, name, genre, timestamp=None, doc_graph=None)[source]¶

add_annotation(node_attrs, edge_attrs)[source]¶

Add annotations to the document-level graph.

Delegates to the document graph’s add_annotation method, passing along the sentence IDs for validation.

Parameters:

node_attrs (dict[str, NodeAttributes]) – Node annotations keyed by node ID
edge_attrs (dict[EdgeKey, EdgeAttributes]) – Edge annotations keyed by (source, target) tuples

Return type:

None

add_sentence_graphs(sentence_graphs, sentence_ids)[source]¶

Add sentence graphs to the document.

Creates document-level nodes for each semantics node in the sentence graphs and updates the sentence graph metadata with document information.

Parameters:

sentence_graphs (SentenceGraphDict) – Dictionary mapping graph names to UDSSentenceGraph objects
sentence_ids (SentenceIDDict) – Dictionary mapping graph names to UD sentence identifiers

Return type:

None

classmethod from_dict(document, sentence_graphs, sentence_ids, name='UDS')[source]¶

Construct a UDSDocument from a dictionary.

Since only the document graphs are serialized, the sentence graphs must also be provided to this method call in order to properly associate them with their documents.

Parameters:

document (dict[str, dict]) – a dictionary constructed by networkx.adjacency_data, containing the graph for the document
sentence_graphs (dict[str, UDSSentenceGraph]) – a dictionary containing (possibly a superset of) the sentence-level graphs for the sentences in the document
sentence_ids (dict[str, str]) – a dictionary containing (possibly a superset of) the UD sentence IDs for each graph
name (str, default: 'UDS') – identifier to append to the beginning of node ids

Return type:

UDSDocument

semantics_node(document_node)[source]¶

Get the semantics node corresponding to a document node.

Document nodes maintain references to their corresponding semantics nodes through the ‘semantics’ attribute, which contains the graph name and node ID.

Parameters:

document_node (str) – The document domain node ID

Returns:

Single-item dict mapping node ID to its attributes

Return type:

dict[str, BasicNodeAttrs]

Raises:

TypeError – If the semantics attribute is not a dictionary
KeyError – If required keys are missing from semantics dict

property text: str¶

The full document text reconstructed from sentences.

Concatenates the text from all sentence graphs in sorted order with space separation.

Returns:: The complete document text
Return type:: str

to_dict()[source]¶

Convert the document graph to a dictionary.

Returns:: NetworkX adjacency data format for the document graph
Return type:: NetworkXGraphData

class UDSDocumentGraph[source]¶

Bases: UDSGraph

A Universal Decompositional Semantics document-level graph.

Parameters:

graph (DiGraph) – the NetworkX DiGraph from which the document-level graph is to be constructed
name (str) – the name of the graph

__init__(graph, name)[source]¶

add_annotation(node_attrs, edge_attrs, sentence_ids)[source]¶

Add node and or edge annotations to the graph.

Parameters:

node_attrs (dict[str, TypeAliasType]) – the node annotations to be added
edge_attrs (dict[TypeAliasType, TypeAliasType]) – the edge annotations to be added
sentence_ids (dict[str, str]) – the IDs of all sentences in the document

Return type:

None

class UDSSentenceGraph[source]¶

Bases: UDSGraph

A Universal Decompositional Semantics sentence-level graph.

Parameters:

graph (DiGraph) – the NetworkX DiGraph from which the sentence-level graph is to be constructed
name (str) – the name of the graph
sentence_id (str | None, default: None) – the UD identifier for the sentence associated with this graph
document_id (str | None, default: None) – the UD identifier for the document associated with this graph

QUERIES: ClassVar[dict[str, Query]] = {}¶

__init__(graph, name, sentence_id=None, document_id=None)[source]¶

add_annotation(node_attrs, edge_attrs, add_heads=True, add_subargs=False, add_subpreds=False, add_orphans=False)[source]¶

Add node and or edge annotations to the graph.

Parameters:

node_attrs (dict[str, TypeAliasType])
edge_attrs (dict[TypeAliasType, TypeAliasType])
add_heads (bool, default: True)
add_subargs (bool, default: False)
add_subpreds (bool, default: False)
add_orphans (bool, default: False)

Return type:

None

argument_edges(nodeid=None)[source]¶

Return edges between predicates and their arguments.

Parameters:: nodeid (str | None, default: None) – The node that must be incident on an edge
Return type:: dict[TypeAliasType, TypeAliasType]

argument_head_edges(nodeid=None)[source]¶

Return edges between nodes and their semantic heads.

Parameters:: nodeid (str | None, default: None) – The node that must be incident on an edge
Return type:: dict[TypeAliasType, TypeAliasType]

property argument_nodes: dict[str, NodeAttributes]¶

All argument nodes in the semantics domain.

Returns:: Mapping of node IDs to attributes for arguments
Return type:: dict[str, NodeAttributes]

head(nodeid, attrs=None)[source]¶

Get the head corresponding to a semantics node.

Parameters:

nodeid (str) – the node identifier for a semantics node
attrs (list[str] | None, default: None) – a list of syntax node attributes to return

Return type:

tuple[int, list[TypeAliasType]]

Returns:

a pairing of the head position and the requested
attributes

instance_edges(nodeid=None)[source]¶

Return edges between syntax nodes and semantics nodes.

Parameters:: nodeid (str | None, default: None) – The node that must be incident on an edge
Return type:: dict[TypeAliasType, TypeAliasType]

maxima(nodeids=None)[source]¶

Find nodes not dominated by any other nodes in the set.

Parameters:: nodeids (list[str] | None, optional) – Nodes to consider. If None, uses all nodes.
Returns:: Node IDs that have no incoming edges from other nodes in the set
Return type:: list[str]

minima(nodeids=None)[source]¶

Find nodes not dominating any other nodes in the set.

Parameters:: nodeids (list[str] | None, optional) – Nodes to consider. If None, uses all nodes.
Returns:: Node IDs that have no outgoing edges to other nodes in the set
Return type:: list[str]

property predicate_nodes: dict[str, NodeAttributes]¶

All predicate nodes in the semantics domain.

Returns:: Mapping of node IDs to attributes for predicates
Return type:: dict[str, NodeAttributes]

query(query, query_type=None, cache_query=True, cache_rdf=True)[source]¶

Query graph using SPARQL 1.1.

Parameters:

query (str | Query) – a SPARQL 1.1 query
query_type (str | None, default: None) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.
cache_query (bool, default: True) – whether to cache the query; false when querying particular nodes or edges using precompiled queries
clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory

Return type:

Result | dict[str, TypeAliasType] | dict[TypeAliasType, TypeAliasType]

property rdf: Graph¶

The graph converted to RDF format.

Returns:: RDFLib graph representation
Return type:: Graph
Raises:: AttributeError – If RDFConverter is not available

property rootid: NodeID¶

The ID of the graph’s root node.

Returns:: The root node identifier
Return type:: NodeID
Raises:: ValueError – If the graph has no root or multiple roots

semantics_edges(nodeid=None, edgetype=None)[source]¶

Return edges between semantics nodes.

Parameters:

nodeid (str | None, default: None) – The node that must be incident on an edge
edgetype (str | None, default: None) – The type of edge (“dependency” or “head”)

Return type:

dict[TypeAliasType, TypeAliasType]

property semantics_nodes: dict[str, NodeAttributes]¶

All semantics domain nodes.

Returns:: Mapping of node IDs to attributes for semantics nodes
Return type:: dict[str, NodeAttributes]

property semantics_subgraph: DiGraph¶

Subgraph containing only semantics nodes.

Returns:: NetworkX subgraph with semantics nodes
Return type:: DiGraph

property sentence: str¶

The sentence text reconstructed from syntax nodes.

Returns:: The sentence text with tokens in surface order
Return type:: str

span(nodeid, attrs=None)[source]¶

Get the span corresponding to a semantics node.

Parameters:

nodeid (str) – the node identifier for a semantics node
attrs (list[str] | None, default: None) – a list of syntax node attributes to return

Return type:

dict[int, list[TypeAliasType]]

Returns:

a mapping from positions in the span to the requested
attributes in those positions

syntax_edges(nodeid=None)[source]¶

Return edges between syntax nodes.

Parameters:: nodeid (str | None, default: None) – The node that must be incident on an edge
Return type:: dict[TypeAliasType, TypeAliasType]

property syntax_nodes: dict[str, NodeAttributes]¶

All syntax domain token nodes.

Returns:: Mapping of node IDs to attributes for syntax tokens
Return type:: dict[str, NodeAttributes]

property syntax_subgraph: DiGraph¶

Subgraph containing only syntax nodes.

Returns:: NetworkX subgraph with syntax nodes
Return type:: DiGraph