decomp.semantics.uds¶
Universal Decompositional Semantics (UDS) representation framework.
This module provides a comprehensive framework for working with Universal Decompositional Semantics (UDS) datasets. UDS is a semantic annotation framework that captures diverse semantic properties of natural language texts through real-valued annotations on predicate-argument structures.
The module is organized hierarchically:
Annotations (
annotation): Provides classes for handling UDS property annotations in both raw (multi-annotator) and normalized (aggregated) formats.Graphs (
graph): Implements graph representations at sentence and document levels, integrating syntactic dependency structures with semantic annotations.Documents (
document): Represents complete documents containing multiple sentences with their associated graphs and metadata.Corpus (
corpus): Manages collections of UDS documents and provides functionality for loading, querying, and serializing UDS datasets.
Classes¶
- NormalizedUDSAnnotation
Annotations with aggregated values and confidence scores from multiple annotators.
- RawUDSAnnotation
Annotations preserving individual annotator responses before aggregation.
- UDSSentenceGraph
Graph representation of a single sentence with syntax and semantics layers.
- UDSDocumentGraph
Graph connecting multiple sentence graphs within a document.
- UDSDocument
Container for sentence graphs and document-level annotations.
- UDSCorpus
Collection of UDS documents with support for various data formats and queries.
Notes
The UDS framework builds upon the PredPatt system for extracting predicate-argument structures and extends it with rich semantic annotations. All graph representations use NetworkX for the underlying graph structure and support SPARQL queries via RDF conversion.
- class NormalizedUDSAnnotation[source]¶
Bases:
UDSAnnotationA normalized Universal Decompositional Semantics annotation.
Properties in a NormalizedUDSAnnotation may have only a single
str,int, orfloatvalue and a singlestr,int, orfloatconfidence.- Parameters:
metadata (
UDSAnnotationMetadata) – The metadata for the annotations.data (
dict[str,dict[str,dict[str,dict[str,TypeAliasType]]]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.
- classmethod from_json(jsonfile)[source]¶
Load a dataset of normalized annotations from a JSON file.
For node annotations, the format of the JSON passed to this class method must be:
{GRAPHID_1: {NODEID_1_1: DATA, ...}, GRAPHID_2: {NODEID_2_1: DATA, ...}, ... }
Edge annotations should be of the form:
{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA, ...}, GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA, ...}, ... }
Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.
DATA in the above is assumed to have the following structure:
{SUBSPACE_1: {PROP_1_1: {'value': VALUE, 'confidence': VALUE}, ...}, SUBSPACE_2: {PROP_2_1: {'value': VALUE, 'confidence': VALUE}, ...}, }
VALUE in the above is assumed to be unstructured.
- Return type:
- class RawUDSAnnotation[source]¶
Bases:
UDSAnnotationA raw Universal Decompositional Semantics dataset.
Unlike
decomp.semantics.uds.NormalizedUDSAnnotation, objects of this class may have multiple annotations for a particular attribute. Each annotation is associated with an annotator ID, and different annotators may have annotated different numbers of items.- Parameters:
annotation – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence for each annotator. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.
- annotators(subspace=None, prop=None)[source]¶
Get annotator IDs for a subspace and property.
If neither subspace nor property are specified, all annotator IDs are returned. If only the subspace is specified, all annotator IDs for the subspace are returned.
- classmethod from_json(jsonfile)[source]¶
Load a dataset for raw annotations from a JSON file.
For node annotations, the format of the JSON passed to this class method must be:
{GRAPHID_1: {NODEID_1_1: DATA, ...}, GRAPHID_2: {NODEID_2_1: DATA, ...}, ... }
Edge annotations should be of the form:
{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA, ...}, GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA, ...}, ... }
Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.
DATA in the above is assumed to have the following structure:
{SUBSPACE_1: {PROP_1_1: {'value': { ANNOTATOR1: VALUE1, ANNOTATOR2: VALUE2, ... }, 'confidence': { ANNOTATOR1: CONF1, ANNOTATOR2: CONF2, ... } }, PROP_1_2: {'value': { ANNOTATOR1: VALUE1, ANNOTATOR2: VALUE2, ... }, 'confidence': { ANNOTATOR1: CONF1, ANNOTATOR2: CONF2, ... } }, ...}, SUBSPACE_2: {PROP_2_1: {'value': { ANNOTATOR3: VALUE1, ANNOTATOR4: VALUE2, ... }, 'confidence': { ANNOTATOR3: CONF1, ANNOTATOR4: CONF2, ... } }, ...}, ...}
VALUEi and CONFi are assumed to be unstructured.
- Return type:
- items(annotation_type=None, annotator_id=None)[source]¶
Dictionary-like items generator for attributes.
This method behaves exactly like UDSAnnotation.items, except that, if an annotator ID is passed, it generates only items annotated by the specified annotator.
- Parameters:
- Raises:
ValueError – If both annotation_type and annotator_id are passed and the relevant annotator gives no annotations of the relevant type, and exception is raised
- Return type:
TypeAliasType
- class UDSCorpus[source]¶
Bases:
PredPattCorpusA collection of Universal Decompositional Semantics graphs.
- Parameters:
sentences (
PredPattCorpus|dict[str,UDSSentenceGraph] |None, default:None) – the predpatt sentence graphs to associate the annotations withdocuments (
dict[str,UDSDocument] |None, default:None) – the documents associated with the predpatt sentence graphssentence_annotations (
list[UDSAnnotation] |None, default:None) – additional annotations to associate with predpatt nodes on sentence-level graphs; in most cases, no such annotations will be passed, since the standard UDS annotations are automatically loadeddocument_annotations (
list[UDSAnnotation] |None, default:None) – additional annotations to associate with predpatt nodes on document-level graphsversion (
str, default:'2.0') – the version of UDS datasets to usesplit (
str|None, default:None) – the split to load: “train”, “dev”, or “test”annotation_format (
str, default:'normalized') – which annotation type to load (“raw” or “normalized”)
- ANN_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/decomp/checkouts/stable/decomp/data/'¶
- CACHE_DIR = '/home/docs/checkouts/readthedocs.org/user_builds/decomp/checkouts/stable/decomp/data/'¶
- UD_URL = 'https://github.com/UniversalDependencies/UD_English-EWT/archive/r1.2.zip'¶
- __init__(sentences=None, documents=None, sentence_annotations=None, document_annotations=None, version='2.0', split=None, annotation_format='normalized')[source]¶
- add_annotation(sentence_annotation=None, document_annotation=None)[source]¶
Add annotations to UDS sentence and document graphs.
- Parameters:
sentence_annotation (
list[UDSAnnotation] |None, default:None) – the annotations to add to the sentence graphs in the corpusdocument_annotation (
list[UDSAnnotation] |None, default:None) – the annotations to add to the document graphs in the corpus
- Return type:
- add_corpus_metadata(metadata)[source]¶
Add metadata to the corpus.
- Parameters:
metadata (UDSCorpusMetadata) – Metadata to merge with existing corpus metadata
- Return type:
- add_document_annotation(annotation)[source]¶
Add annotations to UDS documents.
- Parameters:
annotation (
UDSAnnotation) – the annotations to add to the documents in the corpus- Return type:
- add_sentence_annotation(annotation)[source]¶
Add annotations to UDS sentence graphs.
- Parameters:
annotation (
UDSAnnotation) – the annotations to add to the graphs in the corpus- Return type:
- property document_node_subspaces: set[str]¶
The UDS document node subspaces in the corpus.
- Returns:
Set of subspace names for document nodes
- Return type:
- Raises:
NotImplementedError – This property is not yet implemented
- document_properties(subspace=None)[source]¶
Return the properties in a document subspace.
- Parameters:
subspace (str | None, optional) – Subspace to query, or None for all properties
- Returns:
Property names in the subspace
- Return type:
- Raises:
NotImplementedError – This method is not yet implemented
- document_property_metadata(subspace, prop)[source]¶
Return the metadata for a property in a document subspace.
- Parameters:
- Return type:
- property documents: dict[str, UDSDocument]¶
The documents in the corpus.
- Returns:
Mapping from document IDs to UDSDocument objects
- Return type:
- classmethod from_conll_and_annotations(corpus, sentence_annotations=[], document_annotations=[], annotation_format='normalized', version='2.0', name='ewt')[source]¶
Load UDS graph corpus from CoNLL (dependencies) and JSON (annotations).
This method should only be used if the UDS corpus is being (re)built. Otherwise, loading the corpus from the JSON shipped with this package using UDSCorpus.__init__ or UDSCorpus.from_json is suggested.
- Parameters:
corpus (
TypeAliasType) – (path to) Universal Dependencies corpus in conllu formatsentence_annotations (
Sequence[TypeAliasType], default:[]) – a list of paths to JSON files or open JSON files containing sentence-level annotationsdocument_annotations (
Sequence[TypeAliasType], default:[]) – a list of paths to JSON files or open JSON files containing document-level annotationsannotation_format (
str, default:'normalized') – Whether the annotation is raw or normalizedversion (
str, default:'2.0') – the version of UDS datasets to usename (
str, default:'ewt') – corpus name to be appended to the beginning of graph ids
- Return type:
- classmethod from_json(sentences_jsonfile, documents_jsonfile)[source]¶
Load annotated UDS graph corpus (including annotations) from JSON.
This is the suggested method for loading the UDS corpus.
- Parameters:
sentences_jsonfile (
TypeAliasType) – file containing Universal Decompositional Semantics corpus sentence-level graphs in JSON formatdocuments_jsonfile (
TypeAliasType) – file containing Universal Decompositional Semantics corpus document-level graphs in JSON format
- Return type:
- property metadata: UDSCorpusMetadata¶
The corpus metadata.
- Returns:
Metadata for sentence and document annotations
- Return type:
- property ndocuments: int¶
The number of documents in the corpus.
- Returns:
Total document count
- Return type:
- query(query, query_type=None, cache_query=True, cache_rdf=True)[source]¶
Query all graphs in the corpus using SPARQL 1.1.
- Parameters:
query_type (
str|None, default:None) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.cache_query (
bool, default:True) – whether to cache the query. This should usually be set to True. It should generally only be False when querying particular nodes or edges–e.g. as in precompiled queries.clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory
- Return type:
dict[str,Result|dict[str,TypeAliasType] |dict[TypeAliasType,TypeAliasType]]
- sample_documents(k)[source]¶
Sample k documents without replacement.
- Parameters:
k (
int) – the number of documents to sample- Return type:
- property sentence_edge_subspaces: set[str]¶
The UDS sentence edge subspaces in the corpus.
- Returns:
Set of subspace names for sentence edges
- Return type:
- Raises:
NotImplementedError – This property is not yet implemented
- property sentence_node_subspaces: set[str]¶
The UDS sentence node subspaces in the corpus.
- Returns:
Set of subspace names for sentence nodes
- Return type:
- Raises:
NotImplementedError – This property is not yet implemented
- sentence_properties(subspace=None)[source]¶
Return the properties in a sentence subspace.
- Parameters:
subspace (str | None, optional) – Subspace to query, or None for all properties
- Returns:
Property names in the subspace
- Return type:
- Raises:
NotImplementedError – This method is not yet implemented
- sentence_property_metadata(subspace, prop)[source]¶
Return the metadata for a property in a sentence subspace.
- Parameters:
- Return type:
- class UDSDocument[source]¶
Bases:
objectA Universal Decompositional Semantics document.
- Parameters:
sentence_graphs (
TypeAliasType) – the UDSSentenceGraphs associated with each sentence in the documentsentence_ids (
TypeAliasType) – the UD sentence IDs for each graphname (
str) – the name of the document (i.e. the UD document ID)genre (
str) – the genre of the document (e.g. weblog)timestamp (
str|None, default:None) – the timestamp of the UD document on which this UDSDocument is baseddoc_graph (
UDSDocumentGraph|None, default:None) – the NetworkX DiGraph for the document. If not provided, this will be initialized without edges from sentence_graphs
- add_annotation(node_attrs, edge_attrs)[source]¶
Add annotations to the document-level graph.
Delegates to the document graph’s add_annotation method, passing along the sentence IDs for validation.
- add_sentence_graphs(sentence_graphs, sentence_ids)[source]¶
Add sentence graphs to the document.
Creates document-level nodes for each semantics node in the sentence graphs and updates the sentence graph metadata with document information.
- Parameters:
sentence_graphs (SentenceGraphDict) – Dictionary mapping graph names to UDSSentenceGraph objects
sentence_ids (SentenceIDDict) – Dictionary mapping graph names to UD sentence identifiers
- Return type:
- classmethod from_dict(document, sentence_graphs, sentence_ids, name='UDS')[source]¶
Construct a UDSDocument from a dictionary.
Since only the document graphs are serialized, the sentence graphs must also be provided to this method call in order to properly associate them with their documents.
- Parameters:
document (
dict[str,dict]) – a dictionary constructed by networkx.adjacency_data, containing the graph for the documentsentence_graphs (
dict[str,UDSSentenceGraph]) – a dictionary containing (possibly a superset of) the sentence-level graphs for the sentences in the documentsentence_ids (
dict[str,str]) – a dictionary containing (possibly a superset of) the UD sentence IDs for each graphname (
str, default:'UDS') – identifier to append to the beginning of node ids
- Return type:
- semantics_node(document_node)[source]¶
Get the semantics node corresponding to a document node.
Document nodes maintain references to their corresponding semantics nodes through the ‘semantics’ attribute, which contains the graph name and node ID.
- class UDSDocumentGraph[source]¶
Bases:
UDSGraphA Universal Decompositional Semantics document-level graph.
- Parameters:
- class UDSSentenceGraph[source]¶
Bases:
UDSGraphA Universal Decompositional Semantics sentence-level graph.
- Parameters:
graph (
DiGraph) – the NetworkX DiGraph from which the sentence-level graph is to be constructedname (
str) – the name of the graphsentence_id (
str|None, default:None) – the UD identifier for the sentence associated with this graphdocument_id (
str|None, default:None) – the UD identifier for the document associated with this graph
- add_annotation(node_attrs, edge_attrs, add_heads=True, add_subargs=False, add_subpreds=False, add_orphans=False)[source]¶
Add node and or edge annotations to the graph.
- query(query, query_type=None, cache_query=True, cache_rdf=True)[source]¶
Query graph using SPARQL 1.1.
- Parameters:
query_type (
str|None, default:None) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.cache_query (
bool, default:True) – whether to cache the query; false when querying particular nodes or edges using precompiled queriesclear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory
- Return type:
Result|dict[str,TypeAliasType] |dict[TypeAliasType,TypeAliasType]
- property rdf: Graph¶
The graph converted to RDF format.
- Returns:
RDFLib graph representation
- Return type:
Graph
- Raises:
AttributeError – If RDFConverter is not available
- property rootid: NodeID¶
The ID of the graph’s root node.
- Returns:
The root node identifier
- Return type:
NodeID
- Raises:
ValueError – If the graph has no root or multiple roots
- property semantics_subgraph: DiGraph¶
Subgraph containing only semantics nodes.
- Returns:
NetworkX subgraph with semantics nodes
- Return type:
DiGraph
- property sentence: str¶
The sentence text reconstructed from syntax nodes.
- Returns:
The sentence text with tokens in surface order
- Return type:
- decomp.semantics.uds.corpus
UDSCorpusUDSCorpus.UD_URLUDSCorpus.ANN_DIRUDSCorpus.CACHE_DIRUDSCorpus.__init__()UDSCorpus.from_conll_and_annotations()UDSCorpus.from_json()UDSCorpus.add_corpus_metadata()UDSCorpus.add_annotation()UDSCorpus.add_sentence_annotation()UDSCorpus.add_document_annotation()UDSCorpus.to_json()UDSCorpus.query()UDSCorpus.documentsUDSCorpus.documentidsUDSCorpus.ndocumentsUDSCorpus.sample_documents()UDSCorpus.metadataUDSCorpus.sentence_node_subspacesUDSCorpus.sentence_edge_subspacesUDSCorpus.sentence_subspacesUDSCorpus.document_node_subspacesUDSCorpus.document_edge_subspacesUDSCorpus.document_subspacesUDSCorpus.sentence_properties()UDSCorpus.sentence_property_metadata()UDSCorpus.document_properties()UDSCorpus.document_property_metadata()
- decomp.semantics.uds.document
- decomp.semantics.uds.graph
- Key Components
UDSGraphUDSSentenceGraphUDSSentenceGraph.QUERIESUDSSentenceGraph.__init__()UDSSentenceGraph.rdfUDSSentenceGraph.rootidUDSSentenceGraph.query()UDSSentenceGraph.syntax_nodesUDSSentenceGraph.semantics_nodesUDSSentenceGraph.predicate_nodesUDSSentenceGraph.argument_nodesUDSSentenceGraph.syntax_subgraphUDSSentenceGraph.semantics_subgraphUDSSentenceGraph.semantics_edges()UDSSentenceGraph.argument_edges()UDSSentenceGraph.argument_head_edges()UDSSentenceGraph.syntax_edges()UDSSentenceGraph.instance_edges()UDSSentenceGraph.span()UDSSentenceGraph.head()UDSSentenceGraph.maxima()UDSSentenceGraph.minima()UDSSentenceGraph.add_annotation()UDSSentenceGraph.sentence
UDSDocumentGraph
- decomp.semantics.uds.annotation
UDSAnnotationUDSAnnotation.CACHEUDSAnnotation.__init__()UDSAnnotation.__getitem__()UDSAnnotation.from_json()UDSAnnotation.items()UDSAnnotation.node_attributesUDSAnnotation.edge_attributesUDSAnnotation.graphidsUDSAnnotation.node_graphidsUDSAnnotation.edge_graphidsUDSAnnotation.metadataUDSAnnotation.node_subspacesUDSAnnotation.edge_subspacesUDSAnnotation.subspacesUDSAnnotation.properties()UDSAnnotation.property_metadata()
NormalizedUDSAnnotationRawUDSAnnotation
- decomp.semantics.uds.metadata
- Key Components
UDSDataTypeUDSPropertyMetadataUDSAnnotationMetadataUDSAnnotationMetadata.__init__()UDSAnnotationMetadata.__getitem__()UDSAnnotationMetadata.__eq__()UDSAnnotationMetadata.__add__()UDSAnnotationMetadata.metadataUDSAnnotationMetadata.subspacesUDSAnnotationMetadata.properties()UDSAnnotationMetadata.has_annotators()UDSAnnotationMetadata.from_dict()UDSAnnotationMetadata.to_dict()
UDSCorpusMetadataUDSCorpusMetadata.__init__()UDSCorpusMetadata.from_dict()UDSCorpusMetadata.to_dict()UDSCorpusMetadata.__add__()UDSCorpusMetadata.add_sentence_metadata()UDSCorpusMetadata.add_document_metadata()UDSCorpusMetadata.sentence_metadataUDSCorpusMetadata.document_metadataUDSCorpusMetadata.sentence_subspacesUDSCorpusMetadata.document_subspacesUDSCorpusMetadata.sentence_properties()UDSCorpusMetadata.document_properties()UDSCorpusMetadata.sentence_annotators()UDSCorpusMetadata.document_annotators()UDSCorpusMetadata.has_sentence_annotators()UDSCorpusMetadata.has_document_annotators()