decomp.semantics.uds.annotation¶
Module for representing UDS property annotations with support for raw and normalized formats.
This module provides classes for handling Universal Decompositional Semantics (UDS) annotations in both raw (multi-annotator) and normalized (single-value) formats.
The main classes are:
UDSAnnotation: Abstract base class for all UDS annotationsNormalizedUDSAnnotation: Annotations with single normalized values and confidence scoresRawUDSAnnotation: Annotations preserving individual annotator responses
The module also provides:
Type aliases for various annotation data structures (e.g., NodeAttributes, EdgeAttributes)
Helper functions for working with nested defaultdicts
Methods for loading annotations from JSON files and converting between formats
See also
decomp.semantics.uds.metadataMetadata classes for UDS annotations
decomp.semantics.uds.graphGraph structures for UDS annotations
- class UDSAnnotation[source]¶
Bases:
ABCA Universal Decompositional Semantics annotation.
This is an abstract base class. See its RawUDSAnnotation and NormalizedUDSAnnotation subclasses.
The
__init__method for this class is abstract to ensure that it cannot be initialized directly, even though it is used by the subclasses and has a valid default implementation. Thefrom_jsonclass method is abstract to force the subclass to define more specific constraints on its JSON inputs.- Parameters:
metadata (
UDSAnnotationMetadata) – The metadata for the annotations.data (
dict[str,dict[str,TypeAliasType|TypeAliasType]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to properties to annotations. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.
-
CACHE:
ClassVar[dict[str,UDSAnnotation]] = {}¶
- abstractmethod classmethod from_json(jsonfile)[source]¶
Load Universal Decompositional Semantics dataset from JSON.
For node annotations, the format of the JSON passed to this class method must be:
{GRAPHID_1: {NODEID_1_1: DATA, ...}, GRAPHID_2: {NODEID_2_1: DATA, ...}, ... }
Edge annotations should be of the form:
{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA, ...}, GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA, ...}, ... }
Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added. The subclass determines the form of DATA in the above.
- items(annotation_type=None)[source]¶
Dictionary-like items generator for attributes.
If annotation_type is specified as “node” or “edge”, this generator yields a graph identifier and its node or edge attributes (respectively); otherwise, this generator yields a graph identifier and a tuple of its node and edge attributes.
- Return type:
TypeAliasType
- property node_attributes: dict[str, dict[str, NormalizedData | RawData]]¶
All node attributes by graph ID.
- property edge_attributes: dict[str, dict[tuple[str, ...], NormalizedData | RawData]]¶
All edge attributes by graph ID.
- property metadata: UDSAnnotationMetadata¶
The metadata for all annotations.
- Returns:
Metadata including subspaces, properties, and datatypes
- Return type:
- property node_subspaces: set[UDSSubspace]¶
Set of subspaces used in node annotations.
- Returns:
Subspace names excluding structural attributes
- Return type:
set[UDSSubspace]
- property edge_subspaces: set[UDSSubspace]¶
Set of subspaces used in edge annotations.
- Returns:
Subspace names for edges
- Return type:
set[UDSSubspace]
- property subspaces: set[UDSSubspace]¶
Set of all subspaces (node and edge).
- Returns:
Union of node and edge subspaces
- Return type:
set[UDSSubspace]
- class NormalizedUDSAnnotation[source]¶
Bases:
UDSAnnotationA normalized Universal Decompositional Semantics annotation.
Properties in a NormalizedUDSAnnotation may have only a single
str,int, orfloatvalue and a singlestr,int, orfloatconfidence.- Parameters:
metadata (
UDSAnnotationMetadata) – The metadata for the annotations.data (
dict[str,dict[str,dict[str,dict[str,TypeAliasType]]]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.
- classmethod from_json(jsonfile)[source]¶
Load a dataset of normalized annotations from a JSON file.
For node annotations, the format of the JSON passed to this class method must be:
{GRAPHID_1: {NODEID_1_1: DATA, ...}, GRAPHID_2: {NODEID_2_1: DATA, ...}, ... }
Edge annotations should be of the form:
{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA, ...}, GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA, ...}, ... }
Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.
DATA in the above is assumed to have the following structure:
{SUBSPACE_1: {PROP_1_1: {'value': VALUE, 'confidence': VALUE}, ...}, SUBSPACE_2: {PROP_2_1: {'value': VALUE, 'confidence': VALUE}, ...}, }
VALUE in the above is assumed to be unstructured.
- Return type:
- class RawUDSAnnotation[source]¶
Bases:
UDSAnnotationA raw Universal Decompositional Semantics dataset.
Unlike
decomp.semantics.uds.NormalizedUDSAnnotation, objects of this class may have multiple annotations for a particular attribute. Each annotation is associated with an annotator ID, and different annotators may have annotated different numbers of items.- Parameters:
annotation – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence for each annotator. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.
- classmethod from_json(jsonfile)[source]¶
Load a dataset for raw annotations from a JSON file.
For node annotations, the format of the JSON passed to this class method must be:
{GRAPHID_1: {NODEID_1_1: DATA, ...}, GRAPHID_2: {NODEID_2_1: DATA, ...}, ... }
Edge annotations should be of the form:
{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA, ...}, GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA, ...}, ... }
Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.
DATA in the above is assumed to have the following structure:
{SUBSPACE_1: {PROP_1_1: {'value': { ANNOTATOR1: VALUE1, ANNOTATOR2: VALUE2, ... }, 'confidence': { ANNOTATOR1: CONF1, ANNOTATOR2: CONF2, ... } }, PROP_1_2: {'value': { ANNOTATOR1: VALUE1, ANNOTATOR2: VALUE2, ... }, 'confidence': { ANNOTATOR1: CONF1, ANNOTATOR2: CONF2, ... } }, ...}, SUBSPACE_2: {PROP_2_1: {'value': { ANNOTATOR3: VALUE1, ANNOTATOR4: VALUE2, ... }, 'confidence': { ANNOTATOR3: CONF1, ANNOTATOR4: CONF2, ... } }, ...}, ...}
VALUEi and CONFi are assumed to be unstructured.
- Return type:
- annotators(subspace=None, prop=None)[source]¶
Get annotator IDs for a subspace and property.
If neither subspace nor property are specified, all annotator IDs are returned. If only the subspace is specified, all annotator IDs for the subspace are returned.
- items(annotation_type=None, annotator_id=None)[source]¶
Dictionary-like items generator for attributes.
This method behaves exactly like UDSAnnotation.items, except that, if an annotator ID is passed, it generates only items annotated by the specified annotator.
- Parameters:
- Raises:
ValueError – If both annotation_type and annotator_id are passed and the relevant annotator gives no annotations of the relevant type, and exception is raised
- Return type:
TypeAliasType