decomp.semantics.uds.annotation

Module for representing UDS property annotations with support for raw and normalized formats.

This module provides classes for handling Universal Decompositional Semantics (UDS) annotations in both raw (multi-annotator) and normalized (single-value) formats.

The main classes are:

The module also provides:

  • Type aliases for various annotation data structures (e.g., NodeAttributes, EdgeAttributes)

  • Helper functions for working with nested defaultdicts

  • Methods for loading annotations from JSON files and converting between formats

See also

decomp.semantics.uds.metadata

Metadata classes for UDS annotations

decomp.semantics.uds.graph

Graph structures for UDS annotations

class UDSAnnotation[source]

Bases: ABC

A Universal Decompositional Semantics annotation.

This is an abstract base class. See its RawUDSAnnotation and NormalizedUDSAnnotation subclasses.

The __init__ method for this class is abstract to ensure that it cannot be initialized directly, even though it is used by the subclasses and has a valid default implementation. The from_json class method is abstract to force the subclass to define more specific constraints on its JSON inputs.

Parameters:
  • metadata (UDSAnnotationMetadata) – The metadata for the annotations.

  • data (dict[str, dict[str, TypeAliasType | TypeAliasType]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to properties to annotations. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

CACHE: ClassVar[dict[str, UDSAnnotation]] = {}
abstractmethod __init__(metadata, data)[source]
__getitem__(graphid)[source]

Get node and edge attributes for a graph.

Parameters:

graphid (str) – The graph identifier.

Returns:

Tuple of (node_attributes, edge_attributes) for the graph.

Return type:

tuple[dict[str, NormalizedData | RawData], dict[tuple[str, …], NormalizedData | RawData]]

Raises:

KeyError – If graphid not found

abstractmethod classmethod from_json(jsonfile)[source]

Load Universal Decompositional Semantics dataset from JSON.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added. The subclass determines the form of DATA in the above.

Parameters:

jsonfile (str | TextIO) – (path to) file containing annotations as JSON

Return type:

UDSAnnotation

items(annotation_type=None)[source]

Dictionary-like items generator for attributes.

If annotation_type is specified as “node” or “edge”, this generator yields a graph identifier and its node or edge attributes (respectively); otherwise, this generator yields a graph identifier and a tuple of its node and edge attributes.

Return type:

TypeAliasType

property node_attributes: dict[str, dict[str, NormalizedData | RawData]]

All node attributes by graph ID.

Returns:

Mapping from graph ID to node ID to annotation data

Return type:

dict[str, dict[str, NormalizedData | RawData]]

property edge_attributes: dict[str, dict[tuple[str, ...], NormalizedData | RawData]]

All edge attributes by graph ID.

Returns:

Mapping from graph ID to edge tuple to annotation data

Return type:

dict[str, dict[tuple[str, …], NormalizedData | RawData]]

property graphids: set[str]

Set of all graph identifiers with annotations.

Returns:

Graph IDs that have node or edge annotations

Return type:

set[str]

property node_graphids: set[str]

Set of graph identifiers with node annotations.

Returns:

Graph IDs that have node annotations

Return type:

set[str]

property edge_graphids: set[str]

Set of graph identifiers with edge annotations.

Returns:

Graph IDs that have edge annotations

Return type:

set[str]

property metadata: UDSAnnotationMetadata

The metadata for all annotations.

Returns:

Metadata including subspaces, properties, and datatypes

Return type:

UDSAnnotationMetadata

property node_subspaces: set[UDSSubspace]

Set of subspaces used in node annotations.

Returns:

Subspace names excluding structural attributes

Return type:

set[UDSSubspace]

property edge_subspaces: set[UDSSubspace]

Set of subspaces used in edge annotations.

Returns:

Subspace names for edges

Return type:

set[UDSSubspace]

property subspaces: set[UDSSubspace]

Set of all subspaces (node and edge).

Returns:

Union of node and edge subspaces

Return type:

set[UDSSubspace]

properties(subspace=None)[source]

Get properties for a subspace.

Parameters:

subspace (str | None, optional) – Subspace to get properties for. If None, returns all properties.

Returns:

Property names in the subspace

Return type:

set[str]

property_metadata(subspace, prop)[source]

Get metadata for a specific property.

Parameters:
  • subspace (str) – The subspace containing the property

  • prop (str) – The property name

Returns:

Metadata including datatypes and annotators

Return type:

UDSPropertyMetadata

Raises:

KeyError – If subspace or property not found

class NormalizedUDSAnnotation[source]

Bases: UDSAnnotation

A normalized Universal Decompositional Semantics annotation.

Properties in a NormalizedUDSAnnotation may have only a single str, int, or float value and a single str, int, or float confidence.

Parameters:
  • metadata (UDSAnnotationMetadata) – The metadata for the annotations.

  • data (dict[str, dict[str, dict[str, dict[str, TypeAliasType]]]]) – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

__init__(metadata, data)[source]
classmethod from_json(jsonfile)[source]

Load a dataset of normalized annotations from a JSON file.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.

DATA in the above is assumed to have the following structure:

{SUBSPACE_1: {PROP_1_1: {'value': VALUE,
                        'confidence': VALUE},
             ...},
 SUBSPACE_2: {PROP_2_1: {'value': VALUE,
                         'confidence': VALUE},
             ...},
}

VALUE in the above is assumed to be unstructured.

Return type:

NormalizedUDSAnnotation

class RawUDSAnnotation[source]

Bases: UDSAnnotation

A raw Universal Decompositional Semantics dataset.

Unlike decomp.semantics.uds.NormalizedUDSAnnotation, objects of this class may have multiple annotations for a particular attribute. Each annotation is associated with an annotator ID, and different annotators may have annotated different numbers of items.

Parameters:

annotation – A mapping from graph identifiers to node/edge identifiers to property subspaces to property to value and confidence for each annotator. Edge identifiers must be represented as NODEID1%%NODEID2, and node identifiers must not contain %%.

__init__(metadata, data)[source]
classmethod from_json(jsonfile)[source]

Load a dataset for raw annotations from a JSON file.

For node annotations, the format of the JSON passed to this class method must be:

{GRAPHID_1: {NODEID_1_1: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1: DATA,
             ...},
 ...
}

Edge annotations should be of the form:

{GRAPHID_1: {NODEID_1_1%%NODEID_1_2: DATA,
             ...},
 GRAPHID_2: {NODEID_2_1%%NODEID_2_2: DATA,
             ...},
 ...
}

Graph and node identifiers must match the graph and node identifiers of the predpatt graphs to which the annotations will be added.

DATA in the above is assumed to have the following structure:

{SUBSPACE_1: {PROP_1_1: {'value': {
                            ANNOTATOR1: VALUE1,
                            ANNOTATOR2: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR1: CONF1,
                            ANNOTATOR2: CONF2,
                            ...
                                       }
                        },
              PROP_1_2: {'value': {
                            ANNOTATOR1: VALUE1,
                            ANNOTATOR2: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR1: CONF1,
                            ANNOTATOR2: CONF2,
                            ...
                                       }
                        },
              ...},
 SUBSPACE_2: {PROP_2_1: {'value': {
                            ANNOTATOR3: VALUE1,
                            ANNOTATOR4: VALUE2,
                            ...
                                  },
                         'confidence': {
                            ANNOTATOR3: CONF1,
                            ANNOTATOR4: CONF2,
                            ...
                                       }
                        },
             ...},
...}

VALUEi and CONFi are assumed to be unstructured.

Return type:

RawUDSAnnotation

annotators(subspace=None, prop=None)[source]

Get annotator IDs for a subspace and property.

If neither subspace nor property are specified, all annotator IDs are returned. If only the subspace is specified, all annotator IDs for the subspace are returned.

Parameters:
  • subspace (str | None, optional) – The subspace to filter by

  • prop (str | None, optional) – The property to filter by

Returns:

Set of annotator IDs or None if no annotators found

Return type:

set[str] | None

items(annotation_type=None, annotator_id=None)[source]

Dictionary-like items generator for attributes.

This method behaves exactly like UDSAnnotation.items, except that, if an annotator ID is passed, it generates only items annotated by the specified annotator.

Parameters:
  • annotation_type (str | None, default: None) – Whether to return node annotations, edge annotations, or both (default)

  • annotator_id (str | None, default: None) – The annotator whose annotations will be returned by the generator (defaults to all annotators)

Raises:

ValueError – If both annotation_type and annotator_id are passed and the relevant annotator gives no annotations of the relevant type, and exception is raised

Return type:

TypeAliasType