decomp.semantics.uds.corpus

Module for representing UDS corpora.

class decomp.semantics.uds.corpus.UDSCorpus(sentences=None, documents=None, sentence_annotations=[], document_annotations=[], version='2.0', split=None, annotation_format='normalized')

A collection of Universal Decompositional Semantics graphs

Parameters
  • sentences (Optional[PredPattCorpus]) – the predpatt sentence graphs to associate the annotations with

  • documents (Optional[Dict[str, UDSDocument]]) – the documents associated with the predpatt sentence graphs

  • sentence_annotations (List[UDSAnnotation]) – additional annotations to associate with predpatt nodes on sentence-level graphs; in most cases, no such annotations will be passed, since the standard UDS annotations are automatically loaded

  • document_annotations (List[UDSAnnotation]) – additional annotations to associate with predpatt nodes on document-level graphs

  • version (str) – the version of UDS datasets to use

  • split (Optional[str]) – the split to load: “train”, “dev”, or “test”

  • annotation_format (str) – which annotation type to load (“raw” or “normalized”)

add_annotation(sentence_annotation, document_annotation)

Add annotations to UDS sentence and document graphs

Parameters
  • sentence_annotation (UDSAnnotation) – the annotations to add to the sentence graphs in the corpus

  • document_annotation (UDSAnnotation) – the annotations to add to the document graphs in the corpus

Return type

None

add_document_annotation(annotation)

Add annotations to UDS documents

Parameters

annotation (UDSAnnotation) – the annotations to add to the documents in the corpus

Return type

None

add_sentence_annotation(annotation)

Add annotations to UDS sentence graphs

Parameters

annotation (UDSAnnotation) – the annotations to add to the graphs in the corpus

Return type

None

property document_edge_subspaces: Set[str]

The UDS document edge subspaces in the corpus

Return type

Set[str]

property document_node_subspaces: Set[str]

The UDS document node subspaces in the corpus

Return type

Set[str]

document_properties(subspace=None)

The properties in a document subspace

Return type

Set[str]

document_property_metadata(subspace, prop)

The metadata for a property in a document subspace

Parameters
  • subspace (str) – The subspace the property is in

  • prop (str) – The property in the subspace

Return type

UDSPropertyMetadata

property document_subspaces: Set[str]

The UDS document subspaces in the corpus

Return type

Set[str]

property documentids

The document ID for each document in the corpus

property documents: Dict[str, UDSDocument]

The documents in the corpus

Return type

Dict[str, UDSDocument]

classmethod from_conll(corpus, sentence_annotations=[], document_annotations=[], annotation_format='normalized', version='2.0', name='ewt')

Load UDS graph corpus from CoNLL (dependencies) and JSON (annotations)

This method should only be used if the UDS corpus is being (re)built. Otherwise, loading the corpus from the JSON shipped with this package using UDSCorpus.__init__ or UDSCorpus.from_json is suggested.

Parameters
  • corpus (Union[str, TextIO]) – (path to) Universal Dependencies corpus in conllu format

  • sentence_annotations (List[Union[str, TextIO]]) – a list of paths to JSON files or open JSON files containing sentence-level annotations

  • document_annotations (List[Union[str, TextIO]]) – a list of paths to JSON files or open JSON files containing document-level annotations

  • annotation_format (str) – Whether the annotation is raw or normalized

  • version (str) – the version of UDS datasets to use

  • name (str) – corpus name to be appended to the beginning of graph ids

Return type

UDSCorpus

classmethod from_json(sentences_jsonfile, documents_jsonfile)

Load annotated UDS graph corpus (including annotations) from JSON

This is the suggested method for loading the UDS corpus.

Parameters
  • sentences_jsonfile (Union[str, TextIO]) – file containing Universal Decompositional Semantics corpus sentence-level graphs in JSON format

  • documents_jsonfile (Union[str, TextIO]) – file containing Universal Decompositional Semantics corpus document-level graphs in JSON format

Return type

UDSCorpus

property ndocuments

The number of IDs in the corpus

query(query, query_type=None, cache_query=True, cache_rdf=True)

Query all graphs in the corpus using SPARQL 1.1

Parameters
  • query (Union[str, Query]) – a SPARQL 1.1 query

  • query_type (Optional[str]) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.

  • cache_query (bool) – whether to cache the query. This should usually be set to True. It should generally only be False when querying particular nodes or edges–e.g. as in precompiled queries.

  • clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory

Return type

Union[Result, Dict[str, Dict[str, Any]]]

sample_documents(k)

Sample k documents without replacement

Parameters

k (int) – the number of documents to sample

Return type

Dict[str, UDSDocument]

property sentence_edge_subspaces: Set[str]

The UDS sentence edge subspaces in the corpus

Return type

Set[str]

property sentence_node_subspaces: Set[str]

The UDS sentence node subspaces in the corpus

Return type

Set[str]

sentence_properties(subspace=None)

The properties in a sentence subspace

Return type

Set[str]

sentence_property_metadata(subspace, prop)

The metadata for a property in a sentence subspace

Parameters
  • subspace (str) – The subspace the property is in

  • prop (str) – The property in the subspace

Return type

UDSPropertyMetadata

property sentence_subspaces: Set[str]

The UDS sentence subspaces in the corpus

Return type

Set[str]

to_json(sentences_outfile=None, documents_outfile=None)

Serialize corpus to json

Parameters
  • sentences_outfile (Union[str, TextIO, None]) – file to serialize sentence-level graphs to

  • documents_outfile (Union[str, TextIO, None]) – file to serialize document-level graphs to

Return type

Optional[str]