decomp.semantics.uds.corpus¶
Module for representing UDS corpora.
- class decomp.semantics.uds.corpus.UDSCorpus(sentences=None, documents=None, sentence_annotations=[], document_annotations=[], version='2.0', split=None, annotation_format='normalized')¶
A collection of Universal Decompositional Semantics graphs
- Parameters
sentences (
Optional
[PredPattCorpus
]) – the predpatt sentence graphs to associate the annotations withdocuments (
Optional
[Dict
[str
,UDSDocument
]]) – the documents associated with the predpatt sentence graphssentence_annotations (
List
[UDSAnnotation
]) – additional annotations to associate with predpatt nodes on sentence-level graphs; in most cases, no such annotations will be passed, since the standard UDS annotations are automatically loadeddocument_annotations (
List
[UDSAnnotation
]) – additional annotations to associate with predpatt nodes on document-level graphsversion (
str
) – the version of UDS datasets to usesplit (
Optional
[str
]) – the split to load: “train”, “dev”, or “test”annotation_format (
str
) – which annotation type to load (“raw” or “normalized”)
- add_annotation(sentence_annotation, document_annotation)¶
Add annotations to UDS sentence and document graphs
- Parameters
sentence_annotation (
UDSAnnotation
) – the annotations to add to the sentence graphs in the corpusdocument_annotation (
UDSAnnotation
) – the annotations to add to the document graphs in the corpus
- Return type
None
- add_document_annotation(annotation)¶
Add annotations to UDS documents
- Parameters
annotation (
UDSAnnotation
) – the annotations to add to the documents in the corpus- Return type
None
- add_sentence_annotation(annotation)¶
Add annotations to UDS sentence graphs
- Parameters
annotation (
UDSAnnotation
) – the annotations to add to the graphs in the corpus- Return type
None
- property document_edge_subspaces: Set[str]¶
The UDS document edge subspaces in the corpus
- Return type
Set
[str
]
- property document_node_subspaces: Set[str]¶
The UDS document node subspaces in the corpus
- Return type
Set
[str
]
- document_properties(subspace=None)¶
The properties in a document subspace
- Return type
Set
[str
]
- document_property_metadata(subspace, prop)¶
The metadata for a property in a document subspace
- Parameters
subspace (
str
) – The subspace the property is inprop (
str
) – The property in the subspace
- Return type
- property document_subspaces: Set[str]¶
The UDS document subspaces in the corpus
- Return type
Set
[str
]
- property documentids¶
The document ID for each document in the corpus
- property documents: Dict[str, UDSDocument]¶
The documents in the corpus
- Return type
Dict
[str
,UDSDocument
]
- classmethod from_conll(corpus, sentence_annotations=[], document_annotations=[], annotation_format='normalized', version='2.0', name='ewt')¶
Load UDS graph corpus from CoNLL (dependencies) and JSON (annotations)
This method should only be used if the UDS corpus is being (re)built. Otherwise, loading the corpus from the JSON shipped with this package using UDSCorpus.__init__ or UDSCorpus.from_json is suggested.
- Parameters
corpus (
Union
[str
,TextIO
]) – (path to) Universal Dependencies corpus in conllu formatsentence_annotations (
List
[Union
[str
,TextIO
]]) – a list of paths to JSON files or open JSON files containing sentence-level annotationsdocument_annotations (
List
[Union
[str
,TextIO
]]) – a list of paths to JSON files or open JSON files containing document-level annotationsannotation_format (
str
) – Whether the annotation is raw or normalizedversion (
str
) – the version of UDS datasets to usename (
str
) – corpus name to be appended to the beginning of graph ids
- Return type
- classmethod from_json(sentences_jsonfile, documents_jsonfile)¶
Load annotated UDS graph corpus (including annotations) from JSON
This is the suggested method for loading the UDS corpus.
- Parameters
sentences_jsonfile (
Union
[str
,TextIO
]) – file containing Universal Decompositional Semantics corpus sentence-level graphs in JSON formatdocuments_jsonfile (
Union
[str
,TextIO
]) – file containing Universal Decompositional Semantics corpus document-level graphs in JSON format
- Return type
- property ndocuments¶
The number of IDs in the corpus
- query(query, query_type=None, cache_query=True, cache_rdf=True)¶
Query all graphs in the corpus using SPARQL 1.1
- Parameters
query (
Union
[str
,Query
]) – a SPARQL 1.1 queryquery_type (
Optional
[str
]) – whether this is a ‘node’ query or ‘edge’ query. If set to None (default), a Results object will be returned. The main reason to use this option is to automatically format the output of a custom query, since Results objects require additional postprocessing.cache_query (
bool
) – whether to cache the query. This should usually be set to True. It should generally only be False when querying particular nodes or edges–e.g. as in precompiled queries.clear_rdf – whether to delete the RDF constructed for querying against. This will slow down future queries but saves a lot of memory
- Return type
Union
[Result
,Dict
[str
,Dict
[str
,Any
]]]
- sample_documents(k)¶
Sample k documents without replacement
- Parameters
k (
int
) – the number of documents to sample- Return type
Dict
[str
,UDSDocument
]
- property sentence_edge_subspaces: Set[str]¶
The UDS sentence edge subspaces in the corpus
- Return type
Set
[str
]
- property sentence_node_subspaces: Set[str]¶
The UDS sentence node subspaces in the corpus
- Return type
Set
[str
]
- sentence_properties(subspace=None)¶
The properties in a sentence subspace
- Return type
Set
[str
]
- sentence_property_metadata(subspace, prop)¶
The metadata for a property in a sentence subspace
- Parameters
subspace (
str
) – The subspace the property is inprop (
str
) – The property in the subspace
- Return type
- property sentence_subspaces: Set[str]¶
The UDS sentence subspaces in the corpus
- Return type
Set
[str
]
- to_json(sentences_outfile=None, documents_outfile=None)¶
Serialize corpus to json
- Parameters
sentences_outfile (
Union
[str
,TextIO
,None
]) – file to serialize sentence-level graphs todocuments_outfile (
Union
[str
,TextIO
,None
]) – file to serialize document-level graphs to
- Return type
Optional
[str
]