Serializing the UDS datasetΒΆ

The canonical serialization format for the Universal Decompositional Semantics (UDS) dataset is JSON. Sentence- and document-level graphs are serialized separately. For example, if you wanted to serialize the entire UDS dataset to the files uds-sentence.json (for sentences) and uds-document.json (for documents), you would use:

from decomp import uds

uds.to_json("uds-sentence.json", "uds-document.json")

The particular format is based directly on the adjacency_data method implemented in NetworkX

For the sentence-level graphs only, in addition to this JSON format, any serialization format supported by RDFLib can also be used by accessing the rdf attribute of each UDSSentenceGraph object. This attribute exposes an rdflib.graph.Graph object, which implements a serialize method. By default, this method outputs rdf/xml. The format parameter can also be set to 'n3', 'turtle', 'nt', 'pretty-xml', 'trix', 'trig', or 'nquads'; and additional formats, such as JSON-LD, can be supported by installing plugins for RDFLib.

Before considering serialization to such a format, be aware that only the JSON format mentioned above can be read by the toolkit. Additionally, note that if your aim is to query the graphs in the corpus, this can be done using the query instance method in UDSSentenceGraph. See Querying UDS Graphs for details.