The arxiv plugin

The arxiv plugin provides support for arXiv/arXMLiv/ar5iv documents.

Metadata generation

arxiv metadata

Creates a large RDF file with metadata about arxiv documents. At the moment, this mostly links documents to corpora and arxiv categories. More metadata can be added in the future (e.g. publications dates).

The data is extracted from arXiv Dataset on kaggle.

python3 -m spotterbase.plugins.arxiv.arxiv_metadata_rdf_gen \
    --arxiv-raw-metadata=path/to/arxiv-metadata-oai-snapshot.json.zip

arXMLiv metadata

Requires the arXMLiv dataset to be downloaded.

python3 -m spotterbase.plugins.arxiv.arxmliv_metadata_rdf_gen \
    --arxmliv-release=2020 \
    --arxmliv-2020-path=path/to/arxmliv-2020