API Open Research Corpus Supp.ai Dataset Contact

Semantic Scholar Open Research Corpus

Semantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive.

If you are interested in one-off, request-based data, please see our RESTful API.

Download full, sampled, and archived versions of the corpus.

Example Paper Record

{
  "id": "4cd223df721b722b1c40689caa52932a41fcc223",
  "title": "Knowledge-rich, computer-assisted composition of Chinese couplets",
  "paperAbstract": "Recent research effort in poem composition has focused on the use of
   automatic language generation...",
  "entities": [
    "Conformance testing",
    "Natural language generation",
    "Natural language processing",
    "Parallel computing",
    "Stochastic grammar",
    "Web application"
  ],
  "fieldsOfStudy": [
      "Computer Science"
  ],
  "s2Url": "https://semanticscholar.org/paper/4cd223df721b722b1c40689caa52932a41fcc223",
  "s2PdfUrl": "",
  "pdfUrls": [
    "https://doi.org/10.1093/llc/fqu052"
  ],
  "authors": [
    {
      "name": "John Lee",
      "ids": [
        "3362353"
      ]
    },
    "..."
  ],
  "inCitations": [
    "c789e333fdbb963883a0b5c96c648bf36b8cd242"
  ],
  "outCitations": [
    "abe213ed63c426a089bdf4329597137751dbb3a0",
    "..."
  ],
  "year": 2016,
  "venue": "DSH",
  "journalName": "DSH",
  "journalVolume": "31",
  "journalPages": "152-163",
  "sources": [
    "DBLP"
  ],
  "doi": "10.1093/llc/fqu052",
  "doiUrl": "https://doi.org/10.1093/llc/fqu052",
  "pmid": ""
}

Attribute Definitions

id  string

S2 generated research paper ID.

title  string

Research paper title.

paperAbstract  string

Extracted abstract of the paper.

entities  list

Extracted list of relevant entities or topics.

s2Url  string

URL to S2 research paper details page.

s2PdfUrl  string

URL to PDF on S2 if available.

pdfUrls  list

URLs related to this PDF scraped from the web.

authors  list

List of authors with an S2 generated author ID and name.

inCitations  list

List of S2 paper IDs which cited this paper.

outCitations  list

List of S2 paper IDs which this paper cited.

year  int

Year this paper was published as integer.

venue  string

Extracted publication venue for this paper.

journalName  string

Name of the journal that published this paper.

journalVolume  string

The volume of the journal where this paper was published.

journalPages  string

The pages of the journal where this paper was published.

sources  list

Identifies papers sourced from DBLP or Medline.

doi  string

Digital Object Identifier registered at doi.org.

doiUrl  string

DOI link for registered objects.

pmid  string

Unique identifier used by PubMed.

fieldsOfStudy  list

Zero or more fields of study this paper addresses.


License

Semantic Scholar Open Research Corpus is licensed under ODC-BY.

When using the Semantic Scholar Open Research Corpus (“S2 ORC”) in a product or service, or including data in a redistribution, please cite the following paper:

Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL
https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618

This site is provided by The Allen Institute for Artificial Intelligence (“AI2”) as a service to the research community. The site is covered by AI2 Terms of Use and Privacy Policy. AI2 does not claim ownership of any materials on this site unless specifically identified. AI2 does not exercise editorial control over the contents of this site. AI2 respects the intellectual property rights of others. If you believe your copyright or trademark is being infringed by something on this site, please follow the "DMCA Notice" process set out in the Terms of Use.

BibTex format:

    {"@inproceedings{ammar:18,"}
    {"title={Construction of the Literature Graph in Semantic Scholar},"}
    {"author={Waleed Ammar and Dirk Groeneveld and Chandra Bhagavatula and Iz Beltagy and Miles Crawford and Doug Downey"}
    {" and Jason Dunkelberger and Ahmed Elgohary and Sergey Feldman and Vu Ha and Rodney Kinney"}
    {" and Sebastian Kohlmeier and Kyle Lo and Tyler Murray and Hsu-Han Ooi and Matthew Peters and Joanna Power"}
    {" and Sam Skjonsberg and Lucy Lu Wang and Chris Wilhelm and Zheng Yuan and Madeleine van Zuylen and Oren Etzioni},"}
    {"booktitle={NAACL},"}
    {"year={2018},"}
    {"url={https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618}"}