API Open Research Corpus Contact

Semantic Scholar Open Research Corpus

Over 45 million published research papers in Computer Science, Neuroscience, and Biomedical fields provided as an easy-to-use JSON archive.

If you are interested in one-off, request-based data, please see our RESTful API.

Download full, sampled, and archived versions of the corpus.

Example Paper Record

  "id": "4cd223df721b722b1c40689caa52932a41fcc223",
  "title": "Knowledge-rich, computer-assisted composition of Chinese couplets",
  "paperAbstract": "Recent research effort in poem composition has focused on the use of 
   automatic language generation...",
  "entities": [
    "Conformance testing",
    "Natural language generation",
    "Natural language processing",
    "Parallel computing",
    "Stochastic grammar",
    "Web application"
  "s2Url": "https://semanticscholar.org/paper/4cd223df721b722b1c40689caa52932a41fcc223",
  "s2PdfUrl": "",
  "pdfUrls": [
  "authors": [
      "name": "John Lee",
      "ids": [
  "inCitations": [
  "outCitations": [
  "year": 2016,
  "venue": "DSH",
  "journalName": "DSH",
  "journalVolume": "31",
  "journalPages": "152-163",
  "sources": [
  "doi": "10.1093/llc/fqu052",
  "doiUrl": "https://doi.org/10.1093/llc/fqu052",
  "pmid": ""

Attribute Definitions

id  string

S2 generated research paper ID.

title  string

Research paper title.

paperAbstract  string

Extracted abstract of the paper.

entities  list

Extracted list of relevant entities or topics.

s2Url  string

URL to S2 research paper details page.

s2PdfUrl  string

URL to PDF on S2 if available.

pdfUrls  list

URLs related to this PDF scraped from the web.

authors  list

List of authors with an S2 generated author ID and name.

inCitations  list

List of S2 paper IDs which cited this paper.

outCitations  list

List of S2 paper IDs which this paper cited.

year  int

Year this paper was published as integer.

venue  string

Extracted publication venue for this paper.

journalName  string

Name of the journal that published this paper.

journalVolume  string

The volume of the journal where this paper was published.

journalPages  string

The pages of the journal where this paper was published.

sources  list

Identifies papers sourced from DBLP or Medline.

doi  string

Digital Object Identifier registered at doi.org.

doiUrl  string

DOI link for registered objects.

pmid  string

Unique identifier used by PubMed.

Citing this corpus

We request that any published research that makes use of this data cites the following paper:
Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL. https://www.semanticscholar.org/paper/09e3cf5704bcb16e6657f6ceed70e93373a54618

BibTex format:

          {"title={Construction of the Literature Graph in Semantic Scholar},"}
          {"author={Waleed Ammar and Dirk Groeneveld and Chandra Bhagavatula and Iz Beltagy and Miles Crawford and Doug Downey"}
          {" and Jason Dunkelberger and Ahmed Elgohary and Sergey Feldman and Vu Ha and Rodney Kinney"}
          {" and Sebastian Kohlmeier and Kyle Lo and Tyler Murray and Hsu-Han Ooi and Matthew Peters and Joanna Power"}
          {" and Sam Skjonsberg and Lucy Lu Wang and Chris Wilhelm and Zheng Yuan and Madeleine van Zuylen and Oren Etzioni},"}