The papers are provided as json objects, one per line. Newer archives (2018 and newer) are partitioned in batches and shared as a collection of gzipped files. Pre-2018 archives are available as a single file.
To download the newer, partitioned archives, you have two choices:
A license agreement is provided. By downloading this data you acknowledge that you have read and agreed to all the terms in this license.
The preferred method for download is to use the AWS CLI to download direclty from S3:
aws s3 cp --recursive s3://ai2-s2-research-public/open-corpus/ destinationPath
Alternatively, you can download the manifest via http, and use it to download all archive files via http as well. For example, using
wget https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/manifest.txt wget -B https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/ -i manifest.txt