arXiv Bulk Data

Full-text and source bulk distribution channels: an Amazon S3 Requester-Pays bucket containing every arXiv PDF and source archive, plus a periodically refreshed Kaggle dataset of the complete metadata corpus.

API entry from apis.yml

apis.yml Raw ↑
name: arXiv Bulk Data
description: 'Full-text and source bulk distribution channels: an Amazon S3 Requester-Pays bucket containing
  every arXiv PDF and source archive, plus a periodically refreshed Kaggle dataset of the complete metadata
  corpus.'
humanURL: https://info.arxiv.org/help/bulk_data.html
baseURL: https://info.arxiv.org/help/bulk_data_s3.html
tags:
- Bulk Data
- Open Data
properties:
- type: Documentation
  url: https://info.arxiv.org/help/bulk_data.html
- type: Resources
  url: https://info.arxiv.org/help/bulk_data_s3.html
  title: Amazon S3 Bulk Buckets
- type: Resources
  url: https://www.kaggle.com/datasets/Cornell-University/arxiv
  title: Kaggle arXiv Dataset