Scraping the New York Times API
If you’re looking for a New York Times dataset, like historical news articles on a topic or event, you can easily build your own using the Official New York Times Developer API to scrape NYT data straight from the source without resorting to illegal web scraping or violating the NYT Terms of Service.
We’ll walk you through the basics of using our service with the NYTimes API to download their data as Excel & CSV files for your own research for analyzing NYT datasets.
1. Get an API Key
Before you can use the NYT API, you’ll need to register for a developer account which is free and easy - just see their Getting Started Guide. Once you have your API key (or access code), you’re ready to use the API.
2. Run an Article Search
What’s great about the NY Times Online Archive API is that in addition to returning headlines and article previews for over 100 years ago, they also separate entities mentioned in each news article (people, places, companies, etc…), so you can get a sense of which companies, events, people, and places are related to each other over decades, which could be interesting for financial analysis or building a model to predict future events.
The most common use case we help customers with is scraping data from the Article Search API to search articles by free text and published date. You can use the API endpoint on this page (the links will appear when logged in) and perform a search with our service. Just provide your NYTimes API key and query details:
3. Download NYT Data
Our service will parse the scraped data from the NYT API and automatically convert it into downloadable CSV & Excel files you can start using right away. Below is a sample of what the data looks like for the first page of results:
Scraping 1,000s of Articles
While the first page of results is nice, you likely want to scrape a lot of different articles to perform free-text search on or perhaps train a large language model with. You’ll want to see our New York Times Article Search Pagination Workflow, which will automatically query & combine together responses over all pages of results so you get the full dataset into a single CSV file.
The workflow also lets you enter in a list of keywords to search for. So if you have a list of people, places, events or topics you want to collect NYT articles about, you can enter them into the workflow and it will automatically fetch and combine this data together for you.
Other API Endpoints
If you need to scrape other New York Times data than from the basic article search, you can see the Archive API Documentation and the Most Popular API, which lets you query articles based on what is being shared the most (via email, which was not around 100 years ago). These may be useful to get a sense of what people are currently paying attention to. There are also some other miscellaneous endpoints you can explore, like book reviews and movie reviews.