How to Scrape Twitter Data
Scraping social media sites like Twitter doesn’t need to be complicated. In fact, Twitter wants you to scrape their data using the Official Twitter API, which allows you to scrape 10,000 Tweets per month on their basic tier, with higher levels available on their enterprise plans!
Don’t Use Web Scrapers
Web scraping tools and unofficial web scraping platforms that (falsely) promise to scrape data from any website violate Twitter’s Terms of Service because these data scrapers (poorly) attempt to extract data from Twitter’s website instead of their API. Since Twitter prohibits this, it is not only unreliable, but also illegal for any third party to assist you in violating Twitter’s Terms and lawsuits are already being filed against these scraping tools.
Do you really want to depend on a Twitter web scraping service that’s likely to get sued in the future? You may even become part of the lawsuit in the discovery process!
Furthermore, any twitter scraper tools that runs on your computer will jeopardize your own IP address and reputation, resulting in being blocked & banned from Twitter and other large sites that share reputation information.
You may have also tried coding your own “screen scraper,” perhaps with a Python module, but you can be assured that these approaches will all break eventually. See Twitter Scrapers Are All Broken. What Should We Do?
Twitter Scraping via API
If you’ve never used an API (application programming interface) before - or earlier versions of the Twitter API V1 were too confusing - this page will help to alleviate these concerns and offer user-friendly options for scraping publicly available Twitter data the right way, using the new and improved Twitter API V2.
The Twitter API underwent a major overhaul in launching the V2 version in late 2021, making it much easier to use and no longer requires the approval process that made the older V1 version difficult to use. They also have very generous quotas now, allowing for massive data scraping which is great for data science projects.
Getting Your Twitter API Key
The first step to using the Twitter API is to obtain your Twitter API key from the Twitter Developer Portal. We’ve written an article detailing how to get your Twitter API Key in 5 Minutes with a full video tutorial!
Academic & Commercial Access to Even More Data
If you’re a student or affiliated with a university, you can apply for the Twitter API Academic Research Product Track and scrape up to 10,000,000 Tweets per month, as well as access the Twitter Historical Archive and scrape Tweets from 2006.
If you need this access but are not affiliated with a university, you can Join the the Elevated+ Waitlist, which is a better match for companies needing historical or large amounts of Twitter data. There’s also the older Twitter API v1.1 Premium Search Endpoint if you absolutely need to scrape historical Twitter data today and can’t wait. We support downloading historical data from this endpoint via our Twitter API v1.1 Full Archive Search Workflow.
Twitter Scraper for Search Results
As a first step, we suggest following the Twitter API Getting Started Guide on Step 3, where they cover how to scrape search results, essentially building a Tweet scraper:
curl --request GET 'https://api.twitter.com/2/tweets/search/recent?query=from:twitterdev' --header 'Authorization: Bearer $BEARER_TOKEN'
Where you need to replace $BEARER_TOKEN
with your Twitter API Bearer Token we mentioned earlier. Executing this command will query the Recent Tweets Search Endpoint and return data matching the query, in this case from:twitterdev
or Tweets that were from the @TwitterDev account within the past 7 days.
The response format will look like this, showing only the Tweet ID and text of the Tweet by default:
{
"data": [
{
"id": "1373001119480344583",
"text": "Looking to get started with the Twitter API but new to APIs in general? @jessicagarson will walk you through everything you need to know in APIs 101 session. Sheβll use examples using our v2 endpoints, Tuesday, March 23rd at 1 pm EST.nnJoin us on Twitchnhttps://t.co/GrtBOXyHmB"
},
...
],
"meta": {
"newest_id": "1373001119480344583",
"oldest_id": "1364275610764201984",
"result_count": 6
}
}
Getting More Twitter Data Back
One of the best features of the new V2 Twitter API is that they allow you to tell the API exactly what data you want back. For example, if you want to get the usernames, bios and follower counts of everyone who posted with a hashtag in the past 7 days, you can easily do this in one API call now!
You simply need to specify the fields & expansions query parameters for the Twitter API Search Endpoint.
In this example, we want to first change the query to a hashtag, e.g. #beer
and set expansions
to author_id
(telling the API to return more data back for the author_id
field of each Tweet, now giving you an account scraper). We also want to include the description
(or public bio) and public_metrics
(for follower count) of each user, so we will supply them in the user.fields
parameter.
Our query will now look like this:
curl --request GET 'https://api.twitter.com/2/tweets/search/recent?query=%23beer&expansions=author_id&user.fields=description%2Cpublic_metrics' --header 'Authorization: Bearer $BEARER_TOKEN'
And the response will now look like this:
{
"data": [
{
"author_id": "1633426388",
"id": "1564987610530988033",
"text": "RT @bmurphypointman: #travel #bitcoin #reddit #blog #twitter #facebook #instagram #blogger #socialmedia #tiktok #vlog #deal #gift #deals #g\u2026"
},
...
],
"includes": {
"users": [
{
"name": "Chr\u20acri",
"public_metrics": {
"followers_count": 3395,
"following_count": 420,
"tweet_count": 207054,
"listed_count": 514
},
"username": "mOQIl",
"id": "1633426388",
"description": "Just a girl who loves travel \u2764\ufe0f ice cream fanatic forever \u2764\ufe0f \u2764\ufe0f \u2764\ufe0f"
},
...
]
},
"meta": {
"newest_id": "1564987610530988033",
"oldest_id": "1564985619885039616",
"result_count": 10,
"next_token": "b26v89c19zqg8o3fpz8ll44gzg9q2o07qus7r86ljwx31"
}
}
While our data
list still looks the same (remember, we want to focus on the users here, not the Tweets), you’ll notice a new list returned under includes.users
with the user details of all Twitter users who posted with #beer
recently, including their user id, bios and follower counts!
We can also apply this method to the Tweets, e.g. if we want to see when they were made and their engagement metrics, we would simply add tweet.fields=created_at,public_metrics
to our request. You can also use this as a Twitter media scraper if you specify to scrape back attached media like images & videos, then the API will return links to these assets you can download.
Scraping Limits
Basic access to the Twitter API V2 (no approval needed) will allow you to scrape up to 10,000 Tweets per month. There is currently no limit on the number of users you can export (e.g. when using the Twitter Followers Scraper, though Twitter may change this - so check your usage often). You will need to be mindful of Twitter API Rate Limits though, which vary for each endpoint. For example, the Twitter Recent Search Endpoint is limited to 450 requests per 15 minute window according to the “Rate limit” section on the webpage.
Python Twitter API Scrapers
While the above examples are fun ways to get started, in practice you probably won’t be manually forging curl
commands and then copy-pasting the JSON responses into something useful! If you’re set on scraping Twitter data yourself (and are willing to write your own code), you should consider a Python Twitter scraper.
we recommend using the Tweepy Python Module as it queries the official Twitter API and will not suddenly break like other Python modules that attempt to scrape Twitter’s website (e.g. snscrape and twitterscraper). Tweepy should help you download bulk CSV files from Twitter’s API with minimal coding and Python knowledge.
There’s also TwitterAPI which is another Python Module on GitHub, but does not appear to have a CSV export option.
No-Code Twitter API Scraping Service
If you’d rather not deal with maintaining code or reinventing the wheel, our service will scrape data directly from Twitter’s API on your behalf, delivering bulk CSV files (up to millions of results) without you needing to install anything or run any code.
Basic Plan
With our basic plan, you’ll be able to scrape individual API endpoints one at a time and get back however many results these endpoints return. E.g. you can follow our example above with our Twitter Search API Scraper and get back up to 100 results at a time, downloaded as CSV files:
You can also use other endpoints like the Twitter Followers Scraper for exporting Twitter follower lists, but will be limited to downloading 1,000 Twitter accounts at a time per CSV file. You can also download Twitter following list using the Twitter Following Scraper.
Plus Plan
Our plus plan will perform pagination for you (combining multiple pages of results for you) and allow you to combine multiple queries together and aggregate all results into a single CSV file for any Twitter API endpoint.
This will allow you to scrape millions of Tweets & Twitter profiles without worrying about infrastructure or coding as our service is 100% cloud based and can act as your Twitter profile scraper. Hence, we can run jobs for you that take days or weeks (e.g. scraping 100M+ followers) effortlessly on our system while you focus on how you’re going to use this data effectively.
Need More Twitter API Functionality?
Our platform is 100% customizable! If you need to add or change some parameters for any endpoint, simply clone the endpoint and make your changes (which will only be visible to you). You can also tweak your own workflows for bulk data collection and add or remove extractors to capture different types of data returned automatically. Simply reach out to support if you need any help with this!