How to Scrape YouTube Search Results via Data API
There’s so much content on YouTube it can be overwhelming… which is why YouTube has developed a very powerful search feature, making it the #2 search engine in the world (second only to Google).
You may be wondering how you can analyze and use this raw data (e.g. tracking your own videos’ search rankings, spotting trending videos in a topic that interests you), and ultimately want to understand what YouTube users are interested in.
Tracking search results is also very useful, as the YouTube API we’ll discuss briefly gives you the search rankings that mimic the YouTube searching function while in incognito mode. So if you’re wondering why the YouTube search results are not showing your video, you may want to consider using the API to see where it truly ranks when not using a web browser (or YouTube app) to access the search results.
Don’t Use Web Scrapers
You may have seen some solutions that try “screen scrape” the data from YouTube’s search results pages, despite this practice being a violation of YouTube’s Terms of Service, which can quickly get your YouTube account and IP address banned.
If you’ve tried these scrapers already and found them to be unreliable and returning bad or incomplete data, we’ll explain here how to use the Official YouTube Data API to properly scrape YouTube search results the correct way.
Collecting With Stevesie Data
We’ll discuss how to use the Official YouTube API’s Search Results Endpoint to scrape YouTube search results using the Stevesie Data Platform to relieve us of having to write any code.
Stevesie Data is a paid platform and the rest of this article assumes you have the basic or plus plan for running workflows. If you do not want to use a paid platform, then you can refer to the links above and directly access the YouTube Data API at your own time, expense and effort. Disclaimer: I, the author of this article, happen to own the Stevesie Data Platform.
Getting Started
To get started, we’ll use the Stevesie YouTube Search Integration and experiment with a few query combinations to understand exactly how the API works and what kind of data we can get back.
You’ll need a YouTube API Key to get started, which is free and easy to obtain. Please see how to get your YouTube API Key for help. Once you have your key, you’re good to go - just paste it into the “YouTube API Key” when prompted:
Simple Text-Based Search
Let’s start with a simple feature - searching by keyword, just enter in a normal term under the “Query” input:
And you’ll get back up to 50 results for the first “page” of results. But unlike the YouTube website, here you can download these results as CSV or JSON format using the Stevesie Platform for your own use:
You’ll notice that we get back an id.kind
column - this is because just like the YouTube Website Search, the YouTube Search Results API returns co-mingled results of individual videos, channels and playlists.
Finding Channels
We can follow the example above, but this time let’s make one change and set the “Type” input to channel
:
This will now force the API to return only YouTube channels that match our search query, in this case “craft beer:”
Ordering By Popularity
If you’re looking for influential YouTube channels or videos, odds are you want to sort by some popularity metric. While the default sort order is by relevance
, which takes creator popularity into account, you can get a little more specific in exactly how the results are ordered. E.g. if we’re searching for channels, we have the option to sort by videoCount
to return the channels that have published the most videos for our search query:
Now we’ll see that our #1 result has changed and is a channel with 54K subscribers and a LOT of videos! While this won’t always return the most popular channels, sorting by number of videos is a good starting point, and you can always look up channel details later on and then filter out channels under a certain subscriber or total view count.
Boolean Text Searches
Let’s revisit the “Query” parameter, and we’ll notice that YouTube provides us with an interesting example of boating|sailing -fishing
, which means to show content on boating OR sailing, but NOT fishing. In this context, the |
(pipe symbol) means we’ll take either of the 2 keywords and the -
tells us to exclude that keyword. When we provide this value and get the results back, you’ll see that we get non-fishing related results about sailing and even yachting:
Searching by Topics
While keywords are great for finding specific channels and content, they can sometimes be a little “too” specific if we’re instead trying to scrape broader results about a category or topic. In this case, we can remove our keywords and instead search by topic.
YouTube doesn’t have a full list of all topics it supports (unfortunately), so you’ll need to get them from existing channels. E.g. if you go to the channel details endpoint and look up a similar channel, you’ll see topicIds
in the response:
So here we can see that the Topic ID /m/07c1v
is code for the topic “Technology” - so we can paste this in under “Topic ID” on the search endpoint to get back all channels about technology, sorted by number of videos:
Now the channels in the search results are about technology and ordered by how many videos they’ve published:
This will also work for videos, just change the “Type” to video
and change “Order” to viewCount
, and now we’ll get the most popular videos about Technology:
When we examine these results, we can see that they indeed have a lot of views (several million), but they are brand new videos. There must be videos that have more views than this!
Unfortunately though, this just appears to be undocumented behavior of YouTube’s API, where they defer to only recent videos when using Topic ID. When this happens, you just may need to try a lot of other things, like let’s check the similar but different “Video Category ID” input next and see if that can help.
Searching by Category
So let’s remove the Topic ID, since it doesn’t seem to work so well with videos, and go to the Video Categories Endpoint to get the “Technology” category. Just provide your API key and execute the endpoint to get the list back, then look for Technology:
Here we can see the ID of the category is 28
, so we’ll provide that under “Video Category ID” (remember to remove the value for Topic ID):
Now our results include a live Space X livestream and a video with over 6M views about the “Counterflow Centrifugation System”:
The top result is an ongoing live stream, so we can’t see the total unique view count, but we can assume it’s higher than 6M. This raises another question now - what if we only want to get results back for live streams?
Filtering Popular Live Streams
We can tell the YouTube API to only return live streams (either live now, or completed in the past) by specifying the “Event Type” input - here we’ll tell it to give us currently live streams about technology (as we still have that category ID applied):
And with our order set to viewCount
, the YouTube API will return currently live streams in descending order of how many people are watching them RIGHT NOW (or at the time of your query). So you can use this approach to answer the question… what’s the most popular live stream in Technology (or whatever category interests you) RIGHT NOW:
Historical Search Results
Now let’s remove the live
option from our event type and go back in time. Let’s also remove our “Video Category ID” and instead search for bitcoin
in the “Query” field, and we’ll also provide values for the published before and after fields:
With the “Order” still set to viewCount
, we’ll now get the most viewed (to date) videos about bitcoin that were published in 2010:
Search Related Videos
Let’s remove our time ranges and text query, and see all of the related videos to the top result with Video ID YmPg4V-YE0k
by entering it into the “Related to Video ID” input:
And we’ll see the related videos are also a bit old (from 2011), even though we didn’t specify this in our API call:
Search Videos by Location
Let’s take a step back now and try to find videos about “craft beer”, but geo-tagged in New York (maybe I want to see some tours of breweries I can go to after I’m done writing this article). Let’s clear all of our inputs and just provide “Query” as craft beer
, keep “Order” as viewCount
, but this time we’ll set a location of New York coordinates (just Google search “New York coordinates” with whatever city you need; remember to add a “-” sign for S and W) and a radius of say, 10 miles:
Perfect, I can see I got back 15 results about craft beer breweries near my location in New York:
Search Videos by Channel
In the last example, we can see that the search results are from the same channel ID UCi9glr7G9SUzNIA8heLlTuA
, so if we’re happy with these videos, we can search that channel for more videos. Simply enter the Channel ID when prompted and remove the location and query inputs so we just get all the videos for the channel (leave “Order” as viewCount
):
We can now see all the videos this channel has published, order by view count:
Scraping Bulk YouTube Search Results
While these examples have been fun, they’ll only allow you to download up to 50 results per search. However, in practice you will want to download a lot more videos or channels from the YouTube search results for your project to find usable patterns or for prototyping. For this, you’ll want to use the YouTube Search - Pagination workflow, which will auto-paginate and combine results for you.
Simply import this workflow into your account and then enter in whatever combination of inputs that worked above on the individual search results endpoint. When entered into a workflow, we will simply paginate through the results for you (getting the next set of 50 and so on, up to whenever YouTube cuts us off which is typically at 500 results).
Combining Multiple Searches
If you need more than 500 search results, your best bet is to split up your searching into smaller queries and then let the workflow run the searches for those and it will combine the results together. E.g. let’s say we wanted to get videos not just on craft beer, but other forms of liquid entertainment, we would enter in one search term per line in the workflow prompt:
The workflow will then run a search (and full pagination) for each search term and combine the results together. This way we can get up to 2,500 search results back (not just 500). Another approach you can try is to enter in differing sort orders, e.g. if you enter in the following under the “Order” workflow input:
relevance
date
viewCount
Then the workflow will conduct 3 searches per search term, so 15 total searches (where each can return up to 500 results), so you can now get back up to 7,500 total results. You can apply this to other inputs as well, just be careful as each new multi-line input will result in multiplying the total number of searches performed!