YouTube Comments Scraping by Channel or Video via Data API By steve


Scraping YouTube comments has become a popular pastime for researchers, marketers and product builders, however many of the YouTube comment scrapers out there are simply “doing it wrong” as they are resorting to unofficial “screen scraping” and not using the Official YouTube Data API to properly extract YouTube comments & replies.

YouTube Data API Comment & Replies Scraping

While we don’t want to name any names, if the “bot” you’re using does not explicitly state that it’s collecting data from the YouTube API or ask you for a YouTube API Key (which is free & easy to get), you cannot be sure that you’re collecting complete & accurate data - or that your “bot” will simply stop working once Google begins blocking the bot’s unauthorized activity.

Collecting With Stevesie Data

In this article we’ll discuss how to use the Official YouTube API’s Comment Threads Endpoint to scrape all of a video or channel’s YouTube comments & replies using the Stevesie Data Platform to relieve us of having to write any code.

Stevesie Data is a paid platform and the rest of this article assumes you have the plus plan for running workflows. If you do not want to use a paid platform, then you can refer to the links above and directly access the YouTube Data API at your own time, expense and effort. Disclaimer: I, the author of this article, happen to own the Stevesie Data Platform.

I - Scraping Comments & Replies by Video

Let’s start with a simple example and extract all the comments from this amazing video:

โณ

The URL of this video is https://www.youtube.com/watch?v=Mzj3_FjuDuI, so the Video ID can be taken from this URL as Mzj3_FjuDuI, which is what we’ll need to pass to the YouTube API to get the comments & replies back.

We’ll use the Stevesie Data YouTube Video Comments Integration to demonstrate what a single request and response looks like. Just open up the page and provide your Video ID and YouTube API key:

Enter YouTube Video ID & API Key

Hit execute to query the YouTube API and return the results as parsed collections, which you can immediately download as CSV files and import into Excel:

Comments left on the video

You’ll also notice an items > replies > comments collection with the replies to all of the comments:

Comment replies (up to 5 replies per thread)

You can download both as CSV files and be on your way (the comment replies CSV file will have a reference to the parent comment in the file), just be warned that the YouTube API only gives us up to 5 replies per comment using this endpoint (in chronological order). If it’s absolutely necessary to scrape ALL the comment replies, please see the section below.

Where Are ALL the Replies?

To demonstrate how to scrape all comment replies, we’ll use a slightly more popular video than the one referenced above, e.g. one by Mr. Beast here: https://www.youtube.com/watch?v=cV2gBU6hKfY with 145,000 comments at the time of writing.

We can try the endpoint above with Video ID cV2gBU6hKfY, but this time we’ll set “Order” to relevance so we can match how the webpage shows us the comments and validate what we’re getting back matches the webpage (this is also useful for videos with A LOT of comments, e.g. more comments than we want to scrape, so we can just focus on the most important ones):

Set Order to relevance for popular videos

Now the results will match the webpage’s most popular comments, e.g. the pinned comment by Mr. Beast with ID UgyMWI-CSDcwWQNw85x4AaABAg shows up first in our results:

Pinned comment with ID UgyMWI-CSDcwWQNw85x4AaABAg

Now when we download the comment replies (the items > replies > comments collection) as a CSV file, we’ll notice that we only get 5 replies per parentID, which keeps switching every 5 rows:

Only 5 replies per comment

And when we cross-reference the official YouTube page, we see that these 5 comment replies are the first 5 comment replies in chronological order, and not the most up-voted (even though we set sort to relevance):

First replies to the top comment

Scraping ALL Comment Replies

To get all the replies back for an individual comment, we need to use the Comment Replies Integration which will let us enter in an id from the previous step’s root comments (or items collection), and get back ALL the replies back for that one comment, not just the first 5. Simply provide the Comment ID:

Paste in the root comment ID to get ALL replies

Now we’ll see the same data as before, but with 100 results this time instead of just 5:

All replies to a comment

There are actually more than 100 replies here, these are just the first 100. You can refer to the pagination instructions on the endpoint (look for nextPageToken in the response) or import the YouTube Comment Replies - Pagination workflow formula to automatically paginate through all the replies.

Scraping ALL Comments

You may have noticed that just like the comment replies endpoint, the Video Comments Endpoint also returned only 100 results for the items or “Comments” collection:

First 100 Comments

In order to get the next 100, we need to look for the nextPageToken in the initial response and then pass that on to the “Pagination Token” input to get the next 100 and so on. Or you can simply use the YouTube Video Comments & Replies - Pagination workflow, which will do this for you automatically and go through the full list of comments.

Remember to set the order to time (chronolgical order) if you want to get as many comments as possible (e.g. the full list); although in this case with a Mr. Beast video (145K comments and counting), it may be more practical to set “Order” to relevance (just get the top comments), otherwise the workflow may take a very long time to run and you’ll waste a lot of time scraping spammy comments if you insist on chronological order.

II - Scraping Comments & Replies by Channel

While scraping comments by video is great if you have a single video of interest to you, it may be more helpful to scrape all the comments of an entire channel if you are working a more “niche” YouTube channel and want to get the comments from all videos for a particular channel.

To make this easy, we have the YouTube Channel Comments Integration works like the video comment endpoint in the previous section, but works with a YouTube Channel ID instead of a single Video ID. Just provide the Channel ID from the URL (e.g. the channel ID for this great channel https://www.youtube.com/channel/UCArmutk8nAbYQdaYzgqKOwA/ would be UCArmutk8nAbYQdaYzgqKOwA):

Channel ID to Scrape Comments From

Now execute the endpoint and you’ll want to take note of the videoID in the response - you’ll now get comments from a variety of videos within the same channel:

Channel Comments from All Videos

And just like in the previous section, you’ll also see up to 5 replies for each comment in the items > replies > comments section. To get more than the first 5 replies per comment, see the above section on scraping ALL the replies for comments.

Scraping ALL Comments

And just like before, we only get 100 comments per “page” (or response from the API endpoint). In order to get all of the comments for our channel, we need to “paginate” and get the next 100 comments and so on. We can either do this manually on the endpoint (look for the nextPageToken output and pass that in as the “Pagination Token”), or we can use the YouTube Channel Comments - Pagination workflow to automatically paginate for us. Just provide the Channel ID (or list of IDs) you want to scrape the comments of:

Enter in Channel IDs to Scrape Comments For

Run the workflow, and if your channel is somewhat small (say under 10K subscribers like mine), you’ll get back ALL the comments for ALL the videos in a few minutes:

All YouTube Channel Comments & Replies

This should be ALL of the root comments, but only the first 5 replies per comment. If you’re lucky enough to get more than 5 replies per comment, see below on how to ensure you scrape all the replies.

Scraping ALL Comment Replies

Once we have the output file YouTube_Channel_Comments.csv from the above workflow, we can now copy the items.id column to get a list of all the top-level Comment IDs in the channel we scraped, this way we can go through them all and scrape all of the comment replies. Just select the column as shown below:

Select the Comment IDs to Get All Replies

Now import the YouTube Comment Replies - Pagination workflow and this time you can paste in all of the Comment IDs (copied in the last step) into the workflow input, one comment ID per line (be sure to delete the items.id entry):

Enter a List of Comment IDs

Now when you run the workflow, it will make a request for every individual comment ID and get back all of the replies and combine them together. Keep in mind this will take a lot of requests (at least 947 plus pagination), so you may want to only run this for the top comments (most replies) of the channel you’re interested in.

Posted by steve on Nov. 16, 2021, 6:48 p.m. ๐Ÿšฉ  Report