Scrape YouTube Video Comments Legally via Official API
Let’s start with a simple example and extract all the comments from this amazing video:
The URL of this video is https://www.youtube.com/watch?v=Mzj3_FjuDuI
, so the Video ID can be taken from this URL as Mzj3_FjuDuI
, which is what we’ll need to pass to the YouTube API to get the comments & replies back.
We’ll use the Stevesie Data YouTube Video Comments Integration on this page to demonstrate what a single request and response looks like. Just open up the page and provide your Video ID and YouTube API key:
Hit execute to query the YouTube API and return the results as parsed collections, which you can immediately download as CSV files and import into Excel:
You’ll also notice an items > replies > comments
collection with the replies to all of the comments:
You can download both as CSV files and be on your way (the comment replies CSV file will have a reference to the parent comment in the file), just be warned that the YouTube API only gives us up to 5 replies per comment using this endpoint (in chronological order). If it’s absolutely necessary to scrape ALL the comment replies, please see the section below.
Where Are ALL the Replies?
To demonstrate how to scrape all comment replies, we’ll use a slightly more popular video than the one referenced above, e.g. one by Mr. Beast here: https://www.youtube.com/watch?v=cV2gBU6hKfY
with 145,000 comments at the time of writing.
We can try the endpoint above with Video ID cV2gBU6hKfY
, but this time we’ll set “Order” to relevance
so we can match how the webpage shows us the comments and validate what we’re getting back matches the webpage (this is also useful for videos with A LOT of comments, e.g. more comments than we want to scrape, so we can just focus on the most important ones):
Now the results will match the webpage’s most popular comments, e.g. the pinned comment by Mr. Beast with ID UgyMWI-CSDcwWQNw85x4AaABAg
shows up first in our results:
Now when we download the comment replies (the items > replies > comments
collection) as a CSV file, we’ll notice that we only get 5 replies per parentID
, which keeps switching every 5 rows:
And when we cross-reference the official YouTube page, we see that these 5 comment replies are the first 5 comment replies in chronological order, and not the most up-voted (even though we set sort to relevance
):
Scraping ALL Comment Replies
To get all the replies back for an individual comment, we need to use the Comment Replies Integration which will let us enter in an id
from the previous step’s root comments (or items
collection), and get back ALL the replies back for that one comment, not just the first 5. Simply provide the Comment ID:
Now we’ll see the same data as before, but with 100 results this time instead of just 5:
There are actually more than 100 replies here, these are just the first 100. You can refer to the pagination instructions on the endpoint (look for nextPageToken
in the response) or import the YouTube Comment Replies - Pagination workflow formula to automatically paginate through all the replies.
Scraping ALL Comments
You may have noticed that just like the comment replies endpoint, the Video Comments Endpoint also returned only 100 results for the items
or “Comments” collection:
In order to get the next 100, we need to look for the nextPageToken
in the initial response and then pass that on to the “Pagination Token” input to get the next 100 and so on. Or you can simply use the YouTube Video Comments & Replies - Pagination workflow, which will do this for you automatically and go through the full list of comments.
Remember to set the order to time
(chronolgical order) if you want to get as many comments as possible (e.g. the full list); although in this case with a Mr. Beast video (145K comments and counting), it may be more practical to set “Order” to relevance
(just get the top comments), otherwise the workflow may take a very long time to run and you’ll waste a lot of time scraping spammy comments if you insist on chronological order.