Scraping YouTube Data
YouTube offers an official Data API that lets you interect with the app programatically, including searching for public content and retreiving public information about channels & videos we’re interested in.
Getting Started
To use the official API, you’ll need a Google account and will need to register to use the API. Don’t worry, it’s pretty simple and Google provides tutorials.
Videos
Object Hierarchy
Each YouTube user can have multiple channels. Each channel then has one or more playlists. Each playlist then has a collection of videos.
- Username -> Channel IDs - Use the
forUsername
filter. - Channel ID -> Playlists - Use the
channelId
filter. - Playlist -> Playlist Items - Use the
playlistId
filter. Note that a playlist item will have aVideo ID
that can be used to get comments and other information about the video. - Video ID -> Comment Threads - Use the
videoId
filter. - Comment Thread -> Comments - Use the
parentId
filter.
Response Data
Most (if not all) of the endpoints have a part
parameter which specifies what types of data you want the YouTube API to return. It’s a comma-separated list and the more data you’d like back from YouTube, the more “credits” they will charge to your account.
Channel Videos Example
You may have a specific target channel in mind you’d like to get videos for.
Get the Username
You’ll first need to get the username, which may be different from the URL name.
E.g. If I go to https://www.youtube.com/stevesiedata, the URL key is stevesiedata
, however this is not the username!
To get the username of the channel, click on something like the Videos tab and you’ll notice the URL change to something like https://www.youtube.com/user/StevesieLLC/featured which reveals the username! In this case, the username is stevesiellc
and NOT stevesiedata
.
Get the Channel Info
Now to get the channel info, we can use the User Channels integration to get the channel info from the username:
We can see in the response, that the channel ID is UCArmutk8nAbYQdaYzgqKOwA
.
We can also see the Playlist ID for all of the channel’s uploads:
We’ll use this playlist ID UUArmutk8nAbYQdaYzgqKOwA
to fetch all the videos for the channel.
Get the Playlist Videos
Now that we know the playlist ID, we just need to use the Playlist Videos integration and enter the playlist ID:
We can now download this response back in CSV or JSON format for further analysis.