Scrape Other YouTube Users’ Subscriptions & Find Patterns
If you have a YouTube channel, like Stevesie Data, you may be interested in understanding your subscribers a little more so you can make better content in the future. You could call this “data-driven” content generation, and we’ll go over how to do this using the YouTube Data API.
Get Your Subscriber List
You’ll first want to get the list of your own subscribers, which you can do with the YouTube Subscribers Formula. Simply import the formula and follow the directions, you’ll get back a single CSV file containing your subscribers.
Your Subscribers’ Subscriptions
Once we have our own list of subscribers, we want to take a look at what other channels they subscribe to, so we can better understand what kind of content they’re into. All you’ll need is the list of items.subscriberSnippet.channelId
s from the previous step’s CSV file. Take the valeus from this column and input them into the YouTube Subscriptions Formula‘s source input collection (after importing it into your account):
The platform will de-duplicate the IDs for you and you’ll see the total number of subscribers you can get the subscriptions for. In my case, I have 229 subscribers (who I have the IDs for because they made their subscription data public) out of 642 total subscribers.
Per-User Subscription Pagination Limit
Since some users may have A LOT of subscriptions, you may want to address this warning and set a per-user pagination limit (or else this may run for a very long time):
Simply follow the link and for now we’ll use a value of 4
(meaning we’ll make up to 4 additional requests per user when getting their list of subscriptions, so we’ll get up to 5 x 50 = 250 subscriptions per user on our list):
Run the workflow and you’ll get back the other channels your subscribers subscribe to, so we can try to look for patterns and see what channels multiple subscribers of yours subscribe to.
Analyze Subscription Data
When the workflow is done, you’ll have a file named YouTube_Subscriptions_Items.csv
, you can save this on your Desktop or somewhere else so we can access and analyze it. We’ll use Pandas running on a Jupyter Notebook - the first thing you’ll need to do is open the file and load it into a data frame:
import pandas as pd
SUBSCRIPTIONS_FILEPATH = '~/Desktop/YouTube_Subscriptions_Items.csv'
subscriptions = pd.read_csv(SUBSCRIPTIONS_FILEPATH).drop_duplicates([
'input.channel_id',
'items.snippet.resourceId.channelId',
])
Once you have the subscriptions data in a dataframe, you can group by the channel name and then order by frequency count, so you can see which other channels your subscribers subcribe to (here I limited to the top 50):
subscriptions \
.groupby('items.snippet.title') \
.count()[['input.channel_id']] \
.sort_values('input.channel_id', ascending=False)[:50]
Here are my results, showing the top 10 channels (with my own channel being first, no surprise):
Channel Name | Subscriber Count |
---|---|
Stevesie Data | 228 |
TEDx Talks | 53 |
freeCodeCamp.org | 47 |
TED | 39 |
Google Developers | 35 |
sentdex | 32 |
Traversy Media | 32 |
Computerphile | 32 |
Vsauce | 31 |
Siraj Raval | 31 |
Conclusions
So from this data, I can see that my subscribers are heavily interested in TED talks and software development tutorials! So I’m going to make sure I’m subscribed to all of these channels and follow their content closely as my audience is collectively interested in what they are doing.