Learn the basics for general API interactions - what they are and how you can scrape data.
While API stands for "Application Programming Interface," it's a lot less intimidating than this! An API is sort of "a website made for computers to read," compared to standard websites we're all familiar with that are made for "people to read."
APIs don't worry about making the content look pretty, colors, design, layout or anything else. They just provide the raw data so that other machines can consume this data and do what they want with it, potentially displaying it on a website.
For example, take a look at the Stevesie App Directory webpage - you'll see a nicely formatted directory of the apps we support. The "API" equivalent can be found at this URL: https://stevesie.com/cloud/api/v1/apps.
The API returns the same information as does the webpage (meant for people to read), but in a structured JSON format that machines can easily read. That's really all you need to know about scraping data from APIs, is that they are simply URLs you can access with a web browser or similar client to retreive structured data from.
If the API provider grants you explicit permission to access it with your own software, then you can consider it an "official" API. You'll typically get this permission through Googling around for API documentation, such as from Twitter, Spotify, etc...
Many websites and mobile apps use APIs to power their apps and websites, allowing these services to maintain a single API containing the data that is to be displayed with multiple clients that display this data (websites, mobile apps, smart TV apps, etc...).
These unofficial APIs are typically meant to serve data to a client (web browser), that will then transform the raw data into something presentable for humans to read. Because we can uncover the URLs and behaviors of these underlying APIs, we can use them directly to scrape structured data from many third-party services as an alternative to performing screen scraping, which is often brittle and constantly breaking with presentation updates.
Many newer websites now use APIs to power what's displayed on the browser, especially for dynamic features such as searching, showing results on map, paginating, etc... to quickly serve back data without a page refresh.
For example, if you enter a search term on the Stevesie App Directory in the search bar, you'll see results populate immediately which are coming from an API intended to power the website: https://stevesie.com/cloud/api/v1/search?query=twitter.
So you can mimic this search functionality by accessing the API endpoint directly, instead of interacting with the web interface as a middle-man (or writing web scraping software to do so).
You can uncover these "hidden" website APIs used to power these dynamic features using developer tools we'll cover in a later section.
Nearly all mobile apps use APIs to send and receive data between the app on your phone and the company's central servers. These APIs work similar to the search example above, but instead of having the web browser convert the data into something presentable, the app on your phone is doing that magic instead.
We'll go over how you can discover these APIs in later sections, but the principle is the same as using these APIs as a substitute to web scraping when the data is instead showing you information on an app.
While APIs are really just any URL out there on the public internet, many can require you sending a "secret" code to the API so it knows where the request is coming from.
When using an official API, the authentication process is usually clearly documented and allows the API to know which user is accessing it. For example, if you wanted to use the YouTube API to get your subscriber list, you'd need to send the API a secret authentication code to prove to YouTube you are who you say you are.
For unofficial APIs, you may need to work and experiment a bit to figure out how the API is authenticating you. Typically you can look in the HTTP headers for cookies or similar values and just reproduce those on subsequent requests - we'll touch more on this later.
Next: Discover APIs »