Using Python to scrape data from Instagram

Scrape instagram data
Extracting and processing large volumes of data in a reliable and consistent way from Instagram is not so simple! In fact, people usually end up paying for very expensive APIs, or failing to build something solid.

If you are working in marketing, public relations or as a content creator, you’ve certainly discovered how crucial it is to have a data-driven understanding of what is going on across the major social media. Together with other long-standing platforms such as Youtube and Twitter, or rising stars like TikTok, Instagram is one of the most used social media and is a critical source of data for marketers.

However, extracting and processing large volumes of data in a reliable and consistent way from Instagram is not so simple! In fact, people usually end up paying for very expensive APIs, or failing to build something solid.

In this blog post, we show how we can use EnsembleData’s APIs to extract data from Instagram in 3 lines of code and how these simple APIs can unlock infinite use-cases such as brand monitoring or influencer analysis.

What are EnsembleData’s Scraping APIs?

EnsembleData offers a set of APIs to extract data from all the major social medias, such as TikTok, Instagram and Youtube. All you need to do is make an API call and they do all the hard work, scraping and retrieving the requested data for you. In other words, it is a direct and efficient way to extract data across different social media at scale.

You can register for free on ED’s platform, to start using their APIs for free. In order to use the APIs, you just need your personal access token. You can find it once registered in the column on the left, as highlighted in yellow below:

Remember that in order for the token to be active, you need to validate your email address!

Using the Python library to make an request

Once registered and got the token, you can start using the APIs!
One method is to directly send the requests to ED; in the docs there are examples (in Python) on how to perform the calls. Alternatively, you can use the Python library. This provides a simpler and easier interface to send the requests.

Let’s take a look now at a very simple example. Suppose you want to get the recent posts of user, for example, to monitor what kind of content is being posted and its performance.

First you would need to download and access the library locally:

git clone https://github.com/EnsembleData/Instagram-Scraper
cd Instagram-Scraper

The main component is the class Instagram_I_ED. This allows you to perform all the different calls from one interface. When creating an instance of it, we pass the authentication token (which we got in the previous step).

from instagram_interface import Instagram_I_ED
TOKEN = "INSERT YOUR TOKEN HERE"
ig = Instagram_I_ED(token_ED_API=TOKEN)

We are now ready to perform the call! Now we’ll call the method, passing as input the search username (in this case cristiano) and any other (required or optional) parameters, in this case the depth (to control how many posts to retrieve) and the oldest timestamp (unix) :

res, success = ig.get_user_posts_from_username(username="cristiano", depth=2, oldest_timestamp=1611308425 )

As simple as that! We just got all the data available for most recent 20 posts of the user “cristiano”.
Let’s say we also wanted some more detailed information about the user. For example if it is verified, a business account or a professional profile. We can do it very easily with the detailed-info endpoint :

#Through the Python library
res, success = ig.get_user_detailed_info( username="cristiano" )

#Directly sending the request
import requests
root = "https://www.ensembledata.com/apis"
endpoint = "/instagram/user/detailed-info"
params = {
"username": "cristiano",
"token": TOKEN}

res = requests.get(root+endpoint, params=params)

In case you want the compact code of this example, you can find it in the GitHub repository.

Also, if you want to extract data not just from Instagram but from all the major social media, you can use the main ED library, which wraps all the different APIs together. The methods present in the two repositories are the same, so you can use them exchangeably.

And that is it for today!

If you have any question or are unsure about anything, contact us, or write me at andrea[at]ensembledata.com

--

--