Uncovering TikTok’s dynamics with a data-driven approach


TikTok has quickly become a cultural phenomenon, capturing the attention of millions around the world with its short-form video format.
In fact, TikTok is becoming increasingly important for companies looking to engage with younger individuals and build brand awareness, as well as content creators looking to expand their audience and reach. With its powerful algorithm and ever-growing user base, it provides a unique opportunity for businesses to showcase their creativity and connect with potential customers in a fun, authentic way.

If you’re not on TikTok yet, it’s time to jump on the bandwagon and start creating content that will resonate with the next generation of consumers.

But what topics work best for a specific audience? What drives more engagement? And which content type is optimal for growth ?

Those are difficult questions! And while there are plenty of videos and blog posts out there offering good general tips, there is not much work being done to answer those questions in a more analytic and quantitative way.

This is why we at EnsembleData have partnered with leading researchers at UBC (University of British Columbia) lead by Professor Gene Lee to conduct quantitative research into user dynamics on TikTok, using our APIs to extract large amount of data and feed it into large Machine Learning models.
This work resulted in the publishing of a paper in the renowned academic conference Workshop on Information Technologies and Systems (WITS). The full paper can be found here .

In more detail, we aimed to answer the following question:
How Does AI-Generated Voice Affect Video Content Creation?
In fact, as more of these features are available to creators online, it is important to understand how the adoption of AI-generated voice affects users’ routine efforts and creative efforts in online video creation.

These kind of questions are impossible to answer without a large volume of data to study. As a first step for the research project, we have extracted more than 270,000 videos from thousands of creators, using our TikTok APIs. Here is the full procedure:

  1. Using the Search Keyword endpoint, fetch a large quantity of posts coming from different categories, such as beauty, food, technology, sports, etc.
  2. From each of these post we also get the creator’s profile. From there we can get more info about them using the User Info endpoint as well as its most recent posts through the User Posts endpoint.
  3. We then monitor their growth over time using the User Posts endpoint.
  4. Optionally, you can also check for each video’s comments with the Post Comments endpoint.

Through this process, we were rapidly able to fetch a consistent and large-scale dataset, useful for our consequent data analysis.

The results are quite interesting and perhaps even counter-intuitive:
The use of AI-generated voice increases creators’ routine effort and creative effort in the short term. While it has a long-lasting effect on improving the efficiency of video creation, AI-generated voice cannot consistently motivate creators to include more information in videos, and might even be detrimental to their creative effort in the long term.

Here are some graphics showing the evolution of different metrics over time, after the adoption of AI-generated voice:

Time-varying Effect on Creator Routine Effort and Creative Effort. [1]

The polyline represents the magnitude of coefficients, and the grey area denotes the 95% confidence interval.

It is interesting to note that the adoption of AI-generated voice only boosted the use of new hashtags in the treatment week when creators used 0.35 more new hashtags per video. The coefficient became insignificant afterward and even turned significantly negative five weeks later. It is worth noting that we excluded two hashtags “#texttospeech” and “#tts” when calculating avgNewHashtag to make sure the changes are not caused by topics related to AI voice itself.

Want to know more about this work and dive deeper? Read directly the paper published in WITS 2022 here.
Otherwise, read more about how you can extract data from Social Media to power up your analytics in the EnsembleData’s blog !