Stay tuned! Big things to come.

Uncovering TikTok's dynamics with a data-driven approach

20 February 2023
Andrea Ramazzina
Chief Scientist Officer

TikTok has quickly become a cultural phenomenon, capturing the attention of millions around the world with its short-form video format. In fact, TikTok is becoming increasingly important also for companies looking to engage with younger individuals and build brand awareness, as well as content creators looking to expand their audience and reach. With its powerful algorithm and ever-growing user base, it provides a unique opportunity for businesses to showcase their creativity and connect with potential customers in a fun, authentic way. ‍

A research-driven approach to TikTok analysis

If you’re not on TikTok yet, it’s time to jump on the bandwagon and start creating content that will resonate with the next generation of consumers! But what topics work best for specific audiences? What drives more engagement? And which content type is optimal for growth?

Those are difficult questions! And while there are plenty of videos and blog posts out there offering good general tips, there is not much work being done to answer those questions in a more analytic and quantitative way.

This is why we at EnsembleData have partnered with leading researchers at UBC (University of British Columbia) lead by Professor Gene Lee to conduct quantitative research into user dynamics on TikTok, using our APIs to extract large amount of data and feed it into large Machine Learning models. This work resulted in the publishing of a paper in the renowned academic conference Workshop on Information Technologies and Systems (WITS). The full paper can be found here.

AI-Generated Voice and Content Creation

Here, in more detail, we aimed to answer the following question:
How Does AI-Generated Voice Affect Video Content Creation?
In fact, as more of these features are available to creators online, it is important to understand how the adoption of AI-generated voice affects users’ routine efforts and creative efforts in online video creation.

These kind of questions are impossible to answer without a large volume of data to study. As a first step for the research project, we have extracted more than 270,000 videos from thousands of creators, using our TikTok API. Here is the full procedure:

  1. Using the Search Keyword endpoint, fetch a large quantity of posts coming from different categories, such as beauty, food, technology, sports, etc.
  2. From each of these post we also get the creator’s profile. From there we can get more info about them using the User Info endpoint as well as its most recent posts through the User Posts endpoint.
  3. We then monitor their growth over time using the User Posts endpoint.
  4. Optionally, you can also check for each video’s comments with the Post Comments endpoint.

Through this process, we were rapidly able to fetch a consistent and large-scale dataset, useful for our consequent data analysis.

The results are quite interesting and perhaps even counter-intuitive: The use of AI-generated voice increases creators’ routine effort and creative effort in the short term. While it has a long-lasting effect on improving the efficiency of video creation, AI-generated voice cannot consistently motivate creators to include more information in videos, and might even be detrimental to their creative effort in the long term.

Here are some graphics showing the evolution of different metrics over time, after the adoption of AI-generated voice:

image time varying effect
Time-varying Effect on Creator Routine Effort and Creative Effort. [1]

The polyline represents the magnitude of coefficients, and the grey area denotes the 95% confidence interval.

It is interesting to note that the adoption of AI-generated voice only boosted the use of new hashtags in the treatment week when creators used 0.35 more new hashtags per video. The coefficient became insignificant afterward and even turned significantly negative five weeks later. It is worth noting that we excluded two hashtags “#texttospeech” and “#tts” when calculating avgNewHashtag to make sure the changes are not caused by topics related to AI voice itself.

Want to know more about this work and dive deeper? Read directly the paper published in WITS 2022 here.

Why EnsembleData

EnsembleData is a SaaS data intelligence provider that aims to make social media data scraping accessible to businesses by offering a range of APIs that can be easily integrated into a business’ existing data pipeline. The platform sets out to solve many of the challenges associated with social media data scraping, such as the difficulty of robustly collecting relevant data in real-time.

One of the key advantages of EnsembleData is its ability to provide access to a wide range of social media platforms and other online sources through a single API. This allows businesses to easily collect data from multiple sources and integrate it into their existing data pipeline, making it easier to analyse and make sense of the data.

And that is it for today! If you are interested in how to use large amounts of data to extract insights from social media, write me an email at [email protected]. I am always happy to discuss such topics!