Who Gets the Algorithm? The Bigger TikTok Danger

By Weifeng Zhong
Wednesday, May 3, 2023, 8:16 AM

Controversy surrounding TikTok, the popular Chinese company-owned social media platform, has continued to give rise to impasse in recent weeks. Just days after the Biden administration issued a divestiture-or-ban ultimatum to the company and Beijing firmly opposed a forced sale, TikTok CEO Shou Zi Chew testified in Congress to try to save the app’s U.S. operations.

The main concern the U.S. government has with TikTok is the potential access the Chinese government may have to U.S. user data—including journalists’ whereabouts—by having oversight of the app. Chew, trying to bypass the standoff, touted TikTok’s Project Texas, an ongoing effort to establish a data security entity, overseen by Oracle and governed by independent directors, to administer future access to Americans’ data and content moderation. But few lawmakers seemed convinced.

Continued Chinese government access to U.S. user data, or its continued control over content on the platform, would be a significant national security compromise with ramifications for privacy, intelligence, and the U.S. speech environment. But it’s crucial to understand that, while the TikTok battle was initially centered around data privacy and content manipulation, it shouldn’t stop there. TikTok’s secret sauce—a recommendation algorithm that gains understanding of the American user base as usage continues—poses the greater and longer-lasting national security threat by giving the Chinese Communist Party (CCP) access to its reservoir of knowledge and intelligence. Any plan that tackles only data privacy and content manipulation going forward will not be able to mitigate that threat.

The only way out of this impasse is for TikTok to cease to be Chinese owned, which would more effectively cut off the CCP’s access to user data, or cease to operate in the U.S., which would prevent Beijing from exploiting the app’s knowledge and intelligence on the American people.

Why TikTok’s Algorithm Is a National Security—Not Just Personal Privacy—Threat

Not even seven years old, TikTok is one of the most popular social media platforms in the world. The secret sauce for its meteoric success is the app’s recommendation engine—a machine algorithm able to repeatedly show customized videos that keep the user hooked. It’s like a personal chef who intimately understands you and keeps making what you want the most at every meal, including food that you don’t even know you like. To appreciate how hard it is to create such a successful recommendation engine, just think about the number of uninteresting songs or movies your streaming services pick for you.

As with every successful big-data algorithm, TikTok’s recommendation engine is so addictive because it collects and learns from an aggressive amount of user information. According to a 2023 report by Internet 2.0, a cybersecurity company, TikTok has twice as many trackers inside its app as other social media platforms like Facebook and Instagram do, and it tops the other 22 apps analyzed in the amount of privacy risks it exposes users to. (TikTok is followed by VKontakte, the controversial Russian social media app, on this metric.) These trackers, technically known as software development kits (SDKs), are snippets of source code developed in-house or by third parties that facilitate the functioning of the app. While all apps use SDKs and they can be legitimate tools, some of these trackers are more intrusive than others in peeping into private information about the user or the device that has downloaded the app, including, in TikTok’s case, monitoring user behavior like keystrokes and screen taps. TikTok confirmed the existence of those features on its app, although it claimed that it used the information only for debugging.

Observing human behavior can provide insights into how humans think. In a classic 1967 paper, British economist Sydney Afriat showed that observing an individual who satisfies a minimal definition of “being rational”—for example, if a shopper opts for apples over oranges and oranges over strawberries, they also opt for apples over strawberries—can allow researchers to represent that person’s preferences as a simple mathematical formula. Afriat’s finding laid the intellectual groundwork for what became known as demand forecasting.

The TikTok app presents a demand-forecasting dilemma. Different from other social media apps like Facebook and Twitter, which display a scrollable feed of content from the user’s social network, TikTok presents one—and only one—video at a time from somewhere on the platform that the app thinks the user might like. The user needs to decide whether and for how long to watch that video before the next one—and the next decision point—appears. Many dating apps take a similar approach; a user swipes left or right before seeing another potential match. A user in this type of situation is essentially making a purchasing decision every few seconds by “buying” some content in exchange for their time—the time that could have been spent seeing other content or doing something else.

When TikTok observes enough of these “purchases”—and there are a lot of such because the videos tend to be short—the algorithm quickly acquires a lot of actionable information to back out the user’s preferences, just like a mathematical formula in Afriat’s world. The algorithm might figure out whether a user is a strict Catholic or a secular Jew, likes fancy cars or shiny shoes, or how motivated they are by one social issue versus another. This is what in economics is called revealed preferences—people’s desires and curiosities exposed by their actions—but on a big-data steroid.

Each characteristic in isolation may not be a big deal for any ordinary user, but together they paint a picture of a U.S. population that foreign adversaries could exploit. Both Russia and China are known to have tried to sow discord among Americans on hot-button issues ahead of elections. The better they understand the population, the more successful their election-meddling campaigns can be when Americans go to the polls.

That’s why tackling data privacy and content manipulation on the app alone would be missing the point. Even if the CCP was barred from accessing user data or manipulating content on the platform going forward, access to an already well-trained algorithm would still be useful in other venues of Beijing’s influence operations, such as mainstream news and other social media platforms.

Knowing that someone prefers a pill wrapped in cheese to one coated in sugar can help persuade them to swallow it. Similarly, given TikTok’s success, it’s not at all unreasonable to believe that propaganda operatives in Beijing could use the insights behind the algorithm to design messaging strategies that more effectively appeal to U.S. politicians and the American public—even in other news and social media venues. At a minimum, being able to feed users what they like allows the app to at times slip in what they may not agree with, steering their views away from, for example, supporting Ukraine or strengthening ties with Taiwan, or influencing how they vote in elections.

Stopping China’s Big Brother

To the extent that TikTok’s algorithm has gained intelligence on the American people, and given the indications that the CCP has a strong grip on Chinese companies like its parent ByteDance, the app has already likely done some damage to U.S. national security. The company’s engineering team that designed the algorithm is integrated with ByteDance in Beijing, and at least one TikTok executive is found to be reporting directly to ByteDance, bypassing TikTok’s CEO. 

More importantly, ByteDance has already offered up its secret sauce to the CCP. Last August, China’s cyber regulators made a public announcement that scores of Chinese tech giants, including ByteDance, had shared with the Chinese government details about their supposedly secret algorithms—prized machine learning models that are already trained and have so far stood the test of market competition—to comply with a new law that governs recommendation engines. Feeding a government user data feeds it for a day. Feeding a government the ability to understand users feeds it for a lifetime. China’s Big Brother understands that all too well.

That presents a different problem. It’s one thing to stop the CCP from targeting particular users or censoring certain content on the platform; TikTok claimed that Project Texas’s plan to let Oracle host its user data and algorithm in the United States would achieve that. But it’s another to cut Beijing off from the intimate knowledge about the American people the algorithm has already acquired. ByteDance and presumably the Chinese authorities already have the algorithm. The CCP would be shut out of the algorithm only if it permanently deletes its copy of the secret sauce—a hard-pressed “if” —or when the American people’s preferences evolve so much in the future that Beijing’s knowledge becomes out of date.

Even if either scenario is true, Beijing could still recover much of TikTok’s secret algorithm if it’s able to interact with it. For example, software tools like this often include application programming interfaces (APIs) that take queries from outside the system and return what the algorithm thinks are the answers, even though the algorithm may stay in the backend during the interaction. One way to hack the system is to exploit such an API. 

To get a sense of this type of capability, suppose TikTok has an API that takes any proposed video for any targeted user as the query and gives a personalized rating for the video as the output. Because TikTok’s algorithm seeks to custom-recommend videos users like the most, it could quite accurately foresee the user’s reactions to the proposed video. An API like this could also compare two proposed videos by how popular they are among the entire user base. If, hypothetically, strict Catholics who happen to like fancy cars are also interested in the Trump indictment lawsuits, a querier could also find that out by asking the API questions.

It turns out that this reverse-engineering through an API is not so daunting a task. In a first-of-its-kind study in 2016, a group of machine learning researchers at Cornell University, the Swiss institute EPFL, and the University of North Carolina “stole” complicated machine algorithms developed by tech companies like Amazon and BigML simply by having their own program pose questions to the companies’ APIs and analyzing the answers. It took their expertly designed program only a few minutes to crack two Amazon models using about 2,000 questions. Their strategy has since been replicated by multiple studies to duplicate complex models that can take millions of dollars to develop in the first place.

Think of this process as a personal chef being willing to answer someone’s hypothetical, but telling, questions about their client’s palate. Suppose one asks the personal chef what their client will eat on a Friday, and the chef says “salmon.” If one asks about a few more Fridays and keeps getting fish as Easter approaches, one could infer that the client is probably Catholic. So even if the CCP or ByteDance no longer has the TikTok algorithm under Project Texas, they could still easily retrieve at least the contour of it using an API.

Here’s why divestiture can make a meaningful difference. TikTok’s defenders might argue that if Oracle has control over the algorithm and its APIs, it can prevent Chinese authorities from reverse-engineering the technology. Sure, but only in wishful thinking. Project Texas doesn’t change the fact that TikTok remains the owner of the app, that ByteDance remains the owner of TikTok, or that the CCP still commands and controls ByteDance. Oracle would be able to more credibly commit to stopping Beijing’s reach if it, instead of ByteDance, owned TikTok. But it doesn’t.

The Department of Defense and its contractors surely keep a tight lid on their prized artificial intelligence applications—from warfare systems to cyber threat monitoring—and know better, one would assume, than to set up an API that would allow Beijing’s prying eyes to “steal” the algorithms. And that’s because ownership matters. Those contractors abide by U.S. laws, and the Department of Defense answers to the U.S. president—not the CCP’s general secretary. TikTok is no defense contractor. To have such a strong incentive to firewall its algorithm from Beijing, it can’t have a Chinese parent company.

That’s why Project Texas, even if it’s fully implemented as advertised, will not eliminate questions about TikTok’s security threats to the United States. Certainty would require a divestiture, which would cut off the link between TikTok and ByteDance, or a ban, which would cut off the link between TikTok and the American people. But Beijing already objected to ByteDance selling TikTok to a U.S. entity, which leaves a ban as the only feasible option. Anything short of that may help but shouldn’t be treated as an excuse to rest easy.

TikTok’s harm to U.S. national security was inflicted when its algorithm learned to “read” users’ minds. But the silver lining is that people change, and whatever knowledge the CCP has gained through TikTok could eventually become obsolete if the CCP’s access to the app is cut off.