Twitter was once a mainstay of academic research — a way to take the pulse of the internet. But as new owner Elon Musk has attempted to monetize the service, researchers are struggling to replace a once-crucial tool. Unless Twitter makes another about-face soon, it could close the chapter on an entire era of research.
“Research using social media data, it was mostly Twitter-ology,” says Gordon Pennycook, an associate professor of behavioral science at the University of Regina. “It was the primary source that people were using,”
Until Musk’s takeover, Twitter’s API — which allows third-party developers to gather data — was considered one of the best on the internet. It enabled studies into everything from how people respond to weather disasters to how to stop misinformation from spreading online. The problems they addressed are only getting worse, making this kind of research just as important as ever. But Twitter decided to end free access to its API in February and launched paid tiers in March. The company said it was “looking at new ways to continue serving” academia but nevertheless started unceremoniously cutting off access to third-party users who didn’t pay. While the cutoff caused problems for many different kinds of users, including public transit agencies and emergency responders, academics are among the groups hit the hardest.
Researchers who’ve relied on Twitter for years tell The Verge they’ve had to stop using it. It’s just too expensive to pay for access to its API, which has reportedly skyrocketed to $42,000 a month or more for an enterprise account. Scientists have lost a key vantage point into human behavior as a result. And while they’re scrambling to find new sources, there’s no clear alternative yet.
Twitter gave researchers a way to observe people’s real reactions instead of having to ask study participants how they think they might react in certain scenarios. That’s been crucial for Pennycook’s research into strategies to prevent misinformation from fomenting online, for instance, by showing people content that asks them to think about accuracy before sharing a link.
Without being able to see what an individual actually tweets, researchers like Pennycook might be limited to asking someone in a survey what kind of content they would share on social media. “It’s basically hypothetical,” says Pennycook. “For tech companies who would actually be able to implement one of these interventions, they would not be impressed by that … We had to do experiments somewhere to show that it actually can work in the wild.”
In April, a group of academics, journalists, and other researchers called the Coalition for Independent Technology Research sent a letter to Twitter asking it to help them maintain access. The coalition surveyed researchers and found that Twitter’s new restrictions jeopardized more than 250 different projects. It would also signal the end of at least 76 “long-term efforts,” the letter says, including code packages and tools. With enforcement of Twitter’s new policies somewhat haphazard (some users were kicked off the platform before others), the coalition set up a mutual aid effort. Scientists scrambled to harvest as much data as they could before losing their own access keys, and others offered to help them collect that data or donated their own access to Twitter’s API to researchers who lost it.
Twitter’s most affordable API tier, at $100 a month, would only allow third parties to collect 10,000 per month. That’s just 0.3 percent of what they previously had free access to in a single day, according to the letter. And even its “outrageously expensive” enterprise tier, the coalition argued, wasn’t enough to conduct some ambitious studies or maintain important tools.
One such tool is Botometer, a system that rates how likely it is that a Twitter account is a bot. While Musk has expressed skepticism of things like disinformation research, he’s actually used Botometer publicly — to estimate how many bots were on the platform during his attempt to get out of the deal he made to buy Twitter. Now, his move to charge for API access could bring on Botometer’s demise.
A notice on Botometer’s website says that the tool will probably stop working soon. “We are actively seeking solutions to keep this website alive and free for our users, which will involve training a new machine-learning model and working with Twitter’s new paid API plans,” it says. “Please note that even if it is feasible to build a new version of the Botometer website, it will have limited functionalities and quotas compared to the current version due to Twitter’s restricted API.”
The impending shutdown is a personal blow to Botometer co-creator Kai-Cheng Yang, a researcher studying misinformation and bots on social media who recently earned his PhD in informatics at Indiana University Bloomington. “My whole PhD, my whole career, is pretty much based on Twitter data right now. It’s likely that it’s no longer available for the future,” Yang tells The Verge. When asked how he might have to approach his work differently now, he says, “I’ve been asking myself that question constantly.”
Other researchers are similarly nonplussed. “The platform went from one of the most transparent and accessible on the planet to truly bottom of the barrel,” says letter signatory Rebekah Tromble, director of the Institute for Data, Democracy, and Politics (IDDP) at George Washington University. Some of Tromble’s previous work, studying political conversations on Twitter, was actually funded by the company before it changed its API policies.
“Twitter’s API has been absolutely vital to the research that I’ve been doing for years now,” Tromble tells The Verge. And like Yang, she has to pivot in response to the platform’s new pricing schemes. “I’m simply not studying Twitter at the moment,” she says.
But there aren’t many other options for gathering bulk data from social media. While scraping data from a website without the use of an API is one option, it’s more tedious work and can be fraught with other risks. Twitter and other platforms have tried to curtail scraping, in part because it can be hard to discern whether it’s being done in the public interest or for malicious purposes like phishing.
Meanwhile, other social media giants have been even more restrictive than Twitter with API access, making it difficult to pivot to a different platform. And the restrictions seem to be getting tougher — last month, Reddit similarly announced that it would start to limit third-party access to its API.
“I just wonder if this is the beginning of companies now becoming less and less willing to have the API for data sharing,” says Hause Lin, a post-doctoral research fellow at MIT and the University of Regina developing ways to stop the spread of hate speech and misinformation online. “It seems like totally the landscape is changing, so we don’t know where it’s heading right now,” Lin tells The Verge.
There are signs that things could take an even sharper turn for the worse. Last week, inews reported that Twitter had told some researchers they would need to delete data they had already collected through its decahose, which provides a random sample of 10 percent of all the content on the platform unless they pay for an enterprise account that can run upwards of $42,000 a month. The move amounts to “the big data equivalent of book burning,” one unnamed academic who received the notice reportedly told inews.
The Verge was unable to verify that news with Twitter, which now routinely responds to inquiries from reporters with a poop emoji. None of the researchers The Verge spoke to had received such a notice, and it seems to so far be limited to users who previously paid to use the decahose (just one use of Twitter’s API that previously would have been free or low-cost for academics).
Both Tromble and Yang have used decahose for their work in the past. “Never before did Twitter ever come back to researchers and say that now the contract is over, you have to give up all the data,” Tromble says. “It’s a complete travesty. It will devastate a bunch of really important ongoing research projects.”
Other academics similarly tell The Verge that Twitter’s reported push to make researchers “expunge all Twitter data stored and cached in your systems” without an enterprise subscription would be devastating. It could prevent students from completing work they’ve invested years into if they’re forced to delete the data before publishing their findings. Even if they’ve already published their work, access to the raw data is what allows other researchers to test the strength of the study by being able to replicate it.
“That’s really important for transparent science,” Yang says. “This is just a personal preference — I would probably go against Twitter’s policy and still share the data, make it available because I think science is more important in this case.”
Twitter was a great place for digital field experiments in part because it encouraged people from different backgrounds to meet in one place online. That’s different from Facebook or Mastodon, which tend to have more friction between social circles. This centralization sometimes fostered conflict — but to academics, it was valuable.
“If the research is not going to be as good, we won’t be able to know as much about the world as we did before,” Pennycook says. “And so maybe we’ll figure out a way to bridge that gap, but we haven’t figured it out yet.”