Last year’s news

It was tempting to skip wrapping up 2023, because at first glance large language models seemed so thoroughly dominant (and boring to revisit), but bringing the net.wars archive list up to date showed a different story. To be fair, this is partly personal bias: from the beginning LLMs seemed fated to collapse under the weight of their own poisoning; AI Magazine predicted such an outcome as early as June.

LLMs did, however, seem to accelerate public consciousness of three long-running causes of concern: privacy and big data; corporate cooption of public and private resources; and antitrust enforcement. That acceleration may be LLMs’ more important long-term effect. In the short term, the justifiably bigger concern is their propensity to spread disinformation and misinformation in the coming year’s many significant elections.

Enforcement of data protection laws has been slowly ramping up in any case, and the fines just keep getting bigger, culminating in May’s fine against Meta for €1.2 billion. Given that fines, no matter how large, seem insignificant compared to the big technology companies’ revenues, the more important trend is issuing constraints on how they do business. That May fine came with an order to stop sending EU citizens’ data to the US. Meta responded in October by announcing a subscription tier for European Facebook users: €160 a year will buy freedom from ads. Freedom from Facebook remains free.

But Facebook is almost 20 years old; it had years in which to grow without facing serious regulation. By contrast, ChatGPT, which OpenAI launched just over a year ago, has already faced investigation by the US Federal Trade Commission and been banned temporarily by the Italian data protection authority (it was reinstated a month later with conditions). It’s also facing more than a dozen lawsuits claiming copyright infringement; the most recent of these was filed just this week by the New York Times. It has settled one of these suits by forming a partnership with Axel Springer.

It all suggests a lessening tolerance for “ask forgiveness, not permission”. As another example, Clearview AI has spent most of the four years since Kashmir Hill alerted the world to its existence facing regulatory bans and fines, and public disquiet over the rampant spread of live facial recognition continues to grow. Add in the continuing degradation of exTwitter, the increasing number of friends who say they’re dropping out of social media generally, and the revival of US antitrust actions with the FTC’s suit against Amazon, and it feels like change is gathering.

It would be a logical time, for an odd reason: each of the last few decades as seen through published books has had a distinctive focus with respect to information technology. I discovered this recently when, for various reasons, I reorganized my hundreds of books on net.wars-type subjects dating back to the 1980s. How they’re ordered matters: I need to be able to find things quickly when I want them. In 1990, a friend’s suggestion of categorizing by topic seemed logical: copyright, privacy, security, online community, robots, digital rights, policy… The categories quickly broke down and cross-pollinated. In rebuilding the library, what to replace it with?

The exercise, which led to alphabetizing by author’s name within decade of publication, revealed that each of the last few decades has been distinctive enough that it’s remarkably easy to correctly identify a book’s decade without turning to the copyright page to check. The 1980s and 1990s were about exploration and explanation. Hype led us into the 2000s, which were quieter in publishing terms, though marked by bursts of business books that spanned the dot-com boom, bust, and renewal. The 2010s brought social media, content moderation, and big data, and a new set of technologies to hype, such as 3D printing and nanotechnology (about which we hear nothing now). The 2020s, it’s too soon to tell…but safe to say disinformation, AI, and robots are dominating these early years.

The 2020s books to date are trying to understand how to rein in the worst effects of Big Tech: online abuse, cryptocurrency fraud, disinformation, the loss of control as even physical devices turn into manufacturer-controlled subscription services, and, as predicted in 2018 by Christian Wolmar, the ongoing failure of autonomous vehicles to take over the world as projected just ten years ago.

While Teslas are not autonomous, the company’s Silicon Valley ethos has always made them seem more like information technology than cars. Bad idea, as Reuters reports; its investigation found a persistent pattern of mishaps such as part failures and wheels falling off – and an equally persistent pattern of the company blaming the customer, even when the car was brand new. If we don’t want shoddy goods and data invasion with everything to be our future, fighting back is essential. In 2032, I hope looking back shows that story.

The good news going into 2024 is, as the Center for the Public Domain at Duke University, Public Domain Review and Cory Doctorow write, the bumper crop of works entering the public domain: sound recordings (for the first time in 40 years), DH Lawrence’s Lady Chatterley’s Lover, Agatha Christie’s The Mystery of the Blue Train, Ben Hecht and Charles MacArthur’s play The Front Page. and the first of Mickey Mouse. Happy new year.

Illustrations: Promotional still from the 1928 production of The Front Page, which enters the public domain on January 1, 2024 (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon

Own goals

There’s no point in saying I told you so when the people you’re saying it to got the result they intended.

At the Guardian, Peter Walker reports the Electoral Commission’s finding that at least 14,000 people were turned away from polling stations in May’s local elections because they didn’t have the right ID as required under the new voter ID law. The Commission thinks that’s a huge underestimate; 4% of people who didn’t vote said it was because of voter ID – which Walker suggests could mean 400,000 were deterred. Three-quarters of those lacked the right documents; the rest opposed the policy. The demographics of this will be studied more closely in a report due in September, but early indications are that the policy disproportionately deterred people with disabilities, people from certain ethnic groups, and people who are unemployed.

The fact that the Conservatives, who brought in this policy, lost big time in those elections doesn’t change its wrongness. But it did lead the MP Jacob Rees-Mogg (Con-North East Somerset) to admit that this was an attempt to gerrymander the vote that backfired because older voters, who are more likely to vote Conservative, also disproportionately don’t have the necessary ID.

***

One of the more obscure sub-industries is the business of supplying ad services to websites. One such little-known company is Criteo, which provides interactive banner ads that are generated based on the user’s browsing history and behavior using a technique known as “behavioral retargeting”. In 2018, Criteo was one of seven companies listed in a complaint Privacy International and noyb filed with three data protection authorities – the UK, Ireland, and France. In 2020, the French data protection authority, CNIL, launched an investigation.

This week, CNIL issued Criteo with a €40 million fine over failings in how it gathers user consent, a ruling noyb calls a major blow to Criteo’s business model.

It’s good to see the legal actions and fines beginning to reach down into adtech’s underbelly. It’s also worth noting that the CNIL was willing to fine a *French* company to this extent. It makes it harder for the US tech giants to claim that the fines they’re attracting are just anti-US protectionism.

***

Also this week, the US Federal Trade Commission announced it’s suing Amazon, claiming the company enrolled millions of US consumers into its Prime subscription service through deceptive design and sabotaged their efforts to cancel.

“Amazon used manipulative, coercive, or deceptive user-interface designs known as “dark patterns” to trick consumers into enrolling in automatically-renewing Prime subscriptions,” the FTC writes.

I’m guessing this is one area where data protection laws have worked, In my UK-based ultra-brief Prime outings to watch the US Open tennis, canceling has taken at most two clicks. I don’t recognize the tortuous process Business Insider documented in 2022.

***

It has long been no secret that the secret behind AI is human labor. In 2019, Mary L. Gray and Siddharth Suri documented this in their book Ghost Work. Platform workers label images and other content, annotate text, and solve CAPTCHAs to help train AI models.

At MIT Technology Review, Rhiannon Williams reports that platform workers are using ChatGPT to speed up their work and earn more. A team of researchers from the Swiss Federal Institute of Technology study (PDF)found that between 33% and 46% of the 44 workers they tested with a request to summarize 16 extracts from medical research papers used AI models to complete the task.

It’s hard not to feel a little gleeful that today’s “AI” is already eating itself via a closed feedback loop. It’s not good news for platform workers, though, because the most likely consequence will be increased monitoring to force them to show their work.

But this is yet another case in which computer people could have learned from their own history. In 2008, researchers at Google published a paper suggesting that Google search data could be used to spot flu outbreaks. Sick people searching for information about their symptoms could provide real-time warnings ten days earlier than the Centers for Disease Control could.

This actually worked, some of the time. However, as early as 2009, Kaiser Fung reported at Harvard Business Review in 2014, Google Flu Trends missed the swine flu pandemic; in 2012, researchers found that it had overestimated the prevalence of flu for 100 out of the previous 108 weeks. More data is not necessarily better, Fung concluded.

In 2013, as David Lazer and Ryan Kennedy reported for Wired in 2015 in discussing their investigation into the failure of this idea, GFT missed by 140% (without explaining what that means). Lazer and Kennedy find that Google’s algorithm was vulnerable to poisoning by unrelated seasonal search terms and search terms that were correlated purely by chance, and failed to take into account changing user behavior as when it introduced autosuggest and added health-related search terms. The “availability” cognitive bias also played a role: when flu is in the news, searches go up whether or not people are sick.

While the parallels aren’t exact, large language modelers could have drawn the lesson that users can poison their models. ChatGPT’s arrival for widespread use will inevitably thin out the proportion of text that is human-written – and taint the well from which LLMs drink. Everyone imagines the next generation’s increased power. But it’s equally possible that the next generation will degrade as the percentage of AI-generated data rises.

Illustrations: Drunk parrot seen in a Putney garden (by Simon Bisson).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.