Boxed up

If the actions of the owners of streaming services are creating the perfect conditions for the return of piracy, it’s equally true that the adtech industry’s decisions continue to encourage installing ad blockers as a matter of self-defense. This is overall a bad thing, since most of us can’t afford to pay for everything we want to read online.

This week, Google abruptly aborted a change it’s been working on for four years: it will abandon its plan to replace third-party cookies with new technology it called Privacy Sandbox. From the sounds of it, Google will continue working on the Sandbox, but will continue to retain third-party cookies. The privacy consequences of this are…muddy.

To recap: there are two kinds of cookies, which are small files websites place on your computer, distinguished by their source and use. Sites use first-party cookies to give their pages the equivalent of memory. They’re how the site remembers which items you’ve put in your cart, or that you’ve logged in to your account. These are the “essential cookies” that some consent banners mention, and without them you couldn’t use the web interactively.

Third-party cookies are trackers. Once a company deposits one of these things on your computer, it can use it to follow along as you browse the web, collecting data about you and your habits the whole time. To capture the ickiness of this, Demos researcher Carl Miller has suggested renaming them slime trails. Third-party cookies are why the same ads seem to follow you around the web. They are also why people in the UK and Europe see so many cookie consent banners: the EU’s General Data Protection Regulation requires all websites to obtain informed consent before dropping them on our machines. Ad blockers help here. They won’t stop you from seeing the banners, but they can save you the time you’d have to spend adjusting settings on the many sites that make it hard to say no.

The big technology companies are well aware that people hate both ads and being tracked in order to serve ads. In 2020, Apple announced that its Safari web browser would block third-party cookies by default, continuing work it started in 2017. This was one of several privacy-protecting moves the company made; in 2021, it began requiring iPhone apps to offer users the opportunity to opt out of tracking for advertising purposes at installation. In 2022, Meta estimated Apple’s move would cost it $10 billion that year.

If the cookie seemed doomed at that point, it seemed even more so when Google announced it was working on new technology that would do away with third-party cookies in its dominant Chrome browser. Like Apple, however, Google proposed to give users greater control only over the privacy invasions of third parties without in any way disturbing Google’s own ability to track users. Privacy advocates quickly recognized this.

At Ars Technica, Ron Amadeo describes the Sandbox’s inner workings. Briefly, it derives a list of advertising topics from the websites users visits, and shares those with web pages when they ask. This is what you turn on when you say yes to Chrome’s “ad privacy feature”. Back when it was announced, EFF’s Bennett Cyphers was deeply unimpressed: instead of new tracking versus old tracking, he asked, why can’t we have *no* tracking? Just a few days ago, EFF followed up with the news that its Privacy Badger browser add-on now opts users out of the Privacy Sandbox (EFF has also published manual instructions.).

Google intended to make this shift in stages, beginning the process of turning off third-party cookies in January 2024 and finishing the job in the second half of 2024. Now, when the day of completion should be rapidly approaching, the company has said it’s over – that is, it no longer plans to turn off third-party cookies. As Thomas Claburn writes at The Register, implementing the new technology still requires a lot of work from a lot of companies besides Google. The technology will remain in the browser – and users will “get” to choose which kind of tracking they prefer; Kevin Purdy reports at Ars Technica that the company is calling this a “new experience”.

At The Drum, Kendra Barnett reports that the UK’s Information Commissioner’s Office is unhappy about Google’s decision. Even though it had also identified possible vulnerabilities in the Sandbox’s design, the ICO had welcomed the plan to block third-party cookies.

I’d love to believe that Google’s announcement might have been helped by the fact that Sandbox is already the subject of legal action. Last month the privacy-protecting NGO noyb complained to the Austrian data protection authority, arguing that Sandbox tracking still requires user consent. Real consent, not obfuscated “ad privacy feature” stuff, as Richard Speed explains at The Register. But far more likely it’s money, At the Press Gazette, Jim Edwards reports that Sandbox could cost publishers 60% of their revenue “from programmatically sold ads”. Note, however, that the figure is courtesy of adtech company Criteo, likely a loser under Sandbox.

The question is what comes next. As Cyphers said, we deserve real choices: *whether* we are tracked, not just who gets to do it. Our lives should not be the leverage big technology companies use to enhance their already dominant position.

Illustrations: A sandbox (via Wikimedia)

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

The return of piracy

In Internet terms, it’s been at least a generation since the high-profile fights over piracy – that is, the early 2000s legal actions against unauthorized sites offering music, TV, and films, and the people who used them. Visits to the news site TorrentFreak this week feel like a revival.

The wildest story concerns Z-Library, for some years the largest shadow book collection. Somewhere someone must be busily planning a true crime podcast series. Z-Library was briefly offline in 2022, when the US Department of Justice seized many of its domains. Shortly afterwards there arrived Anna’s Archive, a search engine for shadow libraries – Z-Library and many others, and the journal article shadow repository Sci-Hub. Judging from a small sampling exercise, you can find most books that have been out for longer than a month. Newer books tend to be official ebooks stripped of digital rights management.

In November 2022, the Russian nationals Anton Napolsky and Valeriia Ermakova were arrested in Argentina, alleged to be Z-Library administrators. The US requested extradition, and an Argentinian judge agreed. They appealed to the Argentinian supreme court, asking to be classed as political refugees. This week, a story in local publication La Voz, made its way north. As Ashley Belanger explains at Ars Technica, Napolsky and Ermakova decided not to wait for a judgment, escaped house arrest back in May, and vanished. The team running Z-library say the pair are innocent of copyright infringement.

Also missing in court: Anna’s Archive’s administrators. As noted here in February; the library service company OCLC sued Anna’s Archive for having exploited a security hole in its website in order to scrape 2,2TB of its book metadata. This might have gone unnoticed, except that the admins published the news on its blog. OCLC is claiming that the breach has cost millions to remediate its systems.

This week saw an update to the case: OCLC has moved for summary judgment as Anna’s Archive’s operators have failed to turn up in court. At TorrentFreak, Ernesto van der Sar reports that OCLC is also demanding millions in damages and injunctive relief barring Anna’s from continuing to publish the scraped data, though it does not ask for the site to be blocked. (The bit demanding that Anna’s Archive pay the costs of remediating OCLC’s flawed systems is puzzling; do we make burglars who climb in through open windows pay for locksmiths?)

And then there is the case of the Internet Archive’s Open Library, which claims its scanned-in books are legal under the novel theory of controlled digital lending. When the Internet Archive responded to the covid crisis by removing those controls in 2020, four major publishers filed suit. In 2023, the US District Court for the Southern District of New York ruled against the Internet Archive, saying its library enables copyright infringement. Since then, the Archive has removed 500,000 books.

This is the moment when lessons from the past of music, TV, and video piracy could be useful. Critics always said that the only workable answer to piracy is legal, affordable services, and they were right, as shown by Pandora, Spotify, Netflix, which launched its paid streaming service in 2007, and so many others.

It’s been obvious for at least two years that things are now going backwards. Last December, in one of many such stories, the Discovery/Warner Brothers merger ended a licensing agreement with Sony, leading the latter to delete from Playstation users’ libraries TV shows they had paid for in the belief that they would remain permanently available. The next generation is learning the lesson. Many friends over 40 say they can no longer play CDs or DVD; teenaged friends favor physical media because they’ve already learned that digital services can’t be trusted.

Last September, we learned that Hollywood studios were deleting finished, but unaired programs and parts of their back catalogues for tax reasons. Sometimes, shows have abruptly disappeared mid-season. This week, Paramount removed decades of Comedy Central video clips; last month it axed the MTV news archives. This is *our* culture, even if it’s *their* copyright.

Meanwhile, the design of streaming services has stagnated. The complaints people had years ago about interfaces that make it hard to find the shows they want to see are the same ones they have now. Content moves unpredictably from one service to another. Every service is bringing in ads and raising prices. The benefits that siphoned users from broadcast and cable are vanishing.

As against this, consider pirate sites: they have the most comprehensive libraries; there are no ads; you can use the full-featured player of your choice; no one other than you can delete them; and they are free. Logically, piracy should be going back up, and at least one study suggests it is. If only they paid creators…

The lesson: publishers may live to regret attacking the Internet Archive rather than finding ways to work with it – after all, it sends representatives to court hearings and obeys rulings; if so ordered, they might even pay authors. In 20 years, no one’s managed to sink The Pirate Bay; there’ll be no killing the shadow libraries either, especially since my sampling finds that the Internet Archive’s uncorrected scans are often the worst copies to read. Why let the pirate sites be the one to offer the best services?

Illustrations: The New York Public Library, built 1911 (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Hostages

If you grew up with the slow but predictable schedule of American elections, the abruptness with which a British prime minister can prorogue Parliament and hit the campaign trail is startling. Among the pieces of legislation that fell by the wayside this time is the Data Protection and Digital Information bill, which had reached the House of Lords for scrutiny. The bill had many problems. This was the bill that proposed to give the Department of Work and Pensions the right to inspect the bank accounts and financial assets of anyone receiving any government benefits and undermined aspects of the adequacy agreement that allows UK companies to exchange data with businesses in the EU.

Less famously, it also includes the legislative underpinnings for a trust framework for digital verification. On Monday, at a UCL’s conference on crime science, Sandra Peaston, director of research and development at the fraud prevention organization Cifas, outlined how all this is intended to work and asked some pertinent questions. Among them: whether the new regulator will have enough teeth; whether the certification process is strong enough for (for example) mortgage lenders; and how we know how good the relevant algorithm is at identifying deepfakes.

Overall, I think we should be extremely grateful this bill wasn’t rushed through. Quite apart from the digital rights aspects, the framework for digital identity really needs to be right; there’s just too much risk in getting it wrong.

***

At Bloomberg, Mark Gurman reports that Apple’s arrangement with OpenAI to integrate ChatGPT into the iPhone, iPad, and Mac does not involve Apple paying any money. Instead, Gurman cites unidentified sources to the effect that “Apple believes pushing OpenAI’s brand and technology to hundreds of millions of its devices is of equal or greater value than monetary payments.”

We’ve come across this kind of claim before in arguments between telcos and Internet companies like Netflix or between cable companies and rights holders. The underlying question is who brings more value to the arrangement, or who owns the audience. I can’t help feeling suspicious that this will not end well for users. It generally doesn’t.

***

Microsoft is on a roll. First there was the Recall debacle. Now come accusations by a former employee that it ignored a reported security flaw in order to win a large government contract, as Renee Dudley and Doris Burke report at Pro Publica. Result: the Russian Solarwinds cyberattack on numerous US government departments and agencies, including the National Nuclear Security Administration.

This sounds like a variant of Cory Doctorow’s enshittification at the enterprise level (see also: Boeing). They don’t have to be monopolies: these organizations’ evolving culture has let business managers override safety and security engineers. This is how Challenger blew up in 1986.

Boeing is too big and too lacking in competition to be allowed to fail entirely; it will have to find a way back. Microsoft has a lot of customer lock-in. Is it too big to fail?

***

I can’t help feeling a little sad at the news that Raspberry Pi has had an IPO. I see no reason why it shouldn’t be successful as a commercial enterprise, but its values will inevitably change over time. CEO Eben Upton swears they won’t, but he won’t be CEO forever, as even he admits. But: Raspberry Pi could become the “unicorn” Americans keep saying Europe doesn’t have.

***

At that same UCL event, I finally heard someone say something positive about AI – for a meaning of “AI” that *isn’t* chatbots. Sarah Lawson, the university’s chief information security officer, said that “AI and machine learning have really changed the game” when it comes to detecting email spam, which remains the biggest vector for attacks. Dealing with the 2% that evades the filters is still a big job, as it leaves 6,000 emails a week hitting people’s inboxes – but she’ll take it. We really need to be more specific when we say “AI” about what kind of system we mean; success at spam filtering has nothing to say about getting accurate information out of a large language model.

***

Finally, I was highly amused this week when long-time security guy Nick Selby, posted on Mastodon about a long-forgotten incident from 1999 in which I disparaged the sort of technology Apple announced this week that’s supposed to organize your life for you – tell you when it’s time to leave for things based on the traffic, juggle meetings and children’s violin recitals, that sort of thing. Selby felt I was ahead of my time because “it was stupid then and is stupid now because even if it works the cost is insane and the benefit really, really dodgy”,

One of the long-running divides in computing is between the folks who want computers to behave predictably and those who want computers to learn from our behavior what’s wanted and do that without intervention. Right now, the latter is in ascendance. Few of us seem to want the “AI features” being foisted on us. But only a small percentage of mainstream users turn off defaults (a friend was recently surprised to learn you can use the history menu to reopen a closed browser tab). So: soon those “AI features” will be everywhere, pointlessly and extravagantly consuming energy, water, and human patience. How you use information technology used to be a choice. Now, it feels like we’re hostages.

Illustrations: Raspberry Pi: the little computer that could (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

The apostrophe apocalypse

It was immediately tempting to view the absence of apostrophes on new street signs in a North Yorkshire town as a real-life example of computer systems crushing human culture. Then, near-simultaneously, Apple launched an ad (which it now regrets) showing just that process, raising the temptation even more. But no.

In fact, as Brandon Vigliarolo writes at The Register, not only is the removal of apostrophes in place names not new in the UK, but it also long precedes computers. The US Board on Geographic Names declared apostrophes unwanted as long ago as its founding year, 1890, apparently to avoid implying possession. This decision by the BGN, which has only made five exceptions in its history, was later embedded in the US’s Geographic Names Information System and British Standard 7666. When computers arrived to power databases, the practice carried on.

All that said, it’s my experience that the older British generation are more resentful of American-derived changes to their traditional language than they are of computer-driven alterations (one such neighbor complains about “sidewalk”). So campaigns to reinstate missing apostrophes seem likely to persist.

Blaming computers seemed like a coherent narrative, not least because new technology often disrupts social customs. Railways brought standardized time, and the desire to simplify things for computers led to the 2023 decision to eliminate leap seconds in 2035 (after 18 years of debate). Instead, the apostrophe apocalypse is a more ordinary story of central administrators preferencing their own convenience over local culture and custom (which may itself be contested). It still seems like people should be allowed to keep their street signs. I mean.

***

Of course language changes over time and usage. The character limits imposed by texting (and therefore exTwitter and other microblogging sites) brought us many abbreviations that are now commonplace in daily life, just as long before that the telegraph’s cost per word spawned its own compressed dialect. A new example popped up recently in Charles Arthur’s The Overspill.

Arthur highlighted an article at Level Up Coding/Medium by Fareed Khan that offered ways to distinguish between human-written and machine-generated text. It turns out that chatbots use distinctively different words than we do. Khan was able to generate a list of about 100 words that may indicate a chatbot has been at work, as well as a web app that can check a block of text or a file in one go. The word “delve” was at the top.

I had missed Khan’s source material, an earlier claim by YCombinator founder Paul Graham that “delve” used in an email pitch is a clear sign of ChatGPT-generated text. At the Guardian, Alex Hern suggests that an underlying cause may be the fact that much of the labeling necessary to train the large language models that power chatbots is carried out by badly paid people in the global South – including Africa, where “delve” is more commonly used than in Western countries.

At the Premium Times, Chiamaka Okafor argues that therefore identifying “delve” as a marker of “robotic text” penalizes African writers. “We are losing sight of an opportunity to rewrite the AI narratives that exclude people in the global majority,” she writes. A reminder: these chatbots are just math and statistics predicting the next word. They will always regress to the mean. And now they’ll penalize us for being different.

***

Just two years ago, researchers fretted that we were running out of “high-quality text” on which to train large language models. We’ve been seeing the results since, as sites hosting user-generated content strike deals with LLM owners, leading to contentious disputes between those owners and sites’ users, who feel betrayed and ripped off. Reddit began by charging for access to its API, then made a deal with Google to use its database of posts for training for an injection of cash that enabled it to go public. Yesterday, Reddit announced a similar deal with OpenAI – and the stock went up. In reality, these deals are asset-stripping a site that has consistently lost money for 18 years.

The latest site to sell its users’ content is the technical site Stack Overflow, Developers who offer mutual aid by answering each other’s questions are exactly the user base you would expect to be most offended by the news that the site’s owner, the investment group Prosus, which bought the site in 2021 for $1.8 billion, has made a deal giving OpenAI access to all its content. And so it proved: developers promptly began altering or removing their posts to protest the deal. Shortly thereafter, the site’s moderators began restoring those posts and suspending the users.

There’s no way this ends well; Internet history’s many such stories never have. The site’s original owners, who created the culture, are gone. The new ones don’t care what users *believe* their rights are if the terms and conditions grant an irrevocable license to everything they post. Inertia makes it hard to build a replacement; alienation thins out the old site. As someone posted to Twitter a few years ago, “On the Internet your home always leaves you.”

‘Twas ever thus. And so it will be until people stop taking the bait in the first place.

Illustrations: Apple’s canceled “crusher” ad.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Review: A History of Fake Things on the Internet

A History of Fakes on the Internet
By Walter J. Scheirer
Stanford University Press
ISBN 2023017876

One of Agatha Christie’s richest sources of plots was the uncertainty of identity in England’s post-war social disruption. Before then, she tells us, anyone arriving to take up residence in a village brought a letter of introduction; afterwards, old-time residents had to take newcomers at their own valuation. Had she lived into the 21st century, the arriving Internet would have given her whole new levels of uncertainty to play with.

In his recent book A History of Fake Things on the Internet, University of Notre Dame professor Walter J. Scheirer describes creating and detecting online fakes as an ongoing arms race. Where many people project doomishly that we will soon lose the ability to distinguish fakery from reality, Scheirer is more optimistic. “We’ve had functional policies in the past; there is no good reason we can’t have them again,” he concludes, adding that to make this happen we need a better understanding of the media that support the fakes.

I have a lot of sympathy with this view; as I wrote recently, things that fool people when a medium is new are instantly recognizable as fake once they become experienced. We adapt. No one now would be fooled by the images that looked real in the early days of photography. Our perceptions become more sophisticated, and we learn to examine context. Early fakes often work simply because we don’t know yet that such fakes are possible. Once we do know, we exercise much greater caution before believing. Teens who’ve grown up applying filters to the photos and videos they upload to Instagram and TikTok, see images very differently than those of us who grew up with TV and film.

Schierer begins his story with the hacker counterculture that saw computers as a source of subversive opportunities. His own research into media forensics began with Photoshop. At the time, many, especially in the military, worried that nation-states would fake content in order to deceive and manipulate. What they found, in much greater volume, was memes and what Schierer calls “participatory fakery” – that is, the cultural outpouring of fakes for entertainment and self-expression, most of it harmless. Further chapters consider cheat codes in games, the slow conversion of hackers into security practitioners, adversarial algorithms and media forensics, shock-content sites, and generative AI.

Through it all, Schierer remains optimistic that the world we’re moving into “looks pretty good”. Yes, we are discovering hundreds of scientific papers with faked data, faked results, or faked images, but we also have new analysis tools to use to detect them and Retraction Watch to catalogue them. The same new tools that empower malicious people enable many more positive uses for storytelling, collaboration, and communication. Perhaps forgetting that the computer industry relentlessly ignores its own history, he writes that we should learn from the past and react to the present.

The mention of scientific papers raises an issue Schierer seems not to worry about: waste. Every retracted paper represents lost resources – public funding, scientists’ time and effort, and the same multiplied into the future for anyone who attempts to build on that paper. Figuring out how to automate reliable detection of chatbot-generated text does nothing to lessen the vast energy, water, and human resources that go into building and maintaining all those data centers and training models (see also filtering spam). Like Scheirer, I’m largely optimistic about our ability to adapt to a more slippery virtual reality. But the amount of wasted resources is depressing and, given climate change, dangerous.

Deja news

At the first event organized by the University of West London group Women Into Cybersecurity, a questioner asked how the debates around the Internet have changed since I wrote the original 1997 book net.wars..

Not much, I said. Some chapters have dated, but the main topics are constants: censorship, freedom of speech, child safety, copyright, access to information, digital divide, privacy, hacking, cybersecurity, and always, always, *always* access to encryption. Around 2010, there was a major change when the technology platforms became big enough to protect their users and business models by opposing government intrusion. That year Google launched the first version of its annual transparency report, for example. More recently, there’s been another shift: these companies have engorged to the point where they need not care much about their users or fear regulatory fines – the stage Ed Zitron calls the rot economy and Cory Doctorow dubs enshittification.

This is the landscape against which we’re gearing up for (yet) another round of recursion. April 25 saw the passage of amendments to the UK’s Investigatory Powers Act (2016). These are particularly charmless, as they expand the circumstances under which law enforcement can demand access to Internet Connection Records, allow the government to require “exceptional lawful access” (read: backdoored encryption) and require technology companies to get permission before issuing security updates. As Mark Nottingham blogs, no one should have this much power. In any event, the amendments reanimate bulk data surveillance and backdoored encryption.

Also winding through Parliament is the Data Protection and Digital Information bill. The IPA amendments threaten national security by demanding the power to weaken protective measures; the data bill threatens to undermine the adequacy decision under which the UK’s data protection law is deemed to meet the requirements of the EU’s General Data Protection Regulation. Experts have already put that adequacy at risk. If this government proceeds, as it gives every indication of doing, the next, presumably Labour, government may find itself awash in an economic catastrophe as British businesses become persona-non-data to their European counterparts.

The Open Rights Group warns that the data bill makes it easier for government, private companies, and political organizations to exploit our personal data while weakening subject access rights, accountability, and other safeguards. ORG is particularly concerned about the impact on elections, as the bill expands the range of actors who are allowed to process personal data revealing political opinions on a new “democratic engagement activities” basis.

If that weren’t enough, another amendment also gives the Department of Work and Pensions the power to monitor all bank accounts that receive payments, including the state pension – to reduce overpayments and other types of fraud, of course. And any bank account connected to those accounts, such as landlords, carers, parents, and partners. At Computer Weekly, Bill Goodwin suggests that the upshot could be to deter landlords from renting to anyone receiving state benefits or entitlements. The idea is that banks will use criteria we can’t access to flag up accounts for the DWP to inspect more closely, and over the mass of 20 million accounts there will be plenty of mistakes to go around. Safe prediction: there will be horror stories of people denied benefits without warning.

And in the EU… Techcrunch reports that the European Commission (always more surveillance-happy and less human rights-friendly than the European Parliament) is still pursuing its proposal to require messaging platforms to scan private communications for child sexual abuse material. Let’s do the math of truly large numbers: billions of messages, even a teeny-tiny percentage of inaccuracy, literally millions of false positives! On Thursday, a group of scientists and researchers sent an open letter pointing out exactly this. Automated detection technologies perform poorly, innocent images may occur in clusters (as when a parent sends photos to a doctor), and such a scheme requires weakening encryption, and in any case, better to focus on eliminating child abuse (taking CSAM along with it).

Finally, age verification, which has been pending in the UK ever since at least 2016, is becoming a worldwide obsession. At least eight US states and the EU have laws mandating age checks, and the Age Verification Providers Association is pushing to make the Internet “age-aware persistently”. Last month, the BSI convened a global summit to kick off the work of developing a worldwide standard. These moves are the latest push against online privacy; age checks will be applied to *everyone*, and while they could be designed to respect privacy and anonymity, the most likely is that they won’t be. In 2022, the French data protection regulator, CNIL, found that current age verification methods are both intrusive and easily circumvented. In the US, Casey Newton is watching a Texas case about access to online pornography and age verification that threatens to challenge First Amendment precedent in the Supreme Court.

Because the debates are so familiar – the arguments rarely change – it’s easy to overlook how profoundly all this could change the Internet. An age-aware Internet where all web use is identified and encrypted messaging services have shut down rather than compromise their users and every action is suspicious until judged harmless…those are the stakes.

Illustrations: Angel sensibly smashes the ring that makes vampires impervious (in Angel, “In the Dark” (S01e03)).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon

The toast bubble

To The Big Bang Theory (“The Russian Rocket Reaction”, S5e05):

Howard: Someone has to go up with the telescope as a payload specialist, and guess who that someone is!
Sheldon: Muhammed Li.
Howard: Who’s Muhammed Li?
Sheldon: Muhammed is the most common first name in the world, Li the most common surname, and as I didn’t know the answer I thought that gave me a mathematical edge.

Experts tell me that exchange doesn’t perfectly explain how generative AI works; it’s too simplistic. Generative AI – or a Sheldon made more nuanced by his writers – takes into account contextual information to calculate the probable next word. So it wouldn’t pick from all the first names and surnames in the world. It might, however, pick from the names of all the payload specialists or some other group it correlated, or confect one.

More than a year on, I still can’t find a use for generative “AI” that is so unreliable and inscrutable. At Exponential View, Azeem Azhar has written about the “answer engine” Perplexity.ai. While it’s helpful that Perplexity provides references for its answers, it was producing misinformation by the third question I asked it, and offered no improvement when challenged. Wikipedia spent many years being accused of unreliability, too, but at least there you can read the talk page and understand how the editors arrived at the text they present.

On The Daily Show this week, Jon Stewart ranted about AI and interviewed FTC chair Lina Khan. Well-chosen video clips showed AI company heads’ true colors, telling the public AI is an assistant for humans while telling money people and each other that AI will enable greater productivity with fewer workers and help eliminate “the people tax”.

More interesting, however, was Khan’s note that the FTC is investigating the investments and partnerships in AI to understand if they’re giving current technology giants undue influence in the marketplace. If, in her example, all the competitors in a market outsource their pricing decisions to the same algorithm they may be guilty of price fixing even if they’re not actively colluding. And these markets are consolidating at an ever-earlier stage. Snapchat and WhatsApp had millions of users by the time Facebook thought it prudent to buy them rather than let them become dangerous competitors. AI is pre-consolidating: the usual suspects have been buying up AI startups and models at pace.

“More profound than fire or electricity,” Google CEO Sundar Pichai tells a camera at one point, speaking about AI. The last time I heard this level of hyperbole it was about the Internet in the 1990s, shortly before the bust. A friend’s answer to this sort of thing has never varied: “I’d rather have indoor plumbing.”

***

Last week the Federal District Court in Manhattan sentenced FTX CEO Sam Bankman-Fried to 25 years in prison for stealing $8 billion. In the end, you didn’t have to understand anything complicated about cryptocurrencies; it was just good old embezzlement.

And then the price of bitcoin went *up*. At the Guardian, Molly White explains that this is because cryptoevangelists are pushing the idea that the sector can reach its full potential, now that Bankman-Fried and other bad apples have been purged. But, as she says, nothing has really changed. No new use case has come along to make cryptocurrencies more useful, more valuable, or more trustworthy.

Both cryptocurrencies and generative AI are bubbles. The difference is that the AI bubble will likely leave behind it some technologies and knowledge that are genuinely useful; it will be like the Internet, which boomed and busted before settling in to change the world. Cryptocurrencies are more like the Dutch tulips. Unfortunately, in the meantime both these bubbles are consuming energy at an insane rate. How many wildfires is bitcoin worth?

**

I’ve seen a report suggesting that the last known professional words of the late Ross Anderson may have been, “Do they take us for fools?”

He was referring to the plans, debated in the House of Commons on March 25, to amend the Investigatory Powers Act to allow the government to pre-approve (or disapprove) new security features technology firms want to intorduce. The government is of course saying it’s all perfectly innocent, intended to keep the country safe. But recent clashes in the decades-old conflict over strong encryption have seen the technology companies roll out features like end-to-end encryption (Meta) and decide not to implement others, like client-side scanning (Apple). The latest in a long line of UK governments that want access to encrypted text was hardly going to take that quietly. So here we are, debating this yet again. Yet the laws of mathematics still haven’t changed: there is no such thing as a security hole that only “good guys” can use.

***

Returning to AI, it appears that costs may lead Google to charge for access to its AI-enhanced search, as Alex Hern reports at the Guardian. Hern thinks this is good news for its AI-focused startup competitors, which already charge for top-tier tools and who are at risk of being undercut by Google. I think it’s good for users by making it easy to avoid the AI “enhancement”. Of course, DuckDuckGo already does this without all the tracking and monopoly mishegoss.

Illustrations: Jon Stewart uninspired by Mark Zuckerberg’s demonstration of AI making toast.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Anachronistics

“In my mind, computers and the Internet arrived at the same time,” my twenty-something companion said, delivering an entire mindset education in one sentence.

Just a minute or two earlier, she had asked in some surprise, “Did bulletin board systems predate the Internet?” Well, yes: BBSs were a software package running on a single back room computer with a modem users dialed into, whereas the Internet is this giant sprawling mess of millions of computers connected together…simple first, complex later.

Her confusion is understandable: from her perspective, computers and the Internet did arrive at the same time, since her first conscious encounters with them were simultaneous.

But still, speaking as someone who first programmed a (mainframe, with punch cards) computer in 1972 as a student, who got her first personal computer in 1982, and got online in 1991 by modem and 1999 by broadband and to whom the sequence of events is memorable: wow.

A 25-year-old today was born in 1999 (the year I got broadband). Her counterpart 15 years hence (born 2014, the year a smartphone replaced my personal digital assistant) may think smart phones and the Internet were simultaneous. And sometime around 2045 *her* counterpart born in 2020 (two years before ChatGPT was released) might think generative text and image systems were contemporaneous with the first computers.

I think this confusion must have something to do with the speed of change in a relatively narrow sector. I’m sure that even though they all entered my life simultaneously, by the time I was 25 I knew that radio preceded TV (because my parents grew up with radio), bicycles preceded cars, and that handwritten manuscripts predated printed books (because medieval manuscripts). But those transitions played out over multiple lifetimes, if not centuries, and all those memories were personal. Few of us reminisce about the mainframes of the 1960s because most of us didn’t have access to them.

And yet, understanding the timeline of earlier technologies probably mattered less than not understanding the sequence of events in information technology. Jumbling the arrival dates of the pieces of information technology means failing to understand dependencies. What currently passes for “AI” could not exist without being able to train models on giant piles of data that the Internet and the web made possible, and that took 20 years to build. Neural networks pioneer Geoff Hinton came up with the ideas for convolutional neural networks as long ago as the 1980s, but it took until the last decade for them to become workable. That’s because it took that long to build sufficiently powerful computers and to amass enough training data. How do you understand the ongoing battle between those who wish to protect privacy via data protection laws and those who want data to flow freely without hindrance if you do not understand what those masses of data are important for?

This isn’t the only such issue. A surprising number of people who should know better seem to believe that the solution to all our ills with social media is to destroy Section 230, apparently believing that if S230 allowed Big Tech to get big, it must be wrong. Instead, the reality is also that it allows small sites to exist and it is the legal framework that allows content moderation. Improve it by all means, but understand its true purpose first.

Reviewing movies and futurist projections such as Vannevar Bush’s 1946 essay As We May Think (PDF) and Alan Turing’s lecture, Computing Machinery and Intelligence? (PDF) doesn’t really help because so many ideas arrive long before they’re feasible. The crew in the original 1966 Star Trek series (to say nothing of secret agent Maxwell Smart in 1965) were talking over wireless personal communicators. A decade earlier, Arthur C. Clarke (in The Nine Billion Names of God) and Isaac Asimov (in The Last Question) were putting computers – albeit analog ones – in their stories. Asimov in particular imagined a sequence that now looks prescient, beginning with something like a mainframe, moving on to microcomputers, and finishing up with a vast fully interconnected network that can only be held in hyperspace. (OK, it took trillions of years, starting in 2061, but still..) Those writings undoubtedly inspired the technologists of the last 50 years when they decided what to invent.

This all led us to fakes: as the technology to create fake videos, images, and texts continues to improve, she wondered if we will ever be able to keep up. Just about every journalism site is asking some version of that question; they’re all awash in stories about new levels of fakery. My 25-year-old discussant believes the fakes will always be improving faster than our methods of detection – an arms race like computer security, to which I’ve compared problems of misinformation / disinformation before.

I’m more optimistic. I bet even a few years from now today’s versions of generative “AI” will look as primitive to us as the special effects in a 1963 episode of Dr Who or the magic lantern used to create the Knock apparitions do to generations raised on movies, TV, and computer-generated imagery. Humans are adaptable; we will find ways to identify what is authentic that aren’t obvious in the shock of the new. We might even go back to arguing in pubs.

Illustrations: Secret agent Maxwell Smart (Don Adams) talking on his shoe phone (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon

The bridge

Seven months ago, Mastodon was fretting about Meta’s newly-launched Threads. The issue: Threads, which was built on top of Instagram’s user database, had said it complied with the Activity Pub protocol, which allows Mastodon servers (“instances”) to federate with any other service that also uses that protocol. The potential threat that Threads would become interoperable and that potentially millions of Threads users would swamp Mastodon, ignoring its existing social norms and culture created an existential dilemma: to federate or not to federate?

Today, Threads’ integration is still just a plan.

Instead, it seems the first disruptive arrival looks set to be Bluesky, created by a team backed by Twitter co-founder Jack Dorsey and facilitated by a third party. Bluesky wrote a new open source protocol, AT, so the proposal isn’t federation with Mastodon but a bridge, as Amanda Silberling reports at TechCrunch. According to Silberling’s numbers, year-old Bluesky stands at 4.8 million users to Mastodon’s 8.7 million. Anyone familiar with the history of AOL’s gateway to Usenet will tell you that’s big enough to disrupt existing social norms. The AOL exercise was known as Eternal September (because every September Usenet had to ingest a new generation of incoming university freshmen).

There are two key differences, however. First, a third of those Blusky users are new to that system, only joining last week, when the service opened fully to the public. They will bring challenges to the culture Bluesky has so far developed. Second, AOL’s gateway was unidirectional: AOLers could read and post to Usenet newsgroups, but Usenet posters could not read anything on AOL without paying for access. The Bluesky-Mastodon bridge is planned to be bidirectional, so anything posted publicly on one service would be accessible to both – or to outsiders using BridgyFed to connect via website feeds.

I haven’t spent a lot of time on Bluesky, but it’s clear it and Mastodon have different cultures. Friends who spend more time there say Bluesky has a “weirdness” they like and is less “scoldy” than Mastodon, where long-time users tended to school incoming ex-Twitter users in 2022 on their mistakes. That makes sense, when you consider that Mastodon has had time since its 2016 founding to develop an existing culture that newcomers are joining, where Bluesky has been a closed beta group until last week, and its users to date were the ones defining its culture for the future. The newcomers of the past week may have a very different experience.

Even if they don’t, there’s a fundamental economic difference that no technology can bridge: Mastodon is a non-profit cooperative endeavor, while Bluesky is has venture capital funding, although the list of investors is not the usual suspects. Social media users have often been burned by corporate business decisions. It’s therefore easy to believe that the $8 million in seed funding will lead inevitably to user data exploitation, no matter what they say now about being determined to find a different and more sustainable business model based on selling ancillary servicesx. Even if that strategy works, later owners or the dictates of shareholders may demand higher profits via a pivot to advertising, just as the Netflix and Amazon Prime streaming services are doing now.

Designing any software involves making rules for how it will operate and setting defaults. Here’s where the project hit trouble: should it be opt-out, so that users who don’t want their posts to be visible outside their home system have to specifically turn it off, or opt-in, so that users who want their posts published far and wide have to turn it on? BridgyFed’s creator, Ryan Barrett chose opt-out. It was immediately divisive: privacy versus openness.

Silberman reports that Barrett has fashioned a solution, giving users warning pop-ups and a chance to decline if someone from another service tries to follow them, and is thinking more carefully about the risks to safety his bridge might bring.

That’s great, but the next guy may not be so willing to reconsider. As we’ve observed before, there is no way to restrict the use of open protocols without closing them and putting them under centralized control – which is the opposite of the federated, decentralized systems Mastodon and Bluesky were created to build.

In a federated system anything one person can open another can close. Individual admins will decide for their users how their instances will operate. Those who don’t like their choice will be told they can port their accounts to one whose policies they prefer. That’s true, but unsatisfying as an answer. As the “Fediverse” grows, it must accommodate millions of mainstream users for whom moving servers is too complicated.

The key point, however, is that the illusion of control Mastodon seemed to offer is being punctured. Usenet users could have warned them: from its creation in 1979, users believed their postings were readable for a few weeks before expiring and being expunged. Then, in 1995, Steve Madere created the Deja News archive from scattered collections. Overnight, those “ephemeral” postings became permanent and searchable – and even more so, after 2001, when Google bought the archive (see groups.google.com).

The upshot: privacy in public networks is only ever illusory. Assume you have no control over anything you post, no matter how cozy and personal the network seems. As we’ve said before, the privacy-in-public afforded by the physical world has no online counterpart.

Illustrations: A mastodon by Heinrich Harder (public domain, via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Relativity

“Status: closed,” the website read. It gave the time as 10:30 p.m.

Except it wasn’t. It was 5:30 p.m., and the store was very much open. The website, instead of consulting the time zone the store – I mean, the store’s particular branch whose hours and address I had looked up – was in was taking the time from my laptop. Which I hadn’t bothered to switch to the US east coat from Britain because I can subtract five hours in my head and why bother?

Years ago, I remember writing a rant (which I now cannot find) about the “myness” of modern computers: My Computer, My Documents. My account. And so on, like a demented two-year-old who needed to learn to share. The notion that the time on my laptop determined whether or not the store was open had something of the same feel: the computational universe I inhabit is designed to revolve around me, and any dispute with reality is someone else’s problem.

Modern social media have hardened this approach. I say “modern” because back in the days of bulletin board systems, information services, and Usenet, postings were time- and date-stamped with when they were sent and specifying a time zone. Now, every post is labelled “2m” or “30s” or “1d”, so the actual date and time are hidden behind their relationship to “now”. It’s like those maps that rotate along with you so wherever you’re pointed physically is at the top. I guess it works for some people, but I find it disorienting; instead of the map orienting itself to me, I want to orient myself to the map. This seems to me my proper (infinitesimal) place in the universe.

All of this leads up to the revival of software agents. This was a Big Idea in the late 1990s/early 2000s, when it was commonplace to think that the era of having to make appointments and book train tickets was almost over. Instead, software agents configured with your preferences would do the negotiating for you. Discussions of this sort of thing died away as the technology never arrived. Generative AI has brought this idea back, at least to some extent, particularly in the financial area, where smart contracts can be used to set rules and then run automatically. I think only people who never have to worry about being able to afford anything will like this. But they may be the only ones the “market” cares about.

Somewhere during the time when software agents were originally mooted, I happened to sit at a conference dinner with the University of Maryland human-computer interaction expert Ben Shneiderman. There are, he said, two distinct schools of thought in software. In one, software is meant to adapt to the human using it – think of predictive text and smartphones as an example. In the other, software is consistent, and while using it may be repetitive, you always know that x command or action will produce y result. If I remember correctly, both Shneiderman and I were of the “want consistency” school.

Philosophically, though, these twin approaches have something in common with seeing the universe as if the sun went around the earth as against the earth going around the sun. The first of those makes our planet and, by extension, us far more important in the universe than we really are. The second cuts us down to size. No surprise, then, if the techbros who build these things, like the Catholic church in Galileo’s day, prefer the former.

***

Politico has started the year by warning that the UK is seeking to expand its surveillance regime even further by amending the 2016 Investigatory Powers Act. Unnoticed in the run-up to Christmas, the industry body techUK sent a letter to “express our concerns”. The short version: the bill expands the definition of “telecommunications operator” to include non-UK providers when operating outside the UK; allows the Home Office to require companies to seek permission before making changes to a privately and uniquely specified list of services; and the government wants to whip it through Parliament as fast as possible.

No, no, Politico reports the Home Office told the House of Lords, it supports innovation and isn’t threatening encryption. These are minor technical changes. But: “public safety”. With the ink barely dry on the Online Safety Act, here we go again.

***

As data breaches go, the one recently reported by 23andMe is alarming. By using passwords exposed in previous breaches (“credential stuffing”) to break into 14,000 accounts, attackers gained access to 6.9 million account profiles. The reason is reminiscent of the Cambridge Analytica scandal, where access to a few hundred thousand Facebook accounts was leveraged to obtain the data of millions: people turned on “DNA Relatives to allow themselves to be found by those searching for genetic relatives. The company, which afterwards turned on a requireme\nt for two-factor authentication, is fending off dozens of lawsuits by blaming the users for reusing passwords. According to Gizmodo, the legal messiness is considerable, as the company recently changed its terms and conditions to make arbitration more difficult and litigation almost impossible.

There’s nothing good to say about a data breach like this or a company that handles such sensitive data with such disdainx. But it’s yet one more reason why putting yourself at the center of the universe is bad hoodoo.

Illustrations: DNA strands (via Wikimedia.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.