The apostrophe apocalypse

It was immediately tempting to view the absence of apostrophes on new street signs in a North Yorkshire town as a real-life example of computer systems crushing human culture. Then, near-simultaneously, Apple launched an ad (which it now regrets) showing just that process, raising the temptation even more. But no.

In fact, as Brandon Vigliarolo writes at The Register, not only is the removal of apostrophes in place names not new in the UK, but it also long precedes computers. The US Board on Geographic Names declared apostrophes unwanted as long ago as its founding year, 1890, apparently to avoid implying possession. This decision by the BGN, which has only made five exceptions in its history, was later embedded in the US’s Geographic Names Information System and British Standard 7666. When computers arrived to power databases, the practice carried on.

All that said, it’s my experience that the older British generation are more resentful of American-derived changes to their traditional language than they are of computer-driven alterations (one such neighbor complains about “sidewalk”). So campaigns to reinstate missing apostrophes seem likely to persist.

Blaming computers seemed like a coherent narrative, not least because new technology often disrupts social customs. Railways brought standardized time, and the desire to simplify things for computers led to the 2023 decision to eliminate leap seconds in 2035 (after 18 years of debate). Instead, the apostrophe apocalypse is a more ordinary story of central administrators preferencing their own convenience over local culture and custom (which may itself be contested). It still seems like people should be allowed to keep their street signs. I mean.

***

Of course language changes over time and usage. The character limits imposed by texting (and therefore exTwitter and other microblogging sites) brought us many abbreviations that are now commonplace in daily life, just as long before that the telegraph’s cost per word spawned its own compressed dialect. A new example popped up recently in Charles Arthur’s The Overspill.

Arthur highlighted an article at Level Up Coding/Medium by Fareed Khan that offered ways to distinguish between human-written and machine-generated text. It turns out that chatbots use distinctively different words than we do. Khan was able to generate a list of about 100 words that may indicate a chatbot has been at work, as well as a web app that can check a block of text or a file in one go. The word “delve” was at the top.

I had missed Khan’s source material, an earlier claim by YCombinator founder Paul Graham that “delve” used in an email pitch is a clear sign of ChatGPT-generated text. At the Guardian, Alex Hern suggests that an underlying cause may be the fact that much of the labeling necessary to train the large language models that power chatbots is carried out by badly paid people in the global South – including Africa, where “delve” is more commonly used than in Western countries.

At the Premium Times, Chiamaka Okafor argues that therefore identifying “delve” as a marker of “robotic text” penalizes African writers. “We are losing sight of an opportunity to rewrite the AI narratives that exclude people in the global majority,” she writes. A reminder: these chatbots are just math and statistics predicting the next word. They will always regress to the mean. And now they’ll penalize us for being different.

***

Just two years ago, researchers fretted that we were running out of “high-quality text” on which to train large language models. We’ve been seeing the results since, as sites hosting user-generated content strike deals with LLM owners, leading to contentious disputes between those owners and sites’ users, who feel betrayed and ripped off. Reddit began by charging for access to its API, then made a deal with Google to use its database of posts for training for an injection of cash that enabled it to go public. Yesterday, Reddit announced a similar deal with OpenAI – and the stock went up. In reality, these deals are asset-stripping a site that has consistently lost money for 18 years.

The latest site to sell its users’ content is the technical site Stack Overflow, Developers who offer mutual aid by answering each other’s questions are exactly the user base you would expect to be most offended by the news that the site’s owner, the investment group Prosus, which bought the site in 2021 for $1.8 billion, has made a deal giving OpenAI access to all its content. And so it proved: developers promptly began altering or removing their posts to protest the deal. Shortly thereafter, the site’s moderators began restoring those posts and suspending the users.

There’s no way this ends well; Internet history’s many such stories never have. The site’s original owners, who created the culture, are gone. The new ones don’t care what users *believe* their rights are if the terms and conditions grant an irrevocable license to everything they post. Inertia makes it hard to build a replacement; alienation thins out the old site. As someone posted to Twitter a few years ago, “On the Internet your home always leaves you.”

‘Twas ever thus. And so it will be until people stop taking the bait in the first place.

Illustrations: Apple’s canceled “crusher” ad.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Review: More Than a Glitch

More Than a Glitch: Confronting Race, Gender, and Ability Bias in Tech
By Meredith Broussard
MIT Press
ISBN: 978-0-262-04765-4

At the beginning of the 1985 movie Brazil, a family’s life is ruined when a fly gets stuck in a typewriter key so that the wrong man is carted away to prison. It’s a visual play on “computer bug”, so named after a moth got trapped in a computer at Harvard.

Based on her recent book More Than a Glitch, NYU associate professor Meredith Broussard, would call both the fly and the moth a “glitch”. In the movie, the error is catastrophic for Buttle-not-Tuttle and his family, but it’s a single, ephemeral mistake that can be prevented with insecticide and cross-checking. A “bug” is more complex and more significant: it’s “substantial”, “a more serious matter that makes software fail”. It “deserves attention”. It’s the difference between the lone rotten apple in a bushel full of good ones and a barrel that causes all the apples put it in to rot.

This distinction is Broussard’s prelude to her fundamental argument that the lack of fairness in computer systems is persistent, endemic, and structural. In the book, she examines numerous computer systems that are already out in the world causing trouble. After explaining the fundamentals of machine bias, she goes through a variety of sectors and applications to examine failures of fairness in each one. In education, proctoring software penalizes darker-skinned students by failing to identify them accurately, and algorithms used to estimate scores on tests canceled during the pandemic penalized exceptional students from unexpected backgrounds. In health, long-practiced “race correction” that derives from slavery preferences white patients for everything from painkillers to kidney transplants – and gets is embedded into new computer systems built to replicate existing practice. If computer developers don’t understand the way in which the world is prejudiced – and they don’t – how can the systems they create be more neutral than the precursors they replace? Broussard delves inside each system to show why, not just how, it doesn’t work as intended.

In other cases Broussard highlights, part of the problem is rigid inflexibility in back-end systems that need to exchange data. There’s little benefit in having 58 gender options if the underlying database only supports two choices. At a doctor’s office, Broussard is told she can only check one box for race; she prefer to check both “black” and “white” because in medical settings it may affect her treatment. The digital world remains only partially accessible. And, as Broussard discovered when she was diagnosed with breast cancer, even supposed AI successes like reading radiology films are overhyped. This section calls back to her 2018 book, Artificial Unintelligence, which did a good job of both explaining how machine learning and “AI” computer systems work and why a lot of the things the industry says work…really don’t (see also self-driving cars).

Broussard concludes by advocating for public interest technology and a rethink. New technology imitates the world it comes from; computers “predict the status quo”. Making change requires engineering technology so that it performs differently. It’s a tall order, and Broussard knows that. But wasn’t that the whole promise the technology founder made? That they could change the world to empower the rest of us?

Intents and purposes

One of the basic principles of data protection law is the requirement for consent for change of use. For example, giving a site a mobile number for two-factor authentication doesn’t entitle it to sell that number to a telemarketing company. Providing a home address to enable package delivery doesn’t also invite ads trying to manipulate my vote in an election. Governments, too, are subject to data protection law, but they have more scope than most to carve out – or simply take – exceptions for themselves.

And so to the UK’s Department of Work and Pensions, whose mission in life is supposed to be to provide people with the financial support the state has promised them, whether that’s welfare or state pensions – overall, about 23 million people. Schools Week reports that Jen Persson at Defend Digital Me has discovered that the DWP has a secret deal with the Department of Education granting it access to the National Pupil Database for the purpose of finding benefit fraud.

“Who knows their family’s personal confidential records are in the haystack used to find the fraudulent needle?” Persson asks.

Every part of this is a mess. First of all, it turns schools into hostile environments for those already at greatest risk. Second, as we saw as long ago as 2010, parents and children have little choice about the data schools collect and keep. The breadth and depth of this data has been expanding long enough to burn out the UK’s first campaigner on children’s privacy rights (Terri Dowty, with Action for Rights of Children), and keep the second (Persson) fully occupied for some years now.

Persson told Schools Week that more than 15 million of the people on the NPD have long since left school. That sounds right; the database was created in 2002, five years into Tony Blair’s database-loving Labour government. In the 2009 report Database State, written under the aegis of the Foundation for Information Policy Research, Ross Anderson, Terri Dowty, Philip Inglesant, William Heath, and Angela Sasse surveyed 46 government databases. They found that a quarter of them were “almost certainly illegal” under human rights or data protection law, and noted that Britain was increasingly centralizing all such data.

“The emphasis on data capture, form-filling, mechanical assessment and profiling damages professional responsibility and alienates the citizen from the state. Over two-thirds of the population no longer trust the government with their personal data,” they wrote then.

The report was published while Blair’s government was trying to implement the ID card enshrined in the 2006 ID Cards Act. This latest in a long string of such proposals following the withdrawal of ID cards after the end of World War II was ultimately squelched when David Cameron’s coalition government took office in 2010. The act was repealed in 2011.

These bits of history are relevant for three reasons: 1) there is no reason to believe that the Labour government everyone expects will win office in the next nine months will be any less keen on dataveillance; 2) tackling benefit fraud was what they claimed they wanted the ID card for in 2006; 3) you really don’t need an ID *card* if you have biometrics and ubiquitous, permanent access online to a comprehensive government database. This was obvious even in 2006, and now we’re seeing it in action.

Dowty often warned that children were used as experimental subjects on which British governments sharpened the policies they intended to expand to the rest of the population. And so it is proving: the use of education data to look for benefit fraud is the opening act for the provision in the Data Protection and Digital Information bill empowering the DWP to demand account data from banks and other financial institutions, again to reduce benefit fraud.

The current government writes, “The new proposals would allow regular checks to be carried out on the bank accounts held by benefit claimants to spot increases in their savings which push them over the benefit eligibility threshold, or when people send [sic] more time overseas than the benefit rules allow for.” The Information Commissioner’s Office has called the measure disproportionate, and says it does not provide sufficient safeguards.

Big Brother Watch, which is campaigning against this proposal, argues that it reverses the fundamental principle of the presumption of innocence. All pervasive “monitoring” does that; you are continuously a suspect except at the specific points where you’ve been checked and found innocent. .

In a commercial context, we’d call the coercion implicit in repurposing data given under compulsion bait and switch. We’d also bear in mind the Guardian’s recent expose: the DWP has been demanding back huge sums of money from carers who’ve made minor mistakes in reporting their income. As BBW also wrote, even a tiny false positive rate will give the DWP hundreds of thousands of innocent people to harass.

Thirty years ago, when I was first learning about the dangers of rampant data collection, it occurred to me that the only way you can ensure that data can’t be leaked, exploited, or used maliciously is to not collect in the first place. This isn’t a choice anyone can make now. But there are alternatives that reverse the trend toward centralization that Anderson et. al identified in 2009.

Illustrations: Haystacks at a Moldovan village (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Review: A History of Fake Things on the Internet

A History of Fakes on the Internet
By Walter J. Scheirer
Stanford University Press
ISBN 2023017876

One of Agatha Christie’s richest sources of plots was the uncertainty of identity in England’s post-war social disruption. Before then, she tells us, anyone arriving to take up residence in a village brought a letter of introduction; afterwards, old-time residents had to take newcomers at their own valuation. Had she lived into the 21st century, the arriving Internet would have given her whole new levels of uncertainty to play with.

In his recent book A History of Fake Things on the Internet, University of Notre Dame professor Walter J. Scheirer describes creating and detecting online fakes as an ongoing arms race. Where many people project doomishly that we will soon lose the ability to distinguish fakery from reality, Scheirer is more optimistic. “We’ve had functional policies in the past; there is no good reason we can’t have them again,” he concludes, adding that to make this happen we need a better understanding of the media that support the fakes.

I have a lot of sympathy with this view; as I wrote recently, things that fool people when a medium is new are instantly recognizable as fake once they become experienced. We adapt. No one now would be fooled by the images that looked real in the early days of photography. Our perceptions become more sophisticated, and we learn to examine context. Early fakes often work simply because we don’t know yet that such fakes are possible. Once we do know, we exercise much greater caution before believing. Teens who’ve grown up applying filters to the photos and videos they upload to Instagram and TikTok, see images very differently than those of us who grew up with TV and film.

Schierer begins his story with the hacker counterculture that saw computers as a source of subversive opportunities. His own research into media forensics began with Photoshop. At the time, many, especially in the military, worried that nation-states would fake content in order to deceive and manipulate. What they found, in much greater volume, was memes and what Schierer calls “participatory fakery” – that is, the cultural outpouring of fakes for entertainment and self-expression, most of it harmless. Further chapters consider cheat codes in games, the slow conversion of hackers into security practitioners, adversarial algorithms and media forensics, shock-content sites, and generative AI.

Through it all, Schierer remains optimistic that the world we’re moving into “looks pretty good”. Yes, we are discovering hundreds of scientific papers with faked data, faked results, or faked images, but we also have new analysis tools to use to detect them and Retraction Watch to catalogue them. The same new tools that empower malicious people enable many more positive uses for storytelling, collaboration, and communication. Perhaps forgetting that the computer industry relentlessly ignores its own history, he writes that we should learn from the past and react to the present.

The mention of scientific papers raises an issue Schierer seems not to worry about: waste. Every retracted paper represents lost resources – public funding, scientists’ time and effort, and the same multiplied into the future for anyone who attempts to build on that paper. Figuring out how to automate reliable detection of chatbot-generated text does nothing to lessen the vast energy, water, and human resources that go into building and maintaining all those data centers and training models (see also filtering spam). Like Scheirer, I’m largely optimistic about our ability to adapt to a more slippery virtual reality. But the amount of wasted resources is depressing and, given climate change, dangerous.

Deja news

At the first event organized by the University of West London group Women Into Cybersecurity, a questioner asked how the debates around the Internet have changed since I wrote the original 1997 book net.wars..

Not much, I said. Some chapters have dated, but the main topics are constants: censorship, freedom of speech, child safety, copyright, access to information, digital divide, privacy, hacking, cybersecurity, and always, always, *always* access to encryption. Around 2010, there was a major change when the technology platforms became big enough to protect their users and business models by opposing government intrusion. That year Google launched the first version of its annual transparency report, for example. More recently, there’s been another shift: these companies have engorged to the point where they need not care much about their users or fear regulatory fines – the stage Ed Zitron calls the rot economy and Cory Doctorow dubs enshittification.

This is the landscape against which we’re gearing up for (yet) another round of recursion. April 25 saw the passage of amendments to the UK’s Investigatory Powers Act (2016). These are particularly charmless, as they expand the circumstances under which law enforcement can demand access to Internet Connection Records, allow the government to require “exceptional lawful access” (read: backdoored encryption) and require technology companies to get permission before issuing security updates. As Mark Nottingham blogs, no one should have this much power. In any event, the amendments reanimate bulk data surveillance and backdoored encryption.

Also winding through Parliament is the Data Protection and Digital Information bill. The IPA amendments threaten national security by demanding the power to weaken protective measures; the data bill threatens to undermine the adequacy decision under which the UK’s data protection law is deemed to meet the requirements of the EU’s General Data Protection Regulation. Experts have already put that adequacy at risk. If this government proceeds, as it gives every indication of doing, the next, presumably Labour, government may find itself awash in an economic catastrophe as British businesses become persona-non-data to their European counterparts.

The Open Rights Group warns that the data bill makes it easier for government, private companies, and political organizations to exploit our personal data while weakening subject access rights, accountability, and other safeguards. ORG is particularly concerned about the impact on elections, as the bill expands the range of actors who are allowed to process personal data revealing political opinions on a new “democratic engagement activities” basis.

If that weren’t enough, another amendment also gives the Department of Work and Pensions the power to monitor all bank accounts that receive payments, including the state pension – to reduce overpayments and other types of fraud, of course. And any bank account connected to those accounts, such as landlords, carers, parents, and partners. At Computer Weekly, Bill Goodwin suggests that the upshot could be to deter landlords from renting to anyone receiving state benefits or entitlements. The idea is that banks will use criteria we can’t access to flag up accounts for the DWP to inspect more closely, and over the mass of 20 million accounts there will be plenty of mistakes to go around. Safe prediction: there will be horror stories of people denied benefits without warning.

And in the EU… Techcrunch reports that the European Commission (always more surveillance-happy and less human rights-friendly than the European Parliament) is still pursuing its proposal to require messaging platforms to scan private communications for child sexual abuse material. Let’s do the math of truly large numbers: billions of messages, even a teeny-tiny percentage of inaccuracy, literally millions of false positives! On Thursday, a group of scientists and researchers sent an open letter pointing out exactly this. Automated detection technologies perform poorly, innocent images may occur in clusters (as when a parent sends photos to a doctor), and such a scheme requires weakening encryption, and in any case, better to focus on eliminating child abuse (taking CSAM along with it).

Finally, age verification, which has been pending in the UK ever since at least 2016, is becoming a worldwide obsession. At least eight US states and the EU have laws mandating age checks, and the Age Verification Providers Association is pushing to make the Internet “age-aware persistently”. Last month, the BSI convened a global summit to kick off the work of developing a worldwide standard. These moves are the latest push against online privacy; age checks will be applied to *everyone*, and while they could be designed to respect privacy and anonymity, the most likely is that they won’t be. In 2022, the French data protection regulator, CNIL, found that current age verification methods are both intrusive and easily circumvented. In the US, Casey Newton is watching a Texas case about access to online pornography and age verification that threatens to challenge First Amendment precedent in the Supreme Court.

Because the debates are so familiar – the arguments rarely change – it’s easy to overlook how profoundly all this could change the Internet. An age-aware Internet where all web use is identified and encrypted messaging services have shut down rather than compromise their users and every action is suspicious until judged harmless…those are the stakes.

Illustrations: Angel sensibly smashes the ring that makes vampires impervious (in Angel, “In the Dark” (S01e03).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon