Safe

That didn’t take long. Since last week’s fret about AI startups ignoring the robots.txt convention, Thomas Claburn has reported at The Register that Cloudflare has developed a scraping prevention tool that identifies and blocks “content extraction” bots attempting to crawl sites at scale.

It’s a stopgap, not a solution. As Cloudflare’s announcement makes clear, the company knows there will be pushback; given these companies’ lack of interest in following existing norms, blocking tools versus scraping bots is basically the latest arms race (previously on this plotline: spam). Also, obviously, the tool only works on sites that are Cloudflare customers. Although these include many of the web’s largest sites, there are hundreds of millions more that won’t, don’t, or can’t pay for its services. If we want to return control to site owners, we’re going to need a more permanent and accesible solution.

In his 1999 book Code and Other Laws of Cyberspace, Lawrence Lessig finds four forms of regulation: norms, law, markets, and architecture. Norms are failing. Markets will just mean prolonged arms races. We’re going to need law and architecture.

***

We appear to be reaching peak “AI” hype, defined by (as in the peak of app hype) the increasing absurdity of things venture capitalists seem willing to fund. I recall reading the comment that at the peak of app silliness a lot of startups were really just putting a technological gloss on services that young men will previously have had supplied by their mothers. The AI bubble seems to be even less productive of long-term value, calling things “AI” that are not at all novel, and proposing “AI” to patch problems that call for real change.

As an example of the first of those, my new washing machine has a setting called “AI patterns”. The manual explains: it reorders the preset programs on the machine’s dial so the ones you use most appear first. It’s not stupid (although I’ve turned it off anyway, along with the wifi and “smart” features I would rather not pay for), but let’s call it what it is: customizing a menu.

As an example of the second…at Gizmodo, Maxwell Zeff reports that Softbank is claiming to have developed an “emotion canceling” AI that “alters angry voices into calm ones”. The use Softbank envisages is to lessen the stress for call center employees by softening the voices of angry customers without changing their actual words. There are, as people pointed out on Mastodon after the article was posted there, a lot smarter alternatives to reducing those individuals’ stress. Like giving them better employment conditions, or – and here’s a really radical thought – designing your services and products so your customers aren’t so frustrated and angry. What this software does is just falsify the sound. My guess is that if there is a result it will be to make customers even more angry and frustrated. More anger in the world. Great.

***

Oh! Sarcasm, even if only slight! At the Guardian, Ned Carter Miles reports on “emotional AI” (can we say “oxymoron”?). Among his examples is a team at the University of Groningen that is teaching an AI to recognize sarcasm using scenes from US sitcoms such as Friends and The Big Bang Theory. Even absurd-sounding research can be a good thing. I’m still not sure how good a guide sitcoms are for identifying emotions in real-world context even apart from the usual issues of algorithmic bias. After all, actors are given carefully crafted words and work harder to communicate their emotional content than ordinary people normally do.

***

Finally, again in the category of peak-AI-hype is this: at the New York Times Cade Metz is reporting that Ilya Sutskever, a co-founder and former chief scientist at OpenAI, has a new startup whose goal is to create a “safe superintelligence”.

Even if you, unlike me, believe that a “superintelligence” is an imminent possibility, what does “safe” mean, especially in an industry that still treats security and accessibility as add-ons? “Safe” is, like “secure”, meaningless without context and a threat model. Safe from what? Safe for what? To do what? Operated by whom? Owned by whom? With what motives? For how long? We create new intelligent humans all the time. Do we have any ability to ensure they’re “safe” technology? If an AGI is going to be smarter than a human, how can anyone possibly promise it will be, in the industry parlance, “aligned” with our goals? And for what value of “our”? Beware the people who want to build the Torment Nexus!

It’s nonsense. Safety can’t be programmed into a superintelligence any more than Isaac Asimov’s Laws of Robotics.

Sutskever’s own comments are equivocal. In a video clip at the Guardian, Sutsekver confusingly says both that “AI will solve all our problems” and that it will make fake news, cyber attacks, and weapons much worse and “has the potential to create infinitely stable dictatorships”. Then he adds, “I feel that technology is a force of nature.” Which is exactly the opposite of what technology is…but it suits the industry to push the inevitability narrative that technological progress cannot be stopped.

Cue Douglas Adams: “This is obviously some strange use of the word ‘safe’ I wasn’t previously aware of.”

Illustrations: The Big Bang Theory‘s Leonard (Johnny Galecki) teaching Sheldon (Jim Parsons) about sarcasm (Season 1, episode 2, “The Big Bran Hypothesis”).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Changing the faith

The governance of Britain and the governance of the Internet have this in common: the ultimate authority in both cases is to a large extent a “gentleman’s agreement”. For the same reason: both were devised by a relatively small, homogeneous group of people who trusted each other. In the case of Britain, inertia means that even without a written constitution the country goes on electing governments and passing laws as if.

Most people have no reason to know that the Internet’s technical underpinnings are defined by a series of documents known as RFCs, for Requests(s) for Comments. RFC1 was defined in April 1969; the most recent, RFC9598, is dated just last month. While the Internet Engineering Task Force oversees RFCs’ development and administration, it has no power to force anyone to adopt them. Throughout, RFC standards have been created collaboratively by volunteers and adopted on merit.

A fair number of RFCs promote good “Internet citizenship”. There are, for example, email addresses (chiefly, webmaster and postmaster) that anyone running a website is supposed to maintain in order to make it easy for a third party to report problems. Today, probably millions of website owners don’t even know this expectation exists. For Internet curmudgeons over a certain age, however, seeing email to those addresses bounce is frustrating.

Still, many of these good-citizen practices persist. One such is the Robots Exclusion Protocol, updated in 2022 as RFC 9309, which defines a file, “robots.txt”, that website owners can put in place to tell automated web crawlers which parts of the site they may access and copy. This may have mattered less in recent years than it did in 1994, when it was devised. As David Pierce recounts at The Verge, at that time an explosion of new bots were beginning to crawl the web to build directories and indexes (no Google until 1998!). Many of those early websites were hosted on very small systems based in people’s homes or small businesses, and could be overwhelmed by unrestrained crawlers. Robots txt, devised by a small group of administrators and developers, managed this problem.

Even without a legal requirement to adopt it, early Internet companies largely saw being good Internet citizens as benefiting them. They, too, were small at the time, and needed good will to bring them the users and customers that have since made them into giants. It served everyone’s interests to comply.

Until more or less now. This week, Katie Paul is reporting at Reuters that “AI” companies are blowing up this arrangement by ignoring robots.txt and scraping whatever they want. This news follows reporting by Randall Lane at Forbes that Perplexity.ai is using its software to generate stories and podcasts using news sites’ work without credit. At Wired, Druv Mehrotra and Tim Marchman report a similar story: Perplexity is ignoring robots.txt and scraping areas of sites that owners want left alone. At 404 Media, Emmanuel Maiberg reports that Perplexity also has a dubious history of using fake accounts to scrape Twitter data.

Let’s not just pick on Perplexity; this is the latest in a growing trend. Previously, hiQ Labs tried scraping data from LinkedIn in order to build services to sell employers, the courts finally ruled in 2019 that hiQ violated LinkedIn’s terms and conditions. More controversially, in the last few years Clearview AI has been responding to widespread criticism by claiming that any photograph published on the Internet is “public” and therefore within its rights to grab for its database and use to identify individuals online and offline. The result has been myriad legal actions under data protection law in the EU and UK, and, in the US, a sheaf of lawsuits. Last week, Kashmir Hill reported at the New York Times, that because Clearview lacks the funds to settle a class action lawsuit it has offered a 23% stake to Americans whose faces are in its database.

As Pierce (The Verge) writes, robots.txt used to represent a fair-enough trade: website owners got search engine visibility in return for their data, and the owners of the crawlers got the data but in return sent traffic.

But AI startups ingesting data to build models don’t offer any benefit in return. Where search engines have traditionally helped people find things on other sites, the owners of AI chatbots want to keep the traffic for themselves. Perplexity bills itself as an “answer engine”. A second key difference is this: none of these businesses are small. As Vladen Joler pointed out last month at CPDP, “AI comes pre-monopolized.” Getting into this area requires billions in funding; by contrast many early Internet businesses started with just a few hundred dollars.

This all feels like a watershed moment for the Internet. For most of its history, as Charles Arthur writes at The Overspill, every advance has exposed another area where the Internet operates on the basis of good faith. Typically, the result is some form of closure – spam, for example, led the operators of mail servers to close to all but authenticated users. It’s not clear to a non-technical person what stronger measure other than copyright law could replace the genteel agreement of robots.txt, but the alternative will likely be closing open access to large parts of the web – a loss to all of us.

Illustrations: Vladen Joler at CPDP 2024, showing his map of the extractive industries required to underpin “AI”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Hostages

If you grew up with the slow but predictable schedule of American elections, the abruptness with which a British prime minister can prorogue Parliament and hit the campaign trail is startling. Among the pieces of legislation that fell by the wayside this time is the Data Protection and Digital Information bill, which had reached the House of Lords for scrutiny. The bill had many problems. This was the bill that proposed to give the Department of Work and Pensions the right to inspect the bank accounts and financial assets of anyone receiving any government benefits and undermined aspects of the adequacy agreement that allows UK companies to exchange data with businesses in the EU.

Less famously, it also includes the legislative underpinnings for a trust framework for digital verification. On Monday, at a UCL’s conference on crime science, Sandra Peaston, director of research and development at the fraud prevention organization Cifas, outlined how all this is intended to work and asked some pertinent questions. Among them: whether the new regulator will have enough teeth; whether the certification process is strong enough for (for example) mortgage lenders; and how we know how good the relevant algorithm is at identifying deepfakes.

Overall, I think we should be extremely grateful this bill wasn’t rushed through. Quite apart from the digital rights aspects, the framework for digital identity really needs to be right; there’s just too much risk in getting it wrong.

***

At Bloomberg, Mark Gurman reports that Apple’s arrangement with OpenAI to integrate ChatGPT into the iPhone, iPad, and Mac does not involve Apple paying any money. Instead, Gurman cites unidentified sources to the effect that “Apple believes pushing OpenAI’s brand and technology to hundreds of millions of its devices is of equal or greater value than monetary payments.”

We’ve come across this kind of claim before in arguments between telcos and Internet companies like Netflix or between cable companies and rights holders. The underlying question is who brings more value to the arrangement, or who owns the audience. I can’t help feeling suspicious that this will not end well for users. It generally doesn’t.

***

Microsoft is on a roll. First there was the Recall debacle. Now come accusations by a former employee that it ignored a reported security flaw in order to win a large government contract, as Renee Dudley and Doris Burke report at Pro Publica. Result: the Russian Solarwinds cyberattack on numerous US government departments and agencies, including the National Nuclear Security Administration.

This sounds like a variant of Cory Doctorow’s enshittification at the enterprise level (see also: Boeing). They don’t have to be monopolies: these organizations’ evolving culture has let business managers override safety and security engineers. This is how Challenger blew up in 1986.

Boeing is too big and too lacking in competition to be allowed to fail entirely; it will have to find a way back. Microsoft has a lot of customer lock-in. Is it too big to fail?

***

I can’t help feeling a little sad at the news that Raspberry Pi has had an IPO. I see no reason why it shouldn’t be successful as a commercial enterprise, but its values will inevitably change over time. CEO Eben Upton swears they won’t, but he won’t be CEO forever, as even he admits. But: Raspberry Pi could become the “unicorn” Americans keep saying Europe doesn’t have.

***

At that same UCL event, I finally heard someone say something positive about AI – for a meaning of “AI” that *isn’t* chatbots. Sarah Lawson, the university’s chief information security officer, said that “AI and machine learning have really changed the game” when it comes to detecting email spam, which remains the biggest vector for attacks. Dealing with the 2% that evades the filters is still a big job, as it leaves 6,000 emails a week hitting people’s inboxes – but she’ll take it. We really need to be more specific when we say “AI” about what kind of system we mean; success at spam filtering has nothing to say about getting accurate information out of a large language model.

***

Finally, I was highly amused this week when long-time security guy Nick Selby, posted on Mastodon about a long-forgotten incident from 1999 in which I disparaged the sort of technology Apple announced this week that’s supposed to organize your life for you – tell you when it’s time to leave for things based on the traffic, juggle meetings and children’s violin recitals, that sort of thing. Selby felt I was ahead of my time because “it was stupid then and is stupid now because even if it works the cost is insane and the benefit really, really dodgy”,

One of the long-running divides in computing is between the folks who want computers to behave predictably and those who want computers to learn from our behavior what’s wanted and do that without intervention. Right now, the latter is in ascendance. Few of us seem to want the “AI features” being foisted on us. But only a small percentage of mainstream users turn off defaults (a friend was recently surprised to learn you can use the history menu to reopen a closed browser tab). So: soon those “AI features” will be everywhere, pointlessly and extravagantly consuming energy, water, and human patience. How you use information technology used to be a choice. Now, it feels like we’re hostages.

Illustrations: Raspberry Pi: the little computer that could (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Soap dispensers and Skynet

In the TV series Breaking Bad, the weary ex-cop Mike Ehrmantraut tells meth chemist Walter White : “No more half measures.” The last time he took half measures, the woman he was trying to protect was brutally murdered.

Apparently people like to say there are no dead bodies in privacy (although this is easily countered with ex-CIA director General Michael Hayden’s comment, “We kill people based on metadata”). But, as Woody Hartzog told a Senate committee hearing in September 2023, summarizing work he did with Neil Richards and Ryan Durrie, half measures in AI/privacy legislation are still a bad thing.

A discussion at Privacy Law Scholars last week laid out the problems. Half measures don’t work. They don’t prevent societal harms. They don’t prevent AI from being deployed where it shouldn’t be. And they sap the political will to follow up with anything stronger.

In an article for The Brink, Hartzog said, “To bring AI within the rule of law, lawmakers must go beyond half measures to ensure that AI systems and the actors that deploy them are worthy of our trust,”

He goes on to list examples of half measures: transparency, committing to ethical principles, and mitigating bias. Transparency is good, but doesn’t automatically bring accountability. Ethical principles don’t change business models. And bias mitigation to make a technology nominally fairer may simultaneously make it more dangerous. Think facial recognition: debias the system and improve its accuracy for matching the faces of non-male, non-white people, and then it’s used to target those same people with surveillance.

Or, bias mitigation may have nothing to do with the actual problem, an underlying business model, as Arvind Narayanan, author of the forthcoming book AI Snake Oil, pointed out a few days later at an event convened by the Future of Privacy Forum. In his example, the Washington Post reported in 2019 on the case of an algorithm intended to help hospitals predict which patients will benefit from additional medical care. It turned out to favor white patients. But, Narayanan said, the system’s provider responded to the story by saying that the algorithm’s cost model accurately predicted the costs of additional health care – in other words, the algorithm did exactly what the hospital wanted it to do.

“I think hospitals should be forced to use a different model – but that’s not a technical question, it’s politics.”.

Narayanan also called out auditing (another Hartzog half measure). You can, he said, audit a human resources system to expose patterns in which resumes it flags for interviews and which it drops. But no one ever commissions research modeled on the expensive random controlled testing common in medicine that follows up for five years to see if the system actually picks good employees.

Adding confusion is the fact that “AI” isn’t a single thing. Instead, it’s what someone called a “suitcase term” – that is, a container for many different systems built for many different purposes by many different organizations with many different motives. It is absurd to conflate AGI – the artificial general intelligence of science fiction stories and scientists’ dreams that can surpass and kill us all – with pattern-recognizing software that depends on plundering human-created content and the labeling work of millions of low-paid workers

To digress briefly, some of the AI in that suitcase is getting truly goofy. Yum Brands has announced that its restaurants, which include Taco Bell, Pizza Hut, and KFC, will be “AI-first”. Among Yum’s envisioned uses, the company tells Benj Edwards at Ars Technica, are being able to ask an app what temperature to set the oven. I can’t help suspecting that the real eventual use will be data collection and discriminatory pricing. Stuff like this is why Ed Zitron writes postings like The Rot-Com Bubble, which hypothesizes that the reason Internet services are deteriorating is that technology companies have run out of genuinely innovative things to sell us.

That you cannot solve social problems with technology is a long-held truism, but it seems to be especially true of the messy middle of the AI spectrum, the use cases active now that rarely get the same attention as the far ends of that spectrum.

As Neil Richards put it at PLSC, “The way it’s presented now, it’s either existential risk or a soap dispenser that doesn’t work on brown hands when the real problem is the intermediate level of societal change via AI.”

The PLSC discussion included a list of the ways that regulations fail. Underfunded enforcement. Regulations that are pure theater. The wrong measures. The right goal, but weakly drafted legislation. Make the regulation ambiguous, or base it on principles that are too broad. Choose conflicting half-measures – for example, require transparency but add the principle that people should own their own data.

Like Cristina Caffarra a week earlier at CPDP, Hartzog, Richards, and Durrie favor finding remedies that focus on limiting abuses of power. Full measures include outright bans, the right to bring a private cause of action, imposing duties of “loyalty, care, and confidentiality”, and limiting exploitative data practices within these systems. Curbing abuses of power, as he says, is nothing new. The shiny new technology is a distraction.

Or, as Narayanan put it, “Broken AI is appealing to broken institutions.”

Illustrations: Mike (Jonathan Banks) telling Walt (Bryan Cranston) in Breaking Bad (S03e12) “no more half measures”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Admiring the problem

In one sense, the EU’s barely dry AI Act and the other complex legislation – the Digital Markets Act, Digital Services Act, GDPR, and so on -= is a triumph. Flawed it may be, but it’s a genuine attempt to protect citizens’ human rights against a technology that is being birthed with numerous trigger warnings. The AI-with-everything program at this year’s Computers, Privacy, and Data Protection, reflected that sense of accomplishment – but also the frustration that comes with knowing that all legislation is flawed, all technology companies try to game the system, and gaps will widen.

CPDP has had these moments before: new legislation always comes with a large dollop of frustration over the opportunities that were missed and the knowledge that newer technologies are already rushing forwards. AI, and the AI Act, more or less swallowed this year’s conference as people considered what it says, how it will play internationally, and the necessary details of implementation and enforcement. Two years at this event, inadequate enforcement of GDPR was a big topic.

The most interesting future gaps that emerged this year: monopoly power, quantum sensing, and spatial computing.

For at least 20 years we’ve been hearing about quantum computing’s potential threat to public key encryption – that day of doom has been ten years away as long as I can remember, just as the Singularity is always 30 years away. In the panel on quantum sensing, Chris Hoofnagle argued that, as he and Simson Garfinkel recently wrote at Lawfare and in their new book, quantum cryptanalysis is overhyped as a threat (although there are many opportunities for quantum computing in chemistry and materials science). However, quantum sensing is here now, works (because qubits are fragile), and is cheap. There is plenty of privacy threat here to go around: quantum sensing will benefit entirely different classes of intelligence, particularly remote, undetectable surveillance.

Hoofnagle and Garfinkel are calling this MASINT, for machine and signature intelligence, and believe that it will become very difficult to hide things, even at a national level. In Hoofnagle’s example, a quantum sensor-equipped drone could fly over the homes of parolees to scan for guns.

Quantum sensing and spatial computing have this in common: they both enable unprecedented passive data collection. VR headsets, for example, collect all sorts of biomechanical data that can be mined more easily for personal information than people expect.

Barring change, all that data will be collected by today’s already-powerful entities.

The deeper level on which all this legislation fails particularly exercised Cristina Caffarra, the co-founder of the Centre for Economic Policy Research in the panel on AI and monopoly, saying that all this legislation is basically nibbling around the edges because they do not touch the real, fundamental problem of the power being amassed by the handful of companies who own the infrastructure.

“It’s economics 101. You can have as much downstream competition as you like but you will never disperse the power upstream.” The reports and other material generated by government agencies like the UK’s Competition and Markets Authority are, she says, just “admiring the problem”.

A day earlier, the Novi Sad professor Vladen Joler had already pointed out the fundamental problem: at the dawn of the Internet anyone could start with nothing and build something; what we’re calling “AI” requires billions in investment, so comes pre-monopolized. Many people dismiss Europe for not having its own homegrown Big Tech, but that overlooks open technologies: the Raspberry Pi, Linux, and the web itself, which all have European origins.

In 2010, the now-departing MP Robert Halfon (Con-Harlow) said at an event on reining in technology companies that only a company the size of Google – not even a government – could create Street View. Legend has it that open source geeks heard that as a challenge, and so we have OpenStreetMap. Caffarra’s fiery anger raises the question: at what point do the infrastructure providers become so entrenched that they could choke off an open source competitor at birth? Caffarra wants to build a digital public interest infrastructure using the gaps where Big Tech doesn’t yet have that control.

The Dutch Groenlinks MEP Kim van Sparrentak offered an explanation for why the AI Act doesn’t address market concentration: “They still dream of a European champion who will rule the world.” An analogy springs to mind: people who vote for tax cuts for billionaires because one day that might be *them*. Meanwhile, the UK’s Competition and Markets Authority finds nothing to investigate in Microsoft’s partnership with the French AI startup Mistral.

Van Sparrentak thinks one way out is through public procurement; adopt goals of privacy and sustainability, and support European companies. It makes sense; as the AI Now Institute’s Amba Kak, noted, at the moment almost everything anyone does digitally has to go through the systems of at least one Big Tech company.

As Sebastiano Toffaletti, head of the secretariat of the European SME Alliance, put it, “Even if you had all the money in the world, these guys still have more data than you. If you don’t and can’t solve it, you won’t have anyone to challenge these companies.”

Illustrations: Vladen Joler shows Anatomy of an AI System, a map he devised with Kate Crawford of the human labor, data, and planetary resources that are extracted to make “AI”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Microsoft can remember it for you wholesale

A new theory: somewhere in the Silicon Valley universe there’s a cadre of techies who have eidetic memories and they’re feeling them start to slip. Panic time.

That’s my best explanation for Microsoft’s latest wheeze, a new feature for its Copilot assistant that will take what’s variously called a “snapshot” or a “screenshot” of your computer (all three monitors?) every five seconds and store it for future reference. Microsoft hasn’t explained much about Recall’s inner technical workings, but according to the announcement, the data will be stored locally and will be searchable via semantic associations and some sort of “AI”. Microsoft also says the data will not be used to train AI models.

The general anger and dismay at this plan brings back, almost nostalgically, memories of the 1990s, when Microsoft was near-universally hated as the evil monopolist dominating computing. In 2008, when Google was ten years old, a BBC presenter asked me if I thought Google would ever be hated as much as Microsoft was (not then, no). In 2012, veteran journalist Charles Arthur published the book Digital Wars about how Microsoft had stagnated and lost its lead. And then suddenly, in the last few years, it’s back on top.

Possibilities occur that Microsoft doesn’t mention. For example: could software might be embedded into Windows to draw inferences from the data Recall saves? And could those inferences be forwarded to the company or used to target you with ads? That seems like a far more efficient way to invade users’ privacy than copying the data itself, if that’s what the company ultimately wants to do.

Lots of things on our computers already retain a “memory” of what we’ve been doing. Operating systems generate logs to help debug problems. Word processors retain a changelog, which powers the ability to undo mistakes. Web browsers have user-configurable histories; email software has archives; media players retain playlists. All of those are useful – but part of that usefulness is that they are contextual, limited, and either easily terminated by closing the relevant application or relatively easily edited to remove items that shouldn’t be kept.

It’s hard for almost everyone who isn’t Microsoft to understand the point of keeping everything by default. It seems like a feature only developers could love. I certainly would like Windows to be better at searching for stored files or my (Firefox) browser to be better at reloading that article I was reading yesterday. I have even longed for a personal version of Vannevar Bush’s Memex. As part of that, I might welcome a feature that let me hit a button to record the last five useful minutes of a meeting, or save a social media post to a local archive. But the key to that sort of memory expansion is curation, not remembering everything promiscuously. For most people, selective forgetting is how we survive the torrents of irrelevance hurled at us every day.

What Recall sounds most like is the lifelog science fiction writer Charlie Stross imagined in 2007 might be our future. Plummeting storage costs and expanding capacity, he reasoned, would make it possible to store *everything* in your pocket. Even then, there were (a very few) people doing that sort of thing, most notably Steve Mann, a University of Toronto professor who started wearing devices to comprhensively capture his life as a 1990s graduate student. Over the years, Mann has shrunk his personal gadget array from a laptop and peripherals to glasses and pocket devices. Many more people capture their surroundings now – but they do it on their phones. If Apple or Google were proposing a Recall feature for iOS or Android, the idea would seem a lot less weird.

The real issue is that there are many people who would like to be able to know what somone *else* has been doing on their computer at all times. Helicopter parents. Schools and teachers under government compulsion (see for example Prevent (PDF)). Employers. Border guards. Corporate spies. The Department of Work and Pensions. Authoritarian governments. Law enforcement and security agencies. Criminals. Domestic abusers… So developing any feature like this must include considering how to protect it against these threats. This does not appear to have happened.

Many others have written about the privacy issues in all this – the UK’s Information Commission’s Office is already investigating. At The Register, Richard Speed does a particularly good job of looking at some of the fine details. On Mastodon, Kevin Beaumont says inspection of the Copilot+ software suggests that Recall stores the text it extracts from all those snapshots into an easily copiable SQlite database.

But there’s still more. The kind of archive Recall appears to construct can teach an attacker how the target thinks: not just what passwords they choose but how they devise them.Those patterns can be highly valuable. Granted, few targets are worth that level of attention, but it happens, as Peter Davies, a technical director at eThales, has often warned.

Recall is not the only move – see also flawed-AI-with-everything – that suggests that the computer industry, like some politicians and governments, is badly losing touch with the public. Increasingly, what they want to do seems unrelated to what the rest of us want. If they think things like Recall are a good idea they need to read more Philip K. Dick. And then don’t invent the Torment Nexus.

Illustrations: Arnold Schwarzenegger seeking better memories in the 1990 film Total Recall.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon..

The apostrophe apocalypse

It was immediately tempting to view the absence of apostrophes on new street signs in a North Yorkshire town as a real-life example of computer systems crushing human culture. Then, near-simultaneously, Apple launched an ad (which it now regrets) showing just that process, raising the temptation even more. But no.

In fact, as Brandon Vigliarolo writes at The Register, not only is the removal of apostrophes in place names not new in the UK, but it also long precedes computers. The US Board on Geographic Names declared apostrophes unwanted as long ago as its founding year, 1890, apparently to avoid implying possession. This decision by the BGN, which has only made five exceptions in its history, was later embedded in the US’s Geographic Names Information System and British Standard 7666. When computers arrived to power databases, the practice carried on.

All that said, it’s my experience that the older British generation are more resentful of American-derived changes to their traditional language than they are of computer-driven alterations (one such neighbor complains about “sidewalk”). So campaigns to reinstate missing apostrophes seem likely to persist.

Blaming computers seemed like a coherent narrative, not least because new technology often disrupts social customs. Railways brought standardized time, and the desire to simplify things for computers led to the 2023 decision to eliminate leap seconds in 2035 (after 18 years of debate). Instead, the apostrophe apocalypse is a more ordinary story of central administrators preferencing their own convenience over local culture and custom (which may itself be contested). It still seems like people should be allowed to keep their street signs. I mean.

***

Of course language changes over time and usage. The character limits imposed by texting (and therefore exTwitter and other microblogging sites) brought us many abbreviations that are now commonplace in daily life, just as long before that the telegraph’s cost per word spawned its own compressed dialect. A new example popped up recently in Charles Arthur’s The Overspill.

Arthur highlighted an article at Level Up Coding/Medium by Fareed Khan that offered ways to distinguish between human-written and machine-generated text. It turns out that chatbots use distinctively different words than we do. Khan was able to generate a list of about 100 words that may indicate a chatbot has been at work, as well as a web app that can check a block of text or a file in one go. The word “delve” was at the top.

I had missed Khan’s source material, an earlier claim by YCombinator founder Paul Graham that “delve” used in an email pitch is a clear sign of ChatGPT-generated text. At the Guardian, Alex Hern suggests that an underlying cause may be the fact that much of the labeling necessary to train the large language models that power chatbots is carried out by badly paid people in the global South – including Africa, where “delve” is more commonly used than in Western countries.

At the Premium Times, Chiamaka Okafor argues that therefore identifying “delve” as a marker of “robotic text” penalizes African writers. “We are losing sight of an opportunity to rewrite the AI narratives that exclude people in the global majority,” she writes. A reminder: these chatbots are just math and statistics predicting the next word. They will always regress to the mean. And now they’ll penalize us for being different.

***

Just two years ago, researchers fretted that we were running out of “high-quality text” on which to train large language models. We’ve been seeing the results since, as sites hosting user-generated content strike deals with LLM owners, leading to contentious disputes between those owners and sites’ users, who feel betrayed and ripped off. Reddit began by charging for access to its API, then made a deal with Google to use its database of posts for training for an injection of cash that enabled it to go public. Yesterday, Reddit announced a similar deal with OpenAI – and the stock went up. In reality, these deals are asset-stripping a site that has consistently lost money for 18 years.

The latest site to sell its users’ content is the technical site Stack Overflow, Developers who offer mutual aid by answering each other’s questions are exactly the user base you would expect to be most offended by the news that the site’s owner, the investment group Prosus, which bought the site in 2021 for $1.8 billion, has made a deal giving OpenAI access to all its content. And so it proved: developers promptly began altering or removing their posts to protest the deal. Shortly thereafter, the site’s moderators began restoring those posts and suspending the users.

There’s no way this ends well; Internet history’s many such stories never have. The site’s original owners, who created the culture, are gone. The new ones don’t care what users *believe* their rights are if the terms and conditions grant an irrevocable license to everything they post. Inertia makes it hard to build a replacement; alienation thins out the old site. As someone posted to Twitter a few years ago, “On the Internet your home always leaves you.”

‘Twas ever thus. And so it will be until people stop taking the bait in the first place.

Illustrations: Apple’s canceled “crusher” ad.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

The toast bubble

To The Big Bang Theory (“The Russian Rocket Reaction”, S5e05):

Howard: Someone has to go up with the telescope as a payload specialist, and guess who that someone is!
Sheldon: Muhammed Li.
Howard: Who’s Muhammed Li?
Sheldon: Muhammed is the most common first name in the world, Li the most common surname, and as I didn’t know the answer I thought that gave me a mathematical edge.

Experts tell me that exchange doesn’t perfectly explain how generative AI works; it’s too simplistic. Generative AI – or a Sheldon made more nuanced by his writers – takes into account contextual information to calculate the probable next word. So it wouldn’t pick from all the first names and surnames in the world. It might, however, pick from the names of all the payload specialists or some other group it correlated, or confect one.

More than a year on, I still can’t find a use for generative “AI” that is so unreliable and inscrutable. At Exponential View, Azeem Azhar has written about the “answer engine” Perplexity.ai. While it’s helpful that Perplexity provides references for its answers, it was producing misinformation by the third question I asked it, and offered no improvement when challenged. Wikipedia spent many years being accused of unreliability, too, but at least there you can read the talk page and understand how the editors arrived at the text they present.

On The Daily Show this week, Jon Stewart ranted about AI and interviewed FTC chair Lina Khan. Well-chosen video clips showed AI company heads’ true colors, telling the public AI is an assistant for humans while telling money people and each other that AI will enable greater productivity with fewer workers and help eliminate “the people tax”.

More interesting, however, was Khan’s note that the FTC is investigating the investments and partnerships in AI to understand if they’re giving current technology giants undue influence in the marketplace. If, in her example, all the competitors in a market outsource their pricing decisions to the same algorithm they may be guilty of price fixing even if they’re not actively colluding. And these markets are consolidating at an ever-earlier stage. Snapchat and WhatsApp had millions of users by the time Facebook thought it prudent to buy them rather than let them become dangerous competitors. AI is pre-consolidating: the usual suspects have been buying up AI startups and models at pace.

“More profound than fire or electricity,” Google CEO Sundar Pichai tells a camera at one point, speaking about AI. The last time I heard this level of hyperbole it was about the Internet in the 1990s, shortly before the bust. A friend’s answer to this sort of thing has never varied: “I’d rather have indoor plumbing.”

***

Last week the Federal District Court in Manhattan sentenced FTX CEO Sam Bankman-Fried to 25 years in prison for stealing $8 billion. In the end, you didn’t have to understand anything complicated about cryptocurrencies; it was just good old embezzlement.

And then the price of bitcoin went *up*. At the Guardian, Molly White explains that this is because cryptoevangelists are pushing the idea that the sector can reach its full potential, now that Bankman-Fried and other bad apples have been purged. But, as she says, nothing has really changed. No new use case has come along to make cryptocurrencies more useful, more valuable, or more trustworthy.

Both cryptocurrencies and generative AI are bubbles. The difference is that the AI bubble will likely leave behind it some technologies and knowledge that are genuinely useful; it will be like the Internet, which boomed and busted before settling in to change the world. Cryptocurrencies are more like the Dutch tulips. Unfortunately, in the meantime both these bubbles are consuming energy at an insane rate. How many wildfires is bitcoin worth?

**

I’ve seen a report suggesting that the last known professional words of the late Ross Anderson may have been, “Do they take us for fools?”

He was referring to the plans, debated in the House of Commons on March 25, to amend the Investigatory Powers Act to allow the government to pre-approve (or disapprove) new security features technology firms want to intorduce. The government is of course saying it’s all perfectly innocent, intended to keep the country safe. But recent clashes in the decades-old conflict over strong encryption have seen the technology companies roll out features like end-to-end encryption (Meta) and decide not to implement others, like client-side scanning (Apple). The latest in a long line of UK governments that want access to encrypted text was hardly going to take that quietly. So here we are, debating this yet again. Yet the laws of mathematics still haven’t changed: there is no such thing as a security hole that only “good guys” can use.

***

Returning to AI, it appears that costs may lead Google to charge for access to its AI-enhanced search, as Alex Hern reports at the Guardian. Hern thinks this is good news for its AI-focused startup competitors, which already charge for top-tier tools and who are at risk of being undercut by Google. I think it’s good for users by making it easy to avoid the AI “enhancement”. Of course, DuckDuckGo already does this without all the tracking and monopoly mishegoss.

Illustrations: Jon Stewart uninspired by Mark Zuckerberg’s demonstration of AI making toast.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Facts are scarified

The recent doctored Palace photo has done almost as much as the arrival of generative AI to raise fears that in future we will completely lose the ability to identify fakes. The royal photo was sloppily composited – no AI needed – for reasons unknown (though Private Eye has a suggestion). A lot of conspiracy theorizing could be avoided if the palace would release the untouched original(s), but as things are, the photograph is a perfect example of how to provide the fuel for spreading nonsense to 400 million people.

The most interesting thing about the incident was discovering the rules media apply to retouching photos. AP specified, for example, that it does not use altered or digitally manipulated images. It allows cropping and minor adjustments to color and tone where necessary, but bans more substantial changes, even retouching to remove red eye. As Holly Hunter’s character says, trying to uphold standards in the 1987 movie Broadcast News (written by James Brooks), “We are not here to stage the news.”

The desire to make a family photo as appealing as possible is understandable; the motives behind spraying the world with misinformation are less clear and more varied. I’ve long argued here that for this reason combating misinformation and disinformation is similar to cybersecurity because of the complexity of the problem and the diversity of actors and agendas. At last year’s Disinformation Summit in Cambridge cybersecurity was, sadly, one of the missing communities.

Just a couple of weeks ago the BBC announced its adoption of C2PA for authenticating images, developed by a group of technology and media companies including the BBC, the New York Times, Microsoft, and Adobe. The BBC says that many media organizations are beginning to adopt C2PA, and even Meta is considering it. Edits must be signed, and create a chain of provenance all the way back to the original photo. In 2022, the BBC and the Royal Society co-hosted a workshop on digital provenance, following a Royal Society report, at which C2PA featured prominently.

That’s potentially a valuable approach for publishing and broadcast, where the conduit to the public is controlled by one of a relatively small number of organizations. And you can see why those organizations would want it: they need, and in many cases are struggling to retain, public trust. It is, however, too complex a process for the hundreds of millions of people with smartphone cameras posting images to social media, and unworkable for citizen journalists capturing newsworthy events in real time. Ancillary issue: sophisticated phone cameras try so hard to normalize the shots we take that they falsify the image at source. In 2020, Californians attempting to capture the orange color of their smoke-filled sky were defeated by autocorrection that turned it grey. So, many images are *originally* false.

In lengthy blog posting, Neal Krawitz analyzes difficulties with C2PA. He lists security flaws, but also is opposed to the “appeal to authority” approach, which he dubs a “logical fallacy”. In the context of the Internet, it’s worse than that; we already know what happens when a tiny handful of commercial companies (in this case, chiefly Adobe) become the gatekeeper for billions of people.

All of this was why I was glad to hear about work in progress at a workshop last week, led by Mansoor Ahmed-Rengers, a PhD candidate studying system security: Human-Oriented Proof Standard (HOPrS). The basic idea is to build an “Internet-wide, decentralised, creator-centric and scalable standard that allows creators to prove the veracity of their content and allows viewers to verify this with a simple ‘tick’.” Co-sponsoring the workshop was Open Origins, a project to distinguish between synthetic and human-created content.

It’s no accident that HOPrS’ mission statement echoes the ethos of the original Internet; as security researcher Jon Crowcroft explains, it’s part of long-running work on redecentralization. Among HOPrS’ goals, Ahmed-Rengers listed: minimal centralization; the ability for anyone to prove their content; Internet-wide scalability; open decision making; minimal disruption to workflow; and easy interpretability of proof/provenance. The project isn’t trying to cover all bases – that’s impossible. Given the variety of motivations for fakery, there will have to be a large ecosystem of approaches. Rather, HOPrS is focusing specifically on the threat model of an adversary determined to sow disinformation, giving journalists and citizens the tools they need to understand what they’re seeing.

Fakes are as old as humanity. In a brief digression, we were reminded that the early days of photography were full of fakery: the Cottingley Fairies, the Loch Ness monster, many dozens of spirit photographs. The Cottingley Fairies, cardboard cutouts photographed by Elsie Wright, 16, and Florence Griffiths, 9, were accepted as genuine by Sherlock Holmes creator Sir Arthur Conan Doyle, famously a believer in spiritualism. To today’s eyes, trained on millions of photographs, they instantly read as fake. Or take Ireland’s Knock apparitions, flat, unmoving, and, philosophy professor David Berman explained in 1979, magic lantern projections. Our generation, who’ve grown up with movies and TV, would I think have instantly recognized that as fake, too. Which I believe tells us something: yes, we need tools, but we ourselves will get better at detecting fakery, as unlikely as it seems right now. The speed with which the royal photo was dissected showed how much we’ve learned just since generative AI became available.

Illustrations: The first of the Cottingley Fairies photographs (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Anachronistics

“In my mind, computers and the Internet arrived at the same time,” my twenty-something companion said, delivering an entire mindset education in one sentence.

Just a minute or two earlier, she had asked in some surprise, “Did bulletin board systems predate the Internet?” Well, yes: BBSs were a software package running on a single back room computer with a modem users dialed into, whereas the Internet is this giant sprawling mess of millions of computers connected together…simple first, complex later.

Her confusion is understandable: from her perspective, computers and the Internet did arrive at the same time, since her first conscious encounters with them were simultaneous.

But still, speaking as someone who first programmed a (mainframe, with punch cards) computer in 1972 as a student, who got her first personal computer in 1982, and got online in 1991 by modem and 1999 by broadband and to whom the sequence of events is memorable: wow.

A 25-year-old today was born in 1999 (the year I got broadband). Her counterpart 15 years hence (born 2014, the year a smartphone replaced my personal digital assistant) may think smart phones and the Internet were simultaneous. And sometime around 2045 *her* counterpart born in 2020 (two years before ChatGPT was released) might think generative text and image systems were contemporaneous with the first computers.

I think this confusion must have something to do with the speed of change in a relatively narrow sector. I’m sure that even though they all entered my life simultaneously, by the time I was 25 I knew that radio preceded TV (because my parents grew up with radio), bicycles preceded cars, and that handwritten manuscripts predated printed books (because medieval manuscripts). But those transitions played out over multiple lifetimes, if not centuries, and all those memories were personal. Few of us reminisce about the mainframes of the 1960s because most of us didn’t have access to them.

And yet, understanding the timeline of earlier technologies probably mattered less than not understanding the sequence of events in information technology. Jumbling the arrival dates of the pieces of information technology means failing to understand dependencies. What currently passes for “AI” could not exist without being able to train models on giant piles of data that the Internet and the web made possible, and that took 20 years to build. Neural networks pioneer Geoff Hinton came up with the ideas for convolutional neural networks as long ago as the 1980s, but it took until the last decade for them to become workable. That’s because it took that long to build sufficiently powerful computers and to amass enough training data. How do you understand the ongoing battle between those who wish to protect privacy via data protection laws and those who want data to flow freely without hindrance if you do not understand what those masses of data are important for?

This isn’t the only such issue. A surprising number of people who should know better seem to believe that the solution to all our ills with social media is to destroy Section 230, apparently believing that if S230 allowed Big Tech to get big, it must be wrong. Instead, the reality is also that it allows small sites to exist and it is the legal framework that allows content moderation. Improve it by all means, but understand its true purpose first.

Reviewing movies and futurist projections such as Vannevar Bush’s 1946 essay As We May Think (PDF) and Alan Turing’s lecture, Computing Machinery and Intelligence? (PDF) doesn’t really help because so many ideas arrive long before they’re feasible. The crew in the original 1966 Star Trek series (to say nothing of secret agent Maxwell Smart in 1965) were talking over wireless personal communicators. A decade earlier, Arthur C. Clarke (in The Nine Billion Names of God) and Isaac Asimov (in The Last Question) were putting computers – albeit analog ones – in their stories. Asimov in particular imagined a sequence that now looks prescient, beginning with something like a mainframe, moving on to microcomputers, and finishing up with a vast fully interconnected network that can only be held in hyperspace. (OK, it took trillions of years, starting in 2061, but still..) Those writings undoubtedly inspired the technologists of the last 50 years when they decided what to invent.

This all led us to fakes: as the technology to create fake videos, images, and texts continues to improve, she wondered if we will ever be able to keep up. Just about every journalism site is asking some version of that question; they’re all awash in stories about new levels of fakery. My 25-year-old discussant believes the fakes will always be improving faster than our methods of detection – an arms race like computer security, to which I’ve compared problems of misinformation / disinformation before.

I’m more optimistic. I bet even a few years from now today’s versions of generative “AI” will look as primitive to us as the special effects in a 1963 episode of Dr Who or the magic lantern used to create the Knock apparitions do to generations raised on movies, TV, and computer-generated imagery. Humans are adaptable; we will find ways to identify what is authentic that aren’t obvious in the shock of the new. We might even go back to arguing in pubs.

Illustrations: Secret agent Maxwell Smart (Don Adams) talking on his shoe phone (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon