Digital distrust

On Tuesday, at the UK Internet Governance Forum, a questioner asked this: “Why should I trust any technology the government deploys?”

She had come armed with a personal but generalizable anecdote. Since renewing her passport in 2017, at every UK airport the electronic gates routinely send her for rechecking to the human-staffed desk, even though the same passport works perfectly well in electronic gates at airports in other countries. A New Scientist article by Adam Vaughan that I can’t locate eventually explained: the Home Office had deployed the system knowing it wouldn’t work for “people with my skin type”. That is, as you’ve probably already guessed, dark.

She directed her question to Katherine Yesilirmak, director of strategy in the Responsible Tech Adoption Unit, formerly the Centre for Data Ethics and Innovation, a subsidiary of the Department for Skills, Innovation, and Technology.

Yesirlimak did her best, mentioning the problem of bias in training data, the variability of end users, fairness, governmental responsibility for understanding the technology it procures (since it builds very little itself these days) and so on. She is clearly up to date, even referring to the latest study finding that AIs used by human resources consistently prefer résumés with white and male-presenting names over non-white and female-presenting names. But Yesirlimak didn’t really answer the questioner’s fundamental conundrum. Why *should* she trust government systems when they are knowingly commissioned with flaws that exclude her? Well, why?

Pause to remember that 20 years ago, Jim Wayman, a pioneer in biometric identification told me, “People never have what you think they’re going to have where you think they’re going to have it.” Biometrics systems must be built to accommodate outliers – and it’s hard. For more, see Wayman’s potted history of third-party testing of modern biometric systems in the US (PDF).

Yesirlimak, whose LinkedIn profile indicates she’s been at the unit for a little under three years, noted that the government builds very little of its own technology these days. However, her group is partnering with analogues in other countries and international bodies to build tools and standards that she believes will help.

This panel was nominally about AI governance, but the connection that needed to be made was from what the questioner was describing – technology that makes some people second-class citizens – to digital exclusion, siloed in a different panel. Most people describe the “digital divide” as a binary statistical matter: 1.7 million households are not online, and 40% of households don’t meet the digital living standard, per the Liberal Democrat peer Timothy Clement-Jones, who ruefully noted the “serious gap in knowledge in Parliament” regarding digital inclusion.

Clement-Jones, who is the co-chair of the All Party Parliamentary Group on Artificial Intelligence, cited the House of Lords Communications and Digital Committee’s January 2024 report. Another statistic came from Helen Milner: 23% of people with long-term illness or disabilities are digitally excluded.

The report cites the annual consumer digital index Lloyds Bank releases each year; the last one found that Internet use is dropping among the over-60s, and for the first time the percentage of people offline in the previous three months had increased, to 4%. Fifteen percent of those offline are under 50, and overall about 4.7 million people can’t connect to wifi. Ofcom’s 2023 report found that 7% of households (disproportionately poor and/or elderly) have no Internet access, 20% of them because of cost.

“We should make sure the government always provides an analog alternative, especially as we move to digital IDs” Clement-Jones said. In 2010, when Martha Lane Fox was campaigning to get the last 10% online, one could push back: why should they have to be? Today, parying parking meters requires an app and, as Royal Holloway professor Lizzie Coles-Kemp noted, smartphones aren’t enough for some services.

Milner finds that a third of those offline already find it difficult to engage with the NHS, creating “two-tier public services”. Clement-Jones added another example: people in temporary housing have to reapply weekly online – but there is no Internet provision in temporary housing.

Worse, however, is thinking technology will magically fix intractable problems. In Coles-Kemp’s example, if someone can’t do their prescribed rehabilitation exercises at home because they lack space, support, or confidence, no app will fix it. In her work on inclusive security technologies, she has long pushed for systems to be less hostile to users in the name of preventing fraud: “We need to do more work on the difference between scammers and people who are desperate to get something done.”

In addition, Milner said, tackling digital exclusion has to be widely embraced – by the Department of Work and Pensions, for example – not just handed off to DSIT. Much comes down to designers who are unlike the people on whom their systems will be imposed and whose direct customers are administrators. “The emphasis needs to shift to the creators of these technologies – policy makers, programmers. How do algorithms make decisions? What is the impact on others of liking a piece of content?”

Concern about the “digital divide” has been with us since the beginning of the Internet. It seems to have been gradually forgotten as online has become mainstream. It shouldn’t be: digital exclusion makes all the other kinds of exclusion worse and adds anger and frustration to an already morbidly divided society.

Illustrations: Martha Lane Fox in 2011 (via Wikimedia.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

This perfect day

To anyone remembering the excitement over DNA testing just a few years ago, this week’s news about 23andMe comes as a surprise. At CNN, Allison Morrow reports that all seven board members have resigned to protest CEO Anne Wojcicki’s plan to take the company private by buying up all the shares she doesn’t already own at 40 cents each (closing price yesterday was 0.3301. The board wanted her to find a buyer offering a better price.

In January, Rolfe Winkler reported at the Wall Street Journal ($) that 23andMe is likely to run out of cash by next year. Its market cap has dropped from $6 billion to under $200 million. He and Morrow catalogue the company’s problems: it’s never made a profit nor had a sustainable business model.

The reasons are fairly simple: few repeat customers. With DNA testing, as Winkler writes, “Customers only need to take the test once, and few test-takers get life-altering health results.” 23andMe’s mooted revolution in health care instead was a fad. Now, the company is pivoting to sell subscriptions to weight loss drugs.

This strikes me as an extraordinarily dangerous moment: the struggling company’s sole unique asset is a pile of more than 10 million DNA samples whose owners have agreed they can be used for research. Many were alarmed when, in December 2023, hackers broke into 1.7 million accounts and gained access to 6.9 million customer profiles<, though. The company said the hacked data did not include DNA records but did include family trees and other links. We don't think of 23andMe as a social network. But the same affordances that enabled Cambridge Analytica to leverage a relatively small number of user profiles to create a mass of data derived from a much larger number of their Friends worked on 23andMe. Given the way genetics works, this risk should have been obvious.

In 2004, the year of Facebook’s birth, the Australian privacy campaigner Roger Clarke warned in Very Black “Little Black Books” that social networks had no business model other than to abuse their users’ data. 23andMe’s terms and conditions promise to protect user privacy. But in a sale what happens to the data?

The same might be asked about the data that would accrue from Oracle CEO Larry Ellison‘s surveillance-embracing proposals this week. Inevitably, commentators invoked George Orwell’s 1984. At Business Insider, Kenneth Niemeyer was first to report: “[Ellison] said AI will usher in a new era of surveillance that he gleefully said will ensure ‘citizens will be on their best behavior.'”

The all-AI-surveillance all-the-time idea could only be embraced “gleefully” by someone who doesn’t believe it will affect him.

Niemeyer:

“Ellison said AI would be used in the future to constantly watch and analyze vast surveillance systems, like security cameras, police body cameras, doorbell cameras, and vehicle dashboard cameras.

“We’re going to have supervision,” Ellison said. “Every police officer is going to be supervised at all times, and if there’s a problem, AI will report that problem and report it to the appropriate person. Citizens will be on their best behavior because we are constantly recording and reporting everything that’s going on.”

Ellison is twenty-six years behind science fiction author David Brin, who proposed radical transparency in his 1998 non-fiction outing, The Transparent Society. But Brin saw reciprocity as an essential feature, believing it would protect privacy by making surveillance visible. Ellison is claiming that *inscrutable* surveillance will guarantee good behavior.

At 404 Media, Jason Koebler debunks Ellison point by point. Research and other evidence shows securing schools is unlikely to make them safer; body cameras don’t appear to improve police behavior; and all the technologies Ellison talks about have problems with accuracy and false positives. Indeed, the mayor of Chicago wants to end the city’s contract with ShotSpotter (now SoundThinking), saying it’s expensive and doesn’t cut crime; some research says it slows police 911 response. Worth noting Simon Spichak at Brain Facts, who finds with AI tools humans make worse decisions. So…not a good idea for police.

More disturbing is Koebler’s main point: most of the technology Ellison calls “future” is already here and failing to lower crime rates or solve its causes – while being very expensive. Ellison is already out of date.

The book Ellison’s fantasy evokes for me is the less-known This Perfect Day, by Ira Levin, written in 1970. The novel’s world is run by a massive computer (“Unicomp”) that decides all aspects of individuals’ lives: their job, spouse, how many children they can have. Enforcing all this are human counselors and permanent electronic bracelets individuals touch to ubiquitous scanners for permission.

Homogeneity rules: everyone is mixed race, there are only four boys’ and girls’ names, they eat “totalcakes”, drink cokes, wear identical clothing. For the rest, regularly administered drugs keep everyone healthy and docile. “Fight” is an abominable curse word. The controlled world over which Unicomp presides is therefore almost entirely benign: there is no war, crime, and little disease. It rains only at night.

Naturally, the novel’s hero rebels, joins a group of outcasts (“the Incurables”), and finds his way to the secret underground luxury bunker where a few “Programmers” help Unicomp’s inventor, Wei Li Chun, run the world to his specification. So to me, Ellison’s plan is all about installing himself as world ruler. Which, I mean, who could object except other billionaires?

Illustrations: The CCTV camera on George Orwell’s Portobello Road house.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Safe

That didn’t take long. Since last week’s fret about AI startups ignoring the robots.txt convention, Thomas Claburn has reported at The Register that Cloudflare has developed a scraping prevention tool that identifies and blocks “content extraction” bots attempting to crawl sites at scale.

It’s a stopgap, not a solution. As Cloudflare’s announcement makes clear, the company knows there will be pushback; given these companies’ lack of interest in following existing norms, blocking tools versus scraping bots is basically the latest arms race (previously on this plotline: spam). Also, obviously, the tool only works on sites that are Cloudflare customers. Although these include many of the web’s largest sites, there are hundreds of millions more that won’t, don’t, or can’t pay for its services. If we want to return control to site owners, we’re going to need a more permanent and accesible solution.

In his 1999 book Code and Other Laws of Cyberspace, Lawrence Lessig finds four forms of regulation: norms, law, markets, and architecture. Norms are failing. Markets will just mean prolonged arms races. We’re going to need law and architecture.

***

We appear to be reaching peak “AI” hype, defined by (as in the peak of app hype) the increasing absurdity of things venture capitalists seem willing to fund. I recall reading the comment that at the peak of app silliness a lot of startups were really just putting a technological gloss on services that young men will previously have had supplied by their mothers. The AI bubble seems to be even less productive of long-term value, calling things “AI” that are not at all novel, and proposing “AI” to patch problems that call for real change.

As an example of the first of those, my new washing machine has a setting called “AI patterns”. The manual explains: it reorders the preset programs on the machine’s dial so the ones you use most appear first. It’s not stupid (although I’ve turned it off anyway, along with the wifi and “smart” features I would rather not pay for), but let’s call it what it is: customizing a menu.

As an example of the second…at Gizmodo, Maxwell Zeff reports that Softbank is claiming to have developed an “emotion canceling” AI that “alters angry voices into calm ones”. The use Softbank envisages is to lessen the stress for call center employees by softening the voices of angry customers without changing their actual words. There are, as people pointed out on Mastodon after the article was posted there, a lot smarter alternatives to reducing those individuals’ stress. Like giving them better employment conditions, or – and here’s a really radical thought – designing your services and products so your customers aren’t so frustrated and angry. What this software does is just falsify the sound. My guess is that if there is a result it will be to make customers even more angry and frustrated. More anger in the world. Great.

***

Oh! Sarcasm, even if only slight! At the Guardian, Ned Carter Miles reports on “emotional AI” (can we say “oxymoron”?). Among his examples is a team at the University of Groningen that is teaching an AI to recognize sarcasm using scenes from US sitcoms such as Friends and The Big Bang Theory. Even absurd-sounding research can be a good thing. I’m still not sure how good a guide sitcoms are for identifying emotions in real-world context even apart from the usual issues of algorithmic bias. After all, actors are given carefully crafted words and work harder to communicate their emotional content than ordinary people normally do.

***

Finally, again in the category of peak-AI-hype is this: at the New York Times Cade Metz is reporting that Ilya Sutskever, a co-founder and former chief scientist at OpenAI, has a new startup whose goal is to create a “safe superintelligence”.

Even if you, unlike me, believe that a “superintelligence” is an imminent possibility, what does “safe” mean, especially in an industry that still treats security and accessibility as add-ons? “Safe” is, like “secure”, meaningless without context and a threat model. Safe from what? Safe for what? To do what? Operated by whom? Owned by whom? With what motives? For how long? We create new intelligent humans all the time. Do we have any ability to ensure they’re “safe” technology? If an AGI is going to be smarter than a human, how can anyone possibly promise it will be, in the industry parlance, “aligned” with our goals? And for what value of “our”? Beware the people who want to build the Torment Nexus!

It’s nonsense. Safety can’t be programmed into a superintelligence any more than Isaac Asimov’s Laws of Robotics.

Sutskever’s own comments are equivocal. In a video clip at the Guardian, Sutsekver confusingly says both that “AI will solve all our problems” and that it will make fake news, cyber attacks, and weapons much worse and “has the potential to create infinitely stable dictatorships”. Then he adds, “I feel that technology is a force of nature.” Which is exactly the opposite of what technology is…but it suits the industry to push the inevitability narrative that technological progress cannot be stopped.

Cue Douglas Adams: “This is obviously some strange use of the word ‘safe’ I wasn’t previously aware of.”

Illustrations: The Big Bang Theory‘s Leonard (Johnny Galecki) teaching Sheldon (Jim Parsons) about sarcasm (Season 1, episode 2, “The Big Bran Hypothesis”).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Changing the faith

The governance of Britain and the governance of the Internet have this in common: the ultimate authority in both cases is to a large extent a “gentleman’s agreement”. For the same reason: both were devised by a relatively small, homogeneous group of people who trusted each other. In the case of Britain, inertia means that even without a written constitution the country goes on electing governments and passing laws as if.

Most people have no reason to know that the Internet’s technical underpinnings are defined by a series of documents known as RFCs, for Requests(s) for Comments. RFC1 was defined in April 1969; the most recent, RFC9598, is dated just last month. While the Internet Engineering Task Force oversees RFCs’ development and administration, it has no power to force anyone to adopt them. Throughout, RFC standards have been created collaboratively by volunteers and adopted on merit.

A fair number of RFCs promote good “Internet citizenship”. There are, for example, email addresses (chiefly, webmaster and postmaster) that anyone running a website is supposed to maintain in order to make it easy for a third party to report problems. Today, probably millions of website owners don’t even know this expectation exists. For Internet curmudgeons over a certain age, however, seeing email to those addresses bounce is frustrating.

Still, many of these good-citizen practices persist. One such is the Robots Exclusion Protocol, updated in 2022 as RFC 9309, which defines a file, “robots.txt”, that website owners can put in place to tell automated web crawlers which parts of the site they may access and copy. This may have mattered less in recent years than it did in 1994, when it was devised. As David Pierce recounts at The Verge, at that time an explosion of new bots were beginning to crawl the web to build directories and indexes (no Google until 1998!). Many of those early websites were hosted on very small systems based in people’s homes or small businesses, and could be overwhelmed by unrestrained crawlers. Robots txt, devised by a small group of administrators and developers, managed this problem.

Even without a legal requirement to adopt it, early Internet companies largely saw being good Internet citizens as benefiting them. They, too, were small at the time, and needed good will to bring them the users and customers that have since made them into giants. It served everyone’s interests to comply.

Until more or less now. This week, Katie Paul is reporting at Reuters that “AI” companies are blowing up this arrangement by ignoring robots.txt and scraping whatever they want. This news follows reporting by Randall Lane at Forbes that Perplexity.ai is using its software to generate stories and podcasts using news sites’ work without credit. At Wired, Druv Mehrotra and Tim Marchman report a similar story: Perplexity is ignoring robots.txt and scraping areas of sites that owners want left alone. At 404 Media, Emmanuel Maiberg reports that Perplexity also has a dubious history of using fake accounts to scrape Twitter data.

Let’s not just pick on Perplexity; this is the latest in a growing trend. Previously, hiQ Labs tried scraping data from LinkedIn in order to build services to sell employers, the courts finally ruled in 2019 that hiQ violated LinkedIn’s terms and conditions. More controversially, in the last few years Clearview AI has been responding to widespread criticism by claiming that any photograph published on the Internet is “public” and therefore within its rights to grab for its database and use to identify individuals online and offline. The result has been myriad legal actions under data protection law in the EU and UK, and, in the US, a sheaf of lawsuits. Last week, Kashmir Hill reported at the New York Times, that because Clearview lacks the funds to settle a class action lawsuit it has offered a 23% stake to Americans whose faces are in its database.

As Pierce (The Verge) writes, robots.txt used to represent a fair-enough trade: website owners got search engine visibility in return for their data, and the owners of the crawlers got the data but in return sent traffic.

But AI startups ingesting data to build models don’t offer any benefit in return. Where search engines have traditionally helped people find things on other sites, the owners of AI chatbots want to keep the traffic for themselves. Perplexity bills itself as an “answer engine”. A second key difference is this: none of these businesses are small. As Vladen Joler pointed out last month at CPDP, “AI comes pre-monopolized.” Getting into this area requires billions in funding; by contrast many early Internet businesses started with just a few hundred dollars.

This all feels like a watershed moment for the Internet. For most of its history, as Charles Arthur writes at The Overspill, every advance has exposed another area where the Internet operates on the basis of good faith. Typically, the result is some form of closure – spam, for example, led the operators of mail servers to close to all but authenticated users. It’s not clear to a non-technical person what stronger measure other than copyright law could replace the genteel agreement of robots.txt, but the alternative will likely be closing open access to large parts of the web – a loss to all of us.

Illustrations: Vladen Joler at CPDP 2024, showing his map of the extractive industries required to underpin “AI”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.