The data grab

It’s been a good week for those who like mocking flawed technology.

Numerous outlets have reported, for example, that “AI is getting dumber at math”. The source is a study conducted by researchers at Stanford and the University of California Berkeley comparing GPT-3.5’s and GPT-4’s output in March and June 2023. The researchers found that, among other things, GPT-4’s success rate at identifying prime numbers dropped from 84% to 51%. In other words, in June 2023 ChatGPT-4 did little better than chance at identifying prime numbers. That’s psychic level.

The researchers blame “drift”, the problem that improving one part of a model may have unhelpful knock-on effects in other parts of the model. At Ars Technica, Benj Edwards is less sure, citing qualified critics who question the study’s methodology. It’s equally possible, he suggests, that as the novelty fades, people’s attempts to do real work surface problems that were there all along. With no access to the algorithm itself and limited knowledge of the training data, we can only conduct such studies by controlling inputs and observing the outputs, much like diagnosing allergies by giving a child a series of foods in turn and waiting to see which ones make them sick. Edwards advocates greater openness on the part of the companies, especially as software developers begin building products on top of their generative engines.

Unrelated, the New Zealand discount supermarket chain Pak’nSave offered an “AI” meal planner that, set loose, promptly began turning out recipes for “poison bread sandwiches”, “Oreo vegetable stir-fry”, and “aromatic water mix” – which turned out to be a recipe for highly dangerous chlorine gas.

The reason is human-computer interaction: humans, told to provide a list of available ingredients, predictably became creative. As for the computer…anyone who’s read Janelle Shane’s 2019 book, You Look LIke a Thing and I Love You, or her Twitter reports on AI-generated recipes could predict this outcome. Computers have no real world experience against which to judge their output!

Meanwhile, the San Francisco Chronicle reports, Waymo and Cruise driverless taxis are making trouble at an accelerating rate. The cars have gotten stuck in low-hanging wires after thunderstorms, driven through caution tape, blocked emergency vehicles and emergency responders, and behaved erratically enough to endanger cyclists, pedestrians, and other vehicles. If they were driven by humans they’d have lost their licenses by now.

In an interesting side note that reminds of the cars’ potential as a surveillance network, Axios reports that in a ten-day study in May Waymo’s driverless cars found that human drivers in San Francisco speed 33% of the time. A similar exercise in Phoenix, Arizona observed human drivers speeding 47% of the time on roads with a 35mph speed limit. These statistics of course bolster the company’s main argument for adoption: improving road safety.

The study should – but probably won’t – be taken as a warning of the potential for the cars’ data collection to become embedded in both law enforcement and their owners’ business models. The frenzy surrounding ChatGPT-* is fueling an industry-wide data grab as everyone tries to beef up their products with “AI” (see also previous such exercises with “meta”, “nano”, and “e”), consequences to be determined.

Among the newly-discovered data grabbers is Intel, whose graphics processing unit (GPU) drivers are collecting telemetry data, including how you use your computer, the kinds of websites you visit, and other data points. You can opt out, assuming you a) realize what’s happening and b) are paying attention at the right moment during installation.

Google announced recently that it would scrape everything people post online to use as training data. Again, an opt-out can be had if you have the knowledge and access to follow the 30-year-old robots.txt protocol. In practical terms, I can configure my own site, pelicancrossing.net, to block Google’s data grabber, but I can’t stop it from scraping comments I leave on other people’s blogs or anything I post on social media sites or that’s professionally published (though those sites may block Google themselves). This data repurposing feels like it ought to be illegal under data protection and copyright law.

In Australia, Gizmodo reports that the company has asked the Australian government to relax copyright laws to facilitate AI training.

Soon after Google’s announcement the law firm Clarkson filed a class action lawsuit against Google to join its action against OpenAI. The suit accuses Google of “stealing” copyrighted works and personal data,

“Google does not own the Internet,” Clarkson wrote in its press release. Will you tell it, or shall I?

Whatever has been going on until now with data slurping in the interests of bombarding us with microtargeted ads is small stuff compared to the accelerating acquisition for the purpose of feeding AI models. Arguably, AI could be a public good in the long term as it improves, and therefore allowing these companies to access all available data for training is in the public interest. But if that’s true, then the *public* should own the models, not the companies. Why should we consent to the use of our data so they can sell it back to us and keep the proceeds for their shareholders?

It’s all yet another example of why we should pay attention to the harms that are clear and present, not the theoretical harm that someday AI will be general enough to pose an existential threat.

Illustrations: IBM Watson, Jeopardy champion.

Wendy M. Grossman is the 2013 winner of the Enigma Award and contributing editor for the Plutopia News Network podcast. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Performing intelligence

“Oh, great,” I thought when news broke of the release of GPT-4. “Higher-quality deception.”

Most of the Internet disagreed; having gone mad only a few weeks ago over ChatGPT, everyone’s now agog over this latest model. It passed all these tests!

One exception was the journalist Paris Marx, who commented on Twitter: “It’s so funny to me that the AI people think it’s impressive when their programs pass a test after being trained on all the answers.”

Agreed. It’s also so funny to me that they call that “AI” and don’t like it when researchers like computational linguist Emily Bender call it a “stochastic parrot”. At Marx’s Tech Won’t Save Us podcast, Goldsmith professor Dan McQuillan, author of Resisting AI: An Anti-fascist Approach to Artificial Intelligence, calls it a “bullshit engine” whose developers’ sole goal is plausibility – plausibility that, as Bender has said, allows us imaginative humans to think we detect a mind behind it, and the result is to risk devaluing humans.

Let’s walk back to an earlier type of system that has been widely deployed: benefits scoring systems. A couple of weeks ago, Lighthouse Reports and Wired magazine teamed up on an investigation of these systems, calling them “suspicion machines”.

Their work focuses on the welfare benefits system in use in Rotterdam between 2017 and 2021, which uses 315 variables to risk-score benefits recipients according to the likelihood that their claims are fraudulent. In detailed, worked case analyses, they find systemic discrimination: you lose points for being female, for being female and having children (males aren’t asked about children), for being non-white, and for ethnicity (knowing Dutch a requirement for welfare recipients). Other variables include missing meetings, age, and “lacks organizing skills”, which was just one of 54 variables based on case workers’ subjective assessments. Any comment a caseworker adds translates to a 1 added to the risk score, even if it’s positive. The top-scoring 10% are flagged for further investigation.

This is the system that Accenture, the city’s technology partner on the early versions, said at its unveiling in 2018 was an “ethical solution” and promised “unbiased citizen outcomes”. Instead, Wired says, the algorithm “fails the city’s own test of fairness”.

The project’s point wasn’t to pick on Rotterdam; of the dozens of cities they contacted it just happened to be the only one that was willing to share the code behind the algorithm, along with the list of variables, prior evaluations, and the data scientists’ handbook. It even – after being threatened with court action under freedom of information laws, shared the mathematical model itself.

The overall conclusion: the system was so inaccurate it was little better than random sampling “according to some metrics”.

What strikes me, aside from the details of this design, is the initial choice of scoring benefits recipients for risk of fraud. Why not score them for risk of missing out on help they’re entitled to? The UK government’s figures on benefits fraud indicate that in 2021-2022 overpayment (including error as well as fraud) amounted to 4%; and *underpayment* 1.2% of total expenditure. Underpayment is a lot less, but it’s still substantial (£2.6 billion). Yes, I know, the point of the scoring system is to save money, but the point of the *benefits* system is to help people who need it. The suspicion was always there, but the technology has altered the balance.

This was the point the writer Ellen Ullman noted in her 1996 book Close to the Machine”: the hard-edged nature of these systems and their ability to surveil people in new ways, “infect” their owners with suspicion even of people they’ve long trusted and even when the system itself was intended to be helpful. On a societal scale, these “suspicion machines” embed increased division in our infrastructure; in his book, McQuillan warns us to watch for “functionality that contributes to violent separations of ‘us and them’.”

Along those lines, it’s disturbing that Open AI, the owner of ChatGPT and GPT-4 (and several other generative AI gewgaws) has now decided to keep secret the details of its large language models. That is, we have no sight into what data was used in training, what software and hardware methods were used, or how energy-intensive it is. If there’s a machine loose in the world’s computer systems pretending to be human, shouldn’t we understand how it works? It would help with damping down imagining we see a mind in there.

The company’s argument appears to be that because these models could become harmful it’s bad to publish how they work because then bad actors will use them to create harm. In the cybersecurity field we call this “security by obscurity” and there is a general consensus that it does not work as a protection.

In a lengthy article at New York magazine, Elizabeth Weil. quotes Daniel Dennett’s assessment of these machines: “counterfeit people” that should be seen as the same sort of danger to our system as counterfeit money. Bender suggests that rather than trying to make fake people we should be focusing more on making tools to help people.

The thing that makes me tie it to the large language models that are producing GPT is that in both cases it’s all about mining our shared cultural history, with all its flaws and misjudgments, in response to a prompt and pretending the results have meaning and create new knowledge. And *that’s* what’s being embedded into the world’s infrastructure. Have we learned nothing from Clever Hans?

Illustrations: Clever Hans, performing in Leipzig in 1912 (by Karl Krali, via Wikimedia.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.