The data grab

It’s been a good week for those who like mocking flawed technology.

Numerous outlets have reported, for example, that “AI is getting dumber at math”. The source is a study conducted by researchers at Stanford and the University of California Berkeley comparing GPT-3.5’s and GPT-4’s output in March and June 2023. The researchers found that, among other things, GPT-4’s success rate at identifying prime numbers dropped from 84% to 51%. In other words, in June 2023 ChatGPT-4 did little better than chance at identifying prime numbers. That’s psychic level.

The researchers blame “drift”, the problem that improving one part of a model may have unhelpful knock-on effects in other parts of the model. At Ars Technica, Benj Edwards is less sure, citing qualified critics who question the study’s methodology. It’s equally possible, he suggests, that as the novelty fades, people’s attempts to do real work surface problems that were there all along. With no access to the algorithm itself and limited knowledge of the training data, we can only conduct such studies by controlling inputs and observing the outputs, much like diagnosing allergies by giving a child a series of foods in turn and waiting to see which ones make them sick. Edwards advocates greater openness on the part of the companies, especially as software developers begin building products on top of their generative engines.

Unrelated, the New Zealand discount supermarket chain Pak’nSave offered an “AI” meal planner that, set loose, promptly began turning out recipes for “poison bread sandwiches”, “Oreo vegetable stir-fry”, and “aromatic water mix” – which turned out to be a recipe for highly dangerous chlorine gas.

The reason is human-computer interaction: humans, told to provide a list of available ingredients, predictably became creative. As for the computer…anyone who’s read Janelle Shane’s 2019 book, You Look LIke a Thing and I Love You, or her Twitter reports on AI-generated recipes could predict this outcome. Computers have no real world experience against which to judge their output!

Meanwhile, the San Francisco Chronicle reports, Waymo and Cruise driverless taxis are making trouble at an accelerating rate. The cars have gotten stuck in low-hanging wires after thunderstorms, driven through caution tape, blocked emergency vehicles and emergency responders, and behaved erratically enough to endanger cyclists, pedestrians, and other vehicles. If they were driven by humans they’d have lost their licenses by now.

In an interesting side note that reminds of the cars’ potential as a surveillance network, Axios reports that in a ten-day study in May Waymo’s driverless cars found that human drivers in San Francisco speed 33% of the time. A similar exercise in Phoenix, Arizona observed human drivers speeding 47% of the time on roads with a 35mph speed limit. These statistics of course bolster the company’s main argument for adoption: improving road safety.

The study should – but probably won’t – be taken as a warning of the potential for the cars’ data collection to become embedded in both law enforcement and their owners’ business models. The frenzy surrounding ChatGPT-* is fueling an industry-wide data grab as everyone tries to beef up their products with “AI” (see also previous such exercises with “meta”, “nano”, and “e”), consequences to be determined.

Among the newly-discovered data grabbers is Intel, whose graphics processing unit (GPU) drivers are collecting telemetry data, including how you use your computer, the kinds of websites you visit, and other data points. You can opt out, assuming you a) realize what’s happening and b) are paying attention at the right moment during installation.

Google announced recently that it would scrape everything people post online to use as training data. Again, an opt-out can be had if you have the knowledge and access to follow the 30-year-old robots.txt protocol. In practical terms, I can configure my own site, pelicancrossing.net, to block Google’s data grabber, but I can’t stop it from scraping comments I leave on other people’s blogs or anything I post on social media sites or that’s professionally published (though those sites may block Google themselves). This data repurposing feels like it ought to be illegal under data protection and copyright law.

In Australia, Gizmodo reports that the company has asked the Australian government to relax copyright laws to facilitate AI training.

Soon after Google’s announcement the law firm Clarkson filed a class action lawsuit against Google to join its action against OpenAI. The suit accuses Google of “stealing” copyrighted works and personal data,

“Google does not own the Internet,” Clarkson wrote in its press release. Will you tell it, or shall I?

Whatever has been going on until now with data slurping in the interests of bombarding us with microtargeted ads is small stuff compared to the accelerating acquisition for the purpose of feeding AI models. Arguably, AI could be a public good in the long term as it improves, and therefore allowing these companies to access all available data for training is in the public interest. But if that’s true, then the *public* should own the models, not the companies. Why should we consent to the use of our data so they can sell it back to us and keep the proceeds for their shareholders?

It’s all yet another example of why we should pay attention to the harms that are clear and present, not the theoretical harm that someday AI will be general enough to pose an existential threat.

Illustrations: IBM Watson, Jeopardy champion.

Wendy M. Grossman is the 2013 winner of the Enigma Award and contributing editor for the Plutopia News Network podcast. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Book review: Beyond Measure

Beyond Measure: The Hidden History of Measurement
Author: James Vincent
Publisher: Faber and Faber
ISBN: 978-0-571-35421-4

In 2022, then-government minister Jacob Rees-Mogg proposed that Britain should return to imperial measurements – pounds, ounces, yay Brexit! This was a ship that had long since sailed; 40-something friends learned only the metric system at school. Even those old enough to remember imperial measures had little nostalgia for them.

As James Vincent explains in Beyond Measure: The Hidden History of Measurement, and as most of us assume instinctively, measuring physical objects began with comparisons to pieces of the human body: feet, hands, cubits (elbow to fingertip), fathoms (the span of outstretched arms). Other forms of measurement were functional, such as the Irish collop, the amount of land needed to graze one cow. Such imprecise measurements had their benefits, such as convenient availability and immediately understandable context-based value.

Quickly, though, the desire to trade led to the need for consistency, which in turn fed the emergence of centralized state power. The growth of science increased the pressure for more and more consistent and precise measurements – Vincent spends a chapter on the surprisingly difficult quest to pin down a number we now learn as children: the temperature at which water boils. Perversely, though, each new generation of more precise measurement reveals new errors that require even more precise measurement to correct.

The history of measurement is also the history of power. Surveying the land enabled governments to decide its ownership; the world-changing discovery of statistics and the understanding they brought of social trends, and the resulting empowerment of governments, which could afford to amass the biggest avalanches of numbers.

Perhaps the quirkiest and most unexpected material is Vincent’s chapter on Standard Reference Materials. At the US National Institute for Standards and Measurement, Vincent finds carefully studied jars of peanut butter and powdered radioactive human lung. These, it turns out, provide standards against which manufacturers can check their products.

Often, Vincent observes, changes in measurement systems accompany moments of social disruption. The metric system, for example, was born in France at the time of the revolution. Defining units of measurement in terms of official weights and measures made standards egalitarian rather than dependent on one man’s body parts. By 2018, when Vincent visits the official kilo weight and meter stick in Paris, however, even that seemed too elite. Today, both kilogram and meter are defined in terms of constants of nature – the meter, for example, is defined as the distance light travels in 1/299,792,458th of a second (itself now defined in terms of the decay of caesium-133). These are units that anyone with appropriate equipment can derive at any time without needing to check it against a single stick in a vault. Still elite, but a much larger elite.

But still French, which may form part of Rees-Mogg’s objection to it. And, possibly, as Vincent finds some US Republicans have complained, *communist* because of its global adoption. Nonetheless, and despite anti-metric sentiments expressed even by futurists like Stewart Brand, the US is still more metric than most people think. The road system’s miles and retail stores’ pounds and ounces are mostly a veneer; underneath, industry and science have voted for global compatibility – and the federal government has, since 1893, defined feet and inches by metric units.