The data grab

It’s been a good week for those who like mocking flawed technology.

Numerous outlets have reported, for example, that “AI is getting dumber at math”. The source is a study conducted by researchers at Stanford and the University of California Berkeley comparing GPT-3.5’s and GPT-4’s output in March and June 2023. The researchers found that, among other things, GPT-4’s success rate at identifying prime numbers dropped from 84% to 51%. In other words, in June 2023 ChatGPT-4 did little better than chance at identifying prime numbers. That’s psychic level.

The researchers blame “drift”, the problem that improving one part of a model may have unhelpful knock-on effects in other parts of the model. At Ars Technica, Benj Edwards is less sure, citing qualified critics who question the study’s methodology. It’s equally possible, he suggests, that as the novelty fades, people’s attempts to do real work surface problems that were there all along. With no access to the algorithm itself and limited knowledge of the training data, we can only conduct such studies by controlling inputs and observing the outputs, much like diagnosing allergies by giving a child a series of foods in turn and waiting to see which ones make them sick. Edwards advocates greater openness on the part of the companies, especially as software developers begin building products on top of their generative engines.

Unrelated, the New Zealand discount supermarket chain Pak’nSave offered an “AI” meal planner that, set loose, promptly began turning out recipes for “poison bread sandwiches”, “Oreo vegetable stir-fry”, and “aromatic water mix” – which turned out to be a recipe for highly dangerous chlorine gas.

The reason is human-computer interaction: humans, told to provide a list of available ingredients, predictably became creative. As for the computer…anyone who’s read Janelle Shane’s 2019 book, You Look LIke a Thing and I Love You, or her Twitter reports on AI-generated recipes could predict this outcome. Computers have no real world experience against which to judge their output!

Meanwhile, the San Francisco Chronicle reports, Waymo and Cruise driverless taxis are making trouble at an accelerating rate. The cars have gotten stuck in low-hanging wires after thunderstorms, driven through caution tape, blocked emergency vehicles and emergency responders, and behaved erratically enough to endanger cyclists, pedestrians, and other vehicles. If they were driven by humans they’d have lost their licenses by now.

In an interesting side note that reminds of the cars’ potential as a surveillance network, Axios reports that in a ten-day study in May Waymo’s driverless cars found that human drivers in San Francisco speed 33% of the time. A similar exercise in Phoenix, Arizona observed human drivers speeding 47% of the time on roads with a 35mph speed limit. These statistics of course bolster the company’s main argument for adoption: improving road safety.

The study should – but probably won’t – be taken as a warning of the potential for the cars’ data collection to become embedded in both law enforcement and their owners’ business models. The frenzy surrounding ChatGPT-* is fueling an industry-wide data grab as everyone tries to beef up their products with “AI” (see also previous such exercises with “meta”, “nano”, and “e”), consequences to be determined.

Among the newly-discovered data grabbers is Intel, whose graphics processing unit (GPU) drivers are collecting telemetry data, including how you use your computer, the kinds of websites you visit, and other data points. You can opt out, assuming you a) realize what’s happening and b) are paying attention at the right moment during installation.

Google announced recently that it would scrape everything people post online to use as training data. Again, an opt-out can be had if you have the knowledge and access to follow the 30-year-old robots.txt protocol. In practical terms, I can configure my own site, pelicancrossing.net, to block Google’s data grabber, but I can’t stop it from scraping comments I leave on other people’s blogs or anything I post on social media sites or that’s professionally published (though those sites may block Google themselves). This data repurposing feels like it ought to be illegal under data protection and copyright law.

In Australia, Gizmodo reports that the company has asked the Australian government to relax copyright laws to facilitate AI training.

Soon after Google’s announcement the law firm Clarkson filed a class action lawsuit against Google to join its action against OpenAI. The suit accuses Google of “stealing” copyrighted works and personal data,

“Google does not own the Internet,” Clarkson wrote in its press release. Will you tell it, or shall I?

Whatever has been going on until now with data slurping in the interests of bombarding us with microtargeted ads is small stuff compared to the accelerating acquisition for the purpose of feeding AI models. Arguably, AI could be a public good in the long term as it improves, and therefore allowing these companies to access all available data for training is in the public interest. But if that’s true, then the *public* should own the models, not the companies. Why should we consent to the use of our data so they can sell it back to us and keep the proceeds for their shareholders?

It’s all yet another example of why we should pay attention to the harms that are clear and present, not the theoretical harm that someday AI will be general enough to pose an existential threat.

Illustrations: IBM Watson, Jeopardy champion.

Wendy M. Grossman is the 2013 winner of the Enigma Award and contributing editor for the Plutopia News Network podcast. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Watson goes to Wimbledon

The launch of the Fediverse-compatible Meta app Threads seems to have slightly overshadowed the European Court of Justice’s ruling, earlier in the week. This ruling deserves more attention: it undermines the basis of Meta’s targeted advertising. In noyb’s initial reaction, data protection legal bulldog Max Schrems suggests the judgment will make life difficult for not just Meta but other advertising companies.

As Alex Scroxton explains at Computer Weekly, the ruling rejects several different claims by Meta that all attempt to bypass the requirement enshrined in the General Data Protection Regulation that where there is no legal basis for data processing users must actively consent. Meta can’t get by with claiming that targeted advertising is a part of its service users expect, or that it’s technically necessary to provide its service.

More interesting is the fact that the original complaint was not filed by a data protection authority but by Germany’s antitrust body, which sees Meta’s our-way-or-get-lost approach to data gathering as abuse of its dominant position – and the CJEU has upheld this idea.

All this is presumably part of why Meta decided to roll out Threads in many countries but *not* the EU, In February, as a consequence of Brexit, Meta moved UK users to its US agreements. The UK’s data protection law is a clone of GDPR and will remain so until and unless the British Parliament changes it via the pending Data Protection and Digital Information bill. Still, it seems the move makes Meta ready to exploit such changes if they do occur.

Warning to people with longstanding Instagram accounts who want to try Threads: if your plan is to try and (maybe) delete, set up a new Instagram account for the purpose. Otherwise, you’ll be sad to discover that deleting your new Threads account means vaping your old Instagram account along with it. It’s the Hotel California method of Getting Big Fast.

***

Last week the Irish Council for Civil Liberties warned that a last-minute amendment to the Courts and Civil Law (Miscellaneous) bill will allow Ireland’s Data Protection Commissioner to mark any of its proceedings “confidential” and thereby bar third parties from publishing information about them. Effectively, it blocks criticism. This is a muzzle not only for the ICCL and other activists and journalists but for aforesaid bulldog Schrems, who has made a career of pushing the DPC to enforce the law it was created to enforce. He keeps winning in court, too, which I’m sure must be terribly annoying.

The Irish DPC is an essential resource for everyone in Europe because Ireland is the European home of so many of American Big Tech’s subsidiaries. So this amendment – which reportedly passed the Oireachta (Ireland’s parliament) – is an alarming development.

***

Over the last few years Canadian law professor Michael Geist has had plenty of complaints about Canada’s Online News Act, aka C-18. Like the Australian legislation it emulates, C-18 requires intermediaries like Facebook and Google to negotiate and pay for licenses to link to Canadian news content. The bill became law on June 22.

Naturally, Meta and Google have warned that they will block links to Canadian news media from their services when the bill comes into force six months hence. They also intend to withdraw their ongoing programs to support the Canadian press. In response, the Canadian government has pulled its own advertising from Meta platforms Facebook and Instagram. Much hyperbolic silliness is taking place

Pretty much everyone who is not the Canadian government thinks the bill is misconceived. Canadian publishers will lose traffic, not gain revenues, and no one will be happy. In Australia, the main beneficiary appears to be Rupert Murdoch, with whom Google signed a three-year agreement in 2021 and who is hardly the sort of independent local media some hoped would benefit. Unhappily, the state of California wants in on this game; its in-progress Journalism Preservation Act also seeks to require Big Tech to pay a “journalism usage fee”.

The result is to continue to undermine the open Internet, in which the link is fundamental to sharing information. If things aren’t being (pay)walled off, blocked for copyright/geography, or removed for corporate reasons – the latest announced casualty is the GIF hosting site Gfycat – they’re being withheld to avoid compliance requirements or withdrawn for tax reasons. None of us are better off for any of this.

***

Those with long memories will recall that in 2011 IBM’s giant computer, Watson, beat the top champions at the TV game show Jeopardy. IBM predicted a great future for Watson as a medical diagnostician.

By 2019, that projected future was failing. “Overpromised and underdelivered,” ran a IEEE Spectrum headline. IBM is still trying, and is hoping for success with cancer diagnosis.

Meanwhile, Watson has a new (marketing) role: analyzing the draw and providing audio and text commentary for back-court tennis matches at Wimbledon and for highlights clips. For each match, Watson also calculates the competitors’ chances of winning and the favorability of their draw. For a veteran tennis watcher, it’s unsatisfying, though: IBM offers only a black box score, and nothing to show how that number was reached. At least human commentators tell you – albeit at great, repetitive length – the basis of their reasoning.

Illustrations: IBM’s Watson, which beat two of Jeopardy‘s greatest champions in 2011.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Twitter.

Unclear and unpresent dangers

Monthly computer magazines used to fret that their news pages would be out of date by the time the new issue reached readers. This week in AI, a blog posting is out of date before you hit send.

This – Friday – morning, the Italian data protection authority, Il Garante, has ordered ChatGPT to stop processing the data of Italian users until it complies with the General Data Protection Regulation. Il Garante’s objections, per Apple’s translation, posted by Ian Brown: ChatGPT provides no legal basis for collecting and processing its massive store of the personal data used to train the model, and that it fails to filter out users under 13.

This may be the best possible answer to the complaint I’d been writing below.

On Wednesday, the Future of Life Institute published an open letter calling for a six-month pause on developing systems more powerful than Open AI’s current state of the art, GPT4. Barring Elon Musk, Steve Wozniack, and Skype co-founder Jaan Tallinn, most of the signatories are unfamiliar names to most of us, though the companies and institutions they represent aren’t – Pinterest, the MIT Center for Artificial Intelligence, UC Santa Cruz, Ripple, ABN-Amro Bank. Almost immediately, there was a dispute over the validity of the signatures..

My first reaction was on the order of: huh? The signatories are largely people who are inventing this stuff. They don’t have to issue a call. They can just *stop*, work to constrain the negative impacts of the services they provide, and lead by example. Or isn’t that sufficiently performative?

A second reaction: what about all those AI ethics teams that Silicon Valley companies are disbanding? Just in the last few weeks, these teams have been axed or cut at Microsoft and Twitch; Twitter of course ditched such fripperies last November in Musk’s inaugural wave of cost-cutting. The letter does not call to reinstate these.

The problem, as familiar critics such as Emily Bender pointed out almost immediately, is that the threats the letter focuses on are distant not-even-thunder. As she went on to say in a Twitter thread, the artificial general intelligence of the Singularitarian’s rapture is nowhere in sight. By focusing on distant threats – longtermism – we ignore the real and present problems whose roots are being continuously more deeply embedded into the new-building infrastructure: exploited workers, culturally appropriated data, lack of transparency around the models and algorithms used to build these systems….basically, all the ways they impinge upon human rights.

This isn’t the first time such a letter has been written and circulated. In 2015, Stephen Hawking, Musk, and about 150 others similarly warned of the dangers of the rise of “superintelligences”. Just a year later, in 2016, Pro Publica investigated the algorithm behind COMPAS, a risk-scoring criminal justice system in use in US courts in several states. Under Julia Angwin‘s scrutiny, the algorithm failed at both accuracy and fairness; it was heavily racially biased. *That*, not some distant fantasy, was the real threat to society.

“Threat” is the key issue here. This is, at heart, a letter about a security issue, and solutions to security issues are – or should be – responses to threat models. What is *this* threat model, and what level of resources to counter it does it justify?

Today, I’m far more worried by the release onto public roads of Teslas running Full Self Drive helmed by drivers with an inflated sense of the technology’s reliability than I am about all of human work being wiped away any time soon. This matters because, as Jessie Singal, author of There Are No Accidents, keeps reminding us, what we call “accidents” are the results of policy decisions. If we ignore the problems we are presently building in favor of fretting about a projected fantasy future, that, too, is a policy decision, and the collateral damage is not an accident. Can’t we do both? I imagine people saying. Yes. But only if we *do* both.

In a talk this week for a group at the French international research group AI Act. This effort began well before today’s generative tools exploded into public consciousness, and isn’t likely to conclude before 2024. It is, therefore, much more focused on the kinds of risks attached to public sector scandals like COMPAS and those documented in Cathy O’Neil’s 2017 book Weapons of Math Destruction, which laid bare the problems with algorithmic scoring with little to tether it to reality.

With or without a moratorium, what will “AI” look like in 2024? It has changed out of recognition just since the last draft text was published. Prediction from this biological supremacist: it still won’t be sentient.

All this said, as Edwards noted, even if the letter’s proposal is self-serving, a moratorium on development is not necessarily a bad idea. It’s just that if the risk is long-term and existential, what will six months do? If the real risk is the hidden continued centralization of data and power, then those six months could be genuinely destructive. So far, it seems like its major function is as a distraction. Resist.

Illustrations: IBM’s Watson, which beat two of Jeopardy‘s greatest champions in 2011. It has since failed to transform health care.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Twitter.