Small data

Shortly before this gets posted, Jon Crowcroft and I will have presented this year’s offering at Gikii, the weird little conference that crosses law, media, technology, and pop culture. This is what we will possibly may have said, as I understand it, with some added explanation for the slightly less technical audience I imagine will read this.

Two years ago, a team of four researchers – Timnit Gebru, Emily Bender, Margaret Mitchell (writing as Shmargaret Shmitchell), and Angelina McMillan-Major – wrote a now-famous paper called On the Dangers of Stochastic Parrots (PDF) calling into question the usefulness of the large language models (LLMs) that have caused so much ruckus this year. The “Stochastic Four” argued instead of small models built on carefully curated data: less prone to error, less exploitive of people’s data, less damaging to the planet. Gebru got fired over this paper; Google also fired Mitchell soon afterwards. Two years later, neural networks pioneer Geoff Hinton quit Google in order to voice similar concerns.

Despite the hype, LLMs have many problems. They are fundamentally an extractive technology and are resource-intensive. Building LLMs requires massive amounts of training data; so far, the companies have been unwilling to acknowledge their sources, perhaps because (as is happening already) they fear copyright suits.

More important from a technical standpoint, is the issue of model collapse; that is, models degrade when they begin to ingest synthetic AI-generated data instead of human input. We’ve seen this before with Google Flu Trends, which degraded rapidly as incoming new search data included many searches on flu-like symptoms that weren’t actually flu, and others that simply reflected the frequency of local news coverage. “Data pollution” as LLM-generated data fills the web, will mean that the web will be an increasingly useless source of training data for future generations of generative AI. Lots more noise, drowning out the signal (in the photo above, the signal would be the parrot).

Instead, if we follow the lead of the Stochastic Four, the more productive approach is small data – small, carefully curated datasets that train models to match specific goals. Far less resource-intensive, far fewer issues with copyright, appropriation, and extraction.

We know what the LLM future looks like in outline: big, centralized services, because no one else will be able to amass enough data. In that future, surveillance capitalism is an essential part of data gathering. SLM futures could look quite different: decentralized, with realigned incentives. At one point, we wanted to suggest that small data could bring the end of surveillance capitalism; that’s probably an overstatement. But small data could certainly create the ecosystem in which the case for mass data collection would be less compelling.

Jon and I imagined four primary alternative futures: federation, personalization, some combination of those two, and paradigm shift.

Precursors to a federated small data future already exist; these include customer service chatbots, predictive text assistants. In this future, we could imagine personalized LLM servers designed to serve specific needs.

An individualized future might look something like I suggested here in March: a model that fits in your pocket that is constantly updated with material of your own choosing. Such a device might be the closest yet to Vannevar Bush’s 1945 idea of the Memex (PDF), updated for the modern era by automating the dozens of secretary-curators he imagined doing the grunt work of labeling and selection. That future again has precursors in techniques for sharing the computation but not the data, a design we see proposed for health care, where the data is too sensitive to share unless there’s a significant public interest (as in pandemics or very rare illnesses), or in other data analysis designs intended to protect privacy.

In 2007, the science fiction writer Charles Stross suggested something like this, though he imagined it as a comprehensive life log, which he described as a “google for real life”. So this alternative future would look something like Stross’s pocket $10 life log with enhanced statistics-based data analytics.

Imagining what a paradigm shift might look like is much harder. That’s the kind of thing science fiction writers do; it’s 16 years since Stross gave that life log talk. However, in his 2018 history of advertising, The Attention Merchants, Columbia professor Tim Wu argued that industrialization was the vector that made advertising and its grab for our attention part of commerce. A hundred and fifty-odd years later, the centralizing effects of industrialization are being challenged starting with energy via renewables and local power generation and social media via the fediverse. Might language models also play their part in bringing a new, more collaborative and cooperative society?

It is, in other words, just possible that the hot new technology of 2023 is simply a dead end bringing little real change. It’s happened before. There have been, as Wu recounts, counter-moves and movements before, but they didn’t have the technological affordances of our era.

In the Q&A that followed, Miranda Mowbray pointed out that companies are trying to implement the individualized model, but that it’s impossible to do unless there are standardized data formats, and even then hard to do at scale.

Illustrations: Spot the parrot seen in a neighbor’s tree.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Wendy M. GrossmanPosted on Categories AI, Events, New tech, old knowledgeTags 3 Comments on Small data

The data grab

It’s been a good week for those who like mocking flawed technology.

Numerous outlets have reported, for example, that “AI is getting dumber at math”. The source is a study conducted by researchers at Stanford and the University of California Berkeley comparing GPT-3.5’s and GPT-4’s output in March and June 2023. The researchers found that, among other things, GPT-4’s success rate at identifying prime numbers dropped from 84% to 51%. In other words, in June 2023 ChatGPT-4 did little better than chance at identifying prime numbers. That’s psychic level.

The researchers blame “drift”, the problem that improving one part of a model may have unhelpful knock-on effects in other parts of the model. At Ars Technica, Benj Edwards is less sure, citing qualified critics who question the study’s methodology. It’s equally possible, he suggests, that as the novelty fades, people’s attempts to do real work surface problems that were there all along. With no access to the algorithm itself and limited knowledge of the training data, we can only conduct such studies by controlling inputs and observing the outputs, much like diagnosing allergies by giving a child a series of foods in turn and waiting to see which ones make them sick. Edwards advocates greater openness on the part of the companies, especially as software developers begin building products on top of their generative engines.

Unrelated, the New Zealand discount supermarket chain Pak’nSave offered an “AI” meal planner that, set loose, promptly began turning out recipes for “poison bread sandwiches”, “Oreo vegetable stir-fry”, and “aromatic water mix” – which turned out to be a recipe for highly dangerous chlorine gas.

The reason is human-computer interaction: humans, told to provide a list of available ingredients, predictably became creative. As for the computer…anyone who’s read Janelle Shane’s 2019 book, You Look LIke a Thing and I Love You, or her Twitter reports on AI-generated recipes could predict this outcome. Computers have no real world experience against which to judge their output!

Meanwhile, the San Francisco Chronicle reports, Waymo and Cruise driverless taxis are making trouble at an accelerating rate. The cars have gotten stuck in low-hanging wires after thunderstorms, driven through caution tape, blocked emergency vehicles and emergency responders, and behaved erratically enough to endanger cyclists, pedestrians, and other vehicles. If they were driven by humans they’d have lost their licenses by now.

In an interesting side note that reminds of the cars’ potential as a surveillance network, Axios reports that in a ten-day study in May Waymo’s driverless cars found that human drivers in San Francisco speed 33% of the time. A similar exercise in Phoenix, Arizona observed human drivers speeding 47% of the time on roads with a 35mph speed limit. These statistics of course bolster the company’s main argument for adoption: improving road safety.

The study should – but probably won’t – be taken as a warning of the potential for the cars’ data collection to become embedded in both law enforcement and their owners’ business models. The frenzy surrounding ChatGPT-* is fueling an industry-wide data grab as everyone tries to beef up their products with “AI” (see also previous such exercises with “meta”, “nano”, and “e”), consequences to be determined.

Among the newly-discovered data grabbers is Intel, whose graphics processing unit (GPU) drivers are collecting telemetry data, including how you use your computer, the kinds of websites you visit, and other data points. You can opt out, assuming you a) realize what’s happening and b) are paying attention at the right moment during installation.

Google announced recently that it would scrape everything people post online to use as training data. Again, an opt-out can be had if you have the knowledge and access to follow the 30-year-old robots.txt protocol. In practical terms, I can configure my own site, pelicancrossing.net, to block Google’s data grabber, but I can’t stop it from scraping comments I leave on other people’s blogs or anything I post on social media sites or that’s professionally published (though those sites may block Google themselves). This data repurposing feels like it ought to be illegal under data protection and copyright law.

In Australia, Gizmodo reports that the company has asked the Australian government to relax copyright laws to facilitate AI training.

Soon after Google’s announcement the law firm Clarkson filed a class action lawsuit against Google to join its action against OpenAI. The suit accuses Google of “stealing” copyrighted works and personal data,

“Google does not own the Internet,” Clarkson wrote in its press release. Will you tell it, or shall I?

Whatever has been going on until now with data slurping in the interests of bombarding us with microtargeted ads is small stuff compared to the accelerating acquisition for the purpose of feeding AI models. Arguably, AI could be a public good in the long term as it improves, and therefore allowing these companies to access all available data for training is in the public interest. But if that’s true, then the *public* should own the models, not the companies. Why should we consent to the use of our data so they can sell it back to us and keep the proceeds for their shareholders?

It’s all yet another example of why we should pay attention to the harms that are clear and present, not the theoretical harm that someday AI will be general enough to pose an existential threat.

Illustrations: IBM Watson, Jeopardy champion.

Wendy M. Grossman is the 2013 winner of the Enigma Award and contributing editor for the Plutopia News Network podcast. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Solidarity

Whatever you’re starting to binge-watch, slow down. It’s going to be a long wait for fresh content out of Hollywood.

Yesterday, the actors union, SAG-AFTRA, went out on strike alongside the members of the Writers Guild of America, who have been “>walking picket lines since May 2. Like the writers, actors have seen their livelihoods shrink as US TV shows’ seasons shorten, “reruns” that pay residuals fade into the past, and DVD royalties dry up, while royalties from streaming remain tiny by comparison. At the Hollywood and Levine podcast, the veteran screenwriter Ken Levine gives the background to the WGA’s action. But think of it this way: the writers and cast of The Big Bang Theory may be the last to share fairly in the enormous profits their work continues to generate.

The even bigger threat? AI that makes it possible to capture the actor’s likeness and then reuse it ad infinitum in new work. This, as Malia Mendez writes at the LA Times, is the big fear. In a world where Harrison Ford at 80 is making movies in which he’s aged down to look 40 and James Earl Jones has agreed to clone his voice for reuse after his death, it’s arguably a rational big fear.

We’ve had this date for a long time. In the late 1990s I saw a demonstration of “vactors” – virtual actors that were created by scanning a human actor moving in various ways and building a library of movements that thereafter could be rendered at will. At the time, the state of the art was not much advanced from the liquid metal man in Terminator 2. Rendering film-quality characters was very slow, but that was then and this is now, and how long before rendering moving humans can be done in high-def in real-time at action speed?

The studios are already pushing actors into allowing synthesized reuse. California law grants public figures, including actors, publicity rights that prevent the commercial use of their name and likeness without consent. However, Mendez reports that current contracts already require actors to waive those rights to grant the studios digital simulation or digital creation rights. The effects are worst in reality television, where the line is blurred between the individual as a character on a TV show and the individual in their off-screen life. She quotes lawyer Ryan Schmidt: “We’re at this Napster 2001 moment…”

That moment is even closer for voice actors. Last year, Actors Equity announced a campaign to protect voice actors from their synthesized counterparts. This week, one of those synthesizers is providing commentary – more like captions, really – for video clips like this one at Wimbledon. As I said last year, while synthesized voices will be good enough for many applications such as railway announcements, there are lots of situations that will continue to require real humans. Sports commentary is one; commentators aren’t just there to provide information, they’re *also* there to sell the game. Their human excitement at the proceedings is an important part of that.

So SAG-AFTRA, like the Writers Guild of America, is seeking limitations on how studios may use AI, payment for such uses, and rules on protecting against misuse. In another LA Times story, Anoushka Sakoui reports that the studios’ offer included requiring “a performer’s consent for the creation and use of digital replicas or for digital alterations of a performance”. Like publishers “offering” all-rights-in perpetuity contracts to journalists and authors since the 1990s, the studios are trying to ensure they have all the rights they could possibly want.

“You cannot change the business model as much as it has changed and not expect the contract to change, too,” SAG-AFTRA president Fran Drescher said yesterday in a speech that has been widely circulated.

It was already clear this is going to be a long strike that will damage tens of thousands of industry workers and the economy of California. Earlier this week, Dominic Patten reported at Deadline that the Association of Movie and Television Producers plans to delay resuming talks with the WGA until October. By then, Patten reports producers saying, writers will be losing their homes and be more amenable to accepting the AMPTP’s terms. The AMPTP officially denies this, saying it’s committed to reaching a deal. Nonetheless, there are no ongoing talks. As Ken Levine pointed out in a pair of blogposts written during the 2007 writers strike, management is always in control of timing.

But as Levine also says, in the “old days” a top studio mogul could simply say, “Let’s get this done” and everyone would get around the table and make a deal. The new presence of tech giants Netflix, Amazon, and Apple in the AMPTP membership makes this time different. At some point, the strike will be too expensive for legacy Hollywood studios. But for Apple, TV production is a way to sell services and hardware. For Amazon, it’s a perk that comes with subscribing to its Prime delivery service. Only Netflix needs a constant stream of new work – and it can commission it from creators across the globe. All three of them can wait. And the longer they drag this out, the more the traditional studios will lose money and weaken as competitors.

Legacy Hollywood doesn’t seem to realize it yet, but this strike is existential for them, too.

Illustrations: SAG-AFTRA president Fran Drescher, announcing the strike on Thursday.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Watson goes to Wimbledon

The launch of the Fediverse-compatible Meta app Threads seems to have slightly overshadowed the European Court of Justice’s ruling, earlier in the week. This ruling deserves more attention: it undermines the basis of Meta’s targeted advertising. In noyb’s initial reaction, data protection legal bulldog Max Schrems suggests the judgment will make life difficult for not just Meta but other advertising companies.

As Alex Scroxton explains at Computer Weekly, the ruling rejects several different claims by Meta that all attempt to bypass the requirement enshrined in the General Data Protection Regulation that where there is no legal basis for data processing users must actively consent. Meta can’t get by with claiming that targeted advertising is a part of its service users expect, or that it’s technically necessary to provide its service.

More interesting is the fact that the original complaint was not filed by a data protection authority but by Germany’s antitrust body, which sees Meta’s our-way-or-get-lost approach to data gathering as abuse of its dominant position – and the CJEU has upheld this idea.

All this is presumably part of why Meta decided to roll out Threads in many countries but *not* the EU, In February, as a consequence of Brexit, Meta moved UK users to its US agreements. The UK’s data protection law is a clone of GDPR and will remain so until and unless the British Parliament changes it via the pending Data Protection and Digital Information bill. Still, it seems the move makes Meta ready to exploit such changes if they do occur.

Warning to people with longstanding Instagram accounts who want to try Threads: if your plan is to try and (maybe) delete, set up a new Instagram account for the purpose. Otherwise, you’ll be sad to discover that deleting your new Threads account means vaping your old Instagram account along with it. It’s the Hotel California method of Getting Big Fast.

***

Last week the Irish Council for Civil Liberties warned that a last-minute amendment to the Courts and Civil Law (Miscellaneous) bill will allow Ireland’s Data Protection Commissioner to mark any of its proceedings “confidential” and thereby bar third parties from publishing information about them. Effectively, it blocks criticism. This is a muzzle not only for the ICCL and other activists and journalists but for aforesaid bulldog Schrems, who has made a career of pushing the DPC to enforce the law it was created to enforce. He keeps winning in court, too, which I’m sure must be terribly annoying.

The Irish DPC is an essential resource for everyone in Europe because Ireland is the European home of so many of American Big Tech’s subsidiaries. So this amendment – which reportedly passed the Oireachta (Ireland’s parliament) – is an alarming development.

***

Over the last few years Canadian law professor Michael Geist has had plenty of complaints about Canada’s Online News Act, aka C-18. Like the Australian legislation it emulates, C-18 requires intermediaries like Facebook and Google to negotiate and pay for licenses to link to Canadian news content. The bill became law on June 22.

Naturally, Meta and Google have warned that they will block links to Canadian news media from their services when the bill comes into force six months hence. They also intend to withdraw their ongoing programs to support the Canadian press. In response, the Canadian government has pulled its own advertising from Meta platforms Facebook and Instagram. Much hyperbolic silliness is taking place

Pretty much everyone who is not the Canadian government thinks the bill is misconceived. Canadian publishers will lose traffic, not gain revenues, and no one will be happy. In Australia, the main beneficiary appears to be Rupert Murdoch, with whom Google signed a three-year agreement in 2021 and who is hardly the sort of independent local media some hoped would benefit. Unhappily, the state of California wants in on this game; its in-progress Journalism Preservation Act also seeks to require Big Tech to pay a “journalism usage fee”.

The result is to continue to undermine the open Internet, in which the link is fundamental to sharing information. If things aren’t being (pay)walled off, blocked for copyright/geography, or removed for corporate reasons – the latest announced casualty is the GIF hosting site Gfycat – they’re being withheld to avoid compliance requirements or withdrawn for tax reasons. None of us are better off for any of this.

***

Those with long memories will recall that in 2011 IBM’s giant computer, Watson, beat the top champions at the TV game show Jeopardy. IBM predicted a great future for Watson as a medical diagnostician.

By 2019, that projected future was failing. “Overpromised and underdelivered,” ran a IEEE Spectrum headline. IBM is still trying, and is hoping for success with cancer diagnosis.

Meanwhile, Watson has a new (marketing) role: analyzing the draw and providing audio and text commentary for back-court tennis matches at Wimbledon and for highlights clips. For each match, Watson also calculates the competitors’ chances of winning and the favorability of their draw. For a veteran tennis watcher, it’s unsatisfying, though: IBM offers only a black box score, and nothing to show how that number was reached. At least human commentators tell you – albeit at great, repetitive length – the basis of their reasoning.

Illustrations: IBM’s Watson, which beat two of Jeopardy‘s greatest champions in 2011.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Twitter.

Own goals

There’s no point in saying I told you so when the people you’re saying it to got the result they intended.

At the Guardian, Peter Walker reports the Electoral Commission’s finding that at least 14,000 people were turned away from polling stations in May’s local elections because they didn’t have the right ID as required under the new voter ID law. The Commission thinks that’s a huge underestimate; 4% of people who didn’t vote said it was because of voter ID – which Walker suggests could mean 400,000 were deterred. Three-quarters of those lacked the right documents; the rest opposed the policy. The demographics of this will be studied more closely in a report due in September, but early indications are that the policy disproportionately deterred people with disabilities, people from certain ethnic groups, and people who are unemployed.

The fact that the Conservatives, who brought in this policy, lost big time in those elections doesn’t change its wrongness. But it did lead the MP Jacob Rees-Mogg (Con-North East Somerset) to admit that this was an attempt to gerrymander the vote that backfired because older voters, who are more likely to vote Conservative, also disproportionately don’t have the necessary ID.

***

One of the more obscure sub-industries is the business of supplying ad services to websites. One such little-known company is Criteo, which provides interactive banner ads that are generated based on the user’s browsing history and behavior using a technique known as “behavioral retargeting”. In 2018, Criteo was one of seven companies listed in a complaint Privacy International and noyb filed with three data protection authorities – the UK, Ireland, and France. In 2020, the French data protection authority, CNIL, launched an investigation.

This week, CNIL issued Criteo with a €40 million fine over failings in how it gathers user consent, a ruling noyb calls a major blow to Criteo’s business model.

It’s good to see the legal actions and fines beginning to reach down into adtech’s underbelly. It’s also worth noting that the CNIL was willing to fine a *French* company to this extent. It makes it harder for the US tech giants to claim that the fines they’re attracting are just anti-US protectionism.

***

Also this week, the US Federal Trade Commission announced it’s suing Amazon, claiming the company enrolled millions of US consumers into its Prime subscription service through deceptive design and sabotaged their efforts to cancel.

“Amazon used manipulative, coercive, or deceptive user-interface designs known as “dark patterns” to trick consumers into enrolling in automatically-renewing Prime subscriptions,” the FTC writes.

I’m guessing this is one area where data protection laws have worked, In my UK-based ultra-brief Prime outings to watch the US Open tennis, canceling has taken at most two clicks. I don’t recognize the tortuous process Business Insider documented in 2022.

***

It has long been no secret that the secret behind AI is human labor. In 2019, Mary L. Gray and Siddharth Suri documented this in their book Ghost Work. Platform workers label images and other content, annotate text, and solve CAPTCHAs to help train AI models.

At MIT Technology Review, Rhiannon Williams reports that platform workers are using ChatGPT to speed up their work and earn more. A team of researchers from the Swiss Federal Institute of Technology study (PDF)found that between 33% and 46% of the 44 workers they tested with a request to summarize 16 extracts from medical research papers used AI models to complete the task.

It’s hard not to feel a little gleeful that today’s “AI” is already eating itself via a closed feedback loop. It’s not good news for platform workers, though, because the most likely consequence will be increased monitoring to force them to show their work.

But this is yet another case in which computer people could have learned from their own history. In 2008, researchers at Google published a paper suggesting that Google search data could be used to spot flu outbreaks. Sick people searching for information about their symptoms could provide real-time warnings ten days earlier than the Centers for Disease Control could.

This actually worked, some of the time. However, as early as 2009, Kaiser Fung reported at Harvard Business Review in 2014, Google Flu Trends missed the swine flu pandemic; in 2012, researchers found that it had overestimated the prevalence of flu for 100 out of the previous 108 weeks. More data is not necessarily better, Fung concluded.

In 2013, as David Lazer and Ryan Kennedy reported for Wired in 2015 in discussing their investigation into the failure of this idea, GFT missed by 140% (without explaining what that means). Lazer and Kennedy find that Google’s algorithm was vulnerable to poisoning by unrelated seasonal search terms and search terms that were correlated purely by chance, and failed to take into account changing user behavior as when it introduced autosuggest and added health-related search terms. The “availability” cognitive bias also played a role: when flu is in the news, searches go up whether or not people are sick.

While the parallels aren’t exact, large language modelers could have drawn the lesson that users can poison their models. ChatGPT’s arrival for widespread use will inevitably thin out the proportion of text that is human-written – and taint the well from which LLMs drink. Everyone imagines the next generation’s increased power. But it’s equally possible that the next generation will degrade as the percentage of AI-generated data rises.

Illustrations: Drunk parrot seen in a Putney garden (by Simon Bisson).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Unclear and unpresent dangers

Monthly computer magazines used to fret that their news pages would be out of date by the time the new issue reached readers. This week in AI, a blog posting is out of date before you hit send.

This – Friday – morning, the Italian data protection authority, Il Garante, has ordered ChatGPT to stop processing the data of Italian users until it complies with the General Data Protection Regulation. Il Garante’s objections, per Apple’s translation, posted by Ian Brown: ChatGPT provides no legal basis for collecting and processing its massive store of the personal data used to train the model, and that it fails to filter out users under 13.

This may be the best possible answer to the complaint I’d been writing below.

On Wednesday, the Future of Life Institute published an open letter calling for a six-month pause on developing systems more powerful than Open AI’s current state of the art, GPT4. Barring Elon Musk, Steve Wozniack, and Skype co-founder Jaan Tallinn, most of the signatories are unfamiliar names to most of us, though the companies and institutions they represent aren’t – Pinterest, the MIT Center for Artificial Intelligence, UC Santa Cruz, Ripple, ABN-Amro Bank. Almost immediately, there was a dispute over the validity of the signatures..

My first reaction was on the order of: huh? The signatories are largely people who are inventing this stuff. They don’t have to issue a call. They can just *stop*, work to constrain the negative impacts of the services they provide, and lead by example. Or isn’t that sufficiently performative?

A second reaction: what about all those AI ethics teams that Silicon Valley companies are disbanding? Just in the last few weeks, these teams have been axed or cut at Microsoft and Twitch; Twitter of course ditched such fripperies last November in Musk’s inaugural wave of cost-cutting. The letter does not call to reinstate these.

The problem, as familiar critics such as Emily Bender pointed out almost immediately, is that the threats the letter focuses on are distant not-even-thunder. As she went on to say in a Twitter thread, the artificial general intelligence of the Singularitarian’s rapture is nowhere in sight. By focusing on distant threats – longtermism – we ignore the real and present problems whose roots are being continuously more deeply embedded into the new-building infrastructure: exploited workers, culturally appropriated data, lack of transparency around the models and algorithms used to build these systems….basically, all the ways they impinge upon human rights.

This isn’t the first time such a letter has been written and circulated. In 2015, Stephen Hawking, Musk, and about 150 others similarly warned of the dangers of the rise of “superintelligences”. Just a year later, in 2016, Pro Publica investigated the algorithm behind COMPAS, a risk-scoring criminal justice system in use in US courts in several states. Under Julia Angwin‘s scrutiny, the algorithm failed at both accuracy and fairness; it was heavily racially biased. *That*, not some distant fantasy, was the real threat to society.

“Threat” is the key issue here. This is, at heart, a letter about a security issue, and solutions to security issues are – or should be – responses to threat models. What is *this* threat model, and what level of resources to counter it does it justify?

Today, I’m far more worried by the release onto public roads of Teslas running Full Self Drive helmed by drivers with an inflated sense of the technology’s reliability than I am about all of human work being wiped away any time soon. This matters because, as Jessie Singal, author of There Are No Accidents, keeps reminding us, what we call “accidents” are the results of policy decisions. If we ignore the problems we are presently building in favor of fretting about a projected fantasy future, that, too, is a policy decision, and the collateral damage is not an accident. Can’t we do both? I imagine people saying. Yes. But only if we *do* both.

In a talk this week for a group at the French international research group AI Act. This effort began well before today’s generative tools exploded into public consciousness, and isn’t likely to conclude before 2024. It is, therefore, much more focused on the kinds of risks attached to public sector scandals like COMPAS and those documented in Cathy O’Neil’s 2017 book Weapons of Math Destruction, which laid bare the problems with algorithmic scoring with little to tether it to reality.

With or without a moratorium, what will “AI” look like in 2024? It has changed out of recognition just since the last draft text was published. Prediction from this biological supremacist: it still won’t be sentient.

All this said, as Edwards noted, even if the letter’s proposal is self-serving, a moratorium on development is not necessarily a bad idea. It’s just that if the risk is long-term and existential, what will six months do? If the real risk is the hidden continued centralization of data and power, then those six months could be genuinely destructive. So far, it seems like its major function is as a distraction. Resist.

Illustrations: IBM’s Watson, which beat two of Jeopardy‘s greatest champions in 2011. It has since failed to transform health care.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Memex 2.0

As language models get cheaper, it’s dawned on me what kind of “AI” I’d like to have: a fully personalized chat bot that has been trained on my 30-plus years of output plus all the material I’ve read, watched, listened to, and taken notes on all these years. A clone of my brain, basically, with more complete and accurate memory updated alongside my own. Then I could discuss with it: what’s interesting to write about for this week’s net.wars?

I was thinking of what’s happened with voice synthesis. In 2011, it took the Scottish company Cereproc months to build a text-to-speech synthesizer from recordings of Roger Ebert’s voice. Today, voice synthesizers are all over the place – not personalized like Ebert’s, but able to read a set text plausibly enough to scare voice actors.

I was also thinking of the Stochastic Parrots paper, whose first anniversary was celebrated last week by authors Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell. An important part of the paper advocates for smaller, better-curated language models: more is not always better. I can’t find a stream for the event, but here’s the reading list collected during the proceedings. There’s lots I’d rather eliminate from my personal assistant. Eliminating unwanted options upfront has long been a widspread Internet failure, from shopping sites (“never show me pet items”) to news sites (“never show me fashion trends”). But that sort of selective display is more difficult and expensive than including everything and offering only inclusion filters.

A computational linguistics expert tells me that we’re an unknown amount of time away from my dream of the wg-bot. Probably, if such a thing becomes possible it will be based on someone’s large language model and fine-tuned with my stuff. Not sure I entirely like this idea; it means the model will be trained on stuff I haven’t chosen or vetted and whose source material is unknown, unless we get a grip on forcing disclosure or the proposed BLOOM academic open source language model takes over the world.

I want to say that one advantage to training a chatbot on your own output is you don’t have to worry so much about copyright. However, the reality is that most working writers have sold all rights to most of their work to large publishers, which means that such a system is a new version of digital cholera. In my own case, by the time I’d been in this business for 15 years, more than half of the publications I’d written for were defunct. I was lucky enough to retain at least non-exclusive rights to my most interesting work, but after so many closures and sales I couldn’t begin to guess – or even know how to find out – who owns the rights to the rest of it. The question is moot in any case: unless I choose to put those group reviews of Lotus 1-2-3 books back online, probably no one else will, and if I do no one will care.

On Mastodon, the specter of the upcoming new! improved! version of the copyright wars launched by the arrival of the Internet: “The real generative AI copyright wars aren’t going to be these tiny skirmishes over artists and Stability AI. Its going to be a war that puts filesharing 2.0 and the link tax rolled into one in the shade.” Edwards is referring to this case, in which artists are demanding billions from the company behind the Stable Diffusion engine.

Edwards went on to cite a Wall Street Journal piece that discusses publishers’ alarmed response to what they perceive as new threats to their business. First: that the large piles of data used to train generative “AI” models are appropriated without compensation. This is the steroid-fueled analogue to the link tax, under which search engines in Australia pay newspapers (primarily the Murdoch press) for including them in news search results. A similar proposal is pending in Canada.

The second is that users, satisfied with the answers they receive from these souped-up search services will no longer bother to visit the sources – especially since few, most notably Google, seem inclined to offer citations to back up any of the things they say.

The third is outright plagiarism without credit by the chatbot’s output, which is already happening.

The fourth point of contention is whether the results of generative AI should be themselves subject to copyright. So far, the consensus appears to be no, when it comes to artwork. But some publishers who have begun using generative chatbots to create “content” no doubt claim copyright in the results. It might make more sense to copyright the *prompt*. (And some bright corporate non-soul may yet try.)

At Walled Culture, Glyn Moody discovers that the EU has unexpectedly done something right by requiring positive opt-in to copyright protection against text and data mining. I’d like to see this as a ray of hope for avoiding the worst copyright conflicts, but given the transatlantic rhetoric around privacy laws and data flows, it seems much more likely to incite another trade conflict.

It now dawns on me that the system I outlined in the first paragraph is in fact Vannevar Bush’s Memex. Not the web, which was never sufficiently curated, but this, primed full of personal intellectual history. The “AI” represents those thousands of curating secretaries he thought the future would hold. As if.

Illustrations: Stable Diffusion rendering of “stochastic parrots”, as prompted by Jon Crowcroft.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Performing intelligence

“Oh, great,” I thought when news broke of the release of GPT-4. “Higher-quality deception.”

Most of the Internet disagreed; having gone mad only a few weeks ago over ChatGPT, everyone’s now agog over this latest model. It passed all these tests!

One exception was the journalist Paris Marx, who commented on Twitter: “It’s so funny to me that the AI people think it’s impressive when their programs pass a test after being trained on all the answers.”

Agreed. It’s also so funny to me that they call that “AI” and don’t like it when researchers like computational linguist Emily Bender call it a “stochastic parrot”. At Marx’s Tech Won’t Save Us podcast, Goldsmith professor Dan McQuillan, author of Resisting AI: An Anti-fascist Approach to Artificial Intelligence, calls it a “bullshit engine” whose developers’ sole goal is plausibility – plausibility that, as Bender has said, allows us imaginative humans to think we detect a mind behind it, and the result is to risk devaluing humans.

Let’s walk back to an earlier type of system that has been widely deployed: benefits scoring systems. A couple of weeks ago, Lighthouse Reports and Wired magazine teamed up on an investigation of these systems, calling them “suspicion machines”.

Their work focuses on the welfare benefits system in use in Rotterdam between 2017 and 2021, which uses 315 variables to risk-score benefits recipients according to the likelihood that their claims are fraudulent. In detailed, worked case analyses, they find systemic discrimination: you lose points for being female, for being female and having children (males aren’t asked about children), for being non-white, and for ethnicity (knowing Dutch a requirement for welfare recipients). Other variables include missing meetings, age, and “lacks organizing skills”, which was just one of 54 variables based on case workers’ subjective assessments. Any comment a caseworker adds translates to a 1 added to the risk score, even if it’s positive. The top-scoring 10% are flagged for further investigation.

This is the system that Accenture, the city’s technology partner on the early versions, said at its unveiling in 2018 was an “ethical solution” and promised “unbiased citizen outcomes”. Instead, Wired says, the algorithm “fails the city’s own test of fairness”.

The project’s point wasn’t to pick on Rotterdam; of the dozens of cities they contacted it just happened to be the only one that was willing to share the code behind the algorithm, along with the list of variables, prior evaluations, and the data scientists’ handbook. It even – after being threatened with court action under freedom of information laws, shared the mathematical model itself.

The overall conclusion: the system was so inaccurate it was little better than random sampling “according to some metrics”.

What strikes me, aside from the details of this design, is the initial choice of scoring benefits recipients for risk of fraud. Why not score them for risk of missing out on help they’re entitled to? The UK government’s figures on benefits fraud indicate that in 2021-2022 overpayment (including error as well as fraud) amounted to 4%; and *underpayment* 1.2% of total expenditure. Underpayment is a lot less, but it’s still substantial (£2.6 billion). Yes, I know, the point of the scoring system is to save money, but the point of the *benefits* system is to help people who need it. The suspicion was always there, but the technology has altered the balance.

This was the point the writer Ellen Ullman noted in her 1996 book Close to the Machine”: the hard-edged nature of these systems and their ability to surveil people in new ways, “infect” their owners with suspicion even of people they’ve long trusted and even when the system itself was intended to be helpful. On a societal scale, these “suspicion machines” embed increased division in our infrastructure; in his book, McQuillan warns us to watch for “functionality that contributes to violent separations of ‘us and them’.”

Along those lines, it’s disturbing that Open AI, the owner of ChatGPT and GPT-4 (and several other generative AI gewgaws) has now decided to keep secret the details of its large language models. That is, we have no sight into what data was used in training, what software and hardware methods were used, or how energy-intensive it is. If there’s a machine loose in the world’s computer systems pretending to be human, shouldn’t we understand how it works? It would help with damping down imagining we see a mind in there.

The company’s argument appears to be that because these models could become harmful it’s bad to publish how they work because then bad actors will use them to create harm. In the cybersecurity field we call this “security by obscurity” and there is a general consensus that it does not work as a protection.

In a lengthy article at New York magazine, Elizabeth Weil. quotes Daniel Dennett’s assessment of these machines: “counterfeit people” that should be seen as the same sort of danger to our system as counterfeit money. Bender suggests that rather than trying to make fake people we should be focusing more on making tools to help people.

The thing that makes me tie it to the large language models that are producing GPT is that in both cases it’s all about mining our shared cultural history, with all its flaws and misjudgments, in response to a prompt and pretending the results have meaning and create new knowledge. And *that’s* what’s being embedded into the world’s infrastructure. Have we learned nothing from Clever Hans?

Illustrations: Clever Hans, performing in Leipzig in 1912 (by Karl Krali, via Wikimedia.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Esquivalience

The science fiction author Charles Stross had a moment of excitement on Mastodon this week: WRITER CHALLENGE!.

Stross challenged writers to use the word “esquivalience” in their work. The basic idea: turn this Pinocchio word into a “real” word.

Esquivalience is the linguistic equivalent of a man-made lake. The creator, editor Christine Lindberg, invented it for the 2001 edition of the New American Oxford Dictionary and defined it as “the willful avoidance of one’s official responsibilities; the shirking of duties”. It was a trap to catch anyone republishing the dictionary rather than developing their own (a job I have actually done). This is a common tactic for protecting large compilations where it’s hard to prove copying – fake streets are added to maps, for example, and the people who rent out mailing lists add ringers whose use will alert them if the list is used outside the bounds of the contractual agreement.

There is, however, something peculiarly distasteful about fake entries in supposedly authoritative dictionaries, even though I agree with Lindberg that “esquivalience” is a pretty useful addition to the language. It’s perfect – perhaps in the obvious adjectival form “esquivalient” – for numerous contemporary politicians, though here be dragons: “willful” risks libel actions.

Probably most writers have wanted to make up words, and many have, from playwright and drama critic George S. Kaufman, often credited for coining, among other things, “underwhelmed”, to Anthony Burgess, who invented an entire futurist street language for A Clockwork Orange. Some have gone so far as to create enough words to publish dictionaries – such as the humorist Gelett Burgess, whose Burgess Unabridged (free ebook!) compiles “words you’ve always needed”. From that collection, I have always been particularly fond of Burgess’s “wox”, defined as “a state of placid enjoyment; sluggish satisfaction”. It seems particularly apt in the hours immediately following Thanksgiving dinner.

In these cases, though, the context lets you know the language is made up. The dictionary is supposed to be authoritative, admitting words only after they are well-established. The presence of fake words feels damaging in a way that a fake place on a map doesn’t. It’s comparatively easy to check whether a place exists by going there, but at some point down the echoing corridors of time *every* word was used for the first time. Pinpointing exactly when is hard unless someone ‘fesses up. I don’t like the idea that my dictionary is lying to me. Better if NOAD had planted two fake words and had them recursively point at each other for their definitions.

I had been avoiding the ChatGPT hoopla, but it seemed plausible to ask it: Is “esquivalience” a real word?” Its response started well enough: “‘Esquivalience’ is not recognized as a standard word in the English language. It is a made-up word…” And then cuckoo land arrived: “…that was created by a writer named Adam Jacot de Boinod for his book “The Meaning of Tingo”.” Pause to research. The book in question was written in 2006. The word “esquivalience” does not, from a quick text search, appear in it. Huh? I went on to suggest Christine Lindberg’s name to ChatGPT, and after a digression attributing the word to the singer-songwriter Christine Lavin, it appeared to find references to Lindberg’s “claim” in its corpus of data. But, it continued to warn, in every response, “it is still not recognized as a standard word in the English language”. It’s a bot. It’s not being stern. It doesn’t know what it’s saying. Getting it to agree on Christine Lindberg as the original source isn’t winning the argument. It’s just giving it a different prompt.

I ask if it has ever encountered the word “wox”. “As an AI language model, I have certainly come across the word ‘wox’.” A human reads lightly insulted pride into that. Resist. It’s a bot. It has no pride. The bot went on to speculate on possible origins (“it may be a neologism…”). I ask if it’s heard of Gelett Burgess. Oh, yes, followed by a short biography. Then, when told Burgess invented “wox”: “Gelett Burgess did indeed invent the word…” and goes on to cite the correct book…but then continues that Burgess defined it as “to make fun of, to poke fun at” which is absolutely not what Burgess says, and I know this because I have the original 1914 book right here, and the definition I cited above is right there on p112. The bot does “apologize” every time you point out a mistake, though.

This isn’t much of a sample, but based on it, I find ChatGPT quite alarming as an extraordinarily efficient way of undermining factual knowledge. The responses sound authoritative, but every point must be fact-checked. It could not be worse-suited for today’s world, where everyone wants fast answers. Coupled with search, it turns the algorithms that give us answers into even more obscure and less trustworthy black boxes. Wikipedia has many flaws, but its single biggest strength is its sourcing and curation; how every page has been changed and shaped over the years is open for inspection.

So when ChatGPT went on to say that Gelett Burgess is widely credited with coining the term “blurb”, Wikipedia is where I turned. Wikipedia agrees (asked, ChatGPT cites the Oxford English Dictionary). Burgess FTW.

Illustrations: Gelett Burgess’s 1914 Burgess Unabridged, a dictionary of made-up words.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Inappt

Recently, it took a flatwoven wool rug cmore than two weeks to travel from Luton, Bedfordshire to southwest London. The rug’s source – an Etsy seller – and I sent back and forth dozens of messages. It would be there tomorrow. Oh, no, the courier now says Wednesday. Um, Friday. Er, next week. I can send you a different rug, if you want to choose one. No.

In the end, the rug arrived into my life. I don’t dare decide it’s the wrong color.

I would dismiss this as a one-off aberration, except that a few weeks ago the intended recipient of a parcel sent at the beginning of November casually mentioned they had never received it. Upon chasing, the courier company replied: “Despite an extensive investigation, we have not been able to locate your parcel.”

I would dismiss those as a two-off aberration except that late last year the post office tracking on yet another item went on showing it stuck in some unidentifiable depot somewhere for two weeks. Eventually, I applied brain and logic and went down to the nearest delivery office and there it was, waiting for me to pay the customs fee specified on the card I never received. It was only a few days away from being sent back.

And I would dismiss those as a three-off aberration except that two weeks ago I was notified to expect a package from a company whose name I didn’t recognize between 7pm and 9pm. I therefore felt perfectly safe to go into the room furthest from the front door, the kitchen, and wash some dishes at 5:30. Nope. They delivered at 5:48, I didn’t hear them, and I had a hard time figuring out whom to contact to persuade them to redeliver.

The point about all this is not to yell at random couriers to get off my lawn but to note that at least this part of the app-based economy has stopped delivering the results it promised. Less than ten years since these companies set out to disrupt delivery services by providing lower prices, accurate information, on-time deliveries, and constant tracking, we’re back to waiting at home for unspecified numbers of hours wondering if they’re going to show and struggling to trace lost packages. Only this time, there’s no customer service, working conditions and pay are much worse for drivers and delivery folk, and the closure of many local outlets has left us all far more dependent on them.

***

Also falling over this week, as widely reported (because: journalists), was Twitter, which for a time on Wednesday barred posting new tweets unless they were posted via the kind of scheduling software that the site is limiting). Many of us have been expecting outages ever since November, when Charlie Warzel at The Atlantic and Chris Stokel-Walker at MIT Technology Review interviewed Twitter engineers past and present. All of them warned that the many staff cuts and shrinking budgets have left the service undersupplied with people who can keep the site running and that outages of increasing impact should be expected.

Nonetheless, the “Apocalypse, Now!” reporting that ensued was about as sensible as the reporting earlier in the week that the Fediverse was failing to keep the Tweeters who flooded there beginning in November. In response, https://www.techdirt.com/2023/02/08/lazy-reporters-claiming-fediverse-is-slumping-despite-massive-increase-in-usage/ Mike Masnick noted at TechDirt how silly this was. Because: 1) There’s a lot more to the Fediverse than just Mastodon, which is all these reporters looked at; 2) even then, Mastodon had lost a little from its peak but was still vastly more active than before November; 3) it’s hard for people to change their habits, and they will revert to what’s familiar if they don’t see a reason why they can’t; and 4) it’s still early days. So, meh.

However, Zeynep Tufekci reminds that Twitter’s outage is entertainment only for the privileged; for those trying to coordinate rescue and aid efforts for Turkey, Twitter is an essential tool.

***

While we’re sniping at the failings of current journalism, it appears that yet another technology has been overhyped: DoNotPay, “the world’s first robot lawyer”, the bot written by a British university student that has supposedly been helping folks successfully contest traffic tickets. Masnick (again) and Kathryn Tewson have been covering the story for TechDirt. Tewson, a paralegal, has taken advantage of the fact that cities publish their parking ticket data in order to study DoNotPay’s claims in detail.

TechDirt almost ran a skeptical article about the service in 2017. Suffice to say that now Masnick concludes, “I wish that DoNotPay actually could do much of what it claims to do. It sounds like it could be a really useful service…”

***

The pile-up of this sort of thing – apps that disrupt and then degrade service, technology that’s overhyped (see also self-driving cars), flat-out fraud (see cryptocurrencies), breathless media reporting of nothing much – is probably why I have been unable to raise any excitement over the wow-du-jour, ChatGPT. It seems obvious that of course it can’t read, and can’t understand anything it’s typing, and that sober assessment of what it might be good for is some way off. In the New Yorker, Ted Chiang puts it in its place: think of it as a blurred JPEG. Sounds about right.

Illustrations: Drunk parrot (taken by Simon Bisson).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Stories about the border wars between cyberspace and real life are posted occasionally during the week at the net.wars Pinboard – or follow on Mastodon or Twitter.