Beware the duck

Once upon a time, “convergence” was a buzzword. That was back in the days when audio was on stereo systems, television was on a TV, and “communications” happened on phones that weren’t computers. The word has disappeared back into its former usage pattern, but it could easily be revived to describe what’s happening to content as humans dive into using generative tools.

Said another way. Roughly this time last year, the annual technology/law/pop culture conference Gikii was awash in (generative) AI. That bubble is deflating, but in the experiments that nonetheless continue a new topic more worthy of attention is emerging: artificial content. It’s striking because what happens at this gathering, which mines all types of popular culture for cues for serious ideas, is often a good guide to what’s coming next in futurelaw.

That no one dared guess which of Zachary Cooper‘s pair of near-identicalaudio clips was AI-generated, and which human-performed was only a starting point. One had more static? Cooper’s main point: “If you can’t tell which clip is real, then you can’t decide which one gets copyright.” Right, because only human creations are eligible (although fake bands can still scam Spotify).

Cooper’s brief, wild tour of the “generative music underground” included using AI tools to create songs whose content is at odds with their genre, whole generated albums built by a human producer making thousands of tiny choices, and the new genre “gencore”, which exploits the qualities of generated sound (Cher and Autotune on steroids). Voice cloning, instrument cloning, audio production plugins, “throw in a bass and some drums”….

Ultimately, Cooper said, “The use of generative AI reveals nothing about the creative relationship to work; it destabilizes the international market by having different authorship thresholds; and there’s no means of auditing any of it.” Instead of uselessly trying to enforce different rights predicated on the use or non-use of a specific set of technologies, he said, we should tackle directly the challenges new modes of production pose to copyright. Precursor: the battles over sampling.

Soon afterwards, Michael Veale was showing us Civitai, an Idaho-based site offering open source generative AI tools, including fine-tuned models. “Civitai exists to democratize AI media creation,” the site explains. “Everything has a valid legal purpose,” Veale said, but the way capabilities can be retrained and chained together to create composites makes it hard to tell which tools, if any, should be taken down, even for creators (see also the puzzlement as Redditors try to work this out). Even environmental regulation can’t help, as one attendee suggested: unlike large language models, these smaller, fine-tuned models (as Jon Crowcroft and I surmised last year would be the future) are efficient; they can run on a phone.

Even without adding artificial content there is always an inherent conflict when digital meets an analog spectrum. This is why, Andy Phippen said, the threshold of 18 for buying alcohol and cigarettes turns into a real threshold of 25 at retail checkouts. Both software and humans fail at determining over-or-under-18, and retailers fear liability. Online age verification as promoted in the Online Safety Act will not work.

If these blurred lines strain the limits of current legal approaches, others expose gaps in the law. Andrea Matwyshyn, for example, has been studying parallels I’ve also noticed between early 20th century company towns and today’s tech behemoths’ anti-union, surveillance-happy working practices. As a result, she believes that regulatory authorities need to start considering closely the impact of data aggregation when companies merge and look for company town-like dynamics”.

Andelka Phillips parodied the overreach of app contracts by imagining the EULA attached to “ThoughtReader app”. A sample clause: “ThoughtReader may turn on its service at any time. By accepting this agreement, you are deemed to accept all monitoring of your thoughts.” Well, OK, then. (I also had a go at this here, 19 years ago.)

Emily Roach toured the history of fan fiction and the law to end up at Archive of Our Own, a “fan-created, fan-run, nonprofit, noncommercial archive for transformative fanworks, like fanfiction, fanart, fan videos, and podfic”, the idea being to ensure that the work fans pour their hearts into has a permanent home where it can’t be arbitrarily deleted by corporate owners. The rules are strict: not so much as a “buy me a coffee” tip link that could lead to a court-acceptable claim of commercial use.

History, the science fiction writer Charles Stross has said, is the science fiction writer’s secret weapon. Also at Gikii: Miranda Mowbray unearthed the 18th century “Digesting Duck” automaton built by Jacques de Vauconson. It was a marvel that appeared to ingest grain and defecate waste and that in its day inspired much speculation about the boundaries between real and mechanical life. Like the amazing ancient Greek automata before it, it was, of course, a purely mechanical fake – it stored the grain in a small compartment and released pellets from a different compartment – but today’s humans confused into thinking that sentences mean sentience could relate.

Illustrations: One onlooker’s rendering of his (incorrect) idea of the interior of Jacques de Vaucanson’s Digesting Duck (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Soap dispensers and Skynet

In the TV series Breaking Bad, the weary ex-cop Mike Ehrmantraut tells meth chemist Walter White : “No more half measures.” The last time he took half measures, the woman he was trying to protect was brutally murdered.

Apparently people like to say there are no dead bodies in privacy (although this is easily countered with ex-CIA director General Michael Hayden’s comment, “We kill people based on metadata”). But, as Woody Hartzog told a Senate committee hearing in September 2023, summarizing work he did with Neil Richards and Ryan Durrie, half measures in AI/privacy legislation are still a bad thing.

A discussion at Privacy Law Scholars last week laid out the problems. Half measures don’t work. They don’t prevent societal harms. They don’t prevent AI from being deployed where it shouldn’t be. And they sap the political will to follow up with anything stronger.

In an article for The Brink, Hartzog said, “To bring AI within the rule of law, lawmakers must go beyond half measures to ensure that AI systems and the actors that deploy them are worthy of our trust,”

He goes on to list examples of half measures: transparency, committing to ethical principles, and mitigating bias. Transparency is good, but doesn’t automatically bring accountability. Ethical principles don’t change business models. And bias mitigation to make a technology nominally fairer may simultaneously make it more dangerous. Think facial recognition: debias the system and improve its accuracy for matching the faces of non-male, non-white people, and then it’s used to target those same people with surveillance.

Or, bias mitigation may have nothing to do with the actual problem, an underlying business model, as Arvind Narayanan, author of the forthcoming book AI Snake Oil, pointed out a few days later at an event convened by the Future of Privacy Forum. In his example, the Washington Post reported in 2019 on the case of an algorithm intended to help hospitals predict which patients will benefit from additional medical care. It turned out to favor white patients. But, Narayanan said, the system’s provider responded to the story by saying that the algorithm’s cost model accurately predicted the costs of additional health care – in other words, the algorithm did exactly what the hospital wanted it to do.

“I think hospitals should be forced to use a different model – but that’s not a technical question, it’s politics.”.

Narayanan also called out auditing (another Hartzog half measure). You can, he said, audit a human resources system to expose patterns in which resumes it flags for interviews and which it drops. But no one ever commissions research modeled on the expensive random controlled testing common in medicine that follows up for five years to see if the system actually picks good employees.

Adding confusion is the fact that “AI” isn’t a single thing. Instead, it’s what someone called a “suitcase term” – that is, a container for many different systems built for many different purposes by many different organizations with many different motives. It is absurd to conflate AGI – the artificial general intelligence of science fiction stories and scientists’ dreams that can surpass and kill us all – with pattern-recognizing software that depends on plundering human-created content and the labeling work of millions of low-paid workers

To digress briefly, some of the AI in that suitcase is getting truly goofy. Yum Brands has announced that its restaurants, which include Taco Bell, Pizza Hut, and KFC, will be “AI-first”. Among Yum’s envisioned uses, the company tells Benj Edwards at Ars Technica, are being able to ask an app what temperature to set the oven. I can’t help suspecting that the real eventual use will be data collection and discriminatory pricing. Stuff like this is why Ed Zitron writes postings like The Rot-Com Bubble, which hypothesizes that the reason Internet services are deteriorating is that technology companies have run out of genuinely innovative things to sell us.

That you cannot solve social problems with technology is a long-held truism, but it seems to be especially true of the messy middle of the AI spectrum, the use cases active now that rarely get the same attention as the far ends of that spectrum.

As Neil Richards put it at PLSC, “The way it’s presented now, it’s either existential risk or a soap dispenser that doesn’t work on brown hands when the real problem is the intermediate level of societal change via AI.”

The PLSC discussion included a list of the ways that regulations fail. Underfunded enforcement. Regulations that are pure theater. The wrong measures. The right goal, but weakly drafted legislation. Make the regulation ambiguous, or base it on principles that are too broad. Choose conflicting half-measures – for example, require transparency but add the principle that people should own their own data.

Like Cristina Caffarra a week earlier at CPDP, Hartzog, Richards, and Durrie favor finding remedies that focus on limiting abuses of power. Full measures include outright bans, the right to bring a private cause of action, imposing duties of “loyalty, care, and confidentiality”, and limiting exploitative data practices within these systems. Curbing abuses of power, as he says, is nothing new. The shiny new technology is a distraction.

Or, as Narayanan put it, “Broken AI is appealing to broken institutions.”

Illustrations: Mike (Jonathan Banks) telling Walt (Bryan Cranston) in Breaking Bad (S03e12) “no more half measures”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Admiring the problem

In one sense, the EU’s barely dry AI Act and the other complex legislation – the Digital Markets Act, Digital Services Act, GDPR, and so on -= is a triumph. Flawed it may be, but it’s a genuine attempt to protect citizens’ human rights against a technology that is being birthed with numerous trigger warnings. The AI-with-everything program at this year’s Computers, Privacy, and Data Protection, reflected that sense of accomplishment – but also the frustration that comes with knowing that all legislation is flawed, all technology companies try to game the system, and gaps will widen.

CPDP has had these moments before: new legislation always comes with a large dollop of frustration over the opportunities that were missed and the knowledge that newer technologies are already rushing forwards. AI, and the AI Act, more or less swallowed this year’s conference as people considered what it says, how it will play internationally, and the necessary details of implementation and enforcement. Two years at this event, inadequate enforcement of GDPR was a big topic.

The most interesting future gaps that emerged this year: monopoly power, quantum sensing, and spatial computing.

For at least 20 years we’ve been hearing about quantum computing’s potential threat to public key encryption – that day of doom has been ten years away as long as I can remember, just as the Singularity is always 30 years away. In the panel on quantum sensing, Chris Hoofnagle argued that, as he and Simson Garfinkel recently wrote at Lawfare and in their new book, quantum cryptanalysis is overhyped as a threat (although there are many opportunities for quantum computing in chemistry and materials science). However, quantum sensing is here now, works (because qubits are fragile), and is cheap. There is plenty of privacy threat here to go around: quantum sensing will benefit entirely different classes of intelligence, particularly remote, undetectable surveillance.

Hoofnagle and Garfinkel are calling this MASINT, for machine and signature intelligence, and believe that it will become very difficult to hide things, even at a national level. In Hoofnagle’s example, a quantum sensor-equipped drone could fly over the homes of parolees to scan for guns.

Quantum sensing and spatial computing have this in common: they both enable unprecedented passive data collection. VR headsets, for example, collect all sorts of biomechanical data that can be mined more easily for personal information than people expect.

Barring change, all that data will be collected by today’s already-powerful entities.

The deeper level on which all this legislation fails particularly exercised Cristina Caffarra, the co-founder of the Centre for Economic Policy Research in the panel on AI and monopoly, saying that all this legislation is basically nibbling around the edges because they do not touch the real, fundamental problem of the power being amassed by the handful of companies who own the infrastructure.

“It’s economics 101. You can have as much downstream competition as you like but you will never disperse the power upstream.” The reports and other material generated by government agencies like the UK’s Competition and Markets Authority are, she says, just “admiring the problem”.

A day earlier, the Novi Sad professor Vladen Joler had already pointed out the fundamental problem: at the dawn of the Internet anyone could start with nothing and build something; what we’re calling “AI” requires billions in investment, so comes pre-monopolized. Many people dismiss Europe for not having its own homegrown Big Tech, but that overlooks open technologies: the Raspberry Pi, Linux, and the web itself, which all have European origins.

In 2010, the now-departing MP Robert Halfon (Con-Harlow) said at an event on reining in technology companies that only a company the size of Google – not even a government – could create Street View. Legend has it that open source geeks heard that as a challenge, and so we have OpenStreetMap. Caffarra’s fiery anger raises the question: at what point do the infrastructure providers become so entrenched that they could choke off an open source competitor at birth? Caffarra wants to build a digital public interest infrastructure using the gaps where Big Tech doesn’t yet have that control.

The Dutch Groenlinks MEP Kim van Sparrentak offered an explanation for why the AI Act doesn’t address market concentration: “They still dream of a European champion who will rule the world.” An analogy springs to mind: people who vote for tax cuts for billionaires because one day that might be *them*. Meanwhile, the UK’s Competition and Markets Authority finds nothing to investigate in Microsoft’s partnership with the French AI startup Mistral.

Van Sparrentak thinks one way out is through public procurement; adopt goals of privacy and sustainability, and support European companies. It makes sense; as the AI Now Institute’s Amba Kak, noted, at the moment almost everything anyone does digitally has to go through the systems of at least one Big Tech company.

As Sebastiano Toffaletti, head of the secretariat of the European SME Alliance, put it, “Even if you had all the money in the world, these guys still have more data than you. If you don’t and can’t solve it, you won’t have anyone to challenge these companies.”

Illustrations: Vladen Joler shows Anatomy of an AI System, a map he devised with Kate Crawford of the human labor, data, and planetary resources that are extracted to make “AI”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

The second greatest show on earth

There is this to be said for seeing your second total eclipse of the sun: if the first one went well, you can be more relaxed about what you get to see. In 2017, sitting in Centennial Park in Nashville, we saw everything. So in Dallas in 2024, I could tell myself, “It will be interesting even if we can’t see the sun.”

As it happened, we had cloud with lots of breaks. The cloud obscured such phenomena as Bailey’s Beads and the diamond ring – but the play of light on the broken clouds as the sun popped back out was amazing all by itself. The corona-surrounded sun playing peek-a-boo with us was stunningly beautiful. And all too soon it was over. It seemed shorter than 2017, even though totality was nearly twice as long – 3:52 compared to about two minutes.

One thing definitely missing from Nashville was a phenomenon that’s less often discussed: the 360-degree sunset all around the horizon. Sitting in Dallas surrounded by buildings, the horizon was not visible as it was in that Nashville park.

On Sunday, April 7, it seemed like half the country was moving into position for today in a process that involved placing a bet on the local weather. I had friends scattered in Vermont, Montreal, and several locations in upstate New York. Our intermittent cloud compared favorably with at least one of the New York locations. Daytime darkness and watching and listening to animals’ reactions is still interesting…but it remains frustrating to know that the Big Show is going on without you.

The hundreds of photos on show hide the real thrill of seeing totality: the sense of connection to humanity past, present, and future, and across the animal kingdom. The strangers around you become part of your life, however briefly. The inexorable movements of earth, sun, and moon put us all in our place.

Small data

Shortly before this gets posted, Jon Crowcroft and I will have presented this year’s offering at Gikii, the weird little conference that crosses law, media, technology, and pop culture. This is what we will possibly may have said, as I understand it, with some added explanation for the slightly less technical audience I imagine will read this.

Two years ago, a team of four researchers – Timnit Gebru, Emily Bender, Margaret Mitchell (writing as Shmargaret Shmitchell), and Angelina McMillan-Major – wrote a now-famous paper called On the Dangers of Stochastic Parrots (PDF) calling into question the usefulness of the large language models (LLMs) that have caused so much ruckus this year. The “Stochastic Four” argued instead of small models built on carefully curated data: less prone to error, less exploitive of people’s data, less damaging to the planet. Gebru got fired over this paper; Google also fired Mitchell soon afterwards. Two years later, neural networks pioneer Geoff Hinton quit Google in order to voice similar concerns.

Despite the hype, LLMs have many problems. They are fundamentally an extractive technology and are resource-intensive. Building LLMs requires massive amounts of training data; so far, the companies have been unwilling to acknowledge their sources, perhaps because (as is happening already) they fear copyright suits.

More important from a technical standpoint, is the issue of model collapse; that is, models degrade when they begin to ingest synthetic AI-generated data instead of human input. We’ve seen this before with Google Flu Trends, which degraded rapidly as incoming new search data included many searches on flu-like symptoms that weren’t actually flu, and others that simply reflected the frequency of local news coverage. “Data pollution” as LLM-generated data fills the web, will mean that the web will be an increasingly useless source of training data for future generations of generative AI. Lots more noise, drowning out the signal (in the photo above, the signal would be the parrot).

Instead, if we follow the lead of the Stochastic Four, the more productive approach is small data – small, carefully curated datasets that train models to match specific goals. Far less resource-intensive, far fewer issues with copyright, appropriation, and extraction.

We know what the LLM future looks like in outline: big, centralized services, because no one else will be able to amass enough data. In that future, surveillance capitalism is an essential part of data gathering. SLM futures could look quite different: decentralized, with realigned incentives. At one point, we wanted to suggest that small data could bring the end of surveillance capitalism; that’s probably an overstatement. But small data could certainly create the ecosystem in which the case for mass data collection would be less compelling.

Jon and I imagined four primary alternative futures: federation, personalization, some combination of those two, and paradigm shift.

Precursors to a federated small data future already exist; these include customer service chatbots, predictive text assistants. In this future, we could imagine personalized LLM servers designed to serve specific needs.

An individualized future might look something like I suggested here in March: a model that fits in your pocket that is constantly updated with material of your own choosing. Such a device might be the closest yet to Vannevar Bush’s 1945 idea of the Memex (PDF), updated for the modern era by automating the dozens of secretary-curators he imagined doing the grunt work of labeling and selection. That future again has precursors in techniques for sharing the computation but not the data, a design we see proposed for health care, where the data is too sensitive to share unless there’s a significant public interest (as in pandemics or very rare illnesses), or in other data analysis designs intended to protect privacy.

In 2007, the science fiction writer Charles Stross suggested something like this, though he imagined it as a comprehensive life log, which he described as a “google for real life”. So this alternative future would look something like Stross’s pocket $10 life log with enhanced statistics-based data analytics.

Imagining what a paradigm shift might look like is much harder. That’s the kind of thing science fiction writers do; it’s 16 years since Stross gave that life log talk. However, in his 2018 history of advertising, The Attention Merchants, Columbia professor Tim Wu argued that industrialization was the vector that made advertising and its grab for our attention part of commerce. A hundred and fifty-odd years later, the centralizing effects of industrialization are being challenged starting with energy via renewables and local power generation and social media via the fediverse. Might language models also play their part in bringing a new, more collaborative and cooperative society?

It is, in other words, just possible that the hot new technology of 2023 is simply a dead end bringing little real change. It’s happened before. There have been, as Wu recounts, counter-moves and movements before, but they didn’t have the technological affordances of our era.

In the Q&A that followed, Miranda Mowbray pointed out that companies are trying to implement the individualized model, but that it’s impossible to do unless there are standardized data formats, and even then hard to do at scale.

Illustrations: Spot the parrot seen in a neighbor’s tree.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Wendy M. GrossmanPosted on Categories AI, Events, New tech, old knowledgeTags 3 Comments on Small data

The safe place

For a long time, fear that technical decisions – new domain names ($)(, cooption of open standards or software, laws mandating data localization – would splinter the Internet. “Balkanize” was heard a lot.

A panel at the UK Internet Governance Forum a couple of weeks ago focused on this exact topic, and was mostly self-congratulatory. Which is when it occurred to me that the Internet may not *be* fragmented, but it *feels* fragmented. Almost every day I encounter some site I can’t reach: email goes into someone’s spam folder, the site or its content is off-limits because it’s been geofenced to conform with copyright or data protection laws, or the site mysteriously doesn’t load, with no explanation. The most likely explanation for the latter is censorship built into the Internet feed by the ISP or the establishment whose connection I’m using, but they don’t actually *say* that.

The ongoing attrition at Twitter is exacerbating this feeling, as the users I’ve followed for years continue to migrate elsewhere. At the moment, it takes accounts on several other services to keep track of everyone: definite fragmentation.

Here in the UK, this sense of fragmentation may be about to get a lot worse, as the long-heralded Online Safety bill – written and expanded until it’s become a “Frankenstein bill”, as Mark Scott and Annabelle Dickson report at Politico – hurtles toward passage. This week saw fruitless debates on amendments in the House of Lords, and it will presumably be back in the Commons shortly thereafter, where it could be passed into law by this fall.

A number of companies have warned that the bill, particularly if it passes with its provisions undermining end-to-end encryption intact, will drive them out of the country. I’m not sure British politicians are taking them seriously; so often such threats are idle. But in this case, I think they’re real, not least because post-Brexit Britain carries so much less global and commercial weight, a reality some politicians are in denial about. WhatsApp, Signal, and Apple have all said openly that they will not compromise the privacy of their masses of users elsewhere to suit the UK. Wikipedia has warned that including it in the requirement to age-verify its users will force it to withdraw rather than violate its principles about collecting as little information about users as possible. The irony is that the UK government itself runs on WhatsApp.

Wikipedia, Ian McRae, the director of market intelligence for prospective online safety regulator Ofcom, showed in a presentation at UKIGF, would be just one of the estimated 150,000 sites within the scope of the bill. Ofcom is ramping up to deal with the workload, an effort the agency expects to cost £169 million between now and 2025.

In a legal opinion commissioned by the Open Rights Group, barristers at Matrix Chambers find that clause 9(2) of the bill is unlawful. This, as Thomas Macaulay explains at The Next Web, is the clause that requires platforms to proactively remove illegal or “harmful” user-generated content. In fact: prior restraint. As ORG goes on to say, there is no requirement to tell users why their content has been blocked.

Until now, the impact of most badly-formulated British legislative proposals has been sort of abstract. Data retention, for example: you know that pervasive mass surveillance is a bad thing, but most of us don’t really expect to feel the impact personally. This is different. Some of my non-UK friends will only use Signal to communicate, and I doubt a day goes by that I don’t look something up on Wikipedia. I could use a VPN for that, but if the only way to use Signal is to have a non-UK phone? I can feel those losses already.

And if people think they dislike those ubiquitous cookie banners and consent clickthroughs, wait until they have to age-verify all over the place. Worst case: this bill will be an act of self-harm that one day will be as inexplicable to future generations as Brexit.

The UK is not the only one pursuing this path. Age verification in particular is catching on. The US states of Virginia, Mississippi, Louisiana, Arkansas, Texas, Montana, and Utah have all passed legislation requiring it; Pornhub now blocks users in Mississippi and Virginia. The likelihood is that many more countries will try to copy some or all of its provisions, just as Australia’s law requiring the big social media platforms to negotiate with news publishers is spawning copies in Canada and California.

This is where the real threat of the “splinternet” lies. Think of requiring 150,000 websites to implement age verification and proactively police content. Many of those sites, as the law firm Mischon de Reya writes may not even be based in the UK.

This means that any site located outside the UK – and perhaps even some that are based here – will be asking, “Is it worth it?” For a lot of them, it won’t be. Which means that however much the Internet retains its integrity, the British user experience will be the Internet as a sea of holes.

Illustrations: Drunk parrot in a Putney garden (by Simon Bisson; used by permission).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.