Cognitive dissonance

The annual State of the Net, in Washington, DC, always attracts politically diverse viewpoints. This year was especially divided.

Three elements stood out: the divergence between the only remaining member of the Privacy and Civil Liberties Oversight Board (PCLOB) and a recently-fired colleague; a contentious panel on content moderation; and the yay, American innovation! approach to regulation.

As noted previously, on January 29 the days-old Trump administration fired PCLOB members Travis LeBlanc, Ed Felten, and chair Sharon Bradford Franklin; the remaining seat was already empty.

Not to worry, remaining member Beth Williams, said. “We are open for business. Our work conducting important independent oversight of the intelligence community has not ended just because we’re currently sub-quorum.” Flying solo she can greenlight publication, direct work, and review new procedures and policies; she can’t start new projects. A review is ongoing of the EU-US Privacy Framework under Executive Order 14086 (2022). Williams seemed more interested in restricting government censorship and abuse of financial data in the name of combating domestic terrorism.

Soon afterwards, LeBlanc, whose firing has him considering “legal options”, told Brian Fung that the outcome of next year’s reauthorization of Section 702, which covers foreign surveillance programs, keeps him awake at night. Earlier, Williams noted that she and Richard E. DeZinno, who left in 2023, wrote a “minority report” recommending “major” structural change within the FBI to prevent weaponization of S702.

LeBlanc is also concerned that agencies at the border are coordinating with the FBI to surveil US persons as well as migrants. More broadly, he said, gutting the PCLOB costs it independence, expertise, trustworthiness, and credibility and limits public options for redress. He thinks the EU-US data privacy framework could indeed be at risk.

A friend called the panel on content moderation “surreal” in its divisions. Yael Eisenstat and Joel Thayer tried valiantly to disentangle questions of accountability and transparency from free speech. To little avail: Jacob Mchangama and Ari Cohn kept tangling them back up again.

This largely reflects Congressional debates. As in the UK, there is bipartisan concern about child safety – see also the proposed Kids Online Safety Act – but Republicans also separately push hard on “free speech”, claiming that conservative voices are being disproportionately silenced. Meanwhile, organizations that study online speech patterns and could perhaps establish whether that’s true are being attacked and silenced.

Eisenstat tried to draw boundaries between speech and companies’ actions. She can still find on Facebook the sme Telegram ads containing illegal child sexual abuse material that she found when Telegram CEO Pavel Durov was arrested. Despite violating the terms and conditions, they bring Meta profits. “How is that a free speech debate as opposed to a company responsibility debate?”

Thayer seconded her: “What speech interests do these companies have other than to collect data and keep you on their platforms?”

By contrast, Mchangama complained that overblocking – that is, restricting legal speech – is seen across EU countries. “The better solution is to empower users.” Cohn also disliked the UK and European push to hold platforms responsible for fulfilling their own terms and conditions. “When you get to whether platforms are living up to their content moderation standards, that puts the government and courts in the position of having to second-guess platforms’ editorial decisions.”

But Cohn was talking legal content; Eisenstat was talking illegal activity: “We’re talking about distribution mechanisms.” In the end, she said, “We are a democracy, and part of that is having the right to understand how companies affect our health and lives.” Instead, these debates persist because we lack factual knowledge of what goes on inside. If we can’t figure out accountability for these platforms, “This will be the only industry above the law while becoming the richest companies in the world.”

Twenty-five years after data protection became a fundamental right in Europe, the DC crowd still seem to see it as a regulation in search of a deal. Representative Kat Cammack (R-FL), who described herself as the “designated IT person” on the energy and commerce committee, was particularly excited that policy surrounding emerging technologies could be industry-driven, because “Congress is *old*!” and DC is designed to move slowly. “There will always be concerns about data and privacy, but we can navigate that. We can’t deter innovation and expect to flourish.”

Others also expressed enthusiasm for “the great opportunities in front of our country”, compared the EU’s Digital Markets Act to a toll plaza congesting I-95. Samir Jain, on the AI governance panel, suggested the EU may be “reconsidering its approach”. US senator Marsha Blackburn (R-TN) highlighted China’s threat to US cybersecurity without noting the US’s own goal, CALEA.

On that same AI panel, Olivia Zhu, the Assistant Director for AI Policy for the White House Office of Science and Technology Policy, seemed more realistic: “Companies operate globally, and have to do so under the EU AI Act. The reality is they are racing to comply with [it]. Disengaging from that risks a cacophony of regulations worldwide.”

Shortly before, Johnny Ryan, a Senior Fellow at the Irish Council for Civil Liberties posted: “EU Commission has dumped the AI Liability Directive. Presumably for “innovation”. But China, which has the toughest AI law in the world, is out innovating everyone.”

Illustrations: Kat Cammack (R-FL) at State of the Net 2025.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon or Bluesky.

The AI moment

“Why are we still talking about digital transformation?” The speaker was convening a session at last weekend’s UK Govcamp, an event organized by and for civil servants with an interest in digital stuff.

“Because we’ve failed?” someone suggested. These folks are usually *optimists*.

Govcamp is a long-running tradition that began as a guerrilla effort in 2008. At the time, civil servants wanting to harness new technology in the service of government were so thin on the ground they never met until one of them, Jeremy Gould, convened the first Govcamp. These are people who are willing to give up a Saturday in order to do better at their jobs working for us. All hail.

It’s hard to remember now, nearly 15 years on, the excitement in 2010 when David Cameron’s incoming government created the Government Digital Service and embedded it into the Cabinet Office. William Heath immediately ended the Ideal Government blog he’d begun writing in 2004 to press insistently for better use of digital technologies in government. The government had now hired all the people he could have wanted it to, he said, and therefore, “its job is done”.

Some good things followed: tilting government procurement to open the way for smaller British companies, consolidating government publishing, other things less visible but still important. Some data became open. This all has improved processes like applying for concessionary travel passes and other government documents, and made government publishing vastly more usable. The improvement isn’t universal: my application last year to renew my UK driver’s license was sent back because my signature strayed outside the box provided for it.

That’s just one way the business of government doesn’t feel that different. The whole process of developing legislation – green and white papers, public consultations, debates, and amendments – marches on much as it ever has, though with somewhat wider access because the documents are online. Thoughts about how to make it more participatory were the subject of a teacamp in 2013. Eleven years on, civil society is still reading and responding to government consultations in the time-honored way, and policy is still made by the few for the many.

At Govcamp, the conversation spread between the realities of their working lives and the difficulties systems posed for users – that is, the rest of us. “We haven’t removed those little frictions,” one said, evoking the old speed comparisons between Amazon (delivers tomorrow or even today) and the UK government (delivers in weeks, if not months).

“People know what good looks like,” someone else said, in echoing that frustration. That’s 2010-style optimism, from when Amazon product search yielded useful results, search engines weren’t spattered with AI slime and blanketed with ads, today’s algorithms were not yet born, and customer service still had a heartbeat. Here in 2025, we’re all coming up against rampant enshittification, with the result that the next cohort of incoming young civil servants *won’t* know any more what “good” looks like. There will be a whole new layer of necessary education.

Other comments: it’s evolution, not transformation; resistance to change and the requirement to ask permission are embedded throughout the culture; usability is still a problem; trying to change top-down only works in a large organization if it sets up an internal start-up and allows it to cannibalize the existing business; not enough technologists in most departments; the public sector doesn’t have the private sector option of deciding what to ignore; every new government has a new set of priorities. And: the public sector has no competition to push change.

One suggestion was that technological change happens in bursts – punctuated equilibrium. That sort of fits with the history of changing technological trends: computing, the Internet, the web, smartphones, the cloud. Today, that’s “AI”, which prime minister Keir Starmer announced this week he will mainline into the UK’s veins “for everything from spotting potholes to freeing up teachers to teach”.

The person who suggested “punctuated equilibrium” added: “Now is a new moment of change because of AI. It’s a new ‘GDS moment’.” This is plausible in the sense that new paradigms sometimes do bring profound change. Smartphones changed life for homeless people. On the other hand, many don’t do much. Think audio: that was going to be a game-changer, and yet after years of loss-making audio assistants, most of us are still typing.

So is AI one of those opportunities? Many brought up generative AI’s vast consumption of energy and water and rampant inaccuracy. Starmer, like Rishi Sunak before him, seems to think AI can make Britain the envy of other major governments.

Complex systems – such as digital governance – don’t easily change the flow of information or, therefore, the flow of power. It can take longer than most civil servants’ careers. Organizations like Mydex, which seeks to up-end today’s systems to put users in control, have been at work for years now. The upcoming digital identity framework has Mydex chair Alan Mitchell optimistic that the government’s digital identity framework is a breakthrough. We’ll see.

One attendee captured this: “It doesn’t feel like the question has changed from more efficient bureaucracy to things that change lives.” Said another in response, “The technology is the easy bit.”

Illustrations: Sir Humphrey Appleby (Nigel Hawthorne), Bernard Woolley (Derek Fowldes), and Jim Hacker (Paul Eddington) arguing over cultural change in Yes, Minister.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon Bluesky.

Beware the duck

Once upon a time, “convergence” was a buzzword. That was back in the days when audio was on stereo systems, television was on a TV, and “communications” happened on phones that weren’t computers. The word has disappeared back into its former usage pattern, but it could easily be revived to describe what’s happening to content as humans dive into using generative tools.

Said another way. Roughly this time last year, the annual technology/law/pop culture conference Gikii was awash in (generative) AI. That bubble is deflating, but in the experiments that nonetheless continue a new topic more worthy of attention is emerging: artificial content. It’s striking because what happens at this gathering, which mines all types of popular culture for cues for serious ideas, is often a good guide to what’s coming next in futurelaw.

That no one dared guess which of Zachary Cooper‘s pair of near-identicalaudio clips was AI-generated, and which human-performed was only a starting point. One had more static? Cooper’s main point: “If you can’t tell which clip is real, then you can’t decide which one gets copyright.” Right, because only human creations are eligible (although fake bands can still scam Spotify).

Cooper’s brief, wild tour of the “generative music underground” included using AI tools to create songs whose content is at odds with their genre, whole generated albums built by a human producer making thousands of tiny choices, and the new genre “gencore”, which exploits the qualities of generated sound (Cher and Autotune on steroids). Voice cloning, instrument cloning, audio production plugins, “throw in a bass and some drums”….

Ultimately, Cooper said, “The use of generative AI reveals nothing about the creative relationship to work; it destabilizes the international market by having different authorship thresholds; and there’s no means of auditing any of it.” Instead of uselessly trying to enforce different rights predicated on the use or non-use of a specific set of technologies, he said, we should tackle directly the challenges new modes of production pose to copyright. Precursor: the battles over sampling.

Soon afterwards, Michael Veale was showing us Civitai, an Idaho-based site offering open source generative AI tools, including fine-tuned models. “Civitai exists to democratize AI media creation,” the site explains. “Everything has a valid legal purpose,” Veale said, but the way capabilities can be retrained and chained together to create composites makes it hard to tell which tools, if any, should be taken down, even for creators (see also the puzzlement as Redditors try to work this out). Even environmental regulation can’t help, as one attendee suggested: unlike large language models, these smaller, fine-tuned models (as Jon Crowcroft and I surmised last year would be the future) are efficient; they can run on a phone.

Even without adding artificial content there is always an inherent conflict when digital meets an analog spectrum. This is why, Andy Phippen said, the threshold of 18 for buying alcohol and cigarettes turns into a real threshold of 25 at retail checkouts. Both software and humans fail at determining over-or-under-18, and retailers fear liability. Online age verification as promoted in the Online Safety Act will not work.

If these blurred lines strain the limits of current legal approaches, others expose gaps in the law. Andrea Matwyshyn, for example, has been studying parallels I’ve also noticed between early 20th century company towns and today’s tech behemoths’ anti-union, surveillance-happy working practices. As a result, she believes that regulatory authorities need to start considering closely the impact of data aggregation when companies merge and look for company town-like dynamics”.

Andelka Phillips parodied the overreach of app contracts by imagining the EULA attached to “ThoughtReader app”. A sample clause: “ThoughtReader may turn on its service at any time. By accepting this agreement, you are deemed to accept all monitoring of your thoughts.” Well, OK, then. (I also had a go at this here, 19 years ago.)

Emily Roach toured the history of fan fiction and the law to end up at Archive of Our Own, a “fan-created, fan-run, nonprofit, noncommercial archive for transformative fanworks, like fanfiction, fanart, fan videos, and podfic”, the idea being to ensure that the work fans pour their hearts into has a permanent home where it can’t be arbitrarily deleted by corporate owners. The rules are strict: not so much as a “buy me a coffee” tip link that could lead to a court-acceptable claim of commercial use.

History, the science fiction writer Charles Stross has said, is the science fiction writer’s secret weapon. Also at Gikii: Miranda Mowbray unearthed the 18th century “Digesting Duck” automaton built by Jacques de Vauconson. It was a marvel that appeared to ingest grain and defecate waste and that in its day inspired much speculation about the boundaries between real and mechanical life. Like the amazing ancient Greek automata before it, it was, of course, a purely mechanical fake – it stored the grain in a small compartment and released pellets from a different compartment – but today’s humans confused into thinking that sentences mean sentience could relate.

Illustrations: One onlooker’s rendering of his (incorrect) idea of the interior of Jacques de Vaucanson’s Digesting Duck (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Soap dispensers and Skynet

In the TV series Breaking Bad, the weary ex-cop Mike Ehrmantraut tells meth chemist Walter White : “No more half measures.” The last time he took half measures, the woman he was trying to protect was brutally murdered.

Apparently people like to say there are no dead bodies in privacy (although this is easily countered with ex-CIA director General Michael Hayden’s comment, “We kill people based on metadata”). But, as Woody Hartzog told a Senate committee hearing in September 2023, summarizing work he did with Neil Richards and Ryan Durrie, half measures in AI/privacy legislation are still a bad thing.

A discussion at Privacy Law Scholars last week laid out the problems. Half measures don’t work. They don’t prevent societal harms. They don’t prevent AI from being deployed where it shouldn’t be. And they sap the political will to follow up with anything stronger.

In an article for The Brink, Hartzog said, “To bring AI within the rule of law, lawmakers must go beyond half measures to ensure that AI systems and the actors that deploy them are worthy of our trust,”

He goes on to list examples of half measures: transparency, committing to ethical principles, and mitigating bias. Transparency is good, but doesn’t automatically bring accountability. Ethical principles don’t change business models. And bias mitigation to make a technology nominally fairer may simultaneously make it more dangerous. Think facial recognition: debias the system and improve its accuracy for matching the faces of non-male, non-white people, and then it’s used to target those same people with surveillance.

Or, bias mitigation may have nothing to do with the actual problem, an underlying business model, as Arvind Narayanan, author of the forthcoming book AI Snake Oil, pointed out a few days later at an event convened by the Future of Privacy Forum. In his example, the Washington Post reported in 2019 on the case of an algorithm intended to help hospitals predict which patients will benefit from additional medical care. It turned out to favor white patients. But, Narayanan said, the system’s provider responded to the story by saying that the algorithm’s cost model accurately predicted the costs of additional health care – in other words, the algorithm did exactly what the hospital wanted it to do.

“I think hospitals should be forced to use a different model – but that’s not a technical question, it’s politics.”.

Narayanan also called out auditing (another Hartzog half measure). You can, he said, audit a human resources system to expose patterns in which resumes it flags for interviews and which it drops. But no one ever commissions research modeled on the expensive random controlled testing common in medicine that follows up for five years to see if the system actually picks good employees.

Adding confusion is the fact that “AI” isn’t a single thing. Instead, it’s what someone called a “suitcase term” – that is, a container for many different systems built for many different purposes by many different organizations with many different motives. It is absurd to conflate AGI – the artificial general intelligence of science fiction stories and scientists’ dreams that can surpass and kill us all – with pattern-recognizing software that depends on plundering human-created content and the labeling work of millions of low-paid workers

To digress briefly, some of the AI in that suitcase is getting truly goofy. Yum Brands has announced that its restaurants, which include Taco Bell, Pizza Hut, and KFC, will be “AI-first”. Among Yum’s envisioned uses, the company tells Benj Edwards at Ars Technica, are being able to ask an app what temperature to set the oven. I can’t help suspecting that the real eventual use will be data collection and discriminatory pricing. Stuff like this is why Ed Zitron writes postings like The Rot-Com Bubble, which hypothesizes that the reason Internet services are deteriorating is that technology companies have run out of genuinely innovative things to sell us.

That you cannot solve social problems with technology is a long-held truism, but it seems to be especially true of the messy middle of the AI spectrum, the use cases active now that rarely get the same attention as the far ends of that spectrum.

As Neil Richards put it at PLSC, “The way it’s presented now, it’s either existential risk or a soap dispenser that doesn’t work on brown hands when the real problem is the intermediate level of societal change via AI.”

The PLSC discussion included a list of the ways that regulations fail. Underfunded enforcement. Regulations that are pure theater. The wrong measures. The right goal, but weakly drafted legislation. Make the regulation ambiguous, or base it on principles that are too broad. Choose conflicting half-measures – for example, require transparency but add the principle that people should own their own data.

Like Cristina Caffarra a week earlier at CPDP, Hartzog, Richards, and Durrie favor finding remedies that focus on limiting abuses of power. Full measures include outright bans, the right to bring a private cause of action, imposing duties of “loyalty, care, and confidentiality”, and limiting exploitative data practices within these systems. Curbing abuses of power, as he says, is nothing new. The shiny new technology is a distraction.

Or, as Narayanan put it, “Broken AI is appealing to broken institutions.”

Illustrations: Mike (Jonathan Banks) telling Walt (Bryan Cranston) in Breaking Bad (S03e12) “no more half measures”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

Admiring the problem

In one sense, the EU’s barely dry AI Act and the other complex legislation – the Digital Markets Act, Digital Services Act, GDPR, and so on -= is a triumph. Flawed it may be, but it’s a genuine attempt to protect citizens’ human rights against a technology that is being birthed with numerous trigger warnings. The AI-with-everything program at this year’s Computers, Privacy, and Data Protection, reflected that sense of accomplishment – but also the frustration that comes with knowing that all legislation is flawed, all technology companies try to game the system, and gaps will widen.

CPDP has had these moments before: new legislation always comes with a large dollop of frustration over the opportunities that were missed and the knowledge that newer technologies are already rushing forwards. AI, and the AI Act, more or less swallowed this year’s conference as people considered what it says, how it will play internationally, and the necessary details of implementation and enforcement. Two years at this event, inadequate enforcement of GDPR was a big topic.

The most interesting future gaps that emerged this year: monopoly power, quantum sensing, and spatial computing.

For at least 20 years we’ve been hearing about quantum computing’s potential threat to public key encryption – that day of doom has been ten years away as long as I can remember, just as the Singularity is always 30 years away. In the panel on quantum sensing, Chris Hoofnagle argued that, as he and Simson Garfinkel recently wrote at Lawfare and in their new book, quantum cryptanalysis is overhyped as a threat (although there are many opportunities for quantum computing in chemistry and materials science). However, quantum sensing is here now, works (because qubits are fragile), and is cheap. There is plenty of privacy threat here to go around: quantum sensing will benefit entirely different classes of intelligence, particularly remote, undetectable surveillance.

Hoofnagle and Garfinkel are calling this MASINT, for machine and signature intelligence, and believe that it will become very difficult to hide things, even at a national level. In Hoofnagle’s example, a quantum sensor-equipped drone could fly over the homes of parolees to scan for guns.

Quantum sensing and spatial computing have this in common: they both enable unprecedented passive data collection. VR headsets, for example, collect all sorts of biomechanical data that can be mined more easily for personal information than people expect.

Barring change, all that data will be collected by today’s already-powerful entities.

The deeper level on which all this legislation fails particularly exercised Cristina Caffarra, the co-founder of the Centre for Economic Policy Research in the panel on AI and monopoly, saying that all this legislation is basically nibbling around the edges because they do not touch the real, fundamental problem of the power being amassed by the handful of companies who own the infrastructure.

“It’s economics 101. You can have as much downstream competition as you like but you will never disperse the power upstream.” The reports and other material generated by government agencies like the UK’s Competition and Markets Authority are, she says, just “admiring the problem”.

A day earlier, the Novi Sad professor Vladen Joler had already pointed out the fundamental problem: at the dawn of the Internet anyone could start with nothing and build something; what we’re calling “AI” requires billions in investment, so comes pre-monopolized. Many people dismiss Europe for not having its own homegrown Big Tech, but that overlooks open technologies: the Raspberry Pi, Linux, and the web itself, which all have European origins.

In 2010, the now-departing MP Robert Halfon (Con-Harlow) said at an event on reining in technology companies that only a company the size of Google – not even a government – could create Street View. Legend has it that open source geeks heard that as a challenge, and so we have OpenStreetMap. Caffarra’s fiery anger raises the question: at what point do the infrastructure providers become so entrenched that they could choke off an open source competitor at birth? Caffarra wants to build a digital public interest infrastructure using the gaps where Big Tech doesn’t yet have that control.

The Dutch Groenlinks MEP Kim van Sparrentak offered an explanation for why the AI Act doesn’t address market concentration: “They still dream of a European champion who will rule the world.” An analogy springs to mind: people who vote for tax cuts for billionaires because one day that might be *them*. Meanwhile, the UK’s Competition and Markets Authority finds nothing to investigate in Microsoft’s partnership with the French AI startup Mistral.

Van Sparrentak thinks one way out is through public procurement; adopt goals of privacy and sustainability, and support European companies. It makes sense; as the AI Now Institute’s Amba Kak, noted, at the moment almost everything anyone does digitally has to go through the systems of at least one Big Tech company.

As Sebastiano Toffaletti, head of the secretariat of the European SME Alliance, put it, “Even if you had all the money in the world, these guys still have more data than you. If you don’t and can’t solve it, you won’t have anyone to challenge these companies.”

Illustrations: Vladen Joler shows Anatomy of an AI System, a map he devised with Kate Crawford of the human labor, data, and planetary resources that are extracted to make “AI”.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon.

The second greatest show on earth

There is this to be said for seeing your second total eclipse of the sun: if the first one went well, you can be more relaxed about what you get to see. In 2017, sitting in Centennial Park in Nashville, we saw everything. So in Dallas in 2024, I could tell myself, “It will be interesting even if we can’t see the sun.”

As it happened, we had cloud with lots of breaks. The cloud obscured such phenomena as Bailey’s Beads and the diamond ring – but the play of light on the broken clouds as the sun popped back out was amazing all by itself. The corona-surrounded sun playing peek-a-boo with us was stunningly beautiful. And all too soon it was over. It seemed shorter than 2017, even though totality was nearly twice as long – 3:52 compared to about two minutes.

One thing definitely missing from Nashville was a phenomenon that’s less often discussed: the 360-degree sunset all around the horizon. Sitting in Dallas surrounded by buildings, the horizon was not visible as it was in that Nashville park.

On Sunday, April 7, it seemed like half the country was moving into position for today in a process that involved placing a bet on the local weather. I had friends scattered in Vermont, Montreal, and several locations in upstate New York. Our intermittent cloud compared favorably with at least one of the New York locations. Daytime darkness and watching and listening to animals’ reactions is still interesting…but it remains frustrating to know that the Big Show is going on without you.

The hundreds of photos on show hide the real thrill of seeing totality: the sense of connection to humanity past, present, and future, and across the animal kingdom. The strangers around you become part of your life, however briefly. The inexorable movements of earth, sun, and moon put us all in our place.

Small data

Shortly before this gets posted, Jon Crowcroft and I will have presented this year’s offering at Gikii, the weird little conference that crosses law, media, technology, and pop culture. This is what we will possibly may have said, as I understand it, with some added explanation for the slightly less technical audience I imagine will read this.

Two years ago, a team of four researchers – Timnit Gebru, Emily Bender, Margaret Mitchell (writing as Shmargaret Shmitchell), and Angelina McMillan-Major – wrote a now-famous paper called On the Dangers of Stochastic Parrots (PDF) calling into question the usefulness of the large language models (LLMs) that have caused so much ruckus this year. The “Stochastic Four” argued instead of small models built on carefully curated data: less prone to error, less exploitive of people’s data, less damaging to the planet. Gebru got fired over this paper; Google also fired Mitchell soon afterwards. Two years later, neural networks pioneer Geoff Hinton quit Google in order to voice similar concerns.

Despite the hype, LLMs have many problems. They are fundamentally an extractive technology and are resource-intensive. Building LLMs requires massive amounts of training data; so far, the companies have been unwilling to acknowledge their sources, perhaps because (as is happening already) they fear copyright suits.

More important from a technical standpoint, is the issue of model collapse; that is, models degrade when they begin to ingest synthetic AI-generated data instead of human input. We’ve seen this before with Google Flu Trends, which degraded rapidly as incoming new search data included many searches on flu-like symptoms that weren’t actually flu, and others that simply reflected the frequency of local news coverage. “Data pollution” as LLM-generated data fills the web, will mean that the web will be an increasingly useless source of training data for future generations of generative AI. Lots more noise, drowning out the signal (in the photo above, the signal would be the parrot).

Instead, if we follow the lead of the Stochastic Four, the more productive approach is small data – small, carefully curated datasets that train models to match specific goals. Far less resource-intensive, far fewer issues with copyright, appropriation, and extraction.

We know what the LLM future looks like in outline: big, centralized services, because no one else will be able to amass enough data. In that future, surveillance capitalism is an essential part of data gathering. SLM futures could look quite different: decentralized, with realigned incentives. At one point, we wanted to suggest that small data could bring the end of surveillance capitalism; that’s probably an overstatement. But small data could certainly create the ecosystem in which the case for mass data collection would be less compelling.

Jon and I imagined four primary alternative futures: federation, personalization, some combination of those two, and paradigm shift.

Precursors to a federated small data future already exist; these include customer service chatbots, predictive text assistants. In this future, we could imagine personalized LLM servers designed to serve specific needs.

An individualized future might look something like I suggested here in March: a model that fits in your pocket that is constantly updated with material of your own choosing. Such a device might be the closest yet to Vannevar Bush’s 1945 idea of the Memex (PDF), updated for the modern era by automating the dozens of secretary-curators he imagined doing the grunt work of labeling and selection. That future again has precursors in techniques for sharing the computation but not the data, a design we see proposed for health care, where the data is too sensitive to share unless there’s a significant public interest (as in pandemics or very rare illnesses), or in other data analysis designs intended to protect privacy.

In 2007, the science fiction writer Charles Stross suggested something like this, though he imagined it as a comprehensive life log, which he described as a “google for real life”. So this alternative future would look something like Stross’s pocket $10 life log with enhanced statistics-based data analytics.

Imagining what a paradigm shift might look like is much harder. That’s the kind of thing science fiction writers do; it’s 16 years since Stross gave that life log talk. However, in his 2018 history of advertising, The Attention Merchants, Columbia professor Tim Wu argued that industrialization was the vector that made advertising and its grab for our attention part of commerce. A hundred and fifty-odd years later, the centralizing effects of industrialization are being challenged starting with energy via renewables and local power generation and social media via the fediverse. Might language models also play their part in bringing a new, more collaborative and cooperative society?

It is, in other words, just possible that the hot new technology of 2023 is simply a dead end bringing little real change. It’s happened before. There have been, as Wu recounts, counter-moves and movements before, but they didn’t have the technological affordances of our era.

In the Q&A that followed, Miranda Mowbray pointed out that companies are trying to implement the individualized model, but that it’s impossible to do unless there are standardized data formats, and even then hard to do at scale.

Illustrations: Spot the parrot seen in a neighbor’s tree.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Wendy M. GrossmanPosted on Categories AI, Events, New tech, old knowledgeTags 4 Comments on Small data

The safe place

For a long time, fear that technical decisions – new domain names ($), cooption of open standards or software, laws mandating data localization – would splinter the Internet. “Balkanize” was heard a lot.

A panel at the UK Internet Governance Forum a couple of weeks ago focused on this exact topic, and was mostly self-congratulatory. Which is when it occurred to me that the Internet may not *be* fragmented, but it *feels* fragmented. Almost every day I encounter some site I can’t reach: email goes into someone’s spam folder, the site or its content is off-limits because it’s been geofenced to conform with copyright or data protection laws, or the site mysteriously doesn’t load, with no explanation. The most likely explanation for the latter is censorship built into the Internet feed by the ISP or the establishment whose connection I’m using, but they don’t actually *say* that.

The ongoing attrition at Twitter is exacerbating this feeling, as the users I’ve followed for years continue to migrate elsewhere. At the moment, it takes accounts on several other services to keep track of everyone: definite fragmentation.

Here in the UK, this sense of fragmentation may be about to get a lot worse, as the long-heralded Online Safety bill – written and expanded until it’s become a “Frankenstein bill”, as Mark Scott and Annabelle Dickson report at Politico – hurtles toward passage. This week saw fruitless debates on amendments in the House of Lords, and it will presumably be back in the Commons shortly thereafter, where it could be passed into law by this fall.

A number of companies have warned that the bill, particularly if it passes with its provisions undermining end-to-end encryption intact, will drive them out of the country. I’m not sure British politicians are taking them seriously; so often such threats are idle. But in this case, I think they’re real, not least because post-Brexit Britain carries so much less global and commercial weight, a reality some politicians are in denial about. WhatsApp, Signal, and Apple have all said openly that they will not compromise the privacy of their masses of users elsewhere to suit the UK. Wikipedia has warned that including it in the requirement to age-verify its users will force it to withdraw rather than violate its principles about collecting as little information about users as possible. The irony is that the UK government itself runs on WhatsApp.

Wikipedia, Ian McRae, the director of market intelligence for prospective online safety regulator Ofcom, showed in a presentation at UKIGF, would be just one of the estimated 150,000 sites within the scope of the bill. Ofcom is ramping up to deal with the workload, an effort the agency expects to cost £169 million between now and 2025.

In a legal opinion commissioned by the Open Rights Group, barristers at Matrix Chambers find that clause 9(2) of the bill is unlawful. This, as Thomas Macaulay explains at The Next Web, is the clause that requires platforms to proactively remove illegal or “harmful” user-generated content. In fact: prior restraint. As ORG goes on to say, there is no requirement to tell users why their content has been blocked.

Until now, the impact of most badly-formulated British legislative proposals has been sort of abstract. Data retention, for example: you know that pervasive mass surveillance is a bad thing, but most of us don’t really expect to feel the impact personally. This is different. Some of my non-UK friends will only use Signal to communicate, and I doubt a day goes by that I don’t look something up on Wikipedia. I could use a VPN for that, but if the only way to use Signal is to have a non-UK phone? I can feel those losses already.

And if people think they dislike those ubiquitous cookie banners and consent clickthroughs, wait until they have to age-verify all over the place. Worst case: this bill will be an act of self-harm that one day will be as inexplicable to future generations as Brexit.

The UK is not the only one pursuing this path. Age verification in particular is catching on. The US states of Virginia, Mississippi, Louisiana, Arkansas, Texas, Montana, and Utah have all passed legislation requiring it; Pornhub now blocks users in Mississippi and Virginia. The likelihood is that many more countries will try to copy some or all of its provisions, just as Australia’s law requiring the big social media platforms to negotiate with news publishers is spawning copies in Canada and California.

This is where the real threat of the “splinternet” lies. Think of requiring 150,000 websites to implement age verification and proactively police content. Many of those sites, as the law firm Mischon de Reya writes may not even be based in the UK.

This means that any site located outside the UK – and perhaps even some that are based here – will be asking, “Is it worth it?” For a lot of them, it won’t be. Which means that however much the Internet retains its integrity, the British user experience will be the Internet as a sea of holes.

Illustrations: Drunk parrot in a Putney garden (by Simon Bisson; used by permission).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.