Small data

Shortly before this gets posted, Jon Crowcroft and I will have presented this year’s offering at Gikii, the weird little conference that crosses law, media, technology, and pop culture. This is what we will possibly may have said, as I understand it, with some added explanation for the slightly less technical audience I imagine will read this.

Two years ago, a team of four researchers – Timnit Gebru, Emily Bender, Margaret Mitchell (writing as Shmargaret Shmitchell), and Angelina McMillan-Major – wrote a now-famous paper called On the Dangers of Stochastic Parrots (PDF) calling into question the usefulness of the large language models (LLMs) that have caused so much ruckus this year. The “Stochastic Four” argued instead of small models built on carefully curated data: less prone to error, less exploitive of people’s data, less damaging to the planet. Gebru got fired over this paper; Google also fired Mitchell soon afterwards. Two years later, neural networks pioneer Geoff Hinton quit Google in order to voice similar concerns.

Despite the hype, LLMs have many problems. They are fundamentally an extractive technology and are resource-intensive. Building LLMs requires massive amounts of training data; so far, the companies have been unwilling to acknowledge their sources, perhaps because (as is happening already) they fear copyright suits.

More important from a technical standpoint, is the issue of model collapse; that is, models degrade when they begin to ingest synthetic AI-generated data instead of human input. We’ve seen this before with Google Flu Trends, which degraded rapidly as incoming new search data included many searches on flu-like symptoms that weren’t actually flu, and others that simply reflected the frequency of local news coverage. “Data pollution” as LLM-generated data fills the web, will mean that the web will be an increasingly useless source of training data for future generations of generative AI. Lots more noise, drowning out the signal (in the photo above, the signal would be the parrot).

Instead, if we follow the lead of the Stochastic Four, the more productive approach is small data – small, carefully curated datasets that train models to match specific goals. Far less resource-intensive, far fewer issues with copyright, appropriation, and extraction.

We know what the LLM future looks like in outline: big, centralized services, because no one else will be able to amass enough data. In that future, surveillance capitalism is an essential part of data gathering. SLM futures could look quite different: decentralized, with realigned incentives. At one point, we wanted to suggest that small data could bring the end of surveillance capitalism; that’s probably an overstatement. But small data could certainly create the ecosystem in which the case for mass data collection would be less compelling.

Jon and I imagined four primary alternative futures: federation, personalization, some combination of those two, and paradigm shift.

Precursors to a federated small data future already exist; these include customer service chatbots, predictive text assistants. In this future, we could imagine personalized LLM servers designed to serve specific needs.

An individualized future might look something like I suggested here in March: a model that fits in your pocket that is constantly updated with material of your own choosing. Such a device might be the closest yet to Vannevar Bush’s 1945 idea of the Memex (PDF), updated for the modern era by automating the dozens of secretary-curators he imagined doing the grunt work of labeling and selection. That future again has precursors in techniques for sharing the computation but not the data, a design we see proposed for health care, where the data is too sensitive to share unless there’s a significant public interest (as in pandemics or very rare illnesses), or in other data analysis designs intended to protect privacy.

In 2007, the science fiction writer Charles Stross suggested something like this, though he imagined it as a comprehensive life log, which he described as a “google for real life”. So this alternative future would look something like Stross’s pocket $10 life log with enhanced statistics-based data analytics.

Imagining what a paradigm shift might look like is much harder. That’s the kind of thing science fiction writers do; it’s 16 years since Stross gave that life log talk. However, in his 2018 history of advertising, The Attention Merchants, Columbia professor Tim Wu argued that industrialization was the vector that made advertising and its grab for our attention part of commerce. A hundred and fifty-odd years later, the centralizing effects of industrialization are being challenged starting with energy via renewables and local power generation and social media via the fediverse. Might language models also play their part in bringing a new, more collaborative and cooperative society?

It is, in other words, just possible that the hot new technology of 2023 is simply a dead end bringing little real change. It’s happened before. There have been, as Wu recounts, counter-moves and movements before, but they didn’t have the technological affordances of our era.

In the Q&A that followed, Miranda Mowbray pointed out that companies are trying to implement the individualized model, but that it’s impossible to do unless there are standardized data formats, and even then hard to do at scale.

Illustrations: Spot the parrot seen in a neighbor’s tree.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Wendy M. GrossmanPosted on Categories AI, Events, New tech, old knowledgeTags 3 Comments on Small data

Power cuts

In the latest example of corporate destruction, the Guardian reports on the disturbing trend in which streaming services like Disney and Warner Bros Discovery are deleting finished, even popular, shows for financial reasons. It’s like Douglas Adams’ rock star Hotblack Desiato spending a year dead for tax reasons.

Given that consumers’ budgets are stretched so thin that many are reevaluating the streaming services they’re paying for, you would think this would be the worst possible time to delete popular entertainments. Instead, the industry seems to be possessed by a death wish in which it’s making its offerings *less* attractive. Even worse, the promise they appeared to offer to showrunners was creative freedom and broad and permanent access to their work. The news that Disney+ is even canceling finished shows (Nautilus) shortly before their scheduled release in order to pay less *tax* should send a chill through every creator’s spine. No one wants to spend years of their life – for almost *any* amount of money – making things that wind up in the corporate equivalent of the warehouse at the end of Raiders of the Lost Ark.

It’s time, as the Masked Scheduler suggested recently on Mastodon, for the emergence of modern equivalents of creator-founded studios United Artists and Desilu.

***

Many of us were skeptical about Meta’s Oversight Board; it was easy to predict that Facebook would use it to avoid dealing with the PR fallout from controversial cases, but never relinquish control. And so it is proving.

This week, Meta overruled the Board‘s recommendation of a six-month suspension of the Facebook account belonging to former Cambodian prime minister Hun Sen. At issue was a video of one of Sen’s speeches, which everyone agreed incited violence against his opposition. Meta has kept the video up on the grounds of “newsworthiness”; Meta also declined to follow the Board’s recommendation to clarify its rules for public figures in “contexts in which citizens are under continuing threat of retaliatory violence from their governments”.

In the Platformer newsletter Casey Newton argues that the Board’s deliberative process is too slow to matter – it took eight months to decide this case, too late to save the election at stake or deter the political violence that has followed. Newton also concludes from the list of decisions that the Board is only “nibbling round the edges” of Meta’s policies.

A company with shareholders, a business model, and a king is never going to let an independent group make decisions that will profoundly shape its future. From Kate Klonick’s examination, we know the Board members are serious people prepared to think deeply about content moderation and its discontents. But they were always in a losing position. Now, even they must know that.

***

It should go without saying that anything that requires an Internet connection should be designed for connection failures, especially when the connected devices are required to operate the physical world. The downside was made clear by the 2017 incident, when lost signal meant a Tesla-owning venture capitalist couldn’t restart his car. Or the one in 2021, when a bunch of Tesla owners found their phone app couldn’t unlock their car doors. Tesla’s solution both times was to tell car owners to make sure they always had their physical car keys. Which, fine, but then why have an app at all?

Last week, Bambu 3D printers began printing unexpectedly when they got disconnected from the cloud. The software managing the queue of printer jobs lost the ability to monitor them, causing some to be restarted multiple times. Given the heat and extruded material 3D printers generate, this is dangerous for both themselves and their surroundings.

At TechRadar, Bambu’s PR acknowledges this: “It is difficult to have a cloud service 100% reliable all the time, but we should at least have designed the system more carefully to avoid such embarrassing consequences.” As TechRadar notes, if only embarrassment were the worst risk.

So, new rule: before installation test every new “smart” device by blocking its Internet connection to see how it behaves. Of course, companies should do this themselves, but as we/’ve seen, you can’t rely on that either.

***

Finally, in “be careful what you legislate for”, Canada is discovering the downside of C-18, which became law in June. and requires the biggest platforms to pay for the Canadian news content they host. Google and Meta warned all along that they would stop hosting Canadian news rather than pay for it. Experts like law professor Michael Geist predicted that the bill would merely serve to dramatically cut traffic to news sites.

On August 1, Meta began adding blocks for news links on Facebook and Instagram. A coalition of Canadian news outlets quickly asked the Competition Bureau to mount an inquiry into Meta’s actions. At TechDirt Mike Masnick notes the irony: first legacy media said Meta’s linking to news was anticompetitive; now they say not linking is anticompetitive.

However, there are worse consequences. Prime minister Justin Trudeau complains that Meta’s news block is endangering Canadians, who can’t access or share local up-to-date information about the ongoing wildfires.

In a sensible world, people wouldn’t rely on Facebook for their news, politicians would write legislation with greater understanding, and companies like Meta would wield their power responsibly. In *this* world, a we have a perfect storm.

Illustrations:XKCD’s Dependency.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Wendy M. GrossmanPosted on Categories Infrastructure, Intellectual Property, Law, Media, Net lifeTags , , Leave a comment on Power cuts

Guarding the peace

Police are increasingly attempting to prevent crime by using social media targeting tools to shape public behavior, says a new report from the Scottish Institute for Policing Research (PDF) written by a team of academic researchers led by Ben Collier at the University of Edinburgh. There is no formal regulation of these efforts, and the report found many examples of what is genteelly calls “unethical practice”.

On the one hand, “behavioral change marketing” seems an undeniably clever use of new technological tools. If bad actors can use targeted ads to scam, foment division, and incite violence, why shouldn’t police use them to encourage the opposite? The tools don’t care whether you’re a Russian hacker targeting 70-plus white pensioners with anti-immigrant rhetoric or a charity trying to reach vulnerable people to offer help. Using them is a logical extension of the drive toward preventing, rather than solving, crime. Governments have long used PR techniques to influence the public, from benign health PSAs on broadcast media to Theresa May’s notorious , widely cricised, and unsuccessful 2013 campaign of van ads telling illegal immigrants to go home.

On the other hand, it sounds creepy as hell. Combining police power with poorly-evidenced assumptions about crime and behavior and risk and the manipulation and data gathering of surveillance capitalism…yikes.

The idea of influence policing derives at least in part from Cass R. Sunstein‘s and Richard H. Thaler‘s 2008 book Nudge. The “nudge theory” it promoted argued that the use of careful design (“choice architecture”) could push people into making more desirable decisions.

The basic contention seems unarguable; using design to push people toward decisions they might not make by themselves is the basis of many large platforms’ design decisions. Dark patterns are all about that.

Sunstein and Thaler published their theory at the post-financial crisis moment when governments were looking to reduce costs. As early as 2010, the UK’s Cabinet Office set up the Behavioural Insights Team to improve public compliance with government policies. The “Nudge Unit” has been copied widely across the world.

By 2013, it was being criticized for forcing job seekers to fill out a scientifically invalid psychometric test. In 2021, Observer columnist Sonia Sodha called its record “mixed”, deploring the expansion of nudge theory into complex, intractable social problems. In 2022, new research cast doubt on the whole idea that nudges have little effect on personal behavior.

The SIRP report cites the Government Communications Service, the outgrowth of decades of government work to gain public compliance with policy. The GCS itself notes its incorporation of marketing science and other approaches common in the commercial sector. Its 7,000 staff work in departments across government.

This has all grown up alongside the increasing adoption of digital marketing practices across the UK’s public sector, including the tax authorities (HMRC), the Department of Work and Pensions, and especially, the Home Office – and alongside the rise of sophisticated targeting tools for online advertising.

The report notes: “Police are able to develop ‘patchwork profiles’ built up of multiple categories provided by ad platforms and detailed location-based categories using the platform targeting categories to reach extremely specific groups.”

The report’s authors used the Meta Ad Library to study the ads, the audiences and profiles police targeted, and the cost. London’s Metropolitan Police, which a recent scathing report found endemically racist and misogynist, was an early adopter and is the heaviest studied user of digitally targeted ads on Meta.

Many of the cample campaigns these organizations run sound mostly harmless. Campaigns intended to curb domestic violence, for example, may aim at encouraging bystanders to come forward with concerns. Others focus on counter-radicalisation and security themes or, increasingly, preventing online harms and violence against women and girls.

As a particular example of the potential for abuse, the report calls out the Home Office Migrants on the Move campaign, a collaboration with a “migration behavior change” agency called Seefar. This targeted people in France seeking asylum in the UK and attempted to frighten them out of trying to cross the Channel in small boats. The targeting was highly specific, with many ads aimed at as few as 100 to 1,000 people, chosen for their language and recent travel in or through Brussels and Calais.

The report’s authors raise concerns: the harm implicit in frightening already-extremely vulnerable people, the potential for damaging their trust in authorities to help them, and the privacy implications of targeting such specific groups. In the report’s example, Arabic speakers in Brussels might see the Home Office ads but their French neighbors would not – and those Arabic speakers would be unlikely to be seeking asylum. The Home Office’s digital equivalent of May’s van ads, therefore, would be seen only by a selection of microtargeted individuals.

The report concludes: “We argue that this campaign is a central example of the potential for abuse of these methods, and the need for regulation.”

The report makes a number of recommendations including improved transparency, formalized regulation and oversight, better monitoring, and public engagement in designing campaigns. One key issue is coming up with better ways of evaluating the results. Surprise, surprise: counting clicks, which is what digital advertising largely sells as a metric, is not a useful way to measure social change.

All of these arguments make sense. Improving transparency in particular seems crucial, as does working with the relevant communities. Deterring crime doesn’t require tricks and secrecy; it requires collaboration and openness.

Illustrations: Theresa May’s notorious van ad telling illegal immigrants to go home.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Mastodon

Review: Should You Believe Wikipedia?

Should You Believe Wikipedia? Online Communities and the Construction of Knowledge
By Amy S. Bruckman
Publisher: Cambridge
Print publication year: 2022
ISBN: 978-1-108780-704

Every Internet era has had its new-thing obsession. For a time in the mid-1990s, it was “community”. Every business, some industry thinkers insisted, would need to build a community of customers, suppliers, and partners. Many tried, and the next decade saw the proliferation of blogs, web boards, and, soon, multi-player online games. We learned that every such venture of any size attracts abuse that requires human moderators to solve. We learned that community does not scale. Then came Facebook and other modern social media, fueled by mobile phones, and the business model became data collection to support advertising.

Back at the beginning, Amy S. Bruckman, now a professor at Georgia Tech but then a student at MIT, set up the education-oriented MOO Crossing, in which children could collaborate on building objects as a way of learning to code. For 20 years, she has taught a course on designing communities. In Should You Believe Wikipedia?, Bruckman distills the lessons she’s learned over all that time, combining years of practical online experience with readable theoretical analysis based on sociology, psychology, and epistemology. Whether or not to trust Wikipedia is just one chapter in her study of online communities and the issues they pose.

Like pubs, cafes, and town squares, online communities are third spaces – that is, neutral ground where people can meet on equal terms. Clearly not neutral: many popular blogs, which tend to be personal or promotional, or the X formerly known as Twitter. Third places also need to be enclosed but inviting, visible from surrounding areas, and offering affordances for activity. In that sense, two of the most successful online communities are Wikipedia and OpenStreetMap, both of which pursue a common enterprise that contributors can feel is of global value. Facebook is home to probably hundreds of thousands of communities – families, activists, support groups, and so on – but itself is too big, too diffuse, and too lacking in shared purpose to be a community. Bruckman also cites as examples of productive communities open source software projects and citizen science.

Bruckman’s book has arrived at a moment that we may someday see as a watershed. Numerous factors – Elon Musk’s takeover and remaking of Twitter, debates about regulation and antitrust, increased privacy awareness – are making many people reevaluate what they want from online social spaces. It is a moment when new experiments might thrive.

Something like that is needed, Bruckman concludes: people are not being well served by the free market’s profit motives and current business models. She would like to see more of the Internet populated by non-profits, but elides the key hard question: what are the sustainable models for supporting such endeavors? Mozilla, one of the open source software-building communities she praises, is sustained by payments from Google, making it still vulnerable to the dictates of shareholders, albeit at one remove. It remains an open question if the Fediverse, currently chiefly represented by Mastodon, can grow and prosper in the long term under its present structure of volunteer administrators running their own servers and relying on users’ donations to pay expenses. Other established commercial community hosts, such as Reddit, where Bruckman is a moderator, have long failed to find financial sustainability.

Bruckman never quite answers the question in the title. It reflects the skepticism at Wikipedia’s founding that an encyclopedia edited by anyone who wanted to participate could be any good. As she explains, however, the fact that every page has its Talk page that details disputes and exposes prior versions provides transparency the search engines don’t offer. It may not be clear if we *should* believe Wikipedia, whose quality varies depending on the subject, but she does make clear why we *can* when we do.

Five seconds

Careful observers posted to Hacker News this week – and the Washington Post reported – that the X formerly known as Twitter (XFKAT?) appeared to be deliberately introducing a delay in loading links to sites the owner is known to dislike or views as competitors. These would be things like the New York Times and selected other news organizations, and rival social media and publishing services like Facebook, Instagram, Bluesky, and Substack.

The 4.8 seconds users clocked doesn’t sound like much until you remember, as the Post does, that a 2016 Google study found that 53% of mobile users will abandon a website that takes longer than three seconds to load. Not sure whether desktop users are more or less patient, but it’s generally agreed that delay is the enemy.

The mechanism by which XFKAT was able to do this is its built-in link shortener, t.co, through which it routes all the links users post. You can see this for yourself if you right-click on a posted link and copy the results. You can only find the original link by letting the t.co links resolve and copying the real link out of the browser address bar after the page has loaded.

Whether or not the company was deliberately delaying these connections, the fact is that it *can* – as can Meta’s platforms and many others. This in itself is a problem; essentially it’s a failure of network neutrality. This is the principle that a telecoms company should treat all traffic equally, and it is the basis of the egalitarian nature of the Internet. Regulatory insistence on network neutrality is why you can run a voice over Internet Protocol connection over broadband supplied by a telco or telco-owned ISP even though the services are competitors. Social media platforms are not subject to these rules, but the delaying links story suggests maybe they should be once they reach a certain size.

Link shorteners have faded into the landscape these days, but they were controversial for years after the first such service – TinyURL – was launched in 2002 (per Wikipedia). Critics cited several main issues: privacy, persistence, and obscurity. The latter refers to users’ inability to know where their clicks are taking them; I feel strongly about this myself. The privacy issue is that the link shorteners-in-the-middle are in a position to collect traffic data and exploit it (bad actors could also divert links from their intended destination). The ability to collect that data and chart “impact” is, of course, one reason shorteners were widely adopted by media sites of all types. The persistence issue is that intermediating links in this way creates one or more central points of failure. When the link shortener’s server goes down for any reason – failed Internet connection, technical fault, bankrupt owner company – the URL the shortener encodes becomes unreachable, even if the page itself is available as normal. You can’t go directly to the page, or even located a cached copy at the Internet Archive, without the original URL.

Nonetheless, shortened links are still widely used, for the same reasons why they were invented. Many URLs are very long and complicated. In print publications, they are visually overwhelming, and unwieldy to copy into a web address bar; they are near-impossible to proofread in footnotes and citations. They’re even worse to read out on broadcast media. Shortened links solve all that. No longer germane is the 140-character limit Twitter had in its early years; because the URL counted toward that maximum, short was crucial. Since then, the character count has gotten bigger, and URLs aren’t included in the count any more.

If you do online research of any kind you have probably long since internalized the routine of loading the linked content and saving the actual URL rather than the shortened version. This turns out to be one of the benefits of moving to Mastodon: the link you get is the link you see.

So to network neutrality. Logically, its equivalent for social media services ought to include the principle that users can post whatever content or links they choose (law and regulation permitting), whether that’s reposted TikTok videos, a list of my IDs on other systems, or a link to a blog advocating that all social media companies be forced to become public utilities. Most have in fact operated that way until now, infected just enough with the early Internet ethos of openness. Changing that unwritten social contract is very bad news even though no one believed XFKAT’s CEO when he insisted he was a champion of free speech and called the now-his site the “town square”.

If that’s what we want social media platforms to be, someone’s going to have to force them, especially if they begin shrinking and their owners start to feel the chill wind of an existential threat. You could even – though no one is, to the best of my knowledge – make the argument that swapping in a site-created shortened URL is a violation of the spirit of data protection legislation. After all, no one posts links on a social media site with the view that their tastes in content should be collected, analyzed, and used to target ads. Librarians have long been stalwarts in resisting pressure to disclose what their patrons read and access. In the move online in general, and to corporate social media in particular, we have utterly lost sight of the principle of the right to our own thoughts.

Illustrations: The New York City public library in 2006..

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series she is a contributing editor for the Plutopia News Network podcast. Follow on Wendy M. GrossmanPosted on Categories Media, Net life, UncategorizedTags , Leave a comment on Five seconds

The data grab

It’s been a good week for those who like mocking flawed technology.

Numerous outlets have reported, for example, that “AI is getting dumber at math”. The source is a study conducted by researchers at Stanford and the University of California Berkeley comparing GPT-3.5’s and GPT-4’s output in March and June 2023. The researchers found that, among other things, GPT-4’s success rate at identifying prime numbers dropped from 84% to 51%. In other words, in June 2023 ChatGPT-4 did little better than chance at identifying prime numbers. That’s psychic level.

The researchers blame “drift”, the problem that improving one part of a model may have unhelpful knock-on effects in other parts of the model. At Ars Technica, Benj Edwards is less sure, citing qualified critics who question the study’s methodology. It’s equally possible, he suggests, that as the novelty fades, people’s attempts to do real work surface problems that were there all along. With no access to the algorithm itself and limited knowledge of the training data, we can only conduct such studies by controlling inputs and observing the outputs, much like diagnosing allergies by giving a child a series of foods in turn and waiting to see which ones make them sick. Edwards advocates greater openness on the part of the companies, especially as software developers begin building products on top of their generative engines.

Unrelated, the New Zealand discount supermarket chain Pak’nSave offered an “AI” meal planner that, set loose, promptly began turning out recipes for “poison bread sandwiches”, “Oreo vegetable stir-fry”, and “aromatic water mix” – which turned out to be a recipe for highly dangerous chlorine gas.

The reason is human-computer interaction: humans, told to provide a list of available ingredients, predictably became creative. As for the computer…anyone who’s read Janelle Shane’s 2019 book, You Look LIke a Thing and I Love You, or her Twitter reports on AI-generated recipes could predict this outcome. Computers have no real world experience against which to judge their output!

Meanwhile, the San Francisco Chronicle reports, Waymo and Cruise driverless taxis are making trouble at an accelerating rate. The cars have gotten stuck in low-hanging wires after thunderstorms, driven through caution tape, blocked emergency vehicles and emergency responders, and behaved erratically enough to endanger cyclists, pedestrians, and other vehicles. If they were driven by humans they’d have lost their licenses by now.

In an interesting side note that reminds of the cars’ potential as a surveillance network, Axios reports that in a ten-day study in May Waymo’s driverless cars found that human drivers in San Francisco speed 33% of the time. A similar exercise in Phoenix, Arizona observed human drivers speeding 47% of the time on roads with a 35mph speed limit. These statistics of course bolster the company’s main argument for adoption: improving road safety.

The study should – but probably won’t – be taken as a warning of the potential for the cars’ data collection to become embedded in both law enforcement and their owners’ business models. The frenzy surrounding ChatGPT-* is fueling an industry-wide data grab as everyone tries to beef up their products with “AI” (see also previous such exercises with “meta”, “nano”, and “e”), consequences to be determined.

Among the newly-discovered data grabbers is Intel, whose graphics processing unit (GPU) drivers are collecting telemetry data, including how you use your computer, the kinds of websites you visit, and other data points. You can opt out, assuming you a) realize what’s happening and b) are paying attention at the right moment during installation.

Google announced recently that it would scrape everything people post online to use as training data. Again, an opt-out can be had if you have the knowledge and access to follow the 30-year-old robots.txt protocol. In practical terms, I can configure my own site, pelicancrossing.net, to block Google’s data grabber, but I can’t stop it from scraping comments I leave on other people’s blogs or anything I post on social media sites or that’s professionally published (though those sites may block Google themselves). This data repurposing feels like it ought to be illegal under data protection and copyright law.

In Australia, Gizmodo reports that the company has asked the Australian government to relax copyright laws to facilitate AI training.

Soon after Google’s announcement the law firm Clarkson filed a class action lawsuit against Google to join its action against OpenAI. The suit accuses Google of “stealing” copyrighted works and personal data,

“Google does not own the Internet,” Clarkson wrote in its press release. Will you tell it, or shall I?

Whatever has been going on until now with data slurping in the interests of bombarding us with microtargeted ads is small stuff compared to the accelerating acquisition for the purpose of feeding AI models. Arguably, AI could be a public good in the long term as it improves, and therefore allowing these companies to access all available data for training is in the public interest. But if that’s true, then the *public* should own the models, not the companies. Why should we consent to the use of our data so they can sell it back to us and keep the proceeds for their shareholders?

It’s all yet another example of why we should pay attention to the harms that are clear and present, not the theoretical harm that someday AI will be general enough to pose an existential threat.

Illustrations: IBM Watson, Jeopardy champion.

Wendy M. Grossman is the 2013 winner of the Enigma Award and contributing editor for the Plutopia News Network podcast. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Watching YouTube

One of the reasons it’s so difficult to figure out what to do about misinformation, malinformation, and disinformation online is the difficulty of pinpointing how online interaction translates to action in the real world. The worst content on social media has often come from traditional media or been posted by an elected politician.

At least, that’s how it seems to text-based people like me. This characteristic, along with the quick-hit compression of 140 (later 280) characters, was the (minority) appeal of Twitter. It’s also why legacy media pays so little attention to what’s going on in game worlds, struggle with TikTok, and underestimate the enormous influence of YouTube. The notable exception is the prolific Chris Stokel-Walker, who’s written books about both YouTube and TikTok.

Stokel-Walker has said he decided to write YouTubers because the media generally only notices YouTube when there’s a scandal. Touring those scandals occupies much of filmmaker Alex Winter‘s now-showing biography of the service, The YouTube Effect.

The film begins by interviewing co-founder Steve Chen, who giggles a little uncomfortably to admit that he and co-founders Chad Hurley and Jawed Karim thought it could be a video version of Hot or Not?. In 2006, Google bought the year-old site for $1.65 billion in Google stock, to derision from financial commentators certain it had overpaid.

Winter’s selection of clips from early YouTube reminds of early movies, which pulled people into theaters with little girls having a pillow fight. Winter moves on through pioneering stars like Smosh and K-Pop, 2010’s Arab spring, the arrival of advertising and monetization, the rise of alt-right channels, Gamergate, the 2016 US presidential election, the Christchurch shooting, the horrors lurking in YouTube Kids, George Floyd, the multimillion-dollar phenomenon of Ryan Kaji, January 6, the 2020 Congressional hearings. Somewhere in the middle is the arrival of the Algorithm that eliminated spontaneous discovery in favor of guided user experience, and a brief explanation of the role of Section 230 of the Communications Decency Act in protecting platforms from liability for third-party content.

These stories are told by still images and video clips interlaced with interviews with talking heads like Caleb Cain, who was led into right-wing extremism and found his way back out; Andy Parker, father of Alison Parker, footage of whose murder he has been unable to get expunged; successful YouTuber (“ContraPoints”) Natalie Wynn; technology writer and video game developer Brianna Wu; Jillian C. York, author of Silicon Values; litigator Carrie Goldberg, who works to remediate online harms one lawsuit at a time; Anthony Padilla, co-founder of Smosh; and YouTube then-CEO Susan Wojcicki.

Not included among the interviewees: political commentators (though we see short clips of Alex Jones) or free speech fundamentalists. In addition, Winter sticks to user-generated content, ignoring the large percentage of YouTube’s library that is copies of professional media, many otherwise unavailable. Countries outside the US are mentioned only by York, who studies censorship around the world. Also missing is anyone from Google who could explain how YouTube fits into its overall business model.

The movie concludes by asking commentators to recommend changes. Parker wants families of murder victims to be awarded co-copyright and therefore standing to get footage of victims’ deaths removed. Hany Farid, a UC Berkeley professor who studies deepfakes, thinks it’s essential to change the business model from paying with data and engagement to paying with money – that is, subscriptions. Goldberg is afraid we will all become captives of Big Tech. A speaker whose name is illegible in my notes mentions antitrust law. Cain notes that there’s nothing humans have built that we can’t destroy. Wojcicki says only that technology offers “a tremendous opportunity to do good in the long-term”. York notes the dual-use nature of these technologies; their effects are both good and bad, so what you change “depends what you’re looking for”.

Cain gets the last word. “What are we speeding towards?” he asks, as the movie’s accelerating crescendo of images and clips stops on a baby’s face.

Unlike predecessors Coded Bias (2021) and The Great Hack (2019), The YouTube Effect is unclear about what it intends us to understand about YouTube’s impact on the world beyond the sheer size of audience a creator can assemble via the platform. The array of scandals, all of them familiar from mainstream headlines, makes a persuasive case that YouTube deserves Facebook and Twitter-level scrutiny. What’s missing, however, is causality. In fact, the film is wrongly titled: there is no one YouTube effect. York had it right: “fixing” YouTube requires deciding what you’re trying to change. My own inclination is to force change to the business model. The algorithm distorts our interactions, but it’s driven by the business model.

Perhaps this was predictable. Seven years on, we still struggle to pinpoint exactly how social media affected the 2016 US presidential election or the UK’s EU referendum vote. Letting it ride is dangerous, but so is government regulation. Numerous governments are leaning toward the latter.

Even the experts assembled at last week’s Cambridge Disinformation Summit reached no consensus. Some saw disinformation as an existential threat; others argued that disinformation has always been with us and humanity finds a way to live through it. It wouldn’t be reasonable to expect one filmmaker to solve a conundrum that is vexing so many. And yet it’s still disappointing not to have found greater clarity.

Illustrations: YouTube CEO (2014-2023) Susan Wojcicki (via The YouTube Effect).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Review: Making a Metaverse That Matters

Making a Metaverse That Matters: From Snow Crash and Second Life to A Virtual World Worth Fighting For
By Wagner James Au
Publisher: Wiley
ISBN: 978-1-394-15581-1

A couple of years ago, when “the metaverse” was the hype-of-the-month, I kept wondering why people didn’t just join 20-year-old Second Life, or a game world. Even then the idea wasn’t new: the first graphical virtual world, Habitat, launched in 1988. And even *that* was preceded by text-based MUDs that despite their limitations afforded their users the chance to explore a virtual world and experiment with personal identity.

I never really took to Second Life. The initial steps – download the software, install it, choose a user name and password, and then an avatar – aren’t difficult. The trouble begins after that: what do I do now? Fly to an island, and then…what?

I *did*, once, have a commission to interview a technology company executive, who dressed his avatar in a suit and tie to give a lecture in a virtual auditorium and then joined me in the now-empty auditorium to talk, now changedinto jeans, T-shirt, and baseball cap.

In his new book, Making a Metaverse That Matters, the freelance journalist Wagner James Au argues that this sort of image consciousness derives from allowing humanoid avatars; they lead us to bring the constraints of our human societies into the virtual world, where instead we could free our selves. Humanoid form leads people to observe the personal space common in their culture, apply existing prejudices, and so on. Au favors blocking markers such as gender and skin color that are the subject of prejudice offline. I’m not convinced this will make much difference; even on text-based systems with numbers instead of names disguising your real-life physical characteristics takes work.

Au spent Second Life’s heyday as its embedded reporter; his news and cultural reports eventually became his 1999 book, The Making of Second Life: Notes from a New World. Part of his new book reassesses that work and reports regrets. He wishes he had been a stronger critic back then instead of being swayed by his own love for the service. Second Life’s biggest mistake, he thinks, was persistently refusing to call itself a game or add game features. The result was a dedicated user base that stubbornly failed to grow beyond about 600,000 as most people joined and reacted the way I did: what now? But some of those 600,000 benefited handsomely, as Au documents: some remade their lives, and a few continue to operate million-dollar businesses built inside the service.

Au returns repeatedly to Snow Crash author Neal Stephenson‘s original conception of the metaverse, a single pervasive platform. The metaverse of Au’s dreams has community as its core value, is accessible to all, is a game (because non-game virtual worlds have generally failed), and collaborative for creators. In other words, pretty much the opposite of anything Meta is likely to build.

The music and the myth

“Do you know anything about the festival, the culture, and movements of the times?” a teenaged friend asked as part of researching Woodstock for an essay.

The reality is that until a few years ago Woodstock only existed in my head because of the movie.

At the time, I was 15. I knew it was happening; I recall hearing on my parents’ car radio that the festival, 100 miles north, was being declared a disaster area.

“Can we go?” I remember inexplicably asking. My parents were immediately dismissive. Smart: we’d just have spent hours pointlessly stuck in traffic.

And that was it, until 1971 or thereabouts, when I saw the movie as a college student. And *that* was it until 2009, when the late, great film critic Roger Ebert picked it for Ebertfest to celebrate the 40th anniversary rerelease, with director Michael Wadleigh present to explain the manual labor required to carve the movie out of 120 miles of footage and invent its split screens and other effects, using razor blades and tape. Today, it would all be done on computers in a fraction of the time.

My Ebertfest account reminds me that faced with the studio’s intention to cut out half his movie despite his contractual “final cut”, Wadleigh stole the film (see also Blake Edwards’ S.O.B. (1981)). He then got his agent to convince the studio that he would set fire to it and himself if they didn’t release it at the full length he intended. Thus was born America’s most successful documentary.

As you might expect, the artist I most remember is Joan Baez, who, surveying the 400,000-person throng while being told she’d close the night’s show, says, “Maybe there’ll be a few more people here by then. I don’t like a puny, little gathering like this.” Later, she brings the house down with just her voice on “Swing Low, Sweet Chariot” (timecode 0:48).

Soon after that Ebertfest I discovered I knew people who’d gone. The 40th anniversary landed it on the front page of the New York Times. When a friend’s high school-aged kids noticed it, he casually dropped the bomb: “I was there.” Yes, kids, your father was cool, once.

“Was it anything like the movie?” I asked.

“It was more boring. The movie was highlights. You have to remember, it rained for three days and there was nothing to eat.”

In 2018, another friend and I went to see an exhibit of art from Burning Man in Washington, DC.

The art was awesome, but my friend began fretting about the impact on the desert lands where it’s held (also, Fern’s departure point in the movie Nomadland). I explained that the crew spend a post-event month meticulously restoring the desert to pristine condition.

She seemed relieved. “I was at Woodstock.” Decades on, she remained conscious and ashamed of the damage to the surrounding area and the harm done to the local farmers. Most people, she thought, had forgotten that.

It is, however, documented in the movie. Wadleigh and his 16-camera team (which included a young Martin Scorsese) gave over 40% of the finished movie to interviews with organizers, audience members, and local residents. Over the days, the locals’ attitudes noticeably shift from welcoming to frustrated as the festival bursts its banks and gives up trying to charge admission, and supplies run out.

My friend was right, though; most people remember just the music and the myth. Few notable musicians missed it: Bob Dylan, despite living nearby at the time; Joni Mitchell, who later wrote a song about it; The Beatles; The Rolling Stones. Not in the movie were musicians booked to appear on substages, as I learned when one of my favorite folksingers, the late Ed Trickett, explained in interviews in 2016 and 2017 that he was meant to accompany Rosalie Sorrels on a rained-out sub-stage intended to feature folk music. His experience was certainly different from those starving in the mud: helicoptered in from the performer hotel.

But, as Arlo Guthrie said at the time, that’s not what I came here to talk about.

It is even clearer in retrospect how much Woodstock was shaped by protest against the Vietnam war. The organizers’ repeated pride that 400,000 mostly young people could assemble for three days of “peace and music” is a direct oppositional response to the violence elsewhere. The hair lengths some local residents comment on were as much about visually rejecting the military as rebelling against the clean-cut, nicely-clothed corporate workers their middle class parents intended them to be. This is the serious underpinning that made Woodstock more than a music festival, and gave extra juice to that giant audience shouting “FUCK!” as Country Joe McDonald introduced the explicitly anti-war Fixin’-to-Die Rag. Ebertfest’s 2009 theaterful joined in lustily.

By 1995, when Ebert rereviewed the movie for its 25th anniversary (he revisited it yet again on its 35th anniversary in 2005) the movie’s creation of the festival’s mythic status had become clear. Without the movie, as Ebert said, the festival itself would be a mostly-forgotten “rock concert that produced some recordings”. No publicly celebrated 40th anniversary, and if my friend’s kids ever did hear their father had been there, they would have said, “So lame”.

The only way I can imagine a modern event of similar impact would be if it took place in Russia and the audience was filled with people opposing the war in Ukraine. They would need a characteristic I’m not sure exists in the world any more: a real belief that ending all war was possible.

Illustrations: Aerial shot of the festival from Woodstock.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Book review: Beyond Measure

Beyond Measure: The Hidden History of Measurement
Author: James Vincent
Publisher: Faber and Faber
ISBN: 978-0-571-35421-4

In 2022, then-government minister Jacob Rees-Mogg proposed that Britain should return to imperial measurements – pounds, ounces, yay Brexit! This was a ship that had long since sailed; 40-something friends learned only the metric system at school. Even those old enough to remember imperial measures had little nostalgia for them.

As James Vincent explains in Beyond Measure: The Hidden History of Measurement, and as most of us assume instinctively, measuring physical objects began with comparisons to pieces of the human body: feet, hands, cubits (elbow to fingertip), fathoms (the span of outstretched arms). Other forms of measurement were functional, such as the Irish collop, the amount of land needed to graze one cow. Such imprecise measurements had their benefits, such as convenient availability and immediately understandable context-based value.

Quickly, though, the desire to trade led to the need for consistency, which in turn fed the emergence of centralized state power. The growth of science increased the pressure for more and more consistent and precise measurements – Vincent spends a chapter on the surprisingly difficult quest to pin down a number we now learn as children: the temperature at which water boils. Perversely, though, each new generation of more precise measurement reveals new errors that require even more precise measurement to correct.

The history of measurement is also the history of power. Surveying the land enabled governments to decide its ownership; the world-changing discovery of statistics and the understanding they brought of social trends, and the resulting empowerment of governments, which could afford to amass the biggest avalanches of numbers.

Perhaps the quirkiest and most unexpected material is Vincent’s chapter on Standard Reference Materials. At the US National Institute for Standards and Measurement, Vincent finds carefully studied jars of peanut butter and powdered radioactive human lung. These, it turns out, provide standards against which manufacturers can check their products.

Often, Vincent observes, changes in measurement systems accompany moments of social disruption. The metric system, for example, was born in France at the time of the revolution. Defining units of measurement in terms of official weights and measures made standards egalitarian rather than dependent on one man’s body parts. By 2018, when Vincent visits the official kilo weight and meter stick in Paris, however, even that seemed too elite. Today, both kilogram and meter are defined in terms of constants of nature – the meter, for example, is defined as the distance light travels in 1/299,792,458th of a second (itself now defined in terms of the decay of caesium-133). These are units that anyone with appropriate equipment can derive at any time without needing to check it against a single stick in a vault. Still elite, but a much larger elite.

But still French, which may form part of Rees-Mogg’s objection to it. And, possibly, as Vincent finds some US Republicans have complained, *communist* because of its global adoption. Nonetheless, and despite anti-metric sentiments expressed even by futurists like Stewart Brand, the US is still more metric than most people think. The road system’s miles and retail stores’ pounds and ounces are mostly a veneer; underneath, industry and science have voted for global compatibility – and the federal government has, since 1893, defined feet and inches by metric units.