Intellectual Property Archives

The return of piracy

In Internet terms, it’s been at least a generation since the high-profile fights over piracy – that is, the early 2000s legal actions against unauthorized sites offering music, TV, and films, and the people who used them. Visits to the news site TorrentFreak this week feel like a revival.

The wildest story concerns Z-Library, for some years the largest shadow book collection. Somewhere someone must be busily planning a true crime podcast series. Z-Library was briefly offline in 2022, when the US Department of Justice seized many of its domains. Shortly afterwards there arrived Anna’s Archive, a search engine for shadow libraries – Z-Library and many others, and the journal article shadow repository Sci-Hub. Judging from a small sampling exercise, you can find most books that have been out for longer than a month. Newer books tend to be official ebooks stripped of digital rights management.

In November 2022, the Russian nationals Anton Napolsky and Valeriia Ermakova were arrested in Argentina, alleged to be Z-Library administrators. The US requested extradition, and an Argentinian judge agreed. They appealed to the Argentinian supreme court, asking to be classed as political refugees. This week, a story in local publication La Voz, made its way north. As Ashley Belanger explains at Ars Technica, Napolsky and Ermakova decided not to wait for a judgment, escaped house arrest back in May, and vanished. The team running Z-library say the pair are innocent of copyright infringement.

Also missing in court: Anna’s Archive’s administrators. As noted here in February; the library service company OCLC sued Anna’s Archive for having exploited a security hole in its website in order to scrape 2,2TB of its book metadata. This might have gone unnoticed, except that the admins published the news on its blog. OCLC is claiming that the breach has cost millions to remediate its systems.

This week saw an update to the case: OCLC has moved for summary judgment as Anna’s Archive’s operators have failed to turn up in court. At TorrentFreak, Ernesto van der Sar reports that OCLC is also demanding millions in damages and injunctive relief barring Anna’s from continuing to publish the scraped data, though it does not ask for the site to be blocked. (The bit demanding that Anna’s Archive pay the costs of remediating OCLC’s flawed systems is puzzling; do we make burglars who climb in through open windows pay for locksmiths?)

And then there is the case of the Internet Archive’s Open Library, which claims its scanned-in books are legal under the novel theory of controlled digital lending. When the Internet Archive responded to the covid crisis by removing those controls in 2020, four major publishers filed suit. In 2023, the US District Court for the Southern District of New York ruled against the Internet Archive, saying its library enables copyright infringement. Since then, the Archive has removed 500,000 books.

This is the moment when lessons from the past of music, TV, and video piracy could be useful. Critics always said that the only workable answer to piracy is legal, affordable services, and they were right, as shown by Pandora, Spotify, Netflix, which launched its paid streaming service in 2007, and so many others.

It’s been obvious for at least two years that things are now going backwards. Last December, in one of many such stories, the Discovery/Warner Brothers merger ended a licensing agreement with Sony, leading the latter to delete from Playstation users’ libraries TV shows they had paid for in the belief that they would remain permanently available. The next generation is learning the lesson. Many friends over 40 say they can no longer play CDs or DVD; teenaged friends favor physical media because they’ve already learned that digital services can’t be trusted.

Last September, we learned that Hollywood studios were deleting finished, but unaired programs and parts of their back catalogues for tax reasons. Sometimes, shows have abruptly disappeared mid-season. This week, Paramount removed decades of Comedy Central video clips; last month it axed the MTV news archives. This is *our* culture, even if it’s *their* copyright.

Meanwhile, the design of streaming services has stagnated. The complaints people had years ago about interfaces that make it hard to find the shows they want to see are the same ones they have now. Content moves unpredictably from one service to another. Every service is bringing in ads and raising prices. The benefits that siphoned users from broadcast and cable are vanishing.

As against this, consider pirate sites: they have the most comprehensive libraries; there are no ads; you can use the full-featured player of your choice; no one other than you can delete them; and they are free. Logically, piracy should be going back up, and at least one study suggests it is. If only they paid creators…

The lesson: publishers may live to regret attacking the Internet Archive rather than finding ways to work with it – after all, it sends representatives to court hearings and obeys rulings; if so ordered, they might even pay authors. In 20 years, no one’s managed to sink The Pirate Bay; there’ll be no killing the shadow libraries either, especially since my sampling finds that the Internet Archive’s uncorrected scans are often the worst copies to read. Why let the pirate sites be the one to offer the best services?

Illustrations: The New York Public Library, built 1911 (via Wikimedia).

To tell the truth

It was toward the end of Craig Wright’s cross-examination on Wednesday when, for the first time in many days, he was lost for words. Wright is in court because the non-profit Crypto Open Patent Alliance seeks a ruling that he is not, as he claims, bitcoin inventor Satoshi Nakomoto, who was last unambiguously heard from in 2011.

Over the preceding days, Wright had repeatedly insisted “I am the real Satoshi” and disputed forensic analysis – anachronistic fonts, metadata, time stamps – pronouncing his proffered proofs forgeries.. He was consistently truculent, verbose, and dismissive of everyone’s expertise but his own and of everyone’s degrees except the ones he holds. For example: “Meiklejohn has not studied cryptography in any depth,” he said of Sarah Meiklejohn, the now-professor who as a student in 2013 showed that bitcoin transactions are traceable. In a favorite moment, Jonathan Hough, KC, who did most of the cross-examination, interrupted a diatribe about the failings of the press with, “Moving on from your expertise on journalism, Dr Wright…”

Participants in a drinking game based on his saying “That is not correct” would be dead of alcohol poisoning. In between, he insisted several times that he never wanted to be outed as Satoshi, and wishes that everyone would “leave me alone and let me invent”. Any money he is awarded in court he will give to charities ; he wants nothing for himself.

But at the moment we began with he was visibly stumped. The question, regarding a variable on a Github page: “Do you know what unsigned means?”

Wright: “Basically, an unsigned variable…it’s not an integer with…it’s larger. I’m not sure how to say it.”

Lawyer: “Try.”

Wright: “How I’d describe it, I’m not quite sure. I’m not good with trying to do things like this.” He could explain it easily in writing… (Transcription by Norbert on exTwitter.)

The lawyer explained it thusly: an unsigned variable cannot be a negative number.

“I understand that, but would I have thought of saying it in such a simple way? No.”

Experience as a journalist teaches you that the better you understand something the more simply and easily you can explain it. Wright’s inability to answer blew the inadequately bolted door plug out of his world’s expert persona. Everything until then could be contested: the stomped hard drive, the emails he wrote, or didn’t write, or wrote only one sentence of, the allegations that he had doctored old documents to make it look like he had been thinking about bitcoin before the publication of Satoshi’s foundational 2008 paper. But there’s no disguising lack of basic knowledge. “Should have been easy,” says a security professor (tenured, chaired) friend.

Normally, cryptography removes ambiguity. This is especially true of public key cryptography and its complementary pair of public and private keys. Being able to decrypt something with a well-attested public key is clear proof that it was encrypted with the complementary private key. Contrariwise, if a specific private key decrypts it, you know that key’s owner is the intended recipient. In both cases, as a bonus, you get proof that the text has not been altered since its encryption. It *ought* to be simple for Wright to support his claim by using Satoshi’s private keys. If he can’t do that, he must present a reason and rely on weaker alternatives.

Courts of law, on the other hand, operate on the balance of probabilities. They don’t remove ambiguity; they study it. Wright’s case is therefore a cultural clash, with far-reaching consequences. COPA is complaining that Wright’s repeated intellectual property lawsuits against developers working on bitcoin projects are expensive in both money and time. Soon after the unsigned variable exchange, the lawyer asked Wright what he will do if the court rules against him. “Move on to patents,” Wright said. He claims thousands of patents relating to bitcoin and the blockchain, and a brief glance at Google Patents shows many filings, some granted.

However this case comes out, therefore, it seems likely Wright will continue to try to control bitcoin. Wright insists that bitcoin isn’t meant to be “digital gold”, but that its true purpose is to facilitate micropayments. I haven’t “studied bitcoin in any depth” (as he might say), but as far as I can tell it’s far too slow, too resource-intensive, and too volatile to be used that way. COPA argues, I think correctly, that it’s the opposite of the world enshrined in Satoshi’s original paper; its whole point was to use cryptography to create the blockchain as a publicly attested, open, shared database that could eliminate central authorities such as banks.

In the Agatha Christie version of this tale, most likely Wright would be an imposter, an early hanger-on who took advantage of the gap formed by Satoshi’s disappearance and the deaths of other significant candidates. Dorothy Sayers would have Lord Peter Wimsey display unexpected mathematical brilliance to improve on Satoshi’s work, find him, and persuade him to turn over his keys and documents to king and country. Sir Arthur Conan Doyle would have both Moriarty and Sherlock Holmes on the trail. Holmes would get there first and send him into protection to ensure Morarty couldn’t take criminal advantage. And then the whole thing would be hushed up in the public interest.

The case continues.

Illustrations: The cryptographic code from “The Dancing Men”, by Sir Arthur Conan Doyle (via Wikimedia).

Nefarious

Torrentfreak is reporting that OCLC, owner of the WorldCat database of bibliographic records, is suing the “shadow library” search engine Anna’s Archive. The claim: that Anna’s Archive hacked into WorldCat, copied 2.2TB of records, and posted them publicly.

Shadow libraries are the text version of “pirate” sites. The best-known is probably Sci-Hub, which provides free access to hundreds of thousands of articles from (typically expensive) scientific journals. Others such as Library Genesis and sites on the dark web offer ebooks. Anna’s Archive indexes as many of these collections as it can find; it was set up in November 2022, shortly after the web domains belonging to the then-largest of these book libraries, Z-Library, were seized by the US Department of Justice. Z-Library has since been rebuilt on the dark web, though it remains under attack by publishers and law enforcement.

Anna’s Archive also includes some links to the unquestionably legal and long-running Gutenberg Project, which publishes titles in the public domain in a wide variety of formats.

The OCLC-Anna’s Archive case has a number of familiar elements that are variants of long-running themes, open versus gatekept being the most prominent. Like many such sites (post-Napster), Anna’s Archive does not host files itself. That’s no protection from the law; authorities in various countries from have nonetheless blocked or seized the domains belonging to such sites. But OCLC is not a publisher or rights holder, although it takes large swipes at Anna’s Archive for lawlessness and copyright infringement. Instead, it says Anna’s Archive hacked WorldCat, violating its terms and conditions, disrupting its business arrangements, and costing it $1.4 million and 10,000 employee hours in system remediation. Second, it complains that Anna’s Archive has posted the data in the aggregate for public download, and is “actively encouraging nefarious use of the data”. Other than the use of “nefarious”, there seems little to dispute about either claim; Anna’s Archive published the details in an October 2023 blog posting.

Anna’s Archive describes this process as “preserving” the world’s books for public access. OCLC describes it as “tortious inference” with its business. It wants the court to issue injunctive relief to make the scraping and use of the data stop, compensatory damages in excess of $75,000, punitive damages, costs, and whatever else the court sees fit. The sole named defendant is a US citizen, María A. Matienzo, thought to be resident near Seattle. If the identification and location are correct, that’s a high-risk situation to be in.

In the blog posting, Anna’s Archive writes that its initial goal was to answer the question of what percentage of the world’s published books are held in shadow libraries and create a to-do list of gaps to fill. To answer these questions, they began by scraping ISBNdb, the database of publications with ISBNs, which only came into use in 1970. When the overlap with the Internet Archive’s Open Library and the seized Z-library was less than they hoped, they turned to Worldcat. At that point, they openly say that security flaws in the fortuitously redesigned Worldcat website allowed them to grab more or less the comprehensive set of records. While scraping can be legal, exploiting security flaws to gain unauthorized access to a computer system likely violates the widely criticized Computer Fraud and Abuse Act (1986), which could be a felony. OCLC has, however, brought a civil case.

Anna’s Archive also searches the Internet Archive’s Open Library, founded in 2006. In 2009, co-creator Aaron Swartz told me that he believed the creation of Open Library pushed OCLC into opening up greater public access to the basic tier of its bibliographic data. The Open Library currently has its own legal troubles; it lost in court in August 2023 after Hachette sued it for copyright infringement. The Internet Archive is appealing; in the meantime it is required to remove on request of any member of the American Asociation of Publishers any book commercially available in electronic format.

OCLC began life as the Ohio Library College Library Center; its WorldCat database is a collaboration between it and its member libraries to create a shared database of bibliographic records and enable online cataloguing. The last time I wrote about it, in 2009, critics were complaining that libraries in general were failing to bring book data onto the open web. It has gotten a lot better in the years since, and many local libraries are now searchable online and enable their card holders to borrow from their holdings of ebooks over the web.

The fact that it’s now often possible to borrow ebooks from libraries should mean there’s less reason to use unauthorized sites. Nonetheless, these still appeal: they have the largest catalogues, the most convenient access, DRM-free files, and no time limits, so you can read them at your leisure using the full-featured reader you prefer.

In my 2009 piece, an OCLC spokesperson fretted about “over-exploitation”, which there would be no good way to maintain or update countless unknown scattered pockets of data, seemingly a solvable problem.

OCLC and its member libraries are all non-profit organizations ultimately funded by taxpayers. The data they collect has one overriding purpose: to facilitate public access to libraries’ holdings by showing who holds what books in which editions. What are “nefarious” uses? Arguably, the data they collect should be public by right. But that’s not the question the courts will decide.

Illustrations: The New York Public Library, built 1911 (via Wikimedia).

Objects of copyright

Back at the beginning, the Internet was going to open up museums to those who can’t travel to them. Today…

At the Art Newspaper, Bender Grosvenor reports that a November judgment from the UK Court of Appeal means museums can’t go on claiming copyright in photographs of public domain art works. Museums have used this claim to create costly licensing schemes. For art history books and dissertations that need the images for discussion, the costs are often prohibitive. And, it turns out, the “GLAM” (galleries, libraries, archives, and museums) sector isn’t even profiting from it.

Grosvenor cites figures: the National Gallery alone lost £31,000 on its licensing scheme in 2021-2022 (how? is left as an exercise for the reader). This figure was familiar: Douglas McCarthy, whom the article quotes, cited it at gikii 2023. As an ongoing project with Andrea Wallace, McCarthy co-runs the Open GLAM survey, which collects data to show the state of open access in this sector.

In his talk, McCarthy, an art historian by training and the head of the Library Learning Center at Delft University of Technology, showed that *most* such schemes are losing money. The National Gallery of Liverpool, for example, lost £71,873 on licensing activities between 2018 and 2023.

Like Grosvenor, McCarthy noted that the scholars whose work underpins the work of museums and libraries, are finding it increasingly difficult to afford that work. One of McCarthy’s examples was St Andrews art history professor Kathryn M. Rudy, who summed up her struggles in a 2019 piece for Times Higher Education: “The more I publish, the poorer I get.”

Rudy’s problem is that publishing in art history, as necessary for university hiring and promotions, requires the use of images of the works under discussion. In her own case, the 1,419 images she needed to use to publish six monographs and 15 articles have consumed most of her disposable income. To be fair, licensing fees are only part of this. She also lists travel to view offline collecctions, the costs of attending conferences, data storage, academic publishers’ production fees, and paying for the copies of books contracts require her to send the libraries supplying the images; some of this is covered by her university. But much of those extra costs come from licensing fees that add up to thousands of pounds for the material necessary for a single book: reproduction fees, charges for buying high-resolution copies for publication, and even, when institutions allow it at all, fees for photographing images in situ using her phone. Yet these institutions are publicly funded, and the works she is photographing have been bought with money provided by taxpayers.

On the face of it, THJ v. Sheridan, as explained by the law firm the law firm Pennington, Manches, Cooper in a legal summary, doesn’t seem to have much to do with the GLAM sector. Instead, the central copyright claim was regarding the defendant software used in webinars and presentations. However, the point, as the Kluwer Copyright blog explains, was deciding which test to apply to decide whether a copyrighted work is original.

In court, THJ, a UK-based software development firm, claimed that Daniel Sheridan, a US options trading mentor and former licensee, had misrepresented its software as his own and had violated THJ’s copyright by using the software after his license agreement expired by including images of the software in his presentations. One of THJ’s two claims failed on the basis that the THJ logo and copyright notices were displayed throughout the presentation.

The second is the one that interests us here: THJ claimed copyright in the images of its software based on the 1988 Copyright, Designs, and Patents Act. The judge, however, ruled that while the CDPA applies to the software, images of the software’s graphical interface are not covered; to constitute infringement Sheridan would have had to communicate the images to the UK public. In analyzing the judgment, Grosvenor pulled out the requirements for copyright to apply: that making the images required some skill and labor on the part of the person or organization making the claim. By definition, this can’t be true of a photograph of a painting, which needs to be as accurate a representation as possible.

Grosvenor has been on this topic for a while. In 2017, he issued a call to arms in Art History News, arguing that image reproduction fees are “killing art history”.

In 2017, Grosvenor was hopeful, because US museums and a few European ones were beginning to do away with copyright claims and licensing fees and finding that releasing the images to the public to be used for free in any context created value in the form of increased discussion, broadcast, and visibility. Progress continues, as McCarthy’s data shows, but inconsistently: last year the incoming Italian government reversed its predecessor’s stance by bringing back reproduction fees even for scientific journals.

Granted, all of the GLAM sector is cash-strapped and is desperately seeking new sources of income. But these copyright claims seem particularly backwards. It ought to be obvious that the more widely images of an institution’s holdings are published the more people will want to see the original; greater discussion of these art works would seem to fulfill their mission of education. Opening all this up would seem to be a no-brainer. Whether the GLAM folks like it or not, the judge did them a favor.

Illustrations: “Harpist”, from antiphonal, Cambrai or Tournai c. 1260-1270, LA, Getty Museum, Ms. 44/Ludwig VI 5, p. 115 (via Discarding Images).

Property is theft

If you were to judge just by behavior, you would have to conclude that the entertainment industry’s rights holders are desperate to promote piracy.

The latest instance is that Sony has warned American Playstation owners that shows they purchased – *bought* – from Discovery will, now that Discovery has merged with Warner Brothers, be removed from their video libraries. This isn’t like Netflix losing the license to stream your current favorite show halfway through season 2, which you can maybe fix by joining whichever streaming service the show is now on (assuming there is one). No, this is you (thought you) bought and they took it away.

In other words, the entertainment industry has taken the old anarchist slogan property is theft and turned it into a business model.

This isn’t a one-time occurrence. As Timothy Geigner writes at TechDirt, in 2022 customers in Germany and Austria lost access to hundreds of movies when a deal between Sony and film distributor Studio Canal expired. As in the Warner Brothers/Discovery case, it’s not just that the movies were removed from the list available for purchase; the long, remote arm of Sony reached into individual Playstations and removed them from there, too.

If Warner Brothers sent a minion to come into my house, take a DVD from a shelf, and take it away, that would clearly be theft, even if I had given the company a key so it could come in and update my Blu-Ray player. Why is it different if it’s a digital file held on an electronic device?

This is the kind of question I used to get asked back when these copyright battles were new. “You’re a freelance writer,” said the first person I interviewed on this sort of subject, back in 1991; he was the new head of the Federation Against Software Theft. “You make your living from copyright. Why aren’t you against piracy?” (Or something close to that.)

At the time the big battle in freelance journalism was that publishers were pushing toward all-rights contracts that would let them use whatever we wrote forever without further payment. Freelances were trying to hang onto the old arrangement, under which the publisher just got the right to run the piece once (and *first*), and then the freelance could go on and resell the piece in whole or in part to others and in other markets. Columnists made money by compiling their pieces into books. Magazine writers made money by reselling to other countries or selling reworked versions to specialist publications.

By 1995 you couldn’t really make money that way any more. Today, younger freelances have little idea it was ever possible. This, again, is the future the recent SAG-AFTRA strikes were trying to avoid. The shift is more simply described like this: the old way was pay per use; the new way the studios want is pay once, use forever. This struggle is endemic to every industry, as SAG head Fran Drescher pointed out.

The exact opposite is what’s happening to consumer access. In the old way, because buying physical media conferred ownership of the media (and the fact that the content was only ever licensed was largely moot), consumers bought once and used as much as they wanted until the disc or tape wore out. Even if streaming doesn’t quite open the way for paying for every use (though I bet that’s the hope), it does grant remote control to anyone who has access to the device – even if you thought you only granted permission to put stuff there, not remove it.

If I remember correctly, the first time people realized this kind of power existed was in 2009, when Amazon deleted (irony of ironies) copies of George Orwell’s novel 1984 from thousands of Kindles because the third-party company selling the ebook did not in fact have the rights to it. In this particular case, Amazon did refund the money people had paid. Since then, there’s been a steady trickle of cases where ultimate control of the device stays with its maker and doesn’t transfer to the person who paid to buy it.

You might think that the solution is to go on (or back to) buying the entertainment you love on physical media…but that option is also under threat. Disney announced in July that it would stop selling DVDs and Blu-Ray discs in Australia. In the US, Best buy is about to stop carrying them. Add in the recent trend for deleting even successful shows for tax reasons and the unpredictability of which streaming service might have the thing you’re looking for, and you have an extremely consumer-hostile industry.

For consumers, the perfect service looks something like this: the library is, if not complete, *very* extensive, all indexed in one place, and easily searchable using a simple but effective interface. Downloads are quick and give you a file you can move around, replay, or copy to friends at will. There are no ads. It will play on any device that can play video. Repeated viewings don’t require an Internet connection. *That* is what piracy offers. It’s not that it’s free. It’s that it gives people what they want. And the worse commercial services become, the better piracy looks. If only it paid the artists…

Illustrations: Opera Australia performing The Pirates of Penzance in 2007 (via Wikimedia).

The end of ownership

It seems no manufacturer will be satisfied until they have turned everything they make into an ongoing revenue stream. Once, it was enough to sell widgets. Then, you needed to have a line of upgrades and add-ons for your widgets and all your sales personnel were expected to “upsell” at every opportunity. Now, you need to turn some of those upgrades and add-ons into subscription services, and throw in some ads for extra revenue. All those ad-free moments in your life? To you, this is space in which to think your own thoughts. To advertisers, these are golden opportunities that haven’t been exploitable before and should be turned to their advantage. (Years ago, I remember, for example, a speaker at a lunchtime meeting convened by the Internet Advertising Bureau saying with great excitement that viral emails could bring ads into workplaces, which had previously been inaccessible.)

The immediate provocation for this musing is the Chamberlain garage door opener that blocks third-party apps in order to display ads. To be fair, I have no skin in this specific game: I have neither garage door opener nor garage door. I don’t even have a car (any more). But I have used these items, and I therefore feel comfortable in saying that this whole idea sucks.

There are three objectionable aspects. First is the ad itself and the market change it represents. I accept that some apps on my phone show ads, but I accept that because I have so far decided not to pay for them (in part because I don’t want to give my credit card information to Google in order to do so). I also accept them because I have chosen to use the apps. Here, however, the app comes with the garage door opener, which you *have* paid for, and the company is double-dipping by trying to turn it into an ongoing revenue stream; its desire to block third-party apps is entirely to protect that revenue stream. Did you even *want* an app with your garage door opener? Does a garage door need options? My friends who have them seem perfectly happy with the two choices of open or closed, and with a gizmo clipped to their sun visor that just has a physical button to push.

Second is the reported user interface design, which forces you to scroll past the ad to get to the button to open the door. This is theft: Chamberlain is stealing a sliver of your time and patience whenever you need to open your garage door. Both are limited resources.

Third is the loss of control over – ownership of – objects you have ostensibly bought. With few exceptions, it has always been understood that once you’ve bought a physical object it’s yours to do with what you want. Even in the case of physical containers of intellectual property – books, CDs, LPs – you always had the right to resell or give away the physical item and to use it as often as you wanted to. The arrival of digital media forced a clarification: you owned the physical object but not the music, pictures, film, or text encoded on it. The part-pairing discussed here a couple of weeks ago is an early example of the extension of this principle to formerly wholly-owned objects. The more software infiltrates the physical world, the more manufacturers will seek to use that software to control how we use the devices they make.

In the case we began with, Chamberlain’s decision to shut off API access to third parties to protect its own profits mirrors a recent trend in social media such as Reddit and Twitter in response to large language models built on training data scraped from their sites. The upshot in the Chamberlain case is that the garage door openers stop working with home automation systems into which the owners want to integrate them. Chamberlain has called this integration unauthorized usage and complains that said use means a tiny proportion of its customers consumed more than half of the traffic to and from its system. Seems like someone could have designed a technical solution for this.

At Pluralistic, Cory Doctorow lists four ways companies can be stopped from exerting unreasonable post-purchase control: fear of their competition, regulation, technical feasibility, and customer DIY. All four, he writes, have so far failed in this case, not least because Chamberlain is now owned by the private equity firm Blackstone, which has already bought up its competitors. Because there are so many other examples, we can’t dismiss this as a one-off; it’s a trend! Or, in Doctorow’s words, “a vast and deadly rot”.

An early example came from Tesla in 2020; when it disabled Full Self-Drive on a used Model S on the grounds that the customer hadn’t paid for it. Over-the-air software updates give companies this level of control long after purchase.

Doctorow believes a countering movement is underway. I hope so, because writing this has led me to this little imaginary future horror: the guitar that silences itself until you type in a code to verify that you have paid royalties for the song you’re trying to play. Logically, then, all interaction with physical objects could become like waiting through the ads for other shows on DVDs until you could watch the one you paid to see. Life is *really* too short.

Illustrations: Steve (Campbell Scott) shows Linda (Kyra Sedgwick) how much he likes her by offering her a garage door opener in Cameron Crowe’s 1992 film Singles.

Power cuts

In the latest example of corporate destruction, the Guardian reports on the disturbing trend in which streaming services like Disney and Warner Bros Discovery are deleting finished, even popular, shows for financial reasons. It’s like Douglas Adams’ rock star Hotblack Desiato spending a year dead for tax reasons.

Given that consumers’ budgets are stretched so thin that many are reevaluating the streaming services they’re paying for, you would think this would be the worst possible time to delete popular entertainments. Instead, the industry seems to be possessed by a death wish in which it’s making its offerings *less* attractive. Even worse, the promise they appeared to offer to showrunners was creative freedom and broad and permanent access to their work. The news that Disney+ is even canceling finished shows (Nautilus) shortly before their scheduled release in order to pay less *tax* should send a chill through every creator’s spine. No one wants to spend years of their life – for almost *any* amount of money – making things that wind up in the corporate equivalent of the warehouse at the end of Raiders of the Lost Ark.

It’s time, as the Masked Scheduler suggested recently on Mastodon, for the emergence of modern equivalents of creator-founded studios United Artists and Desilu.

***

Many of us were skeptical about Meta’s Oversight Board; it was easy to predict that Facebook would use it to avoid dealing with the PR fallout from controversial cases, but never relinquish control. And so it is proving.

This week, Meta overruled the Board‘s recommendation of a six-month suspension of the Facebook account belonging to former Cambodian prime minister Hun Sen. At issue was a video of one of Sen’s speeches, which everyone agreed incited violence against his opposition. Meta has kept the video up on the grounds of “newsworthiness”; Meta also declined to follow the Board’s recommendation to clarify its rules for public figures in “contexts in which citizens are under continuing threat of retaliatory violence from their governments”.

In the Platformer newsletter Casey Newton argues that the Board’s deliberative process is too slow to matter – it took eight months to decide this case, too late to save the election at stake or deter the political violence that has followed. Newton also concludes from the list of decisions that the Board is only “nibbling round the edges” of Meta’s policies.

A company with shareholders, a business model, and a king is never going to let an independent group make decisions that will profoundly shape its future. From Kate Klonick’s examination, we know the Board members are serious people prepared to think deeply about content moderation and its discontents. But they were always in a losing position. Now, even they must know that.

***

It should go without saying that anything that requires an Internet connection should be designed for connection failures, especially when the connected devices are required to operate the physical world. The downside was made clear by the 2017 incident, when lost signal meant a Tesla-owning venture capitalist couldn’t restart his car. Or the one in 2021, when a bunch of Tesla owners found their phone app couldn’t unlock their car doors. Tesla’s solution both times was to tell car owners to make sure they always had their physical car keys. Which, fine, but then why have an app at all?

Last week, Bambu 3D printers began printing unexpectedly when they got disconnected from the cloud. The software managing the queue of printer jobs lost the ability to monitor them, causing some to be restarted multiple times. Given the heat and extruded material 3D printers generate, this is dangerous for both themselves and their surroundings.

At TechRadar, Bambu’s PR acknowledges this: “It is difficult to have a cloud service 100% reliable all the time, but we should at least have designed the system more carefully to avoid such embarrassing consequences.” As TechRadar notes, if only embarrassment were the worst risk.

So, new rule: before installation test every new “smart” device by blocking its Internet connection to see how it behaves. Of course, companies should do this themselves, but as we/’ve seen, you can’t rely on that either.

***

Finally, in “be careful what you legislate for”, Canada is discovering the downside of C-18, which became law in June. and requires the biggest platforms to pay for the Canadian news content they host. Google and Meta warned all along that they would stop hosting Canadian news rather than pay for it. Experts like law professor Michael Geist predicted that the bill would merely serve to dramatically cut traffic to news sites.

On August 1, Meta began adding blocks for news links on Facebook and Instagram. A coalition of Canadian news outlets quickly asked the Competition Bureau to mount an inquiry into Meta’s actions. At TechDirt Mike Masnick notes the irony: first legacy media said Meta’s linking to news was anticompetitive; now they say not linking is anticompetitive.

However, there are worse consequences. Prime minister Justin Trudeau complains that Meta’s news block is endangering Canadians, who can’t access or share local up-to-date information about the ongoing wildfires.

In a sensible world, people wouldn’t rely on Facebook for their news, politicians would write legislation with greater understanding, and companies like Meta would wield their power responsibly. In *this* world, a we have a perfect storm.

Illustrations:XKCD’s Dependency.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. She is a contributing editor for the Plutopia News Network podcast. Follow on Author Wendy M. GrossmanPosted on September 1, 2023September 1, 2023Categories Infrastructure, Intellectual Property, Law, Media, Net lifeTags 3D printing, Canada, HollywoodLeave a comment on Power cuts

The data grab

It’s been a good week for those who like mocking flawed technology.

Numerous outlets have reported, for example, that “AI is getting dumber at math”. The source is a study conducted by researchers at Stanford and the University of California Berkeley comparing GPT-3.5’s and GPT-4’s output in March and June 2023. The researchers found that, among other things, GPT-4’s success rate at identifying prime numbers dropped from 84% to 51%. In other words, in June 2023 ChatGPT-4 did little better than chance at identifying prime numbers. That’s psychic level.

The researchers blame “drift”, the problem that improving one part of a model may have unhelpful knock-on effects in other parts of the model. At Ars Technica, Benj Edwards is less sure, citing qualified critics who question the study’s methodology. It’s equally possible, he suggests, that as the novelty fades, people’s attempts to do real work surface problems that were there all along. With no access to the algorithm itself and limited knowledge of the training data, we can only conduct such studies by controlling inputs and observing the outputs, much like diagnosing allergies by giving a child a series of foods in turn and waiting to see which ones make them sick. Edwards advocates greater openness on the part of the companies, especially as software developers begin building products on top of their generative engines.

Unrelated, the New Zealand discount supermarket chain Pak’nSave offered an “AI” meal planner that, set loose, promptly began turning out recipes for “poison bread sandwiches”, “Oreo vegetable stir-fry”, and “aromatic water mix” – which turned out to be a recipe for highly dangerous chlorine gas.

The reason is human-computer interaction: humans, told to provide a list of available ingredients, predictably became creative. As for the computer…anyone who’s read Janelle Shane’s 2019 book, You Look LIke a Thing and I Love You, or her Twitter reports on AI-generated recipes could predict this outcome. Computers have no real world experience against which to judge their output!

Meanwhile, the San Francisco Chronicle reports, Waymo and Cruise driverless taxis are making trouble at an accelerating rate. The cars have gotten stuck in low-hanging wires after thunderstorms, driven through caution tape, blocked emergency vehicles and emergency responders, and behaved erratically enough to endanger cyclists, pedestrians, and other vehicles. If they were driven by humans they’d have lost their licenses by now.

In an interesting side note that reminds of the cars’ potential as a surveillance network, Axios reports that in a ten-day study in May Waymo’s driverless cars found that human drivers in San Francisco speed 33% of the time. A similar exercise in Phoenix, Arizona observed human drivers speeding 47% of the time on roads with a 35mph speed limit. These statistics of course bolster the company’s main argument for adoption: improving road safety.

The study should – but probably won’t – be taken as a warning of the potential for the cars’ data collection to become embedded in both law enforcement and their owners’ business models. The frenzy surrounding ChatGPT-* is fueling an industry-wide data grab as everyone tries to beef up their products with “AI” (see also previous such exercises with “meta”, “nano”, and “e”), consequences to be determined.

Among the newly-discovered data grabbers is Intel, whose graphics processing unit (GPU) drivers are collecting telemetry data, including how you use your computer, the kinds of websites you visit, and other data points. You can opt out, assuming you a) realize what’s happening and b) are paying attention at the right moment during installation.

Google announced recently that it would scrape everything people post online to use as training data. Again, an opt-out can be had if you have the knowledge and access to follow the 30-year-old robots.txt protocol. In practical terms, I can configure my own site, pelicancrossing.net, to block Google’s data grabber, but I can’t stop it from scraping comments I leave on other people’s blogs or anything I post on social media sites or that’s professionally published (though those sites may block Google themselves). This data repurposing feels like it ought to be illegal under data protection and copyright law.

In Australia, Gizmodo reports that the company has asked the Australian government to relax copyright laws to facilitate AI training.

Soon after Google’s announcement the law firm Clarkson filed a class action lawsuit against Google to join its action against OpenAI. The suit accuses Google of “stealing” copyrighted works and personal data,

“Google does not own the Internet,” Clarkson wrote in its press release. Will you tell it, or shall I?

Whatever has been going on until now with data slurping in the interests of bombarding us with microtargeted ads is small stuff compared to the accelerating acquisition for the purpose of feeding AI models. Arguably, AI could be a public good in the long term as it improves, and therefore allowing these companies to access all available data for training is in the public interest. But if that’s true, then the *public* should own the models, not the companies. Why should we consent to the use of our data so they can sell it back to us and keep the proceeds for their shareholders?

It’s all yet another example of why we should pay attention to the harms that are clear and present, not the theoretical harm that someday AI will be general enough to pose an existential threat.

Illustrations: IBM Watson, Jeopardy champion.

Wendy M. Grossman is the 2013 winner of the Enigma Award and contributing editor for the Plutopia News Network podcast. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon.

Author Wendy M. GrossmanPosted on August 11, 2023December 6, 2023Categories AI, Intellectual Property, Mathematics, Net life, PrivacyTags ChatGPT, data protection, Google, Intel, OpenAI1 Comment on The data grab

Ex libris

So as previously discussed here three years ago and two years ago, on March 24 the US District Court for the Southern District of New York found that the Internet Archive’s controlled digital lending fails copyright law. Half of my social media feed on this subject filled immediately with people warning that publishers want to kill libraries and this judgment is a dangerous step limiting access to information; the other half is going “They’re stealing from authors. Copyright!” Both of these things can be true. And incomplete.

To recap: in 2006 the Internet Archive set up the Open Library to offer access to digitized books under “controlled digital lending”. The system allows each book to be “out” on “loan” to only one person at a time, with waiting lists for popular titles. In a white paper, lawyers David R. Hansen and Kyle K. Courtney call this “format shifting” and say that because the system replicates library lending it is fair use. Also germane: the Archive points to a 2007 California decision that it is in fact a library. Other countries may beg to differ.

When public libraries closed at the beginning of the covid19 pandemic, the Internet Archive announced the National Emergency Library, which suspended the one-copy-at-a-time rule and scrubbed the waiting lists so anyone could borrow any book at any time. The resulting publicity was the first time many people had heard of the Open Library, although authors had already complained. Hachette Book Group, Penguin Random House, HarperCollins, and Wiley filed suit. Shortly afterwards, the Archive shut down the National Emergency Library. The Open Library continues, and the Archive will appeal the judge’s ruling.

On the they’re-killing-the-libraries side: Mike Masnick and Fight for the Future. At Walled Culture, Glyn Moody argues that sharing ebooks helps sell paid copies. Many authors agree with the publishers that their living is at risk; a group of exceptions including Neil Gaiman, Naomi Klein, and Cory Doctorow, have published an open letter defending the Archive.

At Vice, Claire Woodstock lays out some of the economics of library ebook licenses, which eat up budgets but leave libraries vulnerable and empty-shelved when a service is withdrawn. She also notes that the Internet Archive digitizes physical copies it buys or receives as donations, and does not pay for ebook licenses.

Brief digression back to 1996, when Pamela Samuelson warned of the coming copyright battles in Wired. Many of its key points have since either been enshrined into law, such as circumventing copy protection; others, such as requiring Internet Service Providers to prevent users from uploading copyrighted material, remain in play today. Number three on her copyright maximalists’ wish listeliminating first-sale rights for digitally transmitted documents. This is the doctrine that enables libraries to lend books.

It is therefore entirely believable that commercial publishers believe that every library loan is a missed sale. Outside the US, many countries have a public lending right that pays royalties on loans for that sort of reason. The Internet Archive doesn’t pay those, either.

It surely isn’t facing the headwinds public libraries are. In the UK, years of austerity have shrunk library budgets and therefore their numbers and opening hours. In the US, libraries are fighting against book bans; in Missouri, the Republican-controlled legislature voted to defund the state’s libraries entirely, apparently in retaliation.

At her blog, librarian and consultant Karen Coyle, who has thought for decades about the future of libraries, takes three postings to consider the case. First, she offers a backgrounder, agreeing that the Archive’s losing on appeal could bring consequences for other libraries’ digital lending. In the second, she teases out the differences between academic/research libraries and public libraries and between research and reading. While journals and research materials are generally available in electronic format, centuries of books are not, and scanned books (like those the Archive offers) are a poor reading experience compared to modern publisher-created ebooks. These distinctions are crucial to her third posting, which traces the origins of controlled digital lending.

As initially conceived by Michelle M. Wu in a 2011 paper for Law Library Journal, controlled digital lending was a suggestion that law libraries could, either singly or in groups, buy a hard copy for their holdings and then circulate a digitized copy, similar to an Inter-Library Loan. Law libraries serve limited communities, and their comparatively modest holdings have a known but limited market.

By contrast, the Archive gives global access to millions of books it has scanned. In court, it argued that the availability of popular commercial books on its site has not harmed publishers’ revenues. The judge disagreed: the “alleged benefits” of access could not outweigh the market harm to the four publishers who brought the suit. This view entirely devalues the societal role libraries play, and Coyle, like many others, is dismayed that the judge saw the case purely in terms of its effect on the commercial market.

The question I’m left with is this: is the Open Library a library or a disruptor? If these were businesses, it would obviously be the latter: it avoids many of the costs of local competitors, and asks forgiveness not permission. As things are, it seems to be both: it’s a library for users, but a disruptor to some publishers, some authors, and potentially the world’s libraries. The judge’s ruling captures none of this nuance.

Illustrations: 19th century rendering of the Great Library of Alexandria (via Wikimedia).

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Author Wendy M. GrossmanPosted on April 14, 2023January 1, 2024Categories Future tech, Intellectual Property, LawTags internet archive, judgments, libraries1 Comment on Ex libris

Memex 2.0

As language models get cheaper, it’s dawned on me what kind of “AI” I’d like to have: a fully personalized chat bot that has been trained on my 30-plus years of output plus all the material I’ve read, watched, listened to, and taken notes on all these years. A clone of my brain, basically, with more complete and accurate memory updated alongside my own. Then I could discuss with it: what’s interesting to write about for this week’s net.wars?

I was thinking of what’s happened with voice synthesis. In 2011, it took the Scottish company Cereproc months to build a text-to-speech synthesizer from recordings of Roger Ebert’s voice. Today, voice synthesizers are all over the place – not personalized like Ebert’s, but able to read a set text plausibly enough to scare voice actors.

I was also thinking of the Stochastic Parrots paper, whose first anniversary was celebrated last week by authors Emily Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell. An important part of the paper advocates for smaller, better-curated language models: more is not always better. I can’t find a stream for the event, but here’s the reading list collected during the proceedings. There’s lots I’d rather eliminate from my personal assistant. Eliminating unwanted options upfront has long been a widspread Internet failure, from shopping sites (“never show me pet items”) to news sites (“never show me fashion trends”). But that sort of selective display is more difficult and expensive than including everything and offering only inclusion filters.

A computational linguistics expert tells me that we’re an unknown amount of time away from my dream of the wg-bot. Probably, if such a thing becomes possible it will be based on someone’s large language model and fine-tuned with my stuff. Not sure I entirely like this idea; it means the model will be trained on stuff I haven’t chosen or vetted and whose source material is unknown, unless we get a grip on forcing disclosure or the proposed BLOOM academic open source language model takes over the world.

I want to say that one advantage to training a chatbot on your own output is you don’t have to worry so much about copyright. However, the reality is that most working writers have sold all rights to most of their work to large publishers, which means that such a system is a new version of digital cholera. In my own case, by the time I’d been in this business for 15 years, more than half of the publications I’d written for were defunct. I was lucky enough to retain at least non-exclusive rights to my most interesting work, but after so many closures and sales I couldn’t begin to guess – or even know how to find out – who owns the rights to the rest of it. The question is moot in any case: unless I choose to put those group reviews of Lotus 1-2-3 books back online, probably no one else will, and if I do no one will care.

On Mastodon, the specter of the upcoming new! improved! version of the copyright wars launched by the arrival of the Internet: “The real generative AI copyright wars aren’t going to be these tiny skirmishes over artists and Stability AI. Its going to be a war that puts filesharing 2.0 and the link tax rolled into one in the shade.” Edwards is referring to this case, in which artists are demanding billions from the company behind the Stable Diffusion engine.

Edwards went on to cite a Wall Street Journal piece that discusses publishers’ alarmed response to what they perceive as new threats to their business. First: that the large piles of data used to train generative “AI” models are appropriated without compensation. This is the steroid-fueled analogue to the link tax, under which search engines in Australia pay newspapers (primarily the Murdoch press) for including them in news search results. A similar proposal is pending in Canada.

The second is that users, satisfied with the answers they receive from these souped-up search services will no longer bother to visit the sources – especially since few, most notably Google, seem inclined to offer citations to back up any of the things they say.

The third is outright plagiarism without credit by the chatbot’s output, which is already happening.

The fourth point of contention is whether the results of generative AI should be themselves subject to copyright. So far, the consensus appears to be no, when it comes to artwork. But some publishers who have begun using generative chatbots to create “content” no doubt claim copyright in the results. It might make more sense to copyright the *prompt*. (And some bright corporate non-soul may yet try.)

At Walled Culture, Glyn Moody discovers that the EU has unexpectedly done something right by requiring positive opt-in to copyright protection against text and data mining. I’d like to see this as a ray of hope for avoiding the worst copyright conflicts, but given the transatlantic rhetoric around privacy laws and data flows, it seems much more likely to incite another trade conflict.

It now dawns on me that the system I outlined in the first paragraph is in fact Vannevar Bush’s Memex. Not the web, which was never sufficiently curated, but this, primed full of personal intellectual history. The “AI” represents those thousands of curating secretaries he thought the future would hold. As if.

Illustrations: Stable Diffusion rendering of “stochastic parrots”, as prompted by Jon Crowcroft.

Wendy M. Grossman is the 2013 winner of the Enigma Award. Her Web site has an extensive archive of her books, articles, and music, and an archive of earlier columns in this series. Follow on Mastodon or Twitter.

Author Wendy M. GrossmanPosted on March 24, 2023January 1, 2024Categories AI, Future tech, Intellectual PropertyTags stochastic parrrots2 Comments on Memex 2.0