Resident of the world, traveling the road of life
69439 stories
·
21 followers

Pluralistic: Gemini is better than search because Google enshittified search (29 Jun 2026)

1 Share


Today's links

  • Gemini is better than search because Google enshittified search: We're All Trying To Find The Guy Who Did This.
  • Hey look at this: Delights to delectate.
  • Object permanence: Microsoft antitrust overturned; Scammer carves C64; RIP Jim Baen; GOP rep to constituent's child: "drop dead" (literally); CCTVs jacked for botnet; Olympic profitability lie; Human factors in health infosec; Exfiltration via computer fans; Congress's summer schedule: 9 working days; Antitrust is political antigrav; Ted Chiang's 72 Letters; Microsoft antitrust appeal; Vinge on privacy; Breaking open the web; Bernie on Brexit; "The Perdition Score"; Intuit v Child Tax Credit.
  • Upcoming appearances: London, Edinburgh, Sydney, Melbourne, Brighton, London, South Bend.
  • Recent appearances: Where I've been.
  • Latest books: You keep readin' em, I'll keep writin' 'em.
  • Upcoming books: Like I said, I'll keep writin' 'em.
  • Colophon: All the rest.



The original Google homepage, loaded in the first Netscape browser. It is viewed under a giant magnifying glass. Inside the magnifying glass, we see a killer robot (with the head of the Android droid), choking a man to death.

Gemini is better than search because Google enshittified search (permalink)

Write a critical AI book, and you become everyone's confessor for their AI sins. People in my life keep telling me about their guilty AI pleasures, in search of an explanation, absolution or condemnation:

https://us.macmillan.com/books/9780374621568/thereversecentaursguidetolifeafterai/

Their most common confession: "I only ever use Google's AI-generated search summaries these days. I no longer click those blue links beneath it, not even to verify the summary." People know that the summaries are full of "hallucinations" (that is, "defects" or "errors") but the summaries are right often enough that many people have come to rely on them, to the exclusion of actual websites, made by actual people, on the actual internet.

Everyone knows this isn't good. The reason there's a web for Google's Gemini AI to summarize is that Google – the thrice-convicted monopoly search company with a 90% market share – directs people to websites, and when you visit a website, you generate revenue for the site, which pays for its maintenance. Most commonly, you generate an "ad impression," but you might also buy a subscription, or generate an "affiliate fee" by purchasing a recommended product.

When Google strips all this away by harvesting an "answer" and displaying it at the top of the page, the bargain between Google and the open web breaks down. Google is extracting 100% of the value from the websites it summarizes, and giving nothing back in return.

This is a marked reversal from Google's founding ethos. In the old days, Google measured its success by how little time you spent on its site. The ideal Google outcome was for you to visit its page (or even better, just a search-box in your browser), type a few words, and get "ten blue links" back, the top one of which was the correct link to locate the information or resource you were seeking. The point of Google was to serve as a conduit, a trusted intermediary that neutrally adjudicated the relevance of every web page for every web user from moment to moment.

Everyone dunks on Google for its high-minded motto, "Don't be evil," but over the years, the company's mission was far more important: "Organize the world's information and make it universally accessible and useful." That was the pole star that googlers followed for the first couple decades of the company's history…until, that is, the company saturated its market and its growth stalled out.

That was when Google started to panic over its plateauing search revenue, this being an inescapable consequence of 90%+ market-share. The ensuing power struggle pitted googlers who were committed to technical excellence against the company's most ardent enshittifiers, who pointed out that by making search worse, they could increase revenues. After all, if you need to search two or three times to get the answers to your questions, that means the company can show you two or three times as many ads:

https://pluralistic.net/2024/04/24/naming-names/#prabhakar-raghavan

Where once Google measured its success by how quickly it could send you away from its site and out into the open internet, today's Google is a sticky-trap full of ways to keep you inside its walled garden.

A decade ago, tech had three major approaches:

I. Google's: let you do anything you want, but spy on you while you do it;

II. Apple's: strictly control what you can do, but leave you alone to do it in private; and

III. Facebook's: control everything you do, spy on you from asshole to appetite.

Today, tech is undergoing a form of carcinization, in which every company is turning into a Facebook-crab: maximally surveillant and maximally controlling.

Apple has added surveillance to its walled garden:

https://pluralistic.net/2022/11/14/luxury-surveillance/#liar-liar

While Google has turned its free-range, internet-wide surveillance system into a walled garden that tries to keep you away from the open internet as much as possible.

Now, in Google's defense, the "open internet" kind of sucks these days. Any piece of useful information you seek out on the open internet is liable to be buried under half a dozen pop-ups, pop-unders, and dickovers:

https://daringfireball.net/2026/05/what_is_a_dickover

Even after you clear these away, the actual information you're seeking is further buried in word-salads that anticipated insipid AI prose by half a decade. Think of all those omelet recipes that appear beneath 2,500 words of cod-Proustian remembrances of "the first time I ate an egg."

The major advantage of AI search summaries is in shielding you from all this nonsense. But where did all that nonsense come from in the first place?

It turns out that this is largely Google's fault.

Google and Facebook monopolized the display advertising market, entering into an illegal, collusive arrangement to rig the bidding so that advertisers paid more and publishers received less:

https://en.wikipedia.org/wiki/Jedi_Blue

The Google/Meta duopoly suck up 51% of display advertising revenue – more than triple the historic take for advertising intermediaries (buyers, brokers, agencies, etc). As ad revenues for web publishers cratered, the "ad load" on web pages went up. This set up a vicious cycle: increasing the number of ads decreases the number of readers, driving publishers to increase the ad-load even more to make up for the losses.

The major brake on this is ad-blocking. In a world with ad-blockers in it, publishers contemplating an increase in ad-load have to confront the possibility that they will induce ad-overload in their readers, who will install a blocker that stops them from seeing any ads:

https://www.eff.org/deeplinks/2019/07/adblocking-how-about-nah

Google has been looking to kill ad-blocking for a decade, and now they're on the verge of making it happen in Chrome, the dominant web browser they use to reinforce their search monopoly:

https://protonprivacy.substack.com/p/google-is-finally-killing-ublock

Google long ago did away with ad-blocking on mobile devices (reverse engineering an app is a felony, which means an app is just a web-page skinned with the right kind of IP to make it a crime to protect your privacy while you use it). Part of Google's argument for killing ad-blocking for the web is that this puts the web on an even footing with apps – which is a very weird way to describe a race to the absolute bottom:

https://pluralistic.net/2026/06/12/compelled-speech/#quishing

To top it all off, this decade has seen Google make a series of changes to its search prioritization that favored low-value shovelware sites over carefully researched, reliable alternatives. Search for product reviews and you're apt to get a "site reputation abuse" result from a once-reliable outlet like Forbes filled with useless and even dangerous reviews, which are ranked far above independently maintained, rigorous competitors:

https://pluralistic.net/2024/05/03/keyword-swarming/#site-reputation-abuse

This has only gotten worse with AI search, which preferentially draws from spam sites to produce decontextualized, highly confident recommendations for substandard, overpriced junk, at the expense of recommendations for good products:

https://pluralistic.net/2025/07/15/inhuman-gigapede/#coprophagic-ai

It's not like Google doesn't have the ability to sort the good from the bad. Kagi.com is a $10/month paid search engine whose results are vastly superior to Google's. But Kagi doesn't have its own search index: instead, they rent access to Google's index, but apply their own (much smaller and less resourced) team's algorithm to rank the results for your queries. In other words, Google could deliver good search results, they just choose not to:

https://pluralistic.net/2024/04/04/teach-me-how-to-shruggie/#kagi

Gresham's Law holds that "bad money drives out good." It refers to a counterfeit coin crisis in Tudor England, where people preferentially spent counterfeit money in order to make it someone else's problem; meanwhile, everyone hoarded their good coins. Soon, virtually all the money in circulation was bogus.

By downranking quality material in favor of low-effort spam, Google set up a web-wide version of Gresham's Law, where bad webpages drive out good ones, and since so many of those webpages contain product recommendations, they're greshaming the world of real products, too, so the bad is driving out the good there, too.

This is the problem that Gemini search summaries solve: in its role as the web's most important gatekeeper, Google remade them as an ad-festooned cesspit of garbage text and cynical shovelware sites. Now Google proposes to wipe out the publishers whose content they stripmined by breaking the web's bargain: that search engines are symbiotic with publishers. Google has turned fully parasitic, sucking the last drops of juice out of the open web before discarding its husk.


Hey look at this (permalink)



A shelf of leatherbound history books with a gilt-stamped series title, 'The World's Famous Events.'

Object permanence (permalink)

#25yrsago Appeals court strikes down Microsoft antitrust ruling https://www.nytimes.com/2001/06/28/business/us-appeals-court-overturns-microsoft-antitrust-ruling.html

#25yrsago Ted Chiang's 72 Letters https://web.archive.org/web/20010720192340/http://www.tor.com/72ltrs.html

#25yrsago Concept handheld devices https://web.archive.org/web/20010620115437/https://www.infosync.no/en/news/n/419.asp

#25yrsago Analyzing Microsoft's successful antitrust appeal https://web.archive.org/web/20010703085656/https://www.salon.com/tech/feature/2001/06/28/appeals_reaction/index.html

#20yrsago Bengali science fiction of the 1880s https://www.lehigh.edu/~amsp/2006/05/early-bengali-science-fiction.html

#20yrsago Vernor Vinge on computers, freedom and privacy https://www.theguardian.com/technology/2006/jun/29/guardianweeklytechnologysection5

#20yrsago Scammer convinced to carve replica Commodore 64 https://www.419eater.com/html/john_boko.php

#20yrsago Jim Baen, sf publisher, has passed away https://web.archive.org/web/20060703024337/http://david-drake.com/baen.html

#15yrsago YouTube listens to fraudulent NyanCat takedown notice, drags heels on put-back from creator https://web.archive.org/web/20110628132607/http://www.prguitarman.com/index.php?id=369

#15yrsago Wyoming’s corporation mills manufacture privileged artificial “people” to order https://www.reuters.com/article/2011/06/28/us-usa-shell-companies-idUSTRE75R20Z20110628/

#15yrsago Publishing in the Internet era: connecting audiences and works https://www.theguardian.com/technology/2011/jun/30/publishers-internet-changing-role?utm_source=twitterfeed&utm_medium=twitter

#15yrsago Why writers should have their own domains https://whatever.scalzi.com/2011/06/29/mastering-ones-own-domain-an-no-this-is-not-a-seinfeld-reference/

#15yrsago Copyright troll’s biggest fan commits terminal irony https://www.eff.org/deeplinks/2011/06/righthaven-cheerleader-wanted-irony-police

#10yrsago Mississippi state rep tells distraught mom to buy kid’s lifesaving meds ‘with money she earns’ https://www.sunherald.com/news/local/counties/jackson-county/article86416087.html

#10yrsago Always-on CCTVs with no effective security harnessed into massive, unstoppable botnet https://arstechnica.com/information-technology/2016/06/large-botnet-of-cctv-devices-knock-the-snot-out-of-jewelry-website/

#10yrsago Gun-waving cop who attacked black teenaged girl in her bathing suit faces no charges https://web.archive.org/web/20160624103549/http://dfw.cbslocal.com/2016/06/23/grand-jury-no-bills-former-mckinney-pool-party-cop/

#10yrsago The Olympics are profitable for every host city (that lies about the numbers) https://timharford.com/2016/06/how-do-you-make-the-olympics-pay-fudge-the-figures/

#10yrsago Healthcare workers prioritize helping people over information security (disaster ensues) https://www.cs.dartmouth.edu/~sws/pubs/ksbk15-draft.pdf

#10yrsago Fansmitter: malware that exfiltrates data from airgapped computers by varying the sound of their fans https://www.youtube.com/watch?v=3GCHCVpndaM

#10yrsago Labour’s knives come out for Corbyn, but he’s guaranteed a spot on the ballot https://www.politico.eu/article/inside-account-of-labour-mps-attacks-on-jeremy-corbyn-shadow-cabinet-resignations-brexit/

#10yrsago Hope Larson’s “Compass South”: swashbuckling YA graphic novel https://memex.craphound.com/2016/06/28/hope-larsons-compass-south-swashbuckling-ya-graphic-novel/

#10yrsago How to Break Open the Web: a report on the first Decentralized Web Summit https://www.fastcompany.com/3061357/the-web-decentralized-distributed-open

#10yrsago Californians will get to vote on legal recreational weed https://web.archive.org/web/20160629130245/http://abcnews.go.com/US/wireStory/voters-decide-legalize-recreational-marijuana-40206739

#10yrsago Bernie Sanders on Brexit: urgent lessons for the Democrats https://www.nytimes.com/2016/06/29/opinion/campaign-stops/bernie-sanders-democrats-need-to-wake-up.html

#10yrsago Electoral fraud: Trump sends fundraiser emails to foreign politicians https://www.cnet.com/culture/trump-spams-foreign-politicians-with-fundraising-emails/#ftag=CAD590a51e

#10yrsago The Perdition Score: Sandman Slim vs the One Percent https://memex.craphound.com/2016/06/29/the-perdition-score-sandman-slim-vs-the-one-percent/

#5yrsago Intuit sabotages the Child Tax Credit https://pluralistic.net/2021/06/29/three-times-is-enemy-action/#ctc

#5yrsago SCOTUS to wrongfully accused terrorists: "drop dead" https://pluralistic.net/2021/06/29/three-times-is-enemy-action/#transunion

#5yrsago Lazy Congress only schedules 9 days' work this summer https://pluralistic.net/2021/06/28/dubious-quant-residue/#back-to-work-you

#1yrago Antitrust defies politics' law of gravity https://pluralistic.net/2025/06/28/mamdani/#trustbusting


Upcoming appearances (permalink)

A photo of me onstage, giving a speech, pounding the podium.



A screenshot of me at my desk, doing a livecast.

Recent appearances (permalink)



A grid of my books with Will Stahle covers..

Latest books (permalink)



A cardboard book box with the Macmillan logo.

Upcoming books (permalink)

  • "The Post-American Internet," a geopolitical sequel of sorts to Enshittification, Farrar, Straus and Giroux, 2027

  • "Unauthorized Bread": a middle-grades graphic novel adapted from my novella about refugees, toasters and DRM, FirstSecond, April 20, 2027

  • "Enshittification, Why Everything Suddenly Got Worse and What to Do About It" (the graphic novel), Firstsecond, 2027

  • "The Memex Method," Farrar, Straus, Giroux, 2027



Colophon (permalink)

Today's top sources:

Currently writing: "The Post-American Internet," a sequel to "Enshittification," about the better world the rest of us get to have now that Trump has torched America. Fourth draft completed. Submitted to editor.

  • A Little Brother short story about DIY insulin PLANNING

This work – excluding any serialized fiction – is licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.

https://creativecommons.org/licenses/by/4.0/

Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.


How to get Pluralistic:

Blog (no ads, tracking, or data-collection):

Pluralistic.net

Newsletter (no ads, tracking, or data-collection):

https://pluralistic.net/plura-list

Mastodon (no ads, tracking, or data-collection):

https://mamot.fr/@pluralistic

Bluesky (no ads, possible tracking and data-collection):

https://bsky.app/profile/doctorow.pluralistic.net

Medium (no ads, paywalled):

https://doctorow.medium.com/

Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):

https://mostlysignssomeportents.tumblr.com/tagged/pluralistic

"When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla

READ CAREFULLY: By reading this, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

ISSN: 3066-764X

Read the whole story
mkalus
8 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Neon-Reklamen von Hand fertigen

1 Share

Die meisten Leuchtreklamen werden heutzutage mit flexiblen LED-Röhren hergestellt, die deutlich günstiger sind als Neonröhren. Doch es gibt noch immer einige wenige, die Neonreklamen von Hand fertigen. So wie dieser Gentleman hier, der laut Factory Monster letzte Neon-Meister Koreas ist.


(Direktlink, via The Awesomer)

Read the whole story
mkalus
22 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Behind the Track „Loser“ by Beck

1 Share

Ein klein wenig Musikgeschichte, für die Beck über seine Durchbruch-Single „Loser“ spricht.

Der rätselhafte Künstler blickt auf seine Anfänge in der Underground-Szene zurück und beschreibt, wie diese prägenden Erfahrungen zu bedeutenden Durchbrüchen in seiner Karriere und seinem kreativen Schaffen führten.


(Direktlink)

Read the whole story
mkalus
22 hours ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

SoftBank’s untitled AI goose game — eggs do not lay eggs

1 Share

Masayoshi Son is the founder and CEO of SoftBank, the giant Japanese private equity fund that’s the main backer and cheerleader of OpenAI.

SoftBank was also the main backer of WeWork, which was almost as stupid as OpenAI. WeWork’s 2019 stock market offering failed spectacularly and SoftBank’s investment went down 90%. WeWork still exists in 2026, but it’s a lot smaller.

As well as the spectacular messes, SoftBank has a pile of quietly successful portfolio companies — like ARM, which designs the chips in all the phones, and makes a bundle. These pay for the messes.

Son has a long track record of wild slides for investor events. SoftBank’s annual general meeting was on Wednesday. The internet saw the slides from this thing and went wild for the goose that lays the golden eggs. [SoftBank, PDF, archive]

There are a pile of slides about artificial super-intelligence. And Physical Artificial Superintelligence, which means robots. Son talks up ARM Holdings a lot — Softbank is very proud of ARM.

On page 45 of the PDF, the goose shows up.

Sixteen years ago, Softbank had only three golden eggs. That’s a market cap of 3 trillion yen. The “market only saw the eggs.” It ignored the goose.

But “eggs do not lay eggs.” It’s the goose! Value us!

 

 

There’s a golden egg factory inside the goose! If you look at the slide on page 56 of the PDF, you’ll see the goose is a robot with an “internal mechanism.” Also, the goose is labeled “ASI” — that’s “artificial superintelligence”. The golden eggs are “ASI” too.

Son has long been fond of his golden goose. SoftBank’s 2014 earnings slides start with “SoftBank equals Goose”. Later on, Son includes the whole story of “The Goose that Laid the Golden Eggs”. [SoftBank, 2014, PDF, archive]

Wednesday’s presentation for 2026 mostly hammers on Son’s love of AI. Son said: [Reuters]

I think it’s blasphemy against AI if ‌you say it’s a bubble.

SoftBank insiders said last month they were getting worried how deep in OpenAI SoftBank is. [Bloomberg, archive]

News just came out that OpenAI might put off its IPO until 2027 — and SoftBank’s share price dropped 14%. [MarketWatch]

Son is most annoyed that the market cap of SoftBank stock is half its claimed asset value. “The goose was not valued.” But the market isn’t just valuing the goose at zero — it’s giving it a huge penalty.

Of course, SoftBank is not the goose that lays the golden eggs. ARM Holdings is the goose. SoftBank’s profitable companies are the geese. They’re what produces the golden eggs.

SoftBank is the goose’s greedy owner. SoftBank takes the money from its successful companies like ARM and sets the money on fire at WeWork and now OpenAI.

Son is making out he is the AI-powered golden goose. Praise him! Do not question him!

But that’s not what’s happening here. SoftBank is not a golden goose, it’s not a factory, and it’s not a superintelligence.

Also, “The Goose that Laid the Golden Eggs” is a cautionary fable about greed being bad.

Read the whole story
mkalus
2 days ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

Office workers are spending way too much on AI too

1 Share

The AI push runs on setting venture capital money on fire and charging chatbot users way less than the bot costs to serve.

The idea is that the customers will get hooked on the business value. Then the AI vendors can gouge them. And the vendors can try not to haemorrhage quite so much cash.

So Microsoft, Anthropic, and OpenAI have all been moving their software customers off monthly subscriptions to token-based billing. Tokens have the added enterprise pricing advantage that the cost of anything is completely obscure.

But the customers are not so hooked they can just swallow their GitHub Copilot bill multiplying by a hundred.

One source told Axios that one of their clients had spent $500 million in a month on Claude Code. We don’t know who it was — but the best guesses are Amazon or Uber, both of which are known to have run up stupendous bills. [Axios]

It’s not just the coders — it’s the ordinary office workers! They’re being told to AI up everything and give it that veneer of slop. That’s professional now.

Walmart said in early June that it was rationing its internal AI tool “Code Puppy”, which did office workslop as well as code. Code Puppy used to be unlimited. Now it’s not. [Bloomberg]

Accenture has been having a rollercoaster ride. In February, Accenture was telling staff that getting a promotion would depend on hitting the chatbot. And if you didn’t, you were fired.

This was really an excuse for layoffs — Accenture’s consulting business is badly down and its stock price has cratered over the past year.

But Accenture employees responded to the incentive — and now the bill’s coming due. 404 Media got a leak of a meeting at Accenture: [404, archive]

“We’re seeing from some of the data internally at least that it’s actually not our engineers that are driving the token consumption. It’s a lot of the non-engineers that are doing some of those behaviors […] you were talking about.”

… Stuart Henderson … jokes he hopes Kwak didn’t just convert a PDF into images and then into markdown files. “I’m learning that’s one of the big token chewers,” Henderson says. “Turning PDFs into markdown: is that right?”

If you order people to use a chatbot that doesn’t do anything useful, they’ll just point it at any old trash. Then the bill hits.

The customer backlash is bad enough that Sam Altman at OpenAI is openly talking about price cuts. [WSJ, archive]

But OpenAI can’t afford price cuts. They need revenue numbers to make their planned IPO look plausible. And Altman doesn’t have the sort of Elon magic that SpaceX ran their IPO on.

Elon Musk isn’t convincing the markets either. The SpaceX stock price shot up — then back down a few days later. It’s been steady since, just below the offering price.

The AI scam cannot possibly pay for itself from sales. AI turns out not to be critical for real business work.

The use case for AI is doing things that should not be done. And there’s only so much market for that.

Read the whole story
mkalus
3 days ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete

How to Use AI to Help Find Civilian Harm

1 Share

Between February 2022 and September 2025, Bellingcat staff and volunteers collected, geolocated, and shared more than 2,500 incidents of civilian harm following Russia’s full-scale invasion of Ukraine. 

As part of this effort, Bellingcat tested a new machine learning model intended to rank Telegram social media posts on their likelihood of containing incidents of civilian harm. 

This novel methodology dramatically reduced the search and selection time required, freeing researchers to focus on verifying incidents of civilian harm – not just searching for them. 

This piece documents our methodology, ethical considerations and lessons learned in the hope that others researching similar topics can benefit from our work. 

Open source research into civilian harm is still a relatively new field and it presents many challenges – one of the biggest is organising and sorting through the huge volume of user generated content being produced to find what is relevant. 

Machine learning, a form of artificial intelligence that uses algorithms to identify patterns from large amounts of data and make predictions, can make this task more efficient.

With ongoing conflicts involving large amounts of civilian harm occurring in Sudan, and much of the Middle East, this guide aims to offer those covering these conflicts an example of how machine learning can be used to help find and sort incidents. You can also access the Code Notebook for our model here.

We defined “civilian harm” not just as civilian deaths or injuries resulting from armed conflict, but also the broader and delayed effects on civilians from mental trauma, loss of livelihood, displacement, destruction of infrastructure and more. This definition was informed by the Protection of Civilians book on civilian harm

Initial Telegram Dataset 

Each Telegram post containing civilian harm which had already been manually verified by researchers was used to build an initial dataset of confirmed cases of civilian harm, which data scientists call positive instances. We collected a total of 5,848 unique URLs for these Telegram posts. For our manual collection we reviewed posts on relevant Telegram channels, working through oldest to newest posts each day. Assuming that a given post made it to our geolocated incidents list, it meant the researcher who flagged it also looked at the posts that appeared before and after it on Telegram and did not flag those ones, so we selected the 10 posts surrounding the verified civilian harm post as our additional dataset of posts that did not contain civilian harm. After excluding any deleted or duplicate posts, we ended up with 48,545 non-civilian harm posts, our negative instances

The choice to overrepresent negative instances aims at better reflecting the real world and increasing data available for model training. 

We enriched each URL with metadata from the Telegram API, such as the time of publication, reactions or textual content. As some of these posts had been deleted, we completed the missing data points with previously preserved versions from our Auto Archiver database, only available for the positive instances.

Feature Engineering

Training a machine learning model requires numerical data, as these models compute a prediction score based on mathematical operations.

We built these by converting raw data from our initial dataset, such as keywords signalling potential civilian harm, into numerical scores (or “features”) that the model could interpret, with the aim of increasing the model’s ability to identify patterns. This process, known as feature engineering, can significantly improve model results because it allows data scientists to suggest explicit context knowledge. 

A full list of features we used to train the model can be found in the code notebook accompanying this piece. Many features were directly inspired by researchers’ input from their experiences manually screening cases of civilian harm by sorting through a set number of Telegram channels and inspecting each post individually.

Several of the features used were directly built from the metadata contained in each Telegram post including media_type, day_of_week; or binary ones: forwarded, edited and reply_to

Other features included engagement information: views, forwards, total_reactions, and even individual features for most used emojis including the reaction_crying_face to count 😭 emoji.

Converting Text to Numbers 

To embed the experience from the manual collection process, researchers put together a list of keywords both in Ukrainian and Russian that, to them, signalled posts likely to  show civilian harm. For instance, “Шахед” and “КАБ” translated to “Shahed” and “Guided aerial bomb” respectively. We created a numerical feature to count their frequency. 

In addition, we included several generic English-language keywords which meaningfully signalled potential civilian harm, such as “injured”, “school affected” and “hospital affected” that were only used for generating semantic similarity scores. 

A semantic similarity score is a calculation used to determine the proximity in meaning between different words and phrases. To get the semantic similarity between the post text and each of our keywords, we represented each in a list of numbers via a Sentence Transformer model, which converts words into numerical representations called vectors that a computer can understand. 

We then calculated the level of similarity between each vector using cosine similarity, one of the most popular methods for measuring similarity between two pieces of text.

Due to how embeddings work, this calculation results in a figure on a scale from -1 (no semantic proximity) to 1 (same meaning). For example, the words “hurt” and “injured” would have a high similarity score, while “residential” and “injured” would have a negative score as the words are not semantically similar. 

Finally, to enable the model to identify the relevance of each post to civilian harm in Ukraine, we used a multilingual text transformer from the BERT family of language models to represent the entire post’s text as a vector of 768 numerical values. This model can efficiently represent text from many languages in a way that captures meaning: the same sentence in different languages will generate similar embeddings, and trained machine learning models can detect patterns in the embeddings. 

It is important to note that for this initial prototype of a civilian harm detection model, we did not include any features derived from media content such as photos and videos, although that would be a logical next step in attempting to improve model performance.

Selecting, Training and Evaluating Models

With 54,393 rows of 893 numerical features each, we selected four machine learning algorithms to train our predictive models. 

We chose Logistic Regression as a baseline algorithm due to its simplicity. We also selected three other “best in class” models, Random Forest, XGBoost, and LightGBM. These choices centred on the interpretability of the models and their ability to work on tabular data of this size. For example, we avoided neural networks due to a lack of interpretability and because those models work best with a larger dataset. 

To genuinely assess the performance of the trained models, we split our dataset into three parts:  

  • A training set – the data the models were trained on (60 percent of the full dataset’s rows)
  • A validation set – used for an intermediary evaluation when tuning model parameters (20 percent of all rows)
  • A test set – hidden for the final performance assessment, so the models were evaluated on unseen data (remaining 20 percent of rows)

We used a stratified split to divide the dataset instead of a random split. This method ensured the proportion of positive instances (i.e. confirmed cases of civilian harm) remained consistent across all three sets at about 11 percent.

To measure the performance of machine learning models, we ran them through the test set and measured the number of correct and incorrect predictions. Models output a likelihood between 0 and 1 that each Telegram post contains civilian harm, and we tried to find a cut-off threshold that leads to a good balance between flagging almost every post (0.1) or flagging very few (0.9). 

There are two main types of evaluation metrics to gauge a model’s prediction power. Recall asserts what fraction of positive instances (i.e. known civilian harm posts) were correctly flagged as such. Precision measures the fraction of posts flagged as civilian harm that are indeed civilian harm posts.

Walber, CC BY-SA 4.0, via Wikimedia Commons.

During the training phase, we tuned the models to maximise average precision (PR-AUC), a metric that summarises precision across all recall levels. While this method also accounts for precision, it prioritises recall, which is preferable for this use case as it steers model selection to reduce the number of civilian harm posts that are skipped. 

The following table sorts models from best to worst PR-AUC against a baseline of a coin-flip predictor. ROC-AUC and F1 are two other evaluation metrics included as sanity checks. Simply put, ROC-AUC measures the probability of ranking two instances, one negative and one positive, correctly; F1 balances precision and recall equally and its best cut-off threshold value.

Model test scores comparison, XGBoost stands out in every relevant metric evaluated. 

From these results, we selected XGBoost as our final model as it had the best scores when compared across all metrics.

Interpreting the Model

Because these models are interpretable, we can understand which features are the most useful when predicting whether a post includes civilian harm. The above table shows the top 10 features that most strongly signal the XGBoost model to make a decision:

  • semantic_keywords_similarity: the semantic proximity between the post text and manually selected keywords “casualties”, “damage” and “civilian harm”
  • bert:  the model was able to discern meaning from the text with the same strength as some of the other features in this list – there are three cases of this in the top 10
  • reaction_crying_face: reactions with crying face emojis on the post
  • group_of_messages: whether a post contains multiple media files
  • keywords_in_text: the number of custom Ukrainian or Russian keywords in the post

These results generally tally with what you might expect when selecting Telegram posts for instances of civilian harm, including that posts that generate a lot of emotional engagement and posts using keywords about civilian harm were among those most likely to contain content related to this topic. Not all models had the same top features as XGBoost. In fact, for the Random Forest model the most important feature was the number of crying face emojis present in a post, a soft pattern highlighted by researchers when this methodology was first imagined.

LLM Results and Comparison

Retroactively, we decided to run a sample of the same test dataset through different large language models (LLMs) to gauge their ability to make these same predictions. 

We aimed to include an LLM-generated score as an extra feature for our trained models, which would be captured as relevant if it correlated with the correct predictions. 

To start, we selected two local models, the 1B and 4B variants of Gemma 3 from Google DeepMind, and two cloud-hosted models, Gemini 2.5 flash and Gemini 3.5 flash. With this selection, we hoped to compare results across a wide range of models’ expected performance. 

We generated a 400-row stratified sample (preserving the same proportion of real civilian harm instances) from the test dataset used for the custom models. For each of the four LLM models, we ran two tests: one where only the Telegram post message was sent, and another including both the message and the engineered features (excluding the text embeddings, as the model had direct access to the text). In the prompt for each model, we asked for a score between 0 and 1. We then evaluated the results as we did for the custom models. 

The above table shows that LLMs can indeed extract value from the engineered features. All four LLMs surpassed the baseline Logistic Regression model in our tests, yet none of them performed better than the other custom-trained models, and XGBoost remained the one with the highest PR-AUC. 

Still, Gemini 2.5 Flash performed better than its newer version 3.5 and even achieved a slightly higher best F1 score than any other model. While this is a good result, for the flagging of civilian harm posts, the PR-AUC remains the crucial metric, as it captures the model’s ability to identify infrequent instances of civilian harm while minimising false positives.

Ethical Considerations

Introducing an instrument of automated decision-making into a process of detecting civilian harm brings inherent ethical questions. These include automation bias, or how humans tend to blindly place faith in machine-generated recommendations; algorithmic bias, or how the results of these models echo the same patterns present in the training data, including under- or over-representation of types of civilian harm. 

The decision to test an automated methodology for this particular project came from the fact that there were limited resources for both steps in the process – the detection of potential civilian harm and its actual verification. Historically, we built an enormous backlog of unverified incidents because a lot of time had to be spent on monitoring the most recent events so that potential evidence would be captured and preserved as soon as possible. 

The automation of this process also reduced the exposure of researchers to a significant amount of unpleasant and distressing visual and text content, reducing the burden of exposure to traumatic content. 

For this project, we tried to ameliorate the ethical challenges with a number of strategies including randomly flagging posts not captured by any model, monitoring which features models relied on to make decisions, and by doing historical comparisons of patterns in data. 

Additionally, as stated above, for this initial prototype of a civilian harm detection model we did not include any features derived from the media content itself. In the future, it would be a logical next step in attempting to improve the model performance, to include the media from the posts – but using AI to review actual media comes with additional ethical challenges such as model bias.

Because of the opaque ownership of many LLM companies and their generative nature, the use of LLMs for an extra feature presented additional ethical challenges including privacy and safety concerns considering the sensitive nature of the data. Our model did not rely on LLMs, though we retroactively ran a sample through it. 

How the Model Fits into the Bigger Picture 

After selecting this model, we created a user interface where researchers could view a list of Telegram posts sorted from most to least likely to contain indications of civilian harm. The user interface was designed for quick triage and integration, where a positive confirmation from researchers would instantly send the post to the Auto Archiver (Bellingcat’s tool for preserving digital content) and then transfer it to ATLOS (our internal collaborative verification platform). Bellingcat staff and volunteers could then manually verify incidents. Researcher input was constantly stored so that this data could be used to improve the model in the future. 

Preliminary feedback indicated that the AI model was useful. Not only were we able to reduce time and harm from scouring through dozens of war reporting Telegram channels, researchers also reported that the stream of new posts being added to the verification backlog were capturing real and diverse cases of civilian harm. 

Despite the focus on civilian harm and Telegram (highly popular in Ukraine and Russia), this pipeline is generic and can be adapted to other conflict monitoring tasks. How easily this can be done does depend on how open the social media platform is and whether it is possible to scrape posts from it. Apart from that, it is easy to incorporate new features and data, and cheap to automatically retrain, test and deploy models as the system receives more human input.  

Looking forward, sorting through overwhelming amounts of data in a conflict will continue to be challenging. Hopefully, this methodology can help newsrooms, conflict monitoring organisations, and others find the balance between ethical considerations and resources in order to carry out open source investigations on civilian harm and human rights violations. 


Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Bluesky here, Instagram here, Reddit here and YouTube here.

The post How to Use AI to Help Find Civilian Harm appeared first on bellingcat.

Read the whole story
mkalus
3 days ago
reply
iPhone: 49.287476,-123.142136
Share this story
Delete
Next Page of Stories