Die meisten Leuchtreklamen werden heutzutage mit flexiblen LED-Röhren hergestellt, die deutlich günstiger sind als Neonröhren. Doch es gibt noch immer einige wenige, die Neonreklamen von Hand fertigen. So wie dieser Gentleman hier, der laut Factory Monster letzte Neon-Meister Koreas ist.
Ein klein wenig Musikgeschichte, für die Beck über seine Durchbruch-Single „Loser“ spricht.
Der rätselhafte Künstler blickt auf seine Anfänge in der Underground-Szene zurück und beschreibt, wie diese prägenden Erfahrungen zu bedeutenden Durchbrüchen in seiner Karriere und seinem kreativen Schaffen führten.
Masayoshi Son is the founder and CEO of SoftBank, the giant Japanese private equity fund that’s the main backer and cheerleader of OpenAI.
SoftBank was also the main backer of WeWork, which was almost as stupid as OpenAI. WeWork’s 2019 stock market offering failed spectacularly and SoftBank’s investment went down 90%. WeWork still exists in 2026, but it’s a lot smaller.
As well as the spectacular messes, SoftBank has a pile of quietly successful portfolio companies — like ARM, which designs the chips in all the phones, and makes a bundle. These pay for the messes.
Son has a long track record of wild slides for investor events. SoftBank’s annual general meeting was on Wednesday. The internet saw the slides from this thing and went wild for the goose that lays the golden eggs. [SoftBank, PDF, archive]
There are a pile of slides about artificial super-intelligence. And Physical Artificial Superintelligence, which means robots. Son talks up ARM Holdings a lot — Softbank is very proud of ARM.
On page 45 of the PDF, the goose shows up.
Sixteen years ago, Softbank had only three golden eggs. That’s a market cap of 3 trillion yen. The “market only saw the eggs.” It ignored the goose.
But “eggs do not lay eggs.” It’s the goose! Value us!
There’s a golden egg factory inside the goose! If you look at the slide on page 56 of the PDF, you’ll see the goose is a robot with an “internal mechanism.” Also, the goose is labeled “ASI” — that’s “artificial superintelligence”. The golden eggs are “ASI” too.
Son has long been fond of his golden goose. SoftBank’s 2014 earnings slides start with “SoftBank equals Goose”. Later on, Son includes the whole story of “The Goose that Laid the Golden Eggs”. [SoftBank, 2014, PDF, archive]
Wednesday’s presentation for 2026 mostly hammers on Son’s love of AI. Son said: [Reuters]
I think it’s blasphemy against AI if you say it’s a bubble.
SoftBank insiders said last month they were getting worried how deep in OpenAI SoftBank is. [Bloomberg, archive]
News just came out that OpenAI might put off its IPO until 2027 — and SoftBank’s share price dropped 14%. [MarketWatch]
Son is most annoyed that the market cap of SoftBank stock is half its claimed asset value. “The goose was not valued.” But the market isn’t just valuing the goose at zero — it’s giving it a huge penalty.
Of course, SoftBank is not the goose that lays the golden eggs. ARM Holdings is the goose. SoftBank’s profitable companies are the geese. They’re what produces the golden eggs.
SoftBank is the goose’s greedy owner. SoftBank takes the money from its successful companies like ARM and sets the money on fire at WeWork and now OpenAI.
Son is making out he is the AI-powered golden goose. Praise him! Do not question him!
But that’s not what’s happening here. SoftBank is not a golden goose, it’s not a factory, and it’s not a superintelligence.
Also, “The Goose that Laid the Golden Eggs” is a cautionary fable about greed being bad.
The AI push runs on setting venture capital money on fire and charging chatbot users way less than the bot costs to serve.
The idea is that the customers will get hooked on the business value. Then the AI vendors can gouge them. And the vendors can try not to haemorrhage quite so much cash.
So Microsoft, Anthropic, and OpenAI have all been moving their software customers off monthly subscriptions to token-based billing. Tokens have the added enterprise pricing advantage that the cost of anything is completely obscure.
One source told Axios that one of their clients had spent $500 million in a month on Claude Code. We don’t know who it was — but the best guesses are Amazon or Uber, both of which are known to have run up stupendous bills. [Axios]
It’s not just the coders — it’s the ordinary office workers! They’re being told to AI up everything and give it that veneer of slop. That’s professional now.
Walmart said in early June that it was rationing its internal AI tool “Code Puppy”, which did office workslop as well as code. Code Puppy used to be unlimited. Now it’s not. [Bloomberg]
This was really an excuse for layoffs — Accenture’s consulting business is badly down and its stock price has cratered over the past year.
But Accenture employees responded to the incentive — and now the bill’s coming due. 404 Media got a leak of a meeting at Accenture: [404, archive]
“We’re seeing from some of the data internally at least that it’s actually not our engineers that are driving the token consumption. It’s a lot of the non-engineers that are doing some of those behaviors […] you were talking about.”
… Stuart Henderson … jokes he hopes Kwak didn’t just convert a PDF into images and then into markdown files. “I’m learning that’s one of the big token chewers,” Henderson says. “Turning PDFs into markdown: is that right?”
If you order people to use a chatbot that doesn’t do anything useful, they’ll just point it at any old trash. Then the bill hits.
The customer backlash is bad enough that Sam Altman at OpenAI is openly talking about price cuts. [WSJ, archive]
But OpenAI can’t afford price cuts. They need revenue numbers to make their planned IPO look plausible. And Altman doesn’t have the sort of Elon magic that SpaceX ran their IPO on.
Elon Musk isn’t convincing the markets either. The SpaceX stock price shot up — then back down a few days later. It’s been steady since, just below the offering price.
The AI scam cannot possibly pay for itself from sales. AI turns out not to be critical for real business work.
The use case for AI is doing things that should not be done. And there’s only so much market for that.
Between February 2022 and September 2025, Bellingcat staff and volunteers collected, geolocated, and shared more than 2,500 incidents of civilian harm following Russia’s full-scale invasion of Ukraine.
As part of this effort, Bellingcat tested a new machine learning model intended to rank Telegram social media posts on their likelihood of containing incidents of civilian harm.
This novel methodology dramatically reduced the search and selection time required, freeing researchers to focus on verifying incidents of civilian harm – not just searching for them.
This piece documents our methodology, ethical considerations and lessons learned in the hope that others researching similar topics can benefit from our work.
Open source research into civilian harm is still a relatively new field and it presents many challenges – one of the biggest is organising and sorting through the huge volume of user generated content being produced to find what is relevant.
Machine learning, a form of artificial intelligence that uses algorithms to identify patterns from large amounts of data and make predictions, can make this task more efficient.
With ongoing conflicts involving large amounts of civilian harm occurring in Sudan, and much of the Middle East, this guide aims to offer those covering these conflicts an example of how machine learning can be used to help find and sort incidents. You can also access the Code Notebook for our model here.
We defined “civilian harm” not just as civilian deaths or injuries resulting from armed conflict, but also the broader and delayed effects on civilians from mental trauma, loss of livelihood, displacement, destruction of infrastructure and more. This definition was informed by the Protection of Civilians bookon civilian harm.
Initial Telegram Dataset
Each Telegram post containing civilian harm which had already been manually verified by researchers was used to build an initial dataset of confirmed cases of civilian harm, which data scientists call positive instances. We collected a total of 5,848 unique URLs for these Telegram posts. For our manual collection we reviewed posts on relevant Telegram channels, working through oldest to newest posts each day. Assuming that a given post made it to our geolocated incidents list, it meant the researcher who flagged it also looked at the posts that appeared before and after it on Telegram and did not flag those ones, so we selected the 10 posts surrounding the verified civilian harm post as our additional dataset of posts that did not contain civilian harm. After excluding any deleted or duplicate posts, we ended up with 48,545 non-civilian harm posts, our negative instances.
The choice to overrepresent negative instances aims at better reflecting the real world and increasing data available for model training.
We enriched each URL with metadata from the Telegram API, such as the time of publication, reactions or textual content. As some of these posts had been deleted, we completed the missing data points with previously preserved versions from our Auto Archiver database, only available for the positive instances.
Feature Engineering
Training a machine learning model requires numerical data, as these models compute a prediction score based on mathematical operations.
We built these by converting raw data from our initial dataset, such as keywords signalling potential civilian harm, into numerical scores (or “features”) that the model could interpret, with the aim of increasing the model’s ability to identify patterns. This process, known as feature engineering, can significantly improve model results because it allows data scientists to suggest explicit context knowledge.
A full list of features we used to train the model can be found in the code notebook accompanying this piece. Many features were directly inspired by researchers’ input from their experiences manually screening cases of civilian harm by sorting through a set number of Telegram channels and inspecting each post individually.
Several of the features used were directly built from the metadata contained in each Telegram post including media_type, day_of_week; or binary ones: forwarded, edited andreply_to.
Other features included engagement information: views, forwards, total_reactions, and even individual features for most used emojis including the reaction_crying_face to count emoji.
Converting Text to Numbers
To embed the experience from the manual collection process, researchers put together a list of keywords both in Ukrainian and Russian that, to them, signalled posts likely to show civilian harm. For instance, “Шахед” and “КАБ” translated to “Shahed” and “Guided aerial bomb” respectively. We created a numerical feature to count their frequency.
In addition, we included several generic English-language keywords which meaningfully signalled potential civilian harm, such as “injured”, “school affected” and “hospital affected” that were only used for generating semantic similarity scores.
A semantic similarity score is a calculation used to determine the proximity in meaning between different words and phrases. To get the semantic similarity between the post text and each of our keywords, we represented each in a list of numbers via a Sentence Transformer model, which converts words into numerical representations called vectors that a computer can understand.
We then calculated the level of similarity between each vector using cosine similarity, one of the most popular methods for measuring similarity between two pieces of text.
Due to how embeddings work, this calculation results in a figure on a scale from -1 (no semantic proximity) to 1 (same meaning). For example, the words “hurt” and “injured” would have a high similarity score, while “residential” and “injured” would have a negative score as the words are not semantically similar.
Finally, to enable the model to identify the relevance of each post to civilian harm in Ukraine, we used a multilingual text transformer from the BERT family of language models to represent the entire post’s text as a vector of 768 numerical values. This model can efficiently represent text from many languages in a way that captures meaning: the same sentence in different languages will generate similar embeddings, and trained machine learning models can detect patterns in the embeddings.
It is important to note that for this initial prototype of a civilian harm detection model, we did not include any features derived from media content such as photos and videos, although that would be a logical next step in attempting to improve model performance.
Selecting, Training and Evaluating Models
With 54,393 rows of 893 numerical features each, we selected four machine learning algorithms to train our predictive models.
We chose Logistic Regression as a baseline algorithm due to its simplicity. We also selected three other “best in class” models, Random Forest, XGBoost, and LightGBM. These choices centred on the interpretability of the models and their ability to work on tabular data of this size. For example, we avoided neural networks due to a lack of interpretability and because those models work best with a larger dataset.
To genuinely assess the performance of the trained models, we split our dataset into three parts:
A training set – the data the models were trained on (60 percent of the full dataset’s rows)
A validation set – used for an intermediary evaluation when tuning model parameters (20 percent of all rows)
A test set – hidden for the final performance assessment, so the models were evaluated on unseen data (remaining 20 percent of rows)
We used a stratified split to divide the dataset instead of a random split. This method ensured the proportion of positive instances (i.e. confirmed cases of civilian harm) remained consistent across all three sets at about 11 percent.
To measure the performance of machine learning models, we ran them through the test set and measured the number of correct and incorrect predictions. Models output a likelihood between 0 and 1 that each Telegram post contains civilian harm, and we tried to find a cut-off threshold that leads to a good balance between flagging almost every post (0.1) or flagging very few (0.9).
There are two main types of evaluation metrics to gauge a model’s prediction power. Recall asserts what fraction of positive instances (i.e. known civilian harm posts) were correctly flagged as such. Precision measures the fraction of posts flagged as civilian harm that are indeed civilian harm posts.
During the training phase, we tuned the models to maximise average precision (PR-AUC), a metric that summarises precision across all recall levels. While this method also accounts for precision, it prioritises recall, which is preferable for this use case as it steers model selection to reduce the number of civilian harm posts that are skipped.
The following table sorts models from best to worst PR-AUC against a baseline of a coin-flip predictor. ROC-AUC and F1 are two other evaluation metrics included as sanity checks. Simply put, ROC-AUC measures the probability of ranking two instances, one negative and one positive, correctly; F1 balances precision and recall equally and its best cut-off threshold value.
Model test scores comparison, XGBoost stands out in every relevant metric evaluated.
From these results, we selected XGBoost as our final model as it had the best scores when compared across all metrics.
Interpreting the Model
Because these models are interpretable, we can understand which features are the most useful when predicting whether a post includes civilian harm. The above table shows the top 10 features that most strongly signal the XGBoost model to make a decision:
semantic_keywords_similarity: the semantic proximity between the post text and manually selected keywords “casualties”, “damage” and “civilian harm”
bert: the model was able to discern meaning from the text with the same strength as some of the other features in this list – there are three cases of this in the top 10
reaction_crying_face: reactions with crying face emojis on the post
group_of_messages: whether a post contains multiple media files
keywords_in_text: the number of custom Ukrainian or Russian keywords in the post
These results generally tally with what you might expect when selecting Telegram posts for instances of civilian harm, including that posts that generate a lot of emotional engagement and posts using keywords about civilian harm were among those most likely to contain content related to this topic. Not all models had the same top features as XGBoost. In fact, for the Random Forest model the most important feature was the number of crying face emojis present in a post, a soft pattern highlighted by researchers when this methodology was first imagined.
LLM Results and Comparison
Retroactively, we decided to run a sample of the same test dataset through different large language models (LLMs) to gauge their ability to make these same predictions.
We aimed to include an LLM-generated score as an extra feature for our trained models, which would be captured as relevant if it correlated with the correct predictions.
To start, we selected two local models, the 1B and 4B variants of Gemma 3 from Google DeepMind, and two cloud-hosted models, Gemini 2.5 flash and Gemini 3.5 flash. With this selection, we hoped to compare results across a wide range of models’ expected performance.
We generated a 400-row stratified sample (preserving the same proportion of real civilian harm instances) from the test dataset used for the custom models. For each of the four LLM models, we ran two tests: one where only the Telegram post message was sent, and another including both the message and the engineered features (excluding the text embeddings, as the model had direct access to the text). In the prompt for each model, we asked for a score between 0 and 1. We then evaluated the results as we did for the custom models.
The above table shows that LLMs can indeed extract value from the engineered features. All four LLMs surpassed the baseline Logistic Regression model in our tests, yet none of them performed better than the other custom-trained models, and XGBoost remained the one with the highest PR-AUC.
Still, Gemini 2.5 Flash performed better than its newer version 3.5 and even achieved a slightly higher best F1 score than any other model. While this is a good result, for the flagging of civilian harm posts, the PR-AUC remains the crucial metric, as it captures the model’s ability to identify infrequent instances of civilian harm while minimising false positives.
Ethical Considerations
Introducing an instrument of automated decision-making into a process of detecting civilian harm brings inherent ethical questions. These include automation bias, or how humans tend to blindly place faith in machine-generated recommendations; algorithmic bias, or how the results of these models echo the same patterns present in the training data, including under- or over-representation of types of civilian harm.
The decision to test an automated methodology for this particular project came from the fact that there were limited resources for both steps in the process – the detection of potential civilian harm and its actual verification. Historically, we built an enormous backlog of unverified incidents because a lot of time had to be spent on monitoring the most recent events so that potential evidence would be captured and preserved as soon as possible.
The automation of this process also reduced the exposure of researchers to a significant amount of unpleasant and distressing visual and text content, reducing the burden of exposure to traumatic content.
For this project, we tried to ameliorate the ethical challenges with a number of strategies including randomly flagging posts not captured by any model, monitoring which features models relied on to make decisions, and by doing historical comparisons of patterns in data.
Additionally, as stated above, for this initial prototype of a civilian harm detection model we did not include any features derived from the media content itself. In the future, it would be a logical next step in attempting to improve the model performance, to include the media from the posts – but using AI to review actual media comes with additional ethical challenges such as model bias.
Because of the opaque ownership of many LLM companies and their generative nature, the use of LLMs for an extra feature presented additional ethical challenges including privacy and safety concerns considering the sensitive nature of the data. Our model did not rely on LLMs, though we retroactively ran a sample through it.
How the Model Fits into the Bigger Picture
After selecting this model, we created a user interface where researchers could view a list of Telegram posts sorted from most to least likely to contain indications of civilian harm. The user interface was designed for quick triage and integration, where a positive confirmation from researchers would instantly send the post to the Auto Archiver (Bellingcat’s tool for preserving digital content) and then transfer it to ATLOS (our internal collaborative verification platform). Bellingcat staff and volunteers could then manually verify incidents. Researcher input was constantly stored so that this data could be used to improve the model in the future.
Preliminary feedback indicated that the AI model was useful. Not only were we able to reduce time and harm from scouring through dozens of war reporting Telegram channels, researchers also reported that the stream of new posts being added to the verification backlog were capturing real and diverse cases of civilian harm.
Despite the focus on civilian harm and Telegram (highly popular in Ukraine and Russia), this pipeline is generic and can be adapted to other conflict monitoring tasks. How easily this can be done does depend on how open the social media platform is and whether it is possible to scrape posts from it. Apart from that, it is easy to incorporate new features and data, and cheap to automatically retrain, test and deploy models as the system receives more human input.
Looking forward, sorting through overwhelming amounts of data in a conflict will continue to be challenging. Hopefully, this methodology can help newsrooms, conflict monitoring organisations, and others find the balance between ethical considerations and resources in order to carry out open source investigations on civilian harm and human rights violations.
Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Bluesky here, Instagram here, Reddit here and YouTube here.
In February, police in Claremore, Oklahoma arrested farmer Darren Blanchard for speaking a little too long during a community meeting about data centers. The city charged Blanchard with criminal trespass, a crime with a $200 penalty, but he’s vowed to fight the charge. He recently shared video of the bodycam footage for the first time with 404 Media and answered our questions about the moment cops arrested him for going over his time at a February 17 community meeting of the Claremore City Council.
The plan in February was for the City Council to listen to the concerns citizens had about a planned data center called Project Mustang. The residents of Claremore don’t want the data center and largely feel like the construction project was approved without their input. City officials signed non-disclosure agreements on behalf of the project’s developers and haven’t been forthcoming with details about its construction.