AI · Linguistics · Machine Learning · 2026

The Punctuation That Betrays the Machine

I have spent time studying how AI writes, and no detail is more revealing than a single character: the em dash. What it is doing inside a language model is far stranger and more fascinating than most people realize.

By Ayushman Mishra AI Behavior & Language Deep Dive Sources: arXiv, NPR, ACM, Nick Potkalitsky, Sean Goedecke
2x em dash usage increase in academic text, 2021 to 2025
6.97 em dashes per 1,000 words in GPT-4.1 output
10x more em dashes in GPT-4o vs. GPT-3.5
3.23 human baseline mean per 1,000 words

I have been reading AI-generated text for long enough now to have developed a reflex I cannot switch off. A certain rhythm in a sentence, a particular way a subordinate clause attaches to a main one, a punctuation mark that appears too often in places where a comma or a period would do perfectly well. The character is , the em dash, and once you learn to notice it, you begin to see it everywhere: in chatbot responses, in AI-assisted emails, in research summaries, in customer service messages, in LinkedIn posts that sound slightly too polished. It has become, for many readers, the single most reliable fingerprint of machine-generated prose. But the question I want to answer here is not merely whether AI overuses the em dash. The more interesting question is: why? And the answer reaches deep into how these models are built, trained, and, in a strange sense, taught to perform intelligence.

I want to be careful about one thing before I go further. The em dash is not wrong. It is not a mark of bad writing. Emily Dickinson used it obsessively, so much so that her editors spent decades arguing about whether to standardize her dashes into conventional punctuation. The New Yorker has always been fond of it. Long-form journalism, literary essays, and academic prose have employed it for over a century. The character itself is innocent. What I am investigating is not the em dash's guilt but the mechanism by which AI systems developed such a disproportionate, compulsive attachment to it, and what that attachment reveals about the nature of large language models at a level far below the words they produce.

"The em dash phenomenon is not a quirk. It is a diagnostic signature of how a model was fine-tuned, and it varies from zero to near-invariant under explicit prohibition across different providers."

arXiv: The Last Fingerprint, 2026
Part One

A Brief History of a Character Nobody Had a Key For

I think the history of the em dash matters here, because it shapes which texts contain it and therefore which texts ended up in AI training data. The em dash derives its name from traditional typography, where it was traditionally the width of a capital "M" in a given typeface. In early typesetting, it was used to create interruptions or long pauses, bridging sentiments and saving physical space on a page. It was common, deliberate, and considered a mark of crafted, edited prose.

Then the typewriter arrived, and the em dash quietly retreated. Standard typewriter keyboards had no em dash key. Writers who wanted one had to type two hyphens in sequence, and over time, the double hyphen became the accepted substitute in typed documents. The em dash became the province of print, of professionally typeset books and magazines, of publishing houses with compositors who knew their character sets. It remained common in formal print throughout the twentieth century, but the average person typing on a keyboard stopped thinking about it.

Word processors changed this again. Microsoft Word and Apple's text editors introduced autocorrect features that would automatically convert a double hyphen into a proper em dash, making it suddenly accessible to anyone typing on a computer without requiring any special knowledge. Then came the internet, which brought an explosion of writing in every style, from formal to casual, and the em dash spread across a new generation of journalists, bloggers, and essayists who discovered it as a versatile, expressive tool. By the time AI training data was being assembled from the web, the em dash had settled into a particular niche: it was the punctuation mark of high-quality, carefully composed digital writing.

Part Two

How Training Data Creates Invisible Biases

I have spent time thinking about what it means for a language model to learn to write, and the most important thing to understand is that these models do not learn rules. They learn statistical patterns from enormous corpora of text. A model does not know that an em dash is used to set off a parenthetical clause or to signal an abrupt interruption. It knows, in a very deep probabilistic sense, that in certain syntactic contexts, in certain registers of writing, a particular token tends to follow certain other tokens. The em dash is not a rule the model has memorized. It is a pattern the model has absorbed, weighted by how frequently that pattern appeared in the text it was trained on.

This matters enormously when you consider what kinds of text AI labs use to train their models. The earliest large language models were trained primarily on publicly available internet data, which included writing from across the full quality spectrum. Then, as the capabilities of these models became clear, AI labs began actively curating their training data toward higher-quality sources: digitized books, academic papers, longform journalism, professional documentation, carefully edited websites. Research by Sean Goedecke, whose analysis I have found particularly persuasive, points to a decisive shift: AI labs began scanning large numbers of print books, particularly from the nineteenth and early twentieth centuries, where em dash usage was not merely common but stylistically dominant.

A study on punctuation frequency in English text found that em dash usage peaked around 1860 and then gradually declined through the twentieth century as typewriters normalized the hyphen. If modern AI models were trained on a heavy diet of Victorian and Edwardian literature, classic American prose, and the full digitized catalogue of print publishing, they would absorb em dash patterns from an era when the character was at the height of its literary prestige. The model would learn, from Melville and Dickens and Wharton and their contemporaries, that a sophisticated writer uses the em dash as naturally as breathing.

A Notable Generational Signal

GPT-3.5 Did Not Do This

One of the most revealing data points I encountered in my research is a simple historical observation: GPT-3.5, released in late 2022, did not overuse em dashes. GPT-4o, released in 2024, used approximately ten times as many em dashes as its predecessor. GPT-4.1 was even more pronounced. The pattern appears across Anthropic and Google's models as well. Something changed between these generations, and the most plausible explanation is not an algorithmic shift but a data one. As AI labs recognized that high-quality training data produced better models, they began incorporating more curated, formally edited text, including digitized print books, into their training corpora. The em dash came along as an unintended passenger.

Part Three

The RLHF Amplification: How Human Feedback Made It Worse

I want to explain a mechanism that I think is underappreciated in popular discussions of this topic, because it is the part that transforms a mild statistical tendency in training data into the compulsive, difficult-to-suppress pattern we see in deployed models today. That mechanism is Reinforcement Learning from Human Feedback, or RLHF, and understanding it is essential to understanding why the em dash problem is so persistent.

After a language model is pretrained on vast text corpora, it undergoes a second phase of training in which human annotators are shown pairs of model outputs and asked to indicate which response they prefer. Those preferences are used to train a separate reward model, which then provides a signal to update the original language model through reinforcement learning. The goal is to align the model's output with what humans find helpful, clear, and well-organized. The problem, and it is a structural one, is that the human annotators in this process are disproportionately technical workers: developers, researchers, and educated professionals who are comfortable with structured, formally edited writing.

Research published in arXiv in early 2026 makes this mechanism explicit. Human evaluators in the RLHF pipeline tend to rate clear, well-organized prose more highly. Em-dash-heavy prose reads as precise, articulate, and structurally aware to readers familiar with high-quality edited text. The RLHF process therefore selects for outputs that these evaluators prefer, systematically rewarding the em-dash-heavy register even when the em dashes themselves add nothing to clarity. The em dash is not being used because it is appropriate. It is being used because it signals a style of writing that trained human evaluators associate with quality, and the model has learned, through thousands of reward iterations, that this signal correlates with higher scores.

The local optimization trap

There is a second mechanism operating here that I find particularly interesting from a technical standpoint. Language models generate text token by token, evaluating at each step which token is most likely to follow the current context. They do not generate a full paragraph and then review it for stylistic consistency. Every sentence is constructed locally, in the moment, without any global awareness of what has come before in the same response. A human writer who has used an em dash twice in the last paragraph will often notice the pattern and consciously vary her punctuation in the next one. A language model does not do this. It does not "hear" that it has used five dashes in one page and decide to reach for a semicolon instead. Each token decision is made fresh, and if the training distribution says that em dashes are associated with quality at this point in a sentence, the model will reach for one again, regardless of how many it has already used.

This is what researchers call local optimization without global editorial judgment, and it produces a specific signature: not just em dash usage, but rhythmically repeated em dash usage, at a frequency no human writer with editorial instincts would ever sustain across a long document.

Part Four

The Versatility Trap: Why the Em Dash Is Uniquely Vulnerable to Overuse

I have thought about why specifically the em dash became the dominant tell of AI writing rather than some other punctuation mark, and I believe the answer lies in a particular quality of the character itself: it is the most grammatically promiscuous punctuation mark in the English language. An em dash can substitute for a comma, a colon, a semicolon, parentheses, or even a period in the right context. It can introduce a clause, interrupt one, set off an appositive, or signal a dramatic pause. Because it is technically valid in so many syntactic positions, a model that has learned to associate it with quality writing can deploy it almost anywhere without generating a grammatical error.

Compare this to a semicolon, which has relatively strict rules about connecting independent clauses of equal weight, or to parentheses, which impose structural requirements on the sentence around them. The em dash has no such constraints. It is the grammatical equivalent of a master key: it opens almost any door. For a model doing local token prediction without global style awareness, a character that is both high-prestige and low-risk is an ideal default. The em dash is never technically wrong, it signals quality to the evaluators who trained the model, and it requires no structural analysis of the surrounding sentence to deploy safely. Of course the model reaches for it constantly.

The Feedback Loop Nobody Planned

When AI Output Becomes Training Data

I want to raise a concern that I do not see discussed often enough, and it is about the future trajectory of this pattern rather than its present state. As AI writing tools become embedded in everyday writing workflows, a significant and growing proportion of the text published online is being produced with AI assistance. That text is absorbed into the training corpora of future models. Research tracking scientific abstracts found that em dash usage more than doubled between 2021 and 2025, precisely the period when AI writing tools became mainstream. Writers who use AI assistance absorb its dash-heavy style, then produce writing that becomes training data for future models, potentially amplifying the tendency further. The pattern that began as a statistical artifact of curated historical training data is now entering a feedback loop with human writing behavior itself. The em dash problem is not a fixed phenomenon. Unless AI labs actively intervene to suppress it, the pattern may entrench itself across successive model generations in ways that become progressively harder to trace back to an origin.

Part Five

The Suppression Problem: Why You Cannot Simply Tell a Model to Stop

I have experimented with this myself, and the research confirms what I found: instructing a language model to avoid em dashes is surprisingly ineffective, particularly in the models most prone to the behavior. Research published in early 2026 found that GPT-4.1 produced em dashes at a rate of approximately 6.97 per 1,000 words in longform output even under explicit instruction to suppress them. The em dash frequency varies within the human range for some models, but for others, the behavior resists override not because the model is ignoring the instruction, but because the preference is baked into the neural weights themselves at a level below explicit instruction following.

This is a technically important point. RLHF training adjusts the weights of a neural network through thousands of iterative updates. By the time a model is deployed, the em dash preference is not a rule stored somewhere that can be toggled off. It is distributed across billions of parameters as a diffuse statistical tendency. Explicit prompting operates at the level of attention and token probability, and it can partially suppress a behavior, but suppressing a deeply trained stylistic preference requires either retraining the model with different reward signals or applying inference-time constraints that monitor output as it is generated and intervene when the pattern recurs. Sam Altman publicly acknowledged that em dash frequency in ChatGPT output had been adjusted in response to user preference, confirming that fine-tuning procedures can target specific punctuation-level features, but also that this requires active intervention from the lab rather than a simple user instruction.

"The em dash is never an error to be avoided from the model's perspective. It is often the most natural choice given its training distribution."

Nick Potkalitsky, Why AI Cannot Stop Using Em Dashes, 2025
Part Six

The Backlash, the Backlash to the Backlash, and What Both Get Wrong

I want to spend a moment on the cultural fallout from all of this, because it has produced some genuinely strange outcomes. When the pattern first attracted widespread attention, the advice that spread across social media was blunt: remove all em dashes from your writing before submitting it anywhere, because readers will assume you used AI. Reddit users asked, in earnest, whether they should stop using em dashes to avoid being mistaken for a chatbot. A fashion and lifestyle podcast for Gen Z brought the phrase "the ChatGPT hyphen" into mainstream conversation, and a wave of think pieces followed.

The backlash to that backlash came from writers, grammar professionals, and linguists who were rightfully indignant. The em dash is not AI's punctuation. It is one of the most expressive marks in written English, and writers from Dickinson to Joan Didion have made it central to their voice. Writing instructor Susan Lovett, interviewed by NPR, put it plainly: the solution is not to avoid the em dash, but to reclaim its correct use. Using it sparingly, for genuine rhetorical effect, at moments where no other mark would serve as well, is still excellent writing. Stripping it from human prose out of anxiety about AI detection is a capitulation to a misunderstanding.

But I think both sides of that debate miss the most interesting point. The em dash did not become a tell of AI writing because AI chose it as a signature. It became a tell because a specific chain of events, curated training data heavy with Victorian prose, human feedback from evaluators who associate formal punctuation with quality, local token optimization without global style awareness, made it statistically inevitable. The em dash is an artifact of a training process, not a design decision. And that means the real question it raises is not about punctuation at all. It is about what else might be embedded in these models as invisible stylistic cargo, patterns that feel like intelligent writing choices but are actually inherited compulsions from the data that built them.

How to Read AI Writing More Critically
Conclusion

What a Punctuation Mark Teaches Us About Machine Intelligence

I have been thinking about why the em dash story feels significant beyond its immediate practical implications, and I keep arriving at the same answer. It is a concrete, legible example of something that is otherwise very hard to see: the way AI models carry invisible cargo from the data and feedback processes that shaped them. The em dash is not a bug. Nobody at any AI lab sat down and decided that large language models should overuse this particular character. It emerged from a pipeline, from decisions about training data curation, from the preferences of human annotators, from the structural properties of token-level prediction, from the grammatical versatility of the character itself. No single decision caused it. Every decision contributed to it.

I find this genuinely humbling as someone who thinks about these systems. We spend a great deal of energy discussing what AI models say: whether they are accurate, whether they are biased, whether they produce harmful content. We spend much less energy discussing how they write, and whether the stylistic patterns they have absorbed from their training might themselves carry assumptions and preferences that were never deliberately chosen. The em dash is harmless. But the mechanism that produced it is not exclusive to punctuation. The same pipeline of curated data, human feedback from particular demographics, and local optimization without global review produces tendencies in vocabulary, argumentation structure, rhetorical framing, and conceptual emphasis that are far harder to detect and far more consequential than a misplaced dash.

I have been in enough discussions about AI writing to know that most people, when they think about AI style, think about it as something superficial, something to be polished away by prompting or by human editing. What the em dash story suggests is something different. It suggests that the stylistic character of a language model is not separable from its training history, that what a model sounds like reflects, in specific and traceable ways, what it was built from and what it was rewarded for. The em dash is not a cosmetic artifact. It is a record of how these systems were made, pressed into every sentence they produce, and visible to anyone who knows where to look.