Professor Mustafa Ocal MS '21 Ph.D. '22 researches how to detect AI-generated writing. Photo by Annette Gonzalez.

Why is AI obsessed with the em dash?

June 15, 2026 at 3:17pm

Just a few years ago, the em dash was an ordinary form of punctuation used by writers to emphasize, interrupt and set aside ideas. (It gets its name from typesetting, historically being the exact width of the capital letter “M.”)

Now, AI models like ChatGPT and Claude use it as a common connector, standing in the place of commas and semicolons to string together phrases of all kinds.

Grammatically, it's fine. Aesthetically, it's nice. But audiences are becoming wary of its association with artificial intelligence, so much so that in May, social media users swarmed Nike with accusations of using AI when the company deployed the dash in a post celebrating tennis star Jannik Sinner.

Jannik Sinner can do it all. 6 consecutive titles, a career Golden Masters, and a new record set on home soil. This isn't just history — it's his story in the making. pic.twitter.com/ccPdjJbbjL
— Nike (@Nike) May 17, 2026

With all this hostility, why does AI continue to use the em dash prolifically?

FIU News sat down with AI researcher Mustafa Ocal to better understand why AI loves the em dash in 2026.

Ocal is an assistant teaching professor at the university's Knight Foundation School of Computing and Information Sciences. With a master's degree and Ph.D. in computer science from FIU, he is an expert in detecting language that was generated by AI, with a focus on screening resumes for AI-written text.

Why does AI use so many em dashes?

The core insight is this: AI doesn't think. It predicts.

Let me show you something. Go to iMessage on your phone and type: "Tomorrow I will go to the..." and look at the worlds below. They are "gym," "store" and "hospital."

I have the same three words when I go through this exercise. A lot of my students do, too, when we do this in class.

This is called auto-prediction. The iMessage system looks at its dataset and asks, "How many sentences have started with 'Tomorrow I will go to the...?'"

Let's say that out of 100 sentences, 50% end with "gym," 30% with "store" and 15% with "hospital." The remaining 5% of sentences end with something else. So, iMessage gave you the three most probable options. This is essentially how AI writes, just on a much greater scale.

So how does that lead to the proliferation of em dashes?

AI models are trying to give you the highest quality responses, and they think phrases with em dashes in them are high quality.

During the training of a model, engineers assign scores to different sources of data. That way, the AI can learn what kind of data it should value most.

For example, a celebrated novelist's work might get graded a 10 out of 10. A college student's blog post could get a 3, while a professor's writing gets an 8.

We see the em dash in AI writing today because the great authors and journalists whose work receives 10 out of 10s use em dashes constantly. It's their style.

That's why actual writers are frustrated right now. They're saying, "I've always used em dashes. AI copied me, not the other way around." And they're right.

But great writers use em dashes sparingly, as a dramatic break or for emphasis. Why does AI go overboard?

Remember, AI doesn't think. It's not writing the way humans do. Instead, it's predicting which nugget of text should go next, one at a time. We call these words and pieces of text "tokens."

As AI writes something token-by-token, it references its training data. Many of the highest-quality tokens from this training data contain the em dash. And so, piece by piece, AI creates a paragraph full of em dashes.

Aren't models trained to respond with some variety though?

Of course. Models follow a process called sampling, where they pick their next words by choosing from a set of likely options. ChatGPT won't give 50 people in the same room the same answer to a question as long as it's not a one-word answer or something."

But sampling doesn't stop AI from using em dashes. As AI responds token-by-token, it chooses randomly from those high-scoring options. Regardless of which path the model takes, it keeps ending up at em dashes because so many of the 10-out-of-10 options have them. There are just too many em dashes in the pool of high-quality tokens for them to be avoided.

Have the companies tried to tone it down?

They've started to. OpenAI recently made its models better at obeying 'no em dash' instructions. But the default behavior hasn't changed much because the underlying training data hasn't.

When you forbid the em dash, you're pushing the model away from the phrasings it considers most probable to go next, the most polished ones. You don't lock the model out of all its best material, but the writing can come out a bit flatter.

Consider this, too. This is just my theory: In today's age, might the em dash be a kind of watermark for the AI companies? When you and I see it, we wonder if what we're reading was written by AI. What I wonder is: Why wouldn't they want it known that their products are popular?

If someone wants to use AI for writing, but doesn't want the "AI-isms," what's the simplest fix?

You could tell it to write like a human. I see a lot of people doing that.

What else screams "AI writing" to you?

When someone writes "delve into." I have never heard a person say that they were going to "delve into something" in normal conversation. But it's in the high-quality source material, so AI uses it constantly.

The models lean on a whole family of sophisticated-sounding words and openers: "in the realm of," "a myriad of," "nuanced," "intricate," "robust." These appear constantly in the high-scoring source material, so the models reproduce them at rates that almost no human writer would.

Final take on the em dash: Do you like it or not?

I don't use it. But I have a colleague who does. He tells me, "I'm not going to stop my writing style just because AI exists."

What I hate is that there's now a very thin line between a good writer and the AI. And this em dash thing is one of the things ruining that line.