Inclusive job ads: Why LLMs fall short (and what really works)

If you’re in TA or DEI, you’ve probably been in this conversation: “Can’t we just run our job ads through ChatGPT or Copilot to make them inclusive?”

It’s a fair question. LLMs are embedded in so many workflows now, and they’re right there. But here’s what we’ve learned from actually testing this approach – and why it matters more than ever.

What happens when you ask an LLM to “make this inclusive”

We’ve analyzed hundreds of job ads – both original postings and LLM-rewritten versions. The pattern is consistent and concerning.

One representative example: a job ad for a Talent Acquisition Specialist scored 36 on an inclusion analysis (strongly non-inclusive). We asked an LLM to rewrite it to be “inclusive toward everyone and inclusive across demographic groups.”

The result? Score improved to 45. Still strongly non-inclusive.

Across our broader analysis, we saw similar patterns repeated: marginal improvements at best, inconsistent results between runs, and scores that rarely moved into the “inclusive” range even after multiple rewrites. Some ads actually scored worse after LLM intervention – not because the prompt was bad, but because the model introduced new forms of bias while removing others.

And that’s the first problem.

The reliability issue no one talks about

LLMs are non-deterministic by design. Run the same prompt twice, get different outputs. That score improvement in attempt two? Could be progress. Could be randomness. There’s no way to know.

For a TA team building scalable processes, this is a dealbreaker. You can’t build quality assurance on “maybe it’ll be better this time.”

The bias shuffle: removing one problem, creating another

Here’s where it gets interesting (and frustrating). The rewrites did catch some biased language. But they also introduced new bias – specifically, a flood of communal, emotion-forward language:

  • “We’d love to hear from you”
  • “Feel free to reach out”
  • “We care deeply about…”
  • Multiple instances of “love,” “feel,” “passionate”

If you’ve worked in inclusive language, you know where this goes. Overcorrecting agentic language with communal language doesn’t create inclusion – it just shifts which candidates you’re excluding. Research on gendered language in job ads is clear on this: balance matters, but so does context and role requirements.

The practical risk? If you’re only reviewing the output at surface level, you might miss that you’ve traded one bias pattern for another.

The stuff you didn’t ask for

LLMs also took creative liberties:

  • Restructured sections (renamed “Requirements” without prompting)
  • Added emojis throughout (🚀✨👥)
  • Inserted entire new paragraphs about company culture
  • Changed tone and formality level

Some of this might align with your employer brand. Most of it probably doesn’t. And none of it was requested.

On emojis specifically: they’re a brand choice, not an inclusion strategy. We haven’t seen research suggesting emojis make job ads more inclusive – but we have seen them create tone mismatches that undermine professionalism signals for certain roles.

Why LLMs structurally can’t solve this

This isn’t about the technology being “bad.” It’s about what it’s designed to do:

  1. LLMs predict probability, not quality
    They generate what’s statistically most likely based on training data. The “most likely” job ad language is… often biased. Because that’s what exists in volume online.
  1. No framework for language psychology
    LLMs don’t know the research on how specific word choices affect perception across demographic groups. They can sound confident while being completely inconsistent about inclusion principles.
  1. Inclusion as performance, not practice
    Watch for this: LLMs love adding generic DEI statements. “We’re committed to diversity and inclusion.” “All backgrounds welcome.”

But research shows these can actually backfire when they’re vague. If the statement is empty and generic, the job ad can appear less committed to inclusion than an ad without such a statement (source: What adds to job ads? The impact of equality and diversity information on organizational attraction in minority and majority ethnic groups by Heath, Carlsson and Ägerström (2023)).

Inclusive language isn’t about declaring inclusion in a closing paragraph. It’s demonstrated through every expectation, requirement, and signal in the ad itself.

What a research-based approach does differently

A dedicated inclusion tool built on language psychology research operates on different principles:

Consistency: Same input → same output. No randomness, no crossed fingers.

Transparency and control: You see what needs to change and why – with research backing. Not just a rewritten version with invisible edits.

Additive learning: Explanations help your team improve over time. You’re not just fixing one ad; you’re building capability.

Quality as responsibility: Detection of exclusive language isn’t treated as a probabilistic guess. It’s grounded in mapped, structured research on how language affects hiring outcomes.

What this means for your workflow

Some practical implications:

  • For TA teams: If you’re using LLMs in your job ad workflow, plan for human review specifically on inclusion. Don’t assume “inclusive prompt” = inclusive output.
  • For DEI practitioners: When evaluating tools, ask about the underlying model. Is it research-based or probability-based? Can it explain its suggestions? Is output consistent?
  • For both: Be especially careful with “rewrite” functions. They obscure what changed and why. Suggestion-based tools give you more control and learning opportunity.

To sum up

LLMs are powerful tools. They’re useful for many writing tasks. But inclusive job ads require something they’re not built for: consistent application of research-based principles about language psychology across demographic groups.

The most probable language is often biased language.
Inclusive language is intentional language.

That difference matters – especially as transparency requirements increase and candidates pay closer attention to the signals we send before they ever hit “apply.”

You May Also Like

These Related Stories