chore: add no-emdash/endash rule to agent instructions and CI lint by mafredri · Pull Request #24375 · coder/coder · GitHub
Skip to content

chore: add no-emdash/endash rule to agent instructions and CI lint#24375

Merged
mafredri merged 5 commits intomainfrom
mafredri/no-emdash-rule
Apr 21, 2026
Merged

chore: add no-emdash/endash rule to agent instructions and CI lint#24375
mafredri merged 5 commits intomainfrom
mafredri/no-emdash-rule

Conversation

@mafredri
Copy link
Copy Markdown
Member

@mafredri mafredri commented Apr 15, 2026

Add a lint check that prevents introduction of Unicode emdash (U+2014) and endash (U+2013) characters. These are almost exclusively introduced by AI agents and conflict with the project writing style.

What changed

  • scripts/check_emdash.sh: New lint script using grep -E with bash byte escapes to detect emdash/endash. Checks only added lines in the diff by default (existing violations are not flagged). Pass --all to scan the entire repo.
  • scripts/check_emdash_test.sh: Smoke test for the diff-parsing mode. Covers detection in added lines, clean-line exclusion, and clean-diff pass-through.
  • Makefile: Wired as lint/emdash in both lint: and lint-light: targets. Runs in CI and pre-commit automatically.
  • AGENTS.md: Added "No Emdash or Endash" rule in Code Style section with fix guidance (use commas, semicolons, periods, or restructure).
  • site/AGENTS.md: Same rule added to frontend guidelines. Fixed one existing emdash.
  • .claude/docs/DOCS_STYLE_GUIDE.md: Added Punctuation subsection with the same rule for docs authors.

Existing violations (~1170 lines across the repo) tracked for cleanup in #24526.

Implementation plan and decision log

Decisions

  1. Scope: prevent new only. Fixing existing ~1170 lines is tracked in chore: clean up existing emdash/endash violations (~1170 occurrences) #24526. (Human decision)
  2. Ban both emdash and endash. Humans rarely type endash; it is an AI tell regardless of typographic correctness. (Human ratified)
  3. Ban applies to all code files. Source, comments, string literals, docs. (Agent-recommended)
  4. Lint message teaches the fix. Prevents swapping to -- which cannot be reliably detected. (Human decision)
  5. ASCII -- covered by instructions only, not linted. Too many legitimate uses (SQL, shell, CLI). (Human ratified)
  6. Lint checks added lines only. Avoids forcing cleanup of unrelated pre-existing violations in touched files. Enables immediate CI enforcement. (Agent-recommended, human ratified)

Design notes

  • Uses grep -E with bash $'\xE2\x80\x94' byte escapes to build the pattern at runtime. No PCRE dependency; works on both GNU and BSD grep.
  • Diff parsing extracts only + lines from git diff -U0 for precise line-level checking.
  • Fetches the base ref in shallow CI clones (fetch-depth: 1) before diffing.
  • CLAUDE.md, .cursorrules, and site/CLAUDE.md are symlinks and update automatically.

🤖 This PR was created with the help of Coder Agents, and will be reviewed by a human. 🏂🏻

Add a lint check that prevents introduction of Unicode emdash (U+2014)
and endash (U+2013) characters. These are almost exclusively introduced
by AI agents and conflict with the project writing style.

The lint script (scripts/check_emdash.sh) checks only added lines in
the current diff by default, so existing violations do not block CI.
Pass --all to scan the entire repo for auditing.

Agent instructions in AGENTS.md, site/AGENTS.md, and the docs style
guide now explicitly ban emdash, endash, and " -- " as punctuation,
with guidance to use commas, semicolons, or periods instead.
mafredri

This comment was marked as resolved.

…lint

Address review findings:

- F1: Fetch base ref when not available in shallow clones (CI
  uses fetch-depth: 1). Fail explicitly instead of falling back to
  full scan.
- F2: Capture grep output instead of checking exit code to avoid
  xargs batching issues. Add -I to skip binary files.
- F3: Add grep -P capability check at script start.
- F4: Anchor hunk header regex to avoid capturing line numbers from
  trailing function context.
- F5: Tighten +++ filter to match "b/" prefix specifically.
- F6: Unify rule wording across all four locations (AGENTS.md,
  site/AGENTS.md, DOCS_STYLE_GUIDE.md, lint error message).
- F9: Remove restating comments, rename loop variable to diff_line.
…lint

Drop grep -P (PCRE) dependency. CI runners lack PCRE support.
Use bash $'\xE2\x80\x94' byte escapes to build the pattern
at runtime, then match with grep -E. This avoids both the PCRE
dependency and having literal emdash/endash in the script source
(which would trigger the check when the script itself is in the diff).
mafredri

This comment was marked as resolved.

mafredri

This comment was marked as resolved.

…lint

Add smoke test for the diff-parsing mode in check_emdash.sh.
Covers: emdash/endash detection in added lines, clean-line
exclusion, clean-diff pass-through. Addresses AMREM-10.
Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this if the presence of these characters caused you emotional pain.

Comment thread site/AGENTS.md
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message strings. String matching is brittle; messages change, get
message strings. String matching is brittle: messages change, get

Comment thread scripts/check_emdash_test.sh Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test for this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think we can remove it. My review bots demanded it but meh. 😄

@mafredri
Copy link
Copy Markdown
Member Author

@mafredri mafredri requested a review from david-fraley April 20, 2026 11:15
@mafredri mafredri marked this pull request as ready for review April 20, 2026 11:37
…lint

Remove check_emdash_test.sh. The diff parser is temporary
infrastructure (removed after #24526 cleanup) and does not
warrant a dedicated test file.
@mafredri mafredri merged commit 623e72d into main Apr 21, 2026
29 of 30 checks passed
@mafredri mafredri deleted the mafredri/no-emdash-rule branch April 21, 2026 10:55
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants