{{ message }}
chore: add no-emdash/endash rule to agent instructions and CI lint#24375
Merged
chore: add no-emdash/endash rule to agent instructions and CI lint#24375
Conversation
Add a lint check that prevents introduction of Unicode emdash (U+2014) and endash (U+2013) characters. These are almost exclusively introduced by AI agents and conflict with the project writing style. The lint script (scripts/check_emdash.sh) checks only added lines in the current diff by default, so existing violations do not block CI. Pass --all to scan the entire repo for auditing. Agent instructions in AGENTS.md, site/AGENTS.md, and the docs style guide now explicitly ban emdash, endash, and " -- " as punctuation, with guidance to use commas, semicolons, or periods instead.
…lint Address review findings: - F1: Fetch base ref when not available in shallow clones (CI uses fetch-depth: 1). Fail explicitly instead of falling back to full scan. - F2: Capture grep output instead of checking exit code to avoid xargs batching issues. Add -I to skip binary files. - F3: Add grep -P capability check at script start. - F4: Anchor hunk header regex to avoid capturing line numbers from trailing function context. - F5: Tighten +++ filter to match "b/" prefix specifically. - F6: Unify rule wording across all four locations (AGENTS.md, site/AGENTS.md, DOCS_STYLE_GUIDE.md, lint error message). - F9: Remove restating comments, rename loop variable to diff_line.
…lint Drop grep -P (PCRE) dependency. CI runners lack PCRE support. Use bash $'\xE2\x80\x94' byte escapes to build the pattern at runtime, then match with grep -E. This avoids both the PCRE dependency and having literal emdash/endash in the script source (which would trigger the check when the script itself is in the diff).
…lint Add smoke test for the diff-parsing mode in check_emdash.sh. Covers: emdash/endash detection in added lines, clean-line exclusion, clean-diff pass-through. Addresses AMREM-10.
johnstcn
approved these changes
Apr 20, 2026
Member
johnstcn
left a comment
There was a problem hiding this comment.
I'm fine with this if the presence of these characters caused you emotional pain.
Member
There was a problem hiding this comment.
Suggested change
| message strings. String matching is brittle; messages change, get | |
| message strings. String matching is brittle: messages change, get |
Member
Author
There was a problem hiding this comment.
Nah, I think we can remove it. My review bots demanded it but meh. 😄
Member
Author
…lint Remove check_emdash_test.sh. The diff parser is temporary infrastructure (removed after #24526 cleanup) and does not warrant a dedicated test file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Add a lint check that prevents introduction of Unicode emdash (U+2014) and endash (U+2013) characters. These are almost exclusively introduced by AI agents and conflict with the project writing style.
What changed
scripts/check_emdash.sh: New lint script usinggrep -Ewith bash byte escapes to detect emdash/endash. Checks only added lines in the diff by default (existing violations are not flagged). Pass--allto scan the entire repo.scripts/check_emdash_test.sh: Smoke test for the diff-parsing mode. Covers detection in added lines, clean-line exclusion, and clean-diff pass-through.Makefile: Wired aslint/emdashin bothlint:andlint-light:targets. Runs in CI and pre-commit automatically.AGENTS.md: Added "No Emdash or Endash" rule in Code Style section with fix guidance (use commas, semicolons, periods, or restructure).site/AGENTS.md: Same rule added to frontend guidelines. Fixed one existing emdash..claude/docs/DOCS_STYLE_GUIDE.md: Added Punctuation subsection with the same rule for docs authors.Existing violations (~1170 lines across the repo) tracked for cleanup in #24526.
Implementation plan and decision log
Decisions
--which cannot be reliably detected. (Human decision)--covered by instructions only, not linted. Too many legitimate uses (SQL, shell, CLI). (Human ratified)Design notes
grep -Ewith bash$'\xE2\x80\x94'byte escapes to build the pattern at runtime. No PCRE dependency; works on both GNU and BSD grep.+lines fromgit diff -U0for precise line-level checking.CLAUDE.md,.cursorrules, andsite/CLAUDE.mdare symlinks and update automatically.