Do not allow for a word to start or end with punctuation symbols by yarikoptic · Pull Request #3588 · codespell-project/codespell · GitHub
Skip to content

Do not allow for a word to start or end with punctuation symbols#3588

Open
yarikoptic wants to merge 3 commits into
codespell-project:mainfrom
yarikoptic:bf-wordsplit
Open

Do not allow for a word to start or end with punctuation symbols#3588
yarikoptic wants to merge 3 commits into
codespell-project:mainfrom
yarikoptic:bf-wordsplit

Conversation

@yarikoptic

@yarikoptic yarikoptic commented Nov 22, 2024

Copy link
Copy Markdown
Contributor

The inspired for me to look use case

And then I found the issue this

Although may be I am missing the use-cases/problems @DimitriPapadopoulos and @mdeweerd discussed back then

Edits:

  • I had to partially go back and change so there are two alternative word captures quoted
  • Allow for only trailing, but not leading quotes happen words were not in quotes to start with.

After I pushed, I realized that we have a use case where we are not covering ``LaTeX'' way to quote. So were and keep missing those. Do you think I should add regex for them too?

some gory details on me discovering were' and other "typos" in dictionaries

ok -- tests fail due to the typo:

codespell_lib/data/dictionary_code.txt:were'->we're

and apparently it is not a single one like that left:

codespell_lib/data/dictionary_code.txt:were'->we're
codespell_lib/data/dictionary.txt:aircrafts'->aircraft's
codespell_lib/data/dictionary.txt:arent'->aren't
codespell_lib/data/dictionary.txt:cant'->can't
codespell_lib/data/dictionary.txt:cnat'->can't
codespell_lib/data/dictionary.txt:couldnt'->couldn't
codespell_lib/data/dictionary.txt:didnt'->didn't
codespell_lib/data/dictionary.txt:doesent'->doesn't
codespell_lib/data/dictionary.txt:doesn'->doesn't
codespell_lib/data/dictionary.txt:doesnt'->doesn't
codespell_lib/data/dictionary.txt:dont'->don't
codespell_lib/data/dictionary.txt:dosent'->doesn't
codespell_lib/data/dictionary.txt:hasnt'->hasn't
codespell_lib/data/dictionary.txt:havent'->haven't
codespell_lib/data/dictionary.txt:isnt'->isn't
codespell_lib/data/dictionary.txt:packges'->packages'
codespell_lib/data/dictionary.txt:shouldnt'->shouldn't
codespell_lib/data/dictionary.txt:thats'->that's
codespell_lib/data/dictionary.txt:wasnt'->wasn't
codespell_lib/data/dictionary.txt:wouldnt'->wouldn't

but some of those IMHO make no sense to list ' if correction is also with ' which is AFAIK is not a part of the word, i.e. I think following should be simply removed (replaced with ones with '):

codespell_lib/data/dictionary.txt:gaus'->Gauss'
codespell_lib/data/dictionary.txt:guas'->Gauss'
codespell_lib/data/dictionary.txt:guass'->Gauss'

First I wondered if that is the case worth striving for fixing: since were is a legit word, it could have also been forgotten ' somewhere long before, e.g. in a

var = stay as you were'

which would be programming language gotcha, not a typo.

FWIW were' was added originally in

In leftover cases it boils down to

' is a part of the word, and thus could be present in the typo in alternative location"

(I would still argue to exclude were')..

@yarikoptic yarikoptic changed the title Do not allow for a word to start with punctuation symbols Do not allow for a word to start or end with punctuation symbols Nov 22, 2024
yarikoptic added a commit to yarikoptic/python-sdk that referenced this pull request Nov 22, 2024
codespell from codespell-project/codespell#3588

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w ./tests/unit/test_schema_invalids.py",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
yarikoptic added a commit to yarikoptic/python-sdk that referenced this pull request Nov 22, 2024
codespell from codespell-project/codespell#3588

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w ./tests/unit/test_schema_invalids.py",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@larsoner

Copy link
Copy Markdown
Member

@yarikoptic

Copy link
Copy Markdown
Contributor Author

Lance-Drane pushed a commit to INTERSECT-SDK/python-sdk that referenced this pull request Nov 25, 2024
codespell from codespell-project/codespell#3588

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w ./tests/unit/test_schema_invalids.py",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detection of string delimiters

2 participants