Update C++ grammar by mikomikotaishi · Pull Request #7352 · github-linguist/linguist · GitHub
Skip to content

Update C++ grammar#7352

Merged
lildude merged 6 commits into
github-linguist:mainfrom
mikomikotaishi:main
Apr 23, 2025
Merged

Update C++ grammar#7352
lildude merged 6 commits into
github-linguist:mainfrom
mikomikotaishi:main

Conversation

@mikomikotaishi

@mikomikotaishi mikomikotaishi commented Apr 21, 2025

Copy link
Copy Markdown
Contributor

This pull request changes the grammar used for C++ to a fork of the same repository with updated keywords, at least until the tree-sitter-cpp grammar can be used.

Description

Checklist:

  • I am adding a new extension to a language.

    • The new extension is used in hundreds of repositories on GitHub.com
    • I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a change to the heuristics to distinguish my language from others using the same extension.
  • I am adding a new language.

    • The extension of the new language is used in hundreds of repositories on GitHub.com.
    • I have included a real-world usage sample for all extensions added in this PR:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a syntax highlighting grammar: [URL to grammar repo]
    • I have added a color
      • Hex value: #RRGGBB
      • Rationale:
    • I have updated the heuristics to distinguish my language from others using the same extension.
  • I am fixing a misclassified language

    • I have included a new sample for the misclassified language:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a change to the heuristics to distinguish my language from others using the same extension.
  • I am changing the source of a syntax highlighting grammar

  • I am updating a grammar submodule

  • I am adding new or changing current functionality

    • I have added or updated the tests for the new or changed functionality.
  • I am changing the color associated with a language

    • I have obtained agreement from the wider language community on this color change.
      • [URL to public discussion]
      • [Optional: URL to official branding guidelines for the language]

@mikomikotaishi mikomikotaishi requested a review from a team as a code owner April 21, 2025 22:19

@lildude lildude left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't accept tree-sitter grammars, only TextMate compatible grammars as stated in the CONTRIBUTING.md file. The tree-sitter grammars you see referenced are pulled in directly by the syntax highlighting engine and are managed independently by the team that manages the engine. We update that README whenever that team adds a new grammar.

I don't know that team's plans but as C++ is very popular, it's likely to be switching to C++ some time.

@mikomikotaishi

Copy link
Copy Markdown
Contributor Author

@lildude

lildude commented Apr 22, 2025

Copy link
Copy Markdown
Member

Why exactly can we not switch to the tree-sitter-cpp grammar for C++ when other languages use tree-sitter?

Because those other languages aren't getting their grammars from Linguist. They're being pulled in directly by the syntax highlighting engine.

That's why we have this line right at the top of the vendor/README.md:

**Note:** grammars marked with 🐌 are not updated when Linguist is so upstream fixes may take longer to appear on GitHub.

Essentially GitHub has two syntax highlighting engines: the old engine that uses TextMate-compatible grammars and the new engine that uses tree-sitter grammars.

Linguist provides the grammars for the former (in the tarball attached to every release). The latter pulls the tree-sitter grammars in directly.

There is a plan to slowly phase out the old TextMate-compatible engine, in favour of the new tree-sitter engine, and at this point the team involved may extend the ability to add tree-sitter grammars via Linguist or some other means, but we're no where near that point right now as it turns out tree-sitter grammars aren't all created equal and some have severe performance issues.

So adding a tree-sitter grammar to Linguist will have no effect and the script/add-grammar script should have reported various problems and prevented the addition.

@mikomikotaishi

mikomikotaishi commented Apr 22, 2025

Copy link
Copy Markdown
Contributor Author

Why exactly can we not switch to the tree-sitter-cpp grammar for C++ when other languages use tree-sitter?

Because those other languages aren't getting their grammars from Linguist. They're being pulled in directly by the syntax highlighting engine.

That's why we have this line right at the top of the vendor/README.md:

**Note:** grammars marked with 🐌 are not updated when Linguist is so upstream fixes may take longer to appear on GitHub.

Essentially GitHub has two syntax highlighting engines: the old engine that uses TextMate-compatible grammars and the new engine that uses tree-sitter grammars.

Linguist provides the grammars for the former (in the tarball attached to every release). The latter pulls the tree-sitter grammars in directly.

There is a plan to slowly phase out the old TextMate-compatible engine, in favour of the new tree-sitter engine, and at this point the team involved may extend the ability to add tree-sitter grammars via Linguist or some other means, but we're no where near that point right now as it turns out tree-sitter grammars aren't all created equal and some have severe performance issues.

So adding a tree-sitter grammar to Linguist will have no effect and the script/add-grammar script should have reported various problems and prevented the addition.

How can we get the tree-sitter-cpp grammar into the GitHub syntax highlighter then like the other languages that use it (i.e. C, C#, etc.)?

@lildude

lildude commented Apr 22, 2025

Copy link
Copy Markdown
Member

How can we get the tree-sitter-cpp grammar into the GitHub syntax highlighter then like the other languages that use it (i.e. C, C#, etc.)?

It's up to the team that maintains the engine. You would need to raise a support ticket with GitHub support to get this onto the team's radar.

@mikomikotaishi mikomikotaishi requested a review from lildude April 22, 2025 15:53
@mikomikotaishi

Copy link
Copy Markdown
Contributor Author

I have gone ahead and went back to using my fork, removing the file with the obscenely long regular expression (which I don't think had any useful purpose).

@mikomikotaishi

Copy link
Copy Markdown
Contributor Author

Hi, may I ask what needs to be changed before the pull request can be merged?

@lildude

lildude commented Apr 22, 2025

Copy link
Copy Markdown
Member

It's stated right at the top of the failure output:

  1) Failure:
TestGrammars#test_readme_file_is_in_sync [test/test_grammars.rb:42]:
Grammar list is out-of-date. Run `script/list-grammars`.

This should have been run when you ran script/add-grammar:

script/list-grammars

The second failure is because the submodules aren't sorted. This too should have been done with the script/add-grammar script:

script/sort-submodules

@mikomikotaishi

Copy link
Copy Markdown
Contributor Author

Thanks, I'll fix these

@mikomikotaishi

Copy link
Copy Markdown
Contributor Author

I have made the necessary changes, can the tests be run now?

@mikomikotaishi

mikomikotaishi commented Apr 23, 2025

Copy link
Copy Markdown
Contributor Author

@lildude lildude left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Important

The changes in this PR will not appear on GitHub until the next release has been made and deployed. See here for more details.

@lildude lildude added this pull request to the merge queue Apr 23, 2025
Merged via the queue into github-linguist:main with commit 53a3e81 Apr 23, 2025
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Aug 9, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants