Remove some lookaheads by DecimalTurn · Pull Request #7242 · github-linguist/linguist · GitHub
Skip to content

Remove some lookaheads#7242

Merged
lildude merged 8 commits into
github-linguist:mainfrom
DecimalTurn:lookahead
Mar 5, 2025
Merged

Remove some lookaheads#7242
lildude merged 8 commits into
github-linguist:mainfrom
DecimalTurn:lookahead

Conversation

@DecimalTurn

@DecimalTurn DecimalTurn commented Feb 17, 2025

Copy link
Copy Markdown
Contributor

Description

Removing lookaheads in those regexes will help with portability across librairies using Re2-style regexes and should help ensure that our regexes run linearly.

Note that to demonstrate that this PR doesn't break ActionScript's heuristics, #7241
should be merged before changing this PR from a draft to a real PR.

@lildude lildude mentioned this pull request Feb 19, 2025
6 tasks
@DecimalTurn

Copy link
Copy Markdown
Contributor Author

@DecimalTurn DecimalTurn marked this pull request as ready for review February 24, 2025 04:26
@DecimalTurn DecimalTurn requested a review from a team as a code owner February 24, 2025 04:26
@DecimalTurn

Copy link
Copy Markdown
Contributor Author

@Alhadis, I think I'll let you handle the Vim Help File case, I'm not ready to deal with with nested lookaheads at the moment 😅.

@Alhadis

Alhadis commented Feb 24, 2025

Copy link
Copy Markdown
Collaborator

@DecimalTurn The heuristic used by Vim Help Text was lifted from Linguist's Modeline strategy, which presents a much more readable and self-documenting version of the regex:

# NOTE: When changing this regex, be sure to keep the Vim Help heuristic updated too (#5347)
VIM_MODELINE = %r[
(?-m)
# Start of modeline (syntax documented in E520)
(?:
# `vi:`, `vim:` or `Vim:`
(?:^|[ \t]) (?:vi|Vi(?=m))
# Check if specific Vim version(s) are requested (won't work in vi/ex)
(?:
# Versioned modeline. `vim<700:` targets Vim versions older than 7.0
m
[<=>]? # If comparison operator is omitted, *only* this version is targeted
[0-9]+ # Version argument = (MINOR_VERSION_NUMBER * 100) + MINOR_VERSION_NUMBER
|
# Unversioned modeline. `vim:` targets any version of Vim.
m
)?
|
# `ex:`, which requires leading whitespace to avoid matching stuff like "lex:"
[ \t] ex
)
# If the option-list begins with `set ` or `se `, it indicates an alternative
# modeline syntax partly-compatible with older versions of Vi. Here, the colon
# serves as a terminator for an option sequence, delimited by whitespace.
(?=
# So we have to ensure the modeline ends with a colon
: (?=[ \t]* set? [ \t] [^\r\n:]+ :) |
# Otherwise, it isn't valid syntax and should be ignored
: (?![ \t]* set? [ \t])
)
# Possible (unrelated) `option=value` pairs to skip past
(?:
# Option separator, either
(?:
# 1. A colon (possibly surrounded by whitespace)
[ \t]* : [ \t]* # vim: noai : ft=sh:noexpandtab
|
# 2. At least one (horizontal) whitespace character
[ \t] # vim: noai ft=sh noexpandtab
)
# Option's name. All recognised Vim options have an alphanumeric form.
\w*
# Possible value. Not every option takes an argument.
(?:
# Whitespace between name and value is allowed: `vim: ft =sh`
[ \t]*=
# Option's value. Might be blank; `vim: ft= ` means "use no filetype".
(?:
[^\\\s] # Beware of escaped characters: titlestring=\ ft=sh
| # will be read by Vim as { titlestring: " ft=sh" }.
\\.
)*
)?
)*
# The actual filetype declaration
[ \t:] (?:filetype|ft|syntax) [ \t]*=
# Language's name
(\w+)
# Ensure it's followed by a legal separator (including EOL)
(?=$|\s|:)
]x

Any changes to Linguist::Strategy::Modeline::VIM_MODELINE will logically need to be reflected in the heuristic, and I strongly advise against making the regex even less readable by replacing nested lookaheads with… whatever works with re2.

In other words, you'll probably have to make an exception upstream for this particular heuristic. 🤷‍♂️

@DecimalTurn

DecimalTurn commented Feb 24, 2025

Copy link
Copy Markdown
Contributor Author

@lildude lildude added this pull request to the merge queue Mar 5, 2025
Merged via the queue into github-linguist:main with commit f0830ab Mar 5, 2025
@DecimalTurn DecimalTurn deleted the lookahead branch March 5, 2025 14:04
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jul 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants