Fix the behavior of Lexer.get_column by maxbrunsfeld · Pull Request #978 · tree-sitter/tree-sitter · GitHub
Skip to content

Fix the behavior of Lexer.get_column#978

Merged
maxbrunsfeld merged 2 commits into
masterfrom
fix-get-column-at-eof
Mar 12, 2021
Merged

Fix the behavior of Lexer.get_column#978
maxbrunsfeld merged 2 commits into
masterfrom
fix-get-column-at-eof

Conversation

@maxbrunsfeld

@maxbrunsfeld maxbrunsfeld commented Mar 11, 2021

Copy link
Copy Markdown
Contributor

Fixes #516
Fixes #589
Closes #640
Fixes #144

This PR finally makes the Lexer.get_column (needed for languages like Haskell, Elm) API work properly.

  • Adds unit test coverage for get_column, using a test fixture language with Haskell-like layout rules.
  • Accounts for tokens' position-dependence when editing a tree, so that incremental parsing is fully correct with respect to these "layout" tokens. This requires tracking which subtrees depend on their start column:
    • any external token where the scanner called get_column()
    • recursively, any parent node that has a child node on its first line that depends on its own start column
  • ⚠️ Changes the meaning of get_column so that it returns a byte count, not a character count ⚠️

Regarding this last item - We originally returned a character count from get_column because it seemed more technically correct, since GHC counted the unicode characters (as opposed to the bytes) in its implementation of layout. But this adds so much complexity (and perf cost) to our code that I really don't think it's worth it. There could be some slightly incorrect layout-parsing if somebody uses a mixture of different non-ascii whitespace characters for their layout, but this seems like a very uncommon situation.

/cc @razzeee @tek @bglgwyng @banacorn

@maxbrunsfeld maxbrunsfeld force-pushed the fix-get-column-at-eof branch from e77b303 to e29d371 Compare March 11, 2021 20:11
@maxbrunsfeld maxbrunsfeld changed the title Fix the behavior of Lexer.get_column when at EOF Fix the behavior of Lexer.get_column Mar 11, 2021
@maxbrunsfeld maxbrunsfeld mentioned this pull request Mar 11, 2021
31 tasks
@razzeee

razzeee commented Mar 12, 2021

Copy link
Copy Markdown
Contributor

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

I ran all of the tree-sitter-elm tests, so I think we're good. Just let me know if you see any problems in your language server after upgrading.

@maxbrunsfeld maxbrunsfeld merged commit d366356 into master Mar 12, 2021
@maxbrunsfeld maxbrunsfeld deleted the fix-get-column-at-eof branch March 12, 2021 00:42
@razzeee

razzeee commented Mar 12, 2021

Copy link
Copy Markdown
Contributor

Did you remove the eof check for that? https://github.com/elm-tooling/tree-sitter-elm/blob/main/src/scanner.cc#L262

Is my understanding right, that that's obsolete now?

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

Yeah, tests pass without that.

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

Actually, without that check, one of the example files in elm-ui has a parse error. So maybe you want the eof() check anyway?

@razzeee

razzeee commented Mar 12, 2021

Copy link
Copy Markdown
Contributor

Will have to check, I just realized, that I never added the examples (elm-tooling/elm-language-server#527 (comment)) to our test suite. I probably thought, it's fixed now and we will never move that code again 😆

@razzeee

razzeee commented Mar 12, 2021

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants