Allow precedences to be specified using strings and a partial ordering relation by maxbrunsfeld · Pull Request #939 · tree-sitter/tree-sitter · GitHub
Skip to content

Allow precedences to be specified using strings and a partial ordering relation#939

Merged
maxbrunsfeld merged 3 commits into
masterfrom
partial-order-precedence
Feb 25, 2021
Merged

Allow precedences to be specified using strings and a partial ordering relation#939
maxbrunsfeld merged 3 commits into
masterfrom
partial-order-precedence

Conversation

@maxbrunsfeld

@maxbrunsfeld maxbrunsfeld commented Feb 25, 2021

Copy link
Copy Markdown
Contributor

Background

For some grammars, it is very awkward to specify all of their rules' precedence as a total ordering, using integers. The problem is that we might need to statically (via precedence) resolve conflicts between rules e, f, and g, and also between rules t, u, and v, but want to avoid resolving any conflicts between, say, e and u, or g and t.

This bug in the TypeScript grammar is an example of a case where it's really hard to do this using the integer precedence system.

What we need is a way of specifying precedences as a partial ordering. There's some discussion about this model in this blog post: Operator Precedence: We Can Do Better.

Solution

This PR makes two backward-compatible changes to the grammar DSL:

  1. Grammars have a new optional property called precedences, that specifies a set of precedence names, and a set of ordering relations between those names. Each ordering relation is structured as an array of names, in descending order of their strength.
  2. Within grammar rules, the prec family of functions can now take a precedence name instead of an integer. When constructing the parser, these names will be compared according to the grammar's precedences lists. For some pairs of names, there will be no ordering relation that includes both names. These two names will be treated as equal in strength, for conflict-resolution purposes.

Example

We might use these features in the TypeScript grammar, doing something like this:

grammar({
  // ...
  precedences: () => [
    // Within expressions,
    // * dot binds tighter than comparison
    // * comparison binds tighter than logical-and
    // * logical-and binds tighter than logical-or
    [
      'member',
      'compare',
      'and',
      'or',
    ],
    // Within types:
    // * dot binds tighter than intersection
    // * intersection binds tighter than union
    [
      'type_member',
      'type_intersection',
      'type_union',
    ]
  ],

  rules: {
    // ...
    
    // This binds more tightly than other expression rules, but has *no* static resolution
    // for any conflicts with *type*-related rules.
    member_expression: $ => prec('member', seq(
      $._primary_expression,
      '.',
      $.identifier
    )),
    
    // This binds more tightly than other type rules, but has *no* static resolution
    // for any conflicts with *expression*-related rules.
    nested_type_identifier: $ => prec('type_member', seq(
      choice(
        $._type_identifier,
        $.generic_type,
        $.nested_type_identifier,
      ),
      '.',
      $._type_identifier
    )),
  }
});

Details

Grammars using integer precedence will not be affected. It's also valid to use a mixture of integer precedences and named precedences: those two groups will not be compared to each other.

Right now, the strings are not used in comparisons, but they
are passed through the grammar processing pipeline, and are
available to the parse table construction algorithm.

This also cleans up a confusing aspect of the parse table
construction, in which precedences and associativities were
temporarily stored in the parse table data structure itself.
@maxbrunsfeld maxbrunsfeld merged commit e49a56e into master Feb 25, 2021
@maxbrunsfeld maxbrunsfeld deleted the partial-order-precedence branch February 25, 2021 21:16
@ubolonton

Copy link
Copy Markdown
Contributor

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

That's great to hear!

@razzeee

razzeee commented Feb 27, 2021

Copy link
Copy Markdown
Contributor

This seems to be missing from the dsl types?

/// <reference types="tree-sitter-cli/dsl" />
// @ts-check

Causes this to error

@ahlinc

ahlinc commented Feb 27, 2021

Copy link
Copy Markdown
Contributor
/// <reference types="tree-sitter-cli/dsl" />
// @ts-check

@razzeee could you tell where do you usually put such lines? Is it globals.d.ts file or somewhere else?

@razzeee

razzeee commented Feb 27, 2021

Copy link
Copy Markdown
Contributor

Top of your grammar file. I just have one which is grammar.js

https://www.github.com/elm-tooling/tree-sitter-elm/tree/master/grammar.js

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

Thanks for the heads up @razzeee.

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

seems to be missing from the dsl types?

Fixed on master.

@alemuller

Copy link
Copy Markdown
Contributor

It would be nice if commenting a partial ordering relation on precedence showed a conflict example, like it happens when commenting a tuple of rules on conflicts. I find the "Unresolved conflict" message useful to remember the conflicts.

I've been using the following strategy to workaround the "Undeclared precedence" error:

//['foo', 'bar']
['foo']
['bar']

Maybe the "Unresolved conflict" could be thrown after "Undeclared precedence" error?

By the way, I appreciate the implementation of this elegant solution for the precedence problem.

@maxbrunsfeld

Copy link
Copy Markdown
Contributor Author

Ok, an update on this - In Tree-sitter v0.19.1, I've added a feature make the partial orderings easier to specify. You can control the precedence of an entire grammar rule by including a reference to the rule (using the $.my_rule syntax) directly in the precedences list.

For example, in JavaScript, to control the precedence of await expression and arrow functions relative to binary operators, you can simply do this:

precedences: $ => [
  [
    'binary_times',
    'binary_plus',
    // ...
    $.await_expression,
    $.arrow_function
  ],
],

I'll write up more documentation on this in the main docs site soon.

@woolsweater

Copy link
Copy Markdown

This is a great change; looking forward to using it. I had already been trying to do something like this (very informally) myself 😄 :

const PRECEDENCE = {
    //...
    // Prefer `type_identifier` after the colon in '{ ( identifier : identifier'
    type_over_primary: 1,
    // Prefer `parameter_list` over `void_type` in `func f(_: () -> Foo)`
    param_over_void: 1,
    // Prefer expression _patterns_ over bare expressions
    pattern_over_expr: 1,
    // ...
}

@mingodad

Copy link
Copy Markdown

Also need to update the file ´cli/src/generate/grammar-schema.json´ to include the precedences key.

@kammoh

kammoh commented Oct 14, 2021

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants