Add configurable expression depth limit during AST building by dai-chen · Pull Request #5602 · opensearch-project/sql · GitHub
Skip to content

Add configurable expression depth limit during AST building#5602

Merged
dai-chen merged 1 commit into
opensearch-project:mainfrom
dai-chen:feature/parse-tree-depth-guard
Jul 1, 2026
Merged

Add configurable expression depth limit during AST building#5602
dai-chen merged 1 commit into
opensearch-project:mainfrom
dai-chen:feature/parse-tree-depth-guard

Conversation

@dai-chen

@dai-chen dai-chen commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Description

Adds a configurable safeguard on expression nesting depth during SQL/PPL parsing to improve robustness for very large or deeply nested queries. Introduces the dynamic setting plugins.query.max_expression_depth (default 1000; 0 to disable). Tested SQL/PPL both Lucene and composite index with default -Xss 1m as well as all ClickBench queries.

TODO: Statement/command-level nesting — SQL subqueries (IN/EXISTS/scalar/FROM-derived, and JOINs onto subqueries) and PPL subsearches (in [...], scalar/exists [...], and subsearch-bearing commands like join/lookup/append/appendcol/multisearch) — recurses at the plan level and requires a separate guard.

Examples

With plugins.query.max_expression_depth (default 1000), a query whose expression nesting exceeds the limit is rejected with 400 Bad Request instead of triggering a StackOverflowError / node crash.

=== SQL ===
```bash
curl -s -XPOST localhost:9200/_plugins/_sql -H 'Content-Type: application/json' -d '{
  "query": "SELECT * FROM my_index WHERE n = 0 OR n = 1 OR ... OR n = 1023 LIMIT 1"
}'

{
  "error": {
    "reason": "Invalid SQL query",
    "details": "Expression nesting depth exceeds the maximum allowed [1000]; simplify the query or reduce the number of chained conditions.",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

=== PPL ===
curl -s -XPOST localhost:9200/_plugins/_ppl -H 'Content-Type: application/json' -d '{
  "query": "source=my_index | where n = 0 OR n = 1 OR ... OR n = 1023 | head 1"
}'

{
  "error": {
    "reason": "Invalid Query",
    "details": "Expression nesting depth exceeds the maximum allowed [1000]; simplify the query or reduce the number of chained conditions.",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

References

Bounding query/expression nesting to prevent parser or planner stack overflow is a well-established practice. Comparable safeguards in other engines:

Engine Setting / limit Default Bounds Reference
SQLite SQLITE_MAX_EXPR_DEPTH (runtime SQLITE_LIMIT_EXPR_DEPTH) 1000 Expression-tree nesting depth (enforced in sqlite3ExprCheckHeight()) https://www.sqlite.org/limits.html
PostgreSQL max_stack_depth ~2 MB Physical stack depth for all recursion — parser, expression evaluation, nested subqueries, function calls (check_stack_depth()) https://www.postgresql.org/docs/current/runtime-config-resource.html
MySQL / MariaDB thread_stack (+ internal check_stack_overrun()); max_sp_recursion_depth ~1 MB; 0 Per-thread stack for parser/expression recursion; stored-procedure recursion depth https://dev.mysql.com/doc/refman/en/server-system-variables.html#sysvar_thread_stack
OpenSearch (core DSL) index.query.max_nested_depth 20 Nesting depth of compound/bool queries (query-structure depth) https://docs.opensearch.org/latest/install-and-configure/configuring-opensearch/index-settings/
OpenSearch (core DSL) indices.query.bool.max_clause_count 1024 Total clause count in a boolean query (width, not depth) https://docs.opensearch.org/latest/install-and-configure/configuring-opensearch/index-settings/
Apache Calcite RexUtil.toCnf(maxCnfNodeCount, …)PlanTooComplexError (caller-set) Node-count guard during CNF/DNF expansion to prevent combinatorial/stack blowup https://calcite.apache.org/javadocAggregate/org/apache/calcite/plan/PlanTooComplexError.html

Related Issues

Part of #5246

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this Jul 1, 2026
@dai-chen dai-chen added the enhancement New feature or request label Jul 1, 2026
@dai-chen dai-chen changed the title Bound SQL/PPL expression nesting depth with a configurable limit Add configurable expression depth limit for SQL/PPL parsing Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

(Review updated until commit 7d16ae4)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The ExtendedAstBuilder constructor now requires a Settings parameter, but the ExtendedAstExpressionBuilder constructor is called with only guard without passing settings. If ExtendedAstExpressionBuilder needs access to settings (e.g., for nested expression builders), this could cause issues. The parent AstExpressionBuilder receives guard but may not have access to settings if needed later.

return new ExtendedAstExpressionBuilder(guard);

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Latest suggestions up to 7d16ae4

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix off-by-one depth check

The depth check occurs before incrementing, which means the actual depth can reach
maxDepth + 1 before rejection. Move the depth increment before the check to ensure
the limit is enforced at the correct boundary and prevent off-by-one errors.

common/src/main/java/org/opensearch/sql/common/antlr/AstBuildGuard.java [50-64]

 public <T> T enforce(Supplier<T> visit) {
-  if (maxDepth > 0 && depth >= maxDepth) {
-    throw new IllegalArgumentException(
-        StringUtils.format(
-            "Expression nesting depth exceeds the maximum allowed [%d]; simplify the query or"
-                + " reduce the number of chained conditions.",
-            maxDepth));
-  }
   depth++;
   try {
+    if (maxDepth > 0 && depth > maxDepth) {
+      throw new IllegalArgumentException(
+          StringUtils.format(
+              "Expression nesting depth exceeds the maximum allowed [%d]; simplify the query or"
+                  + " reduce the number of chained conditions.",
+              maxDepth));
+    }
     return visit.get();
   } finally {
     depth--;
   }
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies an off-by-one error in the depth checking logic. The current implementation checks depth >= maxDepth before incrementing, allowing the depth to reach maxDepth + 1 before rejection. Moving the increment before the check and changing to depth > maxDepth ensures the limit is enforced correctly. This is a significant correctness issue that could allow expressions one level deeper than intended.

Medium

Previous suggestions

Suggestions up to commit 82fa28e
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid double-counting expression depth

Wrapping both visit and visitChildren with guard.enforce causes double-counting of
depth for each node traversal, since visit internally calls visitChildren. This will
exhaust the depth limit twice as fast as intended. Remove the guard from one of
these methods to count depth accurately.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [111-119]

 @Override
 public UnresolvedExpression visit(ParseTree tree) {
   return guard.enforce(() -> super.visit(tree));
 }
 
 @Override
 public UnresolvedExpression visitChildren(RuleNode node) {
-  return guard.enforce(() -> super.visitChildren(node));
+  return super.visitChildren(node);
 }
Suggestion importance[1-10]: 9

__

Why: This is a critical bug. Wrapping both visit and visitChildren with guard.enforce causes the depth to be incremented twice per node, exhausting the limit at half the intended depth. This directly undermines the purpose of the depth guard and could reject valid queries.

High
Suggestions up to commit 31772f2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent double-counting of expression depth

Wrapping both visit() and visitChildren() with the guard causes double-counting of
depth for each node traversal, as visit() internally calls visitChildren(). This
results in the depth limit being reached at half the configured value. Remove the
guard from one of these methods to ensure accurate depth tracking.

ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java [111-119]

-@Override
-public UnresolvedExpression visit(ParseTree tree) {
-  return guard.enforce(() -> super.visit(tree));
-}
-
 @Override
 public UnresolvedExpression visitChildren(RuleNode node) {
   return guard.enforce(() -> super.visitChildren(node));
 }
Suggestion importance[1-10]: 9

__

Why: This identifies a critical bug where both visit() and visitChildren() wrap the guard, causing depth to be counted twice per node. This would cause the limit to be reached at half the configured value, making the feature unreliable. The same pattern appears in sql/src/main/java/org/opensearch/sql/sql/parser/AstExpressionBuilder.java.

High
General
Reset depth counter between queries

The depth counter is not reset between independent top-level visits, which could
cause false rejections if the same guard instance is reused across multiple queries.
Consider resetting depth to 0 at the start of each top-level parse operation or
document that guard instances must not be shared.

common/src/main/java/org/opensearch/sql/common/antlr/AstBuildGuard.java [50-64]

 public <T> T enforce(Supplier<T> visit) {
   if (maxDepth > 0 && depth >= maxDepth) {
     throw new IllegalArgumentException(
         StringUtils.format(
             "Expression nesting depth exceeds the maximum allowed [%d]; simplify the query or"
                 + " reduce the number of chained conditions.",
             maxDepth));
   }
   depth++;
   try {
     return visit.get();
   } finally {
     depth--;
+    // Ensure depth is reset after top-level visits
+    if (depth < 0) depth = 0;
   }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion misunderstands the design. The depth counter correctly decrements in the finally block, and the class documentation states instances must not be shared across threads. The proposed check if (depth < 0) depth = 0; is unnecessary since depth should never go negative with proper usage.

Low

@dai-chen dai-chen changed the title Add configurable expression depth limit for SQL/PPL parsing Add configurable expression depth limit during AST building Jul 1, 2026
@dai-chen dai-chen force-pushed the feature/parse-tree-depth-guard branch from 31772f2 to a7c0fe2 Compare July 1, 2026 16:56
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a7c0fe2

@dai-chen dai-chen force-pushed the feature/parse-tree-depth-guard branch from a7c0fe2 to 82fa28e Compare July 1, 2026 20:19
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 82fa28e

@dai-chen dai-chen added SQL PPL Piped processing language labels Jul 1, 2026
Introduce plugins.query.max_expression_depth (default 1000; 0 to disable) to bound expression nesting depth during AST building, improving robustness for very large or deeply nested SQL/PPL queries.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/parse-tree-depth-guard branch from 82fa28e to 7d16ae4 Compare July 1, 2026 21:18
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@dai-chen dai-chen merged commit a71179e into opensearch-project:main Jul 1, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants