Add auto-extract mode for `spath` command by dai-chen · Pull Request #5140 · opensearch-project/sql · GitHub
Skip to content

Add auto-extract mode for spath command#5140

Merged
penghuo merged 7 commits into
opensearch-project:mainfrom
dai-chen:add-auto-extract-mode-support-in-spath
Feb 13, 2026
Merged

Add auto-extract mode for spath command#5140
penghuo merged 7 commits into
opensearch-project:mainfrom
dai-chen:add-auto-extract-mode-support-in-spath

Conversation

@dai-chen

@dai-chen dai-chen commented Feb 12, 2026

Copy link
Copy Markdown
Collaborator

Description

As a follow-up after #5139, this PR implements the auto-extract mode for the spath command per the proposal in #4307. When path parameter is omitted, spath rewrites to json_extract_all(input) function and returns a map<string, string> with flattened keys instead of expanding dynamic columns.

Minor differences from proposal:

  • Malformed JSON returns partial results

Related Issues

Resolves #4307 (partially, ergonomic map access syntax are follow-ups)

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
…serve issue

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen self-assigned this Feb 12, 2026
@dai-chen dai-chen added enhancement New feature or request PPL Piped processing language labels Feb 12, 2026
@coderabbitai

coderabbitai Bot commented Feb 12, 2026

Copy link
Copy Markdown
Contributor

Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/user/ppl/cmd/spath.md`:
- Around line 41-43: There are two adjacent blockquotes ("Auto-extract mode
processes the entire input field..." and "Invalid or malformed JSON returns
partial results...") separated by a blank line which triggers MD028; remove the
blank line and either merge the two notes into a single blockquote or place
regular text between them so they are no longer consecutive blockquotes—update
the lines containing those two blockquote texts to be joined into one blockquote
or separated by non-blockquote content.

In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 40-48: Add three new test documents to the auto-extract fixtures
for CalcitePPLSpathCommandIT by creating additional Request objects (similar to
autoExtractDoc) and performing them via client().performRequest: (1) a document
where the target auto-extract field(s) are explicitly null to exercise null
handling, (2) a document where those fields contain empty JSON object "{}" and
an empty JSON array "[]" to cover boundary/empty-input behavior, and (3) a
document with a deliberately malformed JSON string (e.g., truncated or missing
quotes/braces) to trigger error paths; each should use the same PUT to the
/test_spath_auto/_doc/{id}?refresh=true pattern with unique ids and matching
JSON entity strings so tests validate null, empty, and invalid inputs for the
auto-extract mode.
- Around line 59-63: The testSpathAutoExtract test only asserts schema via
verifySchema(schema("doc","struct")) which is too weak; update
testSpathAutoExtract to also assert actual extracted values and multi-document
behavior by parsing the JSONObject returned from executeQuery("source=test_spath
| spath input=doc") and adding row-level assertions that specific keys/values
exist for each document (e.g., expected map entries inside the "doc" struct),
and/or use existing row-assert helper methods to assert the expected number of
rows and exact field values; keep the initial verifySchema call but add concrete
assertions that validate the map extraction across multiple documents.

Comment thread docs/user/ppl/cmd/spath.md Outdated
penghuo
penghuo previously approved these changes Feb 13, 2026
Signed-off-by: Chen Dai <daichen@amazon.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 102-111: Add a boundary test that verifies empty-array handling
for the spath array suffix case: create a new test (or extend
testSpathAutoExtractArraySuffix) that runs executeQuery with a document whose
array field is empty (e.g., "tags": []) and assert schema via
verifySchema("result","struct") and data via verifyDataRows expecting the
extracted key "tags{}" to stringify as "[]"; reference the existing test method
testSpathAutoExtractArraySuffix, the helpers executeQuery, verifySchema and
verifyDataRows, and ensure the expected JSONObject uses "tags{}":"[]".

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen requested a review from penghuo February 13, 2026 17:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/user/ppl/cmd/spath.md`:
- Around line 161-163: The example table in spath.md has an inconsistent
representation for single-element arrays: update the second example row so the
"tags{}" output is "[python]" (i.e., change 'tags{}': 'python' to 'tags{}':
'[python]') to match the bracketed array stringification used in the other rows
and documented rules; locate the example table in spath.md and adjust that cell
text accordingly.

Comment thread docs/user/ppl/cmd/spath.md

@Swiddis Swiddis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach! lgtm with comments

Comment thread docs/user/ppl/cmd/spath.md

@Swiddis Swiddis Feb 13, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Is this bracket syntax easily supported by downstream commands that want to extract these fields, or will there need to be odd escaping?

That might only be a concern for the mapping functions in a future PR, but I would still carefully review if this is the extraction syntax we want for arrays. At a glance this isn't intuitive to me but after squinting at the docs for a bit I kinda got it?

e.g. It's not obvious to me what will happen if I try "items": [[1, 2], [3, 4]], do I get two items{}{} keys? Does one overwrite the other? Are they merged into one array? (From code review I know which one it is, but not as a user given only the docs & this example)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we don't have good workaround until we move to Map<String,Any> or schemaless. Currently it just preserves all the values from "conflicting" key. Let me verify your example.

@dai-chen dai-chen Feb 13, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result is items{}{} as expected. I can add more cases in doctest or this UT if that's your suggestion. Thanks!

@penghuo penghuo merged commit be44a8e into opensearch-project:main Feb 13, 2026
50 of 51 checks passed
qianheng-aws added a commit that referenced this pull request Feb 24, 2026
* Struct return array value instead of string

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Support filter in GraphLookup

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add experimental tag in doc

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Increment version to 3.6.0-SNAPSHOT (#5115)

Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com>
Co-authored-by: opensearch-ci-bot <opensearch-infra@amazon.com>

* Fix fallback error handling to show original Calcite error (#5133)

* Fix fallback error handling to show original Calcite error

When Calcite falls back to V2 and V2 also fails, now correctly returns the
original Calcite error instead of V2's generic "only supported when calcite
enabled" message, improving error clarity for users. Fixes #5060.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* coderabbit: preserve exception types

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Refactoring for coderabbit: consistent handling of calcite and legacy calcite errors

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Update explain error handling to match non-explain

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Minor tweaks: commenting & eval order

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add a yaml test

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Co-authored-by: Claude <noreply@anthropic.com>

* Add auto-extract mode for `spath` command (#5140)

* Add auto extraction mode in spath command

Signed-off-by: Chen Dai <daichen@amazon.com>

* Change json_extract_all to return map<string,string> and fix null perserve issue

Signed-off-by: Chen Dai <daichen@amazon.com>

* Refactor all unit test and integration tests

Signed-off-by: Chen Dai <daichen@amazon.com>

* Refactor json_extract_all and fix stringify issue

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix broken IT and doctest

Signed-off-by: Chen Dai <daichen@amazon.com>

* Address PR comments

Signed-off-by: Chen Dai <daichen@amazon.com>

* Mark auto extract mode as experimental

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add nomv command  (#5130)

* Add nomv command + parser/AST/calcite wiring + tests + docs

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java
#	integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/test/java/org/opensearch/sql/ppl/parser/FieldResolutionVisitorTest.java

* Address coderrabbit suggestions.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Correct the no_push_down yaml expected result

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Fix Windows CI: Override verifyPPLToSparkSQL to preserve ARRAY_JOIN '\n' delimiter

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Fix Windows CI: Override verifyPPLToSparkSQL to preserve ARRAY_JOIN '\n' delimiter

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Fix Windows CI: Override verifyPPLToSparkSQL to preserve ARRAY_JOIN '\n' delimiter

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Address code review comments

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

---------

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Fix #5114: preserve head/TopK semantics for sort-expression pushdown (#5135)

* Fix bug 5114

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Update

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Address comments

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Fix spotless import ordering for EnumerableLimitSort

Signed-off-by: Peng Huo <penghuo@gmail.com>

---------

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Fix path navigation on map columns for `spath` command (#5149)

* Fix path navigation bug in qualified name resolver and add spath tests

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add spath UT and IT with other commands

Signed-off-by: Chen Dai <daichen@amazon.com>

* Move aliasing logic from resolver to project item expansion

Signed-off-by: Chen Dai <daichen@amazon.com>

* Update javadoc and readme

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* Make sql plugin aware of FIPS build param (-Pcrypto.standard=FIPS-140-3) (#5155)

* Make sql plugin aware of FIPS build param (-Pcrypto.standard=FIPS-140-3)

Signed-off-by: Craig Perkins <cwperx@amazon.com>

* Update build script to include FIPS-140-3 option

Added FIPS-140-3 standard option to Gradle commands.

Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>

---------

Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>

* [Maintenance] Fix bc-fips jar hell by marking dependency as compileOnly (#5158)

* [maintenance] Fix bc-fips jar hell by marking dependency as compileOnly

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add fix for tests

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* [Feature] PPL Command: MvExpand  (#5144)

* MvExpand new PR - After resolving the merge issues

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/analysis/Analyzer.java
#	core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
#	core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
#	docs/category.json
#	docs/user/ppl/index.md
#	integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
#	integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java
#	integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
#	ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
#	ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java

* Address coderrabbit comments

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Address coderrabbit comments

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* merge main to my branch

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Address corerabbit suggestions

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Address corerabbit suggestions

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* ci: trigger

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* ci: trigger

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Address corerabbit suggestions

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* ci: trigger

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Address comments.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* Change the exception from illegalarg to SyntaxCheckException

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* change the exception message for consistency

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* ci: rerun

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

---------

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com>

* bump antlr version to 4.13.2 (#5159)

* Address comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Craig Perkins <cwperx@amazon.com>
Signed-off-by: Peter Zhu <zhujiaxi@amazon.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com>
Co-authored-by: opensearch-ci-bot <opensearch-infra@amazon.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Chen Dai <daichen@amazon.com>
Co-authored-by: Srikanth Padakanti <srikanth29.9@gmail.com>
Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Co-authored-by: Peng Huo <penghuo@gmail.com>
Co-authored-by: Craig Perkins <cwperx@amazon.com>
Co-authored-by: Peter Zhu <zhujiaxi@amazon.com>
Co-authored-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: Eric Wei <menwe@amazon.com>
@github-project-automation github-project-automation Bot moved this from Todo to Done in PPL 2026 Roadmap Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PPL Piped processing language

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

(Calcite PPL) spath command dynamic columns support

4 participants