Add auto-extract mode for spath command#5140
Conversation
Signed-off-by: Chen Dai <daichen@amazon.com>
…serve issue Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/user/ppl/cmd/spath.md`:
- Around line 41-43: There are two adjacent blockquotes ("Auto-extract mode
processes the entire input field..." and "Invalid or malformed JSON returns
partial results...") separated by a blank line which triggers MD028; remove the
blank line and either merge the two notes into a single blockquote or place
regular text between them so they are no longer consecutive blockquotes—update
the lines containing those two blockquote texts to be joined into one blockquote
or separated by non-blockquote content.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 40-48: Add three new test documents to the auto-extract fixtures
for CalcitePPLSpathCommandIT by creating additional Request objects (similar to
autoExtractDoc) and performing them via client().performRequest: (1) a document
where the target auto-extract field(s) are explicitly null to exercise null
handling, (2) a document where those fields contain empty JSON object "{}" and
an empty JSON array "[]" to cover boundary/empty-input behavior, and (3) a
document with a deliberately malformed JSON string (e.g., truncated or missing
quotes/braces) to trigger error paths; each should use the same PUT to the
/test_spath_auto/_doc/{id}?refresh=true pattern with unique ids and matching
JSON entity strings so tests validate null, empty, and invalid inputs for the
auto-extract mode.
- Around line 59-63: The testSpathAutoExtract test only asserts schema via
verifySchema(schema("doc","struct")) which is too weak; update
testSpathAutoExtract to also assert actual extracted values and multi-document
behavior by parsing the JSONObject returned from executeQuery("source=test_spath
| spath input=doc") and adding row-level assertions that specific keys/values
exist for each document (e.g., expected map entries inside the "doc" struct),
and/or use existing row-assert helper methods to assert the expected number of
rows and exact field values; keep the initial verifySchema call but add concrete
assertions that validate the map extraction across multiple documents.
Signed-off-by: Chen Dai <daichen@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java`:
- Around line 102-111: Add a boundary test that verifies empty-array handling
for the spath array suffix case: create a new test (or extend
testSpathAutoExtractArraySuffix) that runs executeQuery with a document whose
array field is empty (e.g., "tags": []) and assert schema via
verifySchema("result","struct") and data via verifyDataRows expecting the
extracted key "tags{}" to stringify as "[]"; reference the existing test method
testSpathAutoExtractArraySuffix, the helpers executeQuery, verifySchema and
verifyDataRows, and ensure the expected JSONObject uses "tags{}":"[]".
Signed-off-by: Chen Dai <daichen@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/user/ppl/cmd/spath.md`:
- Around line 161-163: The example table in spath.md has an inconsistent
representation for single-element arrays: update the second example row so the
"tags{}" output is "[python]" (i.e., change 'tags{}': 'python' to 'tags{}':
'[python]') to match the bracketed array stringification used in the other rows
and documented rules; locate the example table in spath.md and adjust that cell
text accordingly.
Swiddis
left a comment
There was a problem hiding this comment.
I like this approach! lgtm with comments
There was a problem hiding this comment.
thought: Is this bracket syntax easily supported by downstream commands that want to extract these fields, or will there need to be odd escaping?
That might only be a concern for the mapping functions in a future PR, but I would still carefully review if this is the extraction syntax we want for arrays. At a glance this isn't intuitive to me but after squinting at the docs for a bit I kinda got it?
e.g. It's not obvious to me what will happen if I try "items": [[1, 2], [3, 4]], do I get two items{}{} keys? Does one overwrite the other? Are they merged into one array? (From code review I know which one it is, but not as a user given only the docs & this example)
There was a problem hiding this comment.
Yes, I think we don't have good workaround until we move to Map<String,Any> or schemaless. Currently it just preserves all the values from "conflicting" key. Let me verify your example.
There was a problem hiding this comment.
The result is items{}{} as expected. I can add more cases in doctest or this UT if that's your suggestion. Thanks!
* Struct return array value instead of string Signed-off-by: Heng Qian <qianheng@amazon.com> * Support filter in GraphLookup Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> * Add experimental tag in doc Signed-off-by: Heng Qian <qianheng@amazon.com> * Increment version to 3.6.0-SNAPSHOT (#5115) Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com> Co-authored-by: opensearch-ci-bot <opensearch-infra@amazon.com> * Fix fallback error handling to show original Calcite error (#5133) * Fix fallback error handling to show original Calcite error When Calcite falls back to V2 and V2 also fails, now correctly returns the original Calcite error instead of V2's generic "only supported when calcite enabled" message, improving error clarity for users. Fixes #5060. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * coderabbit: preserve exception types Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Refactoring for coderabbit: consistent handling of calcite and legacy calcite errors Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Update explain error handling to match non-explain Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Minor tweaks: commenting & eval order Signed-off-by: Simeon Widdis <sawiddis@amazon.com> * Add a yaml test Signed-off-by: Simeon Widdis <sawiddis@amazon.com> --------- Signed-off-by: Simeon Widdis <sawiddis@amazon.com> Co-authored-by: Claude <noreply@anthropic.com> * Add auto-extract mode for `spath` command (#5140) * Add auto extraction mode in spath command Signed-off-by: Chen Dai <daichen@amazon.com> * Change json_extract_all to return map<string,string> and fix null perserve issue Signed-off-by: Chen Dai <daichen@amazon.com> * Refactor all unit test and integration tests Signed-off-by: Chen Dai <daichen@amazon.com> * Refactor json_extract_all and fix stringify issue Signed-off-by: Chen Dai <daichen@amazon.com> * Fix broken IT and doctest Signed-off-by: Chen Dai <daichen@amazon.com> * Address PR comments Signed-off-by: Chen Dai <daichen@amazon.com> * Mark auto extract mode as experimental Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com> * Add nomv command (#5130) * Add nomv command + parser/AST/calcite wiring + tests + docs Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> # Conflicts: # core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java # integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/parser/FieldResolutionVisitorTest.java * Address coderrabbit suggestions. Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Correct the no_push_down yaml expected result Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Fix Windows CI: Override verifyPPLToSparkSQL to preserve ARRAY_JOIN '\n' delimiter Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Fix Windows CI: Override verifyPPLToSparkSQL to preserve ARRAY_JOIN '\n' delimiter Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Fix Windows CI: Override verifyPPLToSparkSQL to preserve ARRAY_JOIN '\n' delimiter Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address code review comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> --------- Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Fix #5114: preserve head/TopK semantics for sort-expression pushdown (#5135) * Fix bug 5114 Signed-off-by: Peng Huo <penghuo@gmail.com> * Update Signed-off-by: Peng Huo <penghuo@gmail.com> * Address comments Signed-off-by: Peng Huo <penghuo@gmail.com> * Fix spotless import ordering for EnumerableLimitSort Signed-off-by: Peng Huo <penghuo@gmail.com> --------- Signed-off-by: Peng Huo <penghuo@gmail.com> * Fix path navigation on map columns for `spath` command (#5149) * Fix path navigation bug in qualified name resolver and add spath tests Signed-off-by: Chen Dai <daichen@amazon.com> * Add spath UT and IT with other commands Signed-off-by: Chen Dai <daichen@amazon.com> * Move aliasing logic from resolver to project item expansion Signed-off-by: Chen Dai <daichen@amazon.com> * Update javadoc and readme Signed-off-by: Chen Dai <daichen@amazon.com> --------- Signed-off-by: Chen Dai <daichen@amazon.com> * Make sql plugin aware of FIPS build param (-Pcrypto.standard=FIPS-140-3) (#5155) * Make sql plugin aware of FIPS build param (-Pcrypto.standard=FIPS-140-3) Signed-off-by: Craig Perkins <cwperx@amazon.com> * Update build script to include FIPS-140-3 option Added FIPS-140-3 standard option to Gradle commands. Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> --------- Signed-off-by: Craig Perkins <cwperx@amazon.com> Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> * [Maintenance] Fix bc-fips jar hell by marking dependency as compileOnly (#5158) * [maintenance] Fix bc-fips jar hell by marking dependency as compileOnly Signed-off-by: Jialiang Liang <jiallian@amazon.com> * add fix for tests Signed-off-by: Jialiang Liang <jiallian@amazon.com> --------- Signed-off-by: Jialiang Liang <jiallian@amazon.com> * [Feature] PPL Command: MvExpand (#5144) * MvExpand new PR - After resolving the merge issues Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> # Conflicts: # core/src/main/java/org/opensearch/sql/analysis/Analyzer.java # core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java # core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java # docs/category.json # docs/user/ppl/index.md # integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java # integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java # integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java # ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java # ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address coderrabbit comments Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * merge main to my branch Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address corerabbit suggestions Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address corerabbit suggestions Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * ci: trigger Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * ci: trigger Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address corerabbit suggestions Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * ci: trigger Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Address comments. Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * Change the exception from illegalarg to SyntaxCheckException Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * change the exception message for consistency Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * ci: rerun Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> --------- Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com> * bump antlr version to 4.13.2 (#5159) * Address comments Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> Signed-off-by: opensearch-ci-bot <opensearch-infra@amazon.com> Signed-off-by: Simeon Widdis <sawiddis@amazon.com> Signed-off-by: Chen Dai <daichen@amazon.com> Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com> Signed-off-by: Peng Huo <penghuo@gmail.com> Signed-off-by: Craig Perkins <cwperx@amazon.com> Signed-off-by: Peter Zhu <zhujiaxi@amazon.com> Signed-off-by: Jialiang Liang <jiallian@amazon.com> Co-authored-by: opensearch-trigger-bot[bot] <98922864+opensearch-trigger-bot[bot]@users.noreply.github.com> Co-authored-by: opensearch-ci-bot <opensearch-infra@amazon.com> Co-authored-by: Simeon Widdis <sawiddis@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Chen Dai <daichen@amazon.com> Co-authored-by: Srikanth Padakanti <srikanth29.9@gmail.com> Co-authored-by: Srikanth Padakanti <srikanth_padakanti@apple.com> Co-authored-by: Peng Huo <penghuo@gmail.com> Co-authored-by: Craig Perkins <cwperx@amazon.com> Co-authored-by: Peter Zhu <zhujiaxi@amazon.com> Co-authored-by: Jialiang Liang <jiallian@amazon.com> Co-authored-by: Eric Wei <menwe@amazon.com>

Description
As a follow-up after #5139, this PR implements the auto-extract mode for the
spathcommand per the proposal in #4307. Whenpathparameter is omitted,spathrewrites tojson_extract_all(input)function and returns amap<string, string>with flattened keys instead of expanding dynamic columns.Minor differences from proposal:
Related Issues
Resolves #4307 (partially, ergonomic map access syntax are follow-ups)
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.