[RFC] Support `multisearch` command for PPL · Issue #4348 · opensearch-project/sql · GitHub
Skip to content

[RFC] Support multisearch command for PPL #4348

Description

@ahkcs

Problem Statement

Users need a first-class way to execute multiple independent subsearches and view the combined results in strict time order. Achieving this today requires manual composition (e.g., append + explicit sort) and careful command selection, which is error-prone, harder to optimize, and easy to
misuse (e.g., mixing non-streaming operations that stall pipelines).

Current State

  • PPL provides append, which concatenates results sequentially in the order subsearches are written.
  • append command can only support one single subsearch
  • There is no built-in interleaving by timestamp; users must add a sort as a separate step.

Short-Term Goal

Current multisearch behavior:

  • Uses UNION ALL + ORDER BY @timestamp DESC approach for result combination
  • Supports all PPL commands in subsearches (no artificial streaming restrictions)
  • Provides timestamp-based interleaving when @timestamp field is available
  • Falls back to sequential concatenation when timestamp field is missing

Long-Term Goals

  • Provide a first-class generating command that:
    • Runs two or more subsearches and globally interleaves results by timestamp (DESC).
    • Implements concurrent subsearch execution similar to SPL's streaming architecture for improved performance.
    • Provides real-time result merging capabilities for live data scenarios.

Current Implementation

The multisearch command is now available with the following syntax:

source=<index> | multisearch [ <subsearch> ] [ <subsearch> ] ... [ <subsearch> ]

Key semantics:

  • Position: Can be used after a source command (not necessarily first in pipeline).
  • Cardinality: Requires ≥ 2 subsearches.
  • Command support: All PPL commands are now supported in subsearches.
  • Result combination: UNION ALL of all subsearch outputs, followed by ORDER BY @timestamp DESC.
  • Time field: Uses @timestamp by default for ordering results.

Examples

  -- Age group analysis with stats (now supported)
  source=accounts | multisearch
      [source=accounts | where age < 30 | eval age_group = 'young']
      [source=accounts | where age >= 30 | eval age_group = 'adult']
  | stats count by age_group
  -- Time interleaving across two indices
  source=logs | multisearch
      [source=service_logs | where category IN ('A', 'B')]
      [source=metrics_logs | where category IN ('E', 'F')]
  | head 5
  -- Complex aggregations in subsearches (now supported)
  source=data | multisearch
      [source=sales | stats sum(revenue) by region | eval type = 'sales']
      [source=costs | stats sum(expense) by region | eval type = 'costs']
  | sort region

Next Steps for Long-Term Implementation

  • Concurrent Execution Architecture: Investigate implementing SPL-style concurrent subsearch execution using Calcite's parallel query capabilities.
  • Real-time Merging: Explore streaming result merging for live data scenarios where results arrive continuously.
  • Performance Optimization: Benchmark current UNION ALL + SORT approach vs potential concurrent execution benefits.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageRFCRequest For Commentscalcitecalcite migration releatedv3.4.0

Type

No type

Fields

No fields configured for issues without a type.

Projects

Status
New
Status
Done
Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions