fix(opencode): honor --since/--until in SQLite loader and JSON dump scan by justi · Pull Request #1188 · ccusage/ccusage · GitHub
Skip to content

fix(opencode): honor --since/--until in SQLite loader and JSON dump scan#1188

Open
justi wants to merge 1 commit into
ccusage:mainfrom
justi:fix/opencode-since-until-adapter
Open

fix(opencode): honor --since/--until in SQLite loader and JSON dump scan#1188
justi wants to merge 1 commit into
ccusage:mainfrom
justi:fix/opencode-since-until-adapter

Conversation

@justi

@justi justi commented May 29, 2026

Copy link
Copy Markdown

Summary

ccusage opencode (and the aggregate ccusage) reads every row from the OpenCode SQLite message table on every invocation, regardless of --since / --until, and then re-reads every JSON file under storage/message/*.json. On long-lived OpenCode installs (especially those migrated from the pre-SQLite layout) the loader can hang for minutes or appear to lock up.

SharedArgs.since / SharedArgs.until were already plumbed into the adapter (ccusage-cli/src/types.rs:34-35), but the loader did not use them.

Reproduction (local data, before this PR)

On main @ 9d90e1b, opencode storage with:

  • opencode.db: 33.6 GB, 392,340 rows in message
  • storage/message/: 118,619 JSON files (legacy pre-SQLite dump, ~2.0 GB)
$ ccusage opencode --since 2026-05-04 --until 2026-05-10
# never returns; the loader scans 392k DB rows + parses 118k JSON files

Raw SQL with WHERE time_created BETWEEN ? AND ? against the same DB returns in <50 ms.

Fix

rust/crates/ccusage/src/adapter/opencode/loader.rs:

  1. SQLite query uses the existing index. The message table already has CREATE INDEX message_session_time_created_id_idx ON message (session_id, time_created, id). When shared.since / shared.until are set, the prepared statement now adds WHERE time_created >= ?1 AND time_created < ?2 (or open-ended variants).

  2. storage/message/*.json loop skips by mtime. Files outside the inferred range are skipped via fs::metadata(...).modified() before any read/parse.

  3. Slack. since/until are inflated by -1 / +2 days before being applied, so the existing string-based summary filter (summary.rs:265-271) remains authoritative; the SQL/mtime checks only short-circuit work that the summary would have discarded anyway. This absorbs timezone offsets and any FAT32-style 2-second mtime rounding.

Performance on local data

Variant Runtime (--since 2026-05-04 --until 2026-05-10)
main @ 9d90e1b did not return
this PR, SQL WHERE only 186.68 s
this PR, SQL WHERE + mtime-skip 9.57 s

Loaded tokens are identical across variants and match a hand-written sqlite3 query against the same DB: 414,644 input / 4,646 output.

Cross-OS compatibility

Runtime: std::fs::metadata().modified() works on macOS APFS/HFS+, Linux ext4/btrfs/xfs, and Windows NTFS. FAT32's 2-second mtime granularity is covered by the slack.

Tests: new tests use the filetime crate (FileTime::from_unix_time) which maps to utimensat (Linux), setattrlist (macOS), and SetFileTime (Windows). Same code path runs on all three CI targets.

Tests added

4 new tests in adapter::opencode::loader::tests:

  • since_filter_drops_db_rows_older_than_lower_bound
  • until_filter_drops_db_rows_at_or_after_upper_bound
  • since_filter_skips_json_files_with_older_mtime
  • no_since_until_keeps_all_json_files_regardless_of_mtime

All 285 workspace tests pass; cargo clippy --workspace --all-targets -- -D warnings is clean; cargo fmt --all -- --check is clean.

Why this is a fresh narrow PR

The same user-visible gap was reported in #801, requested in #867, and previously fixed against the now-removed TypeScript @ccusage/opencode package in #960 (closed with: "If a gap remains, it needs a fresh narrow PR against main"). This PR is that narrow PR against the current Rust adapter. No CLI surface, docs, or configuration schema changes.


Summary by cubic

Make ccusage opencode honor --since/--until for both SQLite and JSON, avoiding full scans and cutting runtime on large installs from minutes to seconds.

  • Bug Fixes
    • SQLite: push down indexed WHERE time_created bounds (open-ended variants) with bound params; fall back to an unfiltered scan if the column is missing, then apply a per-row date check.
    • JSON: extract time.created from raw content and skip out-of-range files before full parse; works with pretty-printed files and fails open to a full parse if extraction fails.
    • Safety: widen bounds by -1/+2 days so the summary-time filter stays authoritative and tolerates TZ drift.
    • Tests: add cases for SQL bounds, fallback path, and JSON content filtering.

Written for commit 24b7abe. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features

    • Added optional date-range filtering for data loading operations.
    • Implemented SQL-level query optimization for filtered database access.
  • Improvements

    • Enhanced timestamp extraction from JSON data with fallback support.
    • Improved schema compatibility handling with automatic fallback scanning.
  • Tests

    • Extended test coverage for date-range filtering across multiple scenarios.

@github-actions

Copy link
Copy Markdown
Contributor

This PR was auto-closed. Only contributors approved with lgtm can open PRs. Open an issue first.

Maintainers review auto-closed issues and reopen worthwhile ones. Issues that do not meet the quality bar in CONTRIBUTING.md may not be reopened or receive a reply.

If a maintainer replies lgtmi, your future issues will stay open. If a maintainer replies lgtm, your future issues and PRs will stay open.

See CONTRIBUTING.md.

@github-actions github-actions Bot closed this May 29, 2026
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 05e43c95-440b-4202-bbf9-e3f73a709a75

📥 Commits

Reviewing files that changed from the base of the PR and between a98e374 and 24b7abe.

📒 Files selected for processing (1)
  • rust/crates/ccusage/src/adapter/opencode/loader.rs

📝 Walkthrough

Walkthrough

Adds date-range filtering to the OpenCode loader in loader.rs. For SQLite loads, a slack-widened SQL WHERE time_created clause is injected when since/until are set, with a fallback to an unfiltered scan on prepare failure. For JSON file loads, time.created is extracted from raw bytes before full deserialization. New helpers handle YYYYMMDD parsing, slack-bound computation, and authoritative range evaluation. Tests cover all paths.

Changes

OpenCode Loader Date-Range Filtering

Layer / File(s) Summary
Filtering helpers and constants
rust/crates/ccusage/src/adapter/opencode/loader.rs
Imports extended with date/time utilities; adds MS_PER_DAY, parse_yyyymmdd_to_millis, since_until_ms_with_slack, extract_message_timestamp, and timestamp_within_range as foundational primitives.
Conditional SQL pushdown in load_entries_from_database
rust/crates/ccusage/src/adapter/opencode/loader.rs
Replaces unconditional SELECT with a WHERE time_created slack-windowed query when since/until are present; falls back to unfiltered scan on prepare failure; adds per-row payload timestamp check in the iteration loop.
JSON file pre-parse fast path in read_message_file
rust/crates/ccusage/src/adapter/opencode/loader.rs
Extracts time.created from raw JSON bytes before full deserialization and returns None early for out-of-range messages; extraction failures fall through to the full parse.
Tests: DB filtering, fallback, and JSON extraction
rust/crates/ccusage/src/adapter/opencode/loader.rs
Adds create_db_message_with_time helper and new test cases for DB row exclusion by since/until, prepare-failure fallback with in-loop enforcement, JSON exclusion/inclusion by date bounds, and raw timestamp extraction from minified and pretty-printed JSON.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

  • ccusage/ccusage#1332: Directly modifies read_message_file and surrounding JSON load flow in the same loader.rs file, making its Option-returning behavior and skip logic intertwined with this PR's timestamp-based early return.

Poem

🐇 Hippity-hop through timestamps I go,
Checking each message's time.created glow.
SQL WHERE clauses pushed down with slack,
Out-of-range JSON? Tossed right on back!
With millis and YYYYMMDD strings aligned,
Only in-range entries shall be returned, refined. 🗓️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: fixing the OpenCode adapter to respect --since/--until flags in both SQLite loader and JSON file scanning.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@ryoppippi

Copy link
Copy Markdown
Member

Thanks for the careful writeup — the motivation is solid and the SQL path is well reasoned. One correctness concern and a couple of robustness/nit items before this is safe to merge.

Context

The authoritative date filter is filter_loaded_entries_by_date (adapter/claude/mod.rs:121), which runs after load and filters on each entry's logical date (time.created converted to TZ). So the new loader filtering is purely a load-time optimization, and the only correctness requirement is: the loader must never drop an entry that filter_loaded_entries_by_date would keep. Dropping extra rows just makes it slower; dropping kept rows silently under-counts.

✅ SQL WHERE time_created — looks correct

The -1 day / +2 day slack absorbs TZ offset (max ±14h < 1 day) and any small drift between the time_created column and JSON data.time.created, so it's always wider than the logical-date filter. No risk of dropping kept rows, and it rides the existing index. 👍

⚠️ JSON mtime skip on the --until side — silent under-count risk

File mtime is not guaranteed to match the message's logical time. cp -r, rsync without -a, backup restore, and cloud sync all reset mtime to "now". That's exactly the "legacy pre-SQLite dump migrated to a new machine" scenario this PR targets.

  • --since (skip when modified_ms < min): if mtime is reset newer than real, the file is not skipped → no loss. Reasonably safe.
  • --until (skip when modified_ms >= max): if a legacy file's mtime is reset to "now" but its logical date is in the past, an --until <past date> query will skip the file even though its logical date is in range → silent under-count.

"The summary filter is authoritative" does not protect this path, because the skip happens before load, so dropped rows never reach the authoritative filter. The new tests set mtime explicitly and so don't exercise the mtime-vs-logical-time divergence.

Suggested options:

  • Drop the --until mtime skip (the --since skip alone should capture most of the 186s → 9.57s win; legacy data is old and rarely hits the upper bound anyway), or
  • Make the mtime skip explicitly opt-in/heuristic and document that --until may under-count on environments where files were copied without preserving timestamps.

⚠️ DB schema without time_created — silent empty DB results

If an older/variant OpenCode schema lacks time_created, connection.prepare(sql) fails and the else branch returns Vec::new(). That means DB entries silently disappear only when --since/--until is set, while the no-filter path still works — an asymmetric regression. Consider falling back to the unfiltered query on prepare failure.

Nits

  • The JSON loop calls fs::metadata(&file) twice per file (once for since, once for until). With 118k files that's a lot of extra stats — fold into a single metadata call.
  • The two mtime blocks are near-duplicates; a small helper would read better.
  • filetime as a dev-dependency only is fine.

Net: SQL path is good to go; the --until mtime skip and the prepare-failure fallback are the two things I'd want addressed first.

@ryoppippi

Copy link
Copy Markdown
Member

@justi heads up — #1220 (now closed) tackled the same issue with a different JSON-side strategy that I think is worth pulling into this PR.

Instead of skipping JSON files by file mtime, it extracts the real time.created millis straight from the raw JSON string and runs the exact same date check the authoritative filter_loaded_entries_by_date uses (format_date_tz → compare against since/until). That sidesteps the mtime concern I raised above entirely: it's exact, needs no -1/+2 day slack, and if extraction fails it falls through to a full parse (fails open), so it can never drop a row the authoritative filter would keep.

Your SQLite WHERE time_created push-down is the better half — it's the part that actually uses the index and gives the big speedup, which #1220 lacked. So the ideal is the combination:

That keeps the speedup while removing the silent under-count risk. Closing #1220 so this stays the single canonical PR — would you be up for folding that JSON approach in here? Happy to point you at the exact extract_message_timestamp / timestamp_within_range helpers from that PR.

@ryoppippi ryoppippi closed this Jun 15, 2026
@justi

justi commented Jun 17, 2026

Copy link
Copy Markdown
Author

Following up on your suggestion to fold #1220's JSON strategy into this PR.

I've combined both halves as you outlined:

  • SQLite: kept the indexed WHERE time_created push-down with the -1d/+2d slack (the real index speedup).
  • JSON files: replaced the mtime skip with feat(opencode): pre-filter messages by --since/--until in loader #1220's content-based time.created extraction, running the same format_date_tz check as the authoritative filter_loaded_entries_by_date and failing open to a full parse. This removes the mtime under-count risk you raised.

I also addressed the prepare-failure fallback: a schema without time_created now falls back to an unfiltered scan (the in-loop date check still applies) instead of returning no rows.

Verified on a real 33 GB / 395k-row opencode.db: the time_created column equals the payload time.created in all 395,289 rows (max delta 0 ms), so the push-down can't drop a row the authoritative filter would keep. A one-month --since/--until window returns correct totals; the full unit suite, clippy, and fmt are clean.

Would you be willing to reopen this (or drop an lgtm) so I can push the combined version?

@ryoppippi ryoppippi reopened this Jun 21, 2026
@ryoppippi

Copy link
Copy Markdown
Member

@justi plz! go ahead

Push date bounds into the OpenCode loader instead of reading every DB
row and JSON file on each invocation, which hangs on large installs.

- SQLite: indexed `WHERE time_created` push-down with -1d/+2d slack,
  falling back to an unfiltered scan when the schema lacks the column
  (the in-loop date check still applies, so no row is silently dropped).
- JSON files: extract `time.created` from the raw payload and apply the
  same `format_date_tz` check as the authoritative
  `filter_loaded_entries_by_date`, failing open to a full parse so no
  in-range row is ever dropped.

Combines the indexed SQL push-down from ccusage#1188 with the content-based
JSON extraction from ccusage#1220, per maintainer guidance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@justi justi force-pushed the fix/opencode-since-until-adapter branch from e36d426 to 24b7abe Compare June 22, 2026 11:06
@justi

justi commented Jun 25, 2026

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants