fix(codex): dedupe archived session files by yashau · Pull Request #1176 · ccusage/ccusage · GitHub
Skip to content

fix(codex): dedupe archived session files#1176

Merged
ryoppippi merged 1 commit into
ccusage:mainfrom
yashau:codex/include-archived-sessions
Jun 8, 2026
Merged

fix(codex): dedupe archived session files#1176
ryoppippi merged 1 commit into
ccusage:mainfrom
yashau:codex/include-archived-sessions

Conversation

@yashau

@yashau yashau commented May 28, 2026

Copy link
Copy Markdown
Contributor

Summary\n\n- Rebased #1176 on top of #1230\n- Keep active sessions/ JSONL files when archived_sessions/ has the same relative path in the same CODEX_HOME\n- Keep separate CODEX_HOME roots independent and use PathBuf keys for file dedupe\n- Document active-session precedence\n\n## Testing\n\n- direnv exec . cargo fmt --manifest-path rust/Cargo.toml -p ccusage\n- direnv exec . cargo test --manifest-path rust/Cargo.toml -p ccusage adapter::codex\n- direnv exec . cargo clippy --manifest-path rust/Cargo.toml -p ccusage --all-targets -- -D warnings\n- direnv exec . cargo test --manifest-path rust/Cargo.toml -p ccusage\n- direnv exec . pnpm run format\n- direnv exec . cargo run --manifest-path rust/Cargo.toml -q -p ccusage --bin ccusage -- codex daily --json --offline

@yashau

yashau commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

@github-actions

Copy link
Copy Markdown
Contributor

This PR was auto-closed. Only contributors approved with lgtm can open PRs. Open an issue first.

Maintainers review auto-closed issues and reopen worthwhile ones. Issues that do not meet the quality bar in CONTRIBUTING.md may not be reopened or receive a reply.

If a maintainer replies lgtmi, your future issues will stay open. If a maintainer replies lgtm, your future issues and PRs will stay open.

See CONTRIBUTING.md.

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Extends Codex discovery and loading to support multiple CODEX_HOME roots and both sessions/ and archived_sessions/, collecting usage files across sources and deduplicating by relative session path so active sessions/ entries take precedence.

Changes

Codex Multi-Directory Session Loading

Layer / File(s) Summary
Documentation updates
docs/guide/codex/index.md, rust/crates/ccusage/src/adapter/codex/README.md
User guide and adapter README now document comma-separated CODEX_HOME roots, discovery of sessions/ and archived_sessions/, direct JSONL root handling, and deduplication precedence favoring sessions/ over archived_sessions/.
Path discovery and file collection
rust/crates/ccusage/src/adapter/codex/paths.rs
Adds codex_usage_sources and types CodexUsageSource/CodexUsageFileGroup; detects sessions/ and archived_sessions/ per home with fallback to the parent home when neither exists; implements collect_codex_usage_files and collect_deduped_codex_usage_files to produce sorted and deduped per-source file groups. Unit tests validate discovery, dedupe keys (including non-UTF8), and cross-home behaviors.
Event loading from multiple sources
rust/crates/ccusage/src/adapter/codex/loader.rs
Adds load_codex_events_from_sources used by load_codex_events_inner; when multiple sources exist it iterates deduped per-source file groups, reads session files (serial or parallel per single_thread), appends group events, and runs a global dedupe_codex_events. Tests assert active-vs-archived duplicate handling.
Group aggregation across sources
rust/crates/ccusage/src/adapter/codex/aggregate.rs
load_groups now uses codex_usage_sources() and delegates to load_groups_from_directory for single-source fast path or load_groups_from_sources for multi-source aggregation; shared dedupe shards are used to avoid double-counting across sources. Tests verify per-day token totals when duplicate relative session files exist.

Sequence Diagram(s)

sequenceDiagram
  participant codex_usage_sources as codex_usage_sources()
  participant collect_groups as collect_deduped_codex_usage_files
  participant loader as load_codex_events_from_sources
  participant aggregator as load_groups_from_sources
  participant fs as Filesystem

  codex_usage_sources->>fs: discover sessions/ and archived_sessions/ under homes
  codex_usage_sources->>collect_groups: build CodexUsageSource list
  collect_groups->>fs: collect and sort files per source, compute dedupe keys
  collect_groups->>loader: emit per-source deduped file groups
  loader->>fs: read session files (serial/parallel)
  loader->>loader: append group events and dedupe globally
  loader->>aggregator: provide per-source aggregated groups (shared dedupe state)
  aggregator->>aggregator: merge groups using shared dedupe shards
  aggregator->>caller: return deduplicated aggregated groups
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • pullfrog

"I hopped through sessions, old and new,
Roots joined, duplicates trimmed to few;
Active files lead, archives step aside,
One tidy stream of JSONL pride. 🐇✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(codex): dedupe archived session files' accurately summarizes the main change: implementing deduplication logic for archived session files to prevent double-counting when both active and archived sessions exist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot closed this May 28, 2026
@cubic-dev-ai

cubic-dev-ai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai @cubic-dev-ai please review this PR.

@yashau I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown

@yashau Sure! I'll kick off a full review of this PR right away.

✅ Actions performed

Full review triggered.

@yashau

yashau commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

Opened the requested contribution issue first: #1177. This closed PR can serve as the ready implementation reference if maintainers approve/reopen.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread rust/crates/ccusage/src/adapter/codex/paths.rs
Comment thread rust/crates/ccusage/src/adapter/codex/paths.rs Outdated
@ryoppippi ryoppippi reopened this May 29, 2026
@ryoppippi

Copy link
Copy Markdown
Member

@yashau looks reasonable. thank you for this pr.
i'll take a look in a couple of days!

@ryoppippi

Copy link
Copy Markdown
Member

@pullfrog review it. if we need to fix it just fix it

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — Adds archived Codex session support so ccusage codex reads from both sessions/ and archived_sessions/, with file-level deduplication by relative JSONL path (active sessions/ wins) and shared file discovery across loader and aggregate paths.

  • Discover sessions/ and archived_sessions/codex_usage_paths_from_homes now detects both directories; direct JSONL directory fallback is unchanged.
  • Cross-directory file deduplicationcollect_deduped_codex_usage_files deduplicates by relative session path; the first-seen (active) copy takes precedence.
  • Shared file collectionload_codex_events_from_directories and load_groups_from_directories use the same helpers so --json, table output, and session reports all see the same file set.
  • Simplified aggregate — removed load_groups_from_directory_with_dedupe; shared dedupe logic inlined into load_groups_from_directories.
  • Documentation — guide and adapter README updated for archived sessions and active-session precedence.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@pkg-pr-new

pkg-pr-new Bot commented Jun 6, 2026

Copy link
Copy Markdown

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1176

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1176

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1176

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1176

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1176

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1176

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1176

commit: 3327310

@ryoppippi

Copy link
Copy Markdown
Member

@pullfrog check cubic's response

@pullfrog

pullfrog Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Verified cubic's PR summary against the actual code diff (5 files, commit d29832d). All four claims are accurate:

  • sessions/ + archived_sessions/ discoverycodex_usage_paths_from_homes in paths.rs now checks for both subdirectories and pushes whichever exist.
  • Deduplication with sessions/ prioritycollect_deduped_codex_usage_files deduplicates by relative JSONL path. Since sessions/ is pushed first by codex_usage_paths_from_homes, active copies always win in the dedup.
  • Direct JSONL dirs unchanged — When neither sessions/ nor archived_sessions/ exists as a subdirectory, the path falls through to the original direct JSONL path handling.
  • Shared file discovery — Both loader.rs (load_codex_events_from_directories) and aggregate.rs (load_groups_from_directories) use the same collect_deduped_codex_usage_files / collect_codex_usage_files helpers.

Notably, cubic's summary is more accurate than CodeRabbit's, which incorrectly claims "CODEX_HOME accepts multiple comma-separated roots" as a new feature — that was pre-existing. Cubic restricts itself to only what this PR actually introduces.

Task list (4/4 completed)
  • Read issue comments to understand what 'cubic's response' refers to
  • Checkout the PR branch and review the actual code changes
  • Compare cubic's summary against the actual changes
  • Report findings via pullfrog_report_progress

Pullfrog  | View workflow run | via Pullfrog | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@ryoppippi

Copy link
Copy Markdown
Member

@pullfrog rebase main

@pullfrog

pullfrog Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Rebase of codex/include-archived-sessions onto main succeeded locally — no conflicts, all prepush hooks passed (clippy, oxlint, treefmt, gitleaks, cargo test, vitest). However, the push was rejected because this is a fork PR (yashau/ccusage) and the GitHub App pushing to the fork doesn't have workflows permission to modify workflow files (the merge brought in changes to .github/workflows/approve-contributor.yaml from main). The branch cannot be updated remotely.

Task list (5/5 completed)
  • Setup: checkout PR branch, get PR info, fetch main
  • Attempt merge of main into branch
  • Resolve conflicts if present
  • Commit merge resolution, verify, and push
  • Report progress

Pullfrog  | View workflow run | via Pullfrog | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

Rebase the remaining ccusage#1176 behavior on top of the archived_sessions support that landed in ccusage#1230.

Collect Codex JSONL files through scoped usage sources so an active sessions/ file wins over an archived_sessions/ file with the same relative path within the same CODEX_HOME. Keep different CODEX_HOME roots independent, and use PathBuf keys instead of lossy string conversion for relative paths.
@ryoppippi ryoppippi force-pushed the codex/include-archived-sessions branch from d29832d to 3327310 Compare June 8, 2026 10:31
@ryoppippi

Copy link
Copy Markdown
Member

@coderabbitai @cubic-dev-ai please review the rebased update at 3327310.\n\nThis keeps #1176 scoped to the remaining value after #1230: file-level dedupe between sessions/ and archived_sessions/ within the same CODEX_HOME. It also addresses the prior cubic findings by scoping the dedupe key per CODEX_HOME and using PathBuf relative keys instead of lossy string conversion.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai @cubic-dev-ai please review the rebased update at 3327310.\n\nThis keeps #1176 scoped to the remaining value after #1230: file-level dedupe between sessions/ and archived_sessions/ within the same CODEX_HOME. It also addresses the prior cubic findings by scoping the dedupe key per CODEX_HOME and using PathBuf relative keys instead of lossy string conversion.

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Re-trigger cubic

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — Adds file-level deduplication between active and archived Codex sessions so the active sessions/ copy wins over archived_sessions/ for the same relative JSONL path within a CODEX_HOME, with different homes kept independent.

  • CodexUsageSource and CodexUsageFileGroup types — scoped file discovery with dedupe_scope per home so deduplication is home-scoped.
  • collect_deduped_codex_usage_files — deduplicates JSONL files by (dedupe_scope, relative_path) using PathBuf keys, handling non-UTF8 paths without lossy string conversion.
  • Shared dedup across loader and aggregate — both load_codex_events_from_sources and load_groups_from_sources use the same collection logic for consistent results.
  • Tests — file-level dedup tests in paths, event-level in loader, and aggregate-level in aggregate, covering active-wins, cross-home independence, and non-UTF8 paths.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@ryoppippi ryoppippi changed the title feat(codex): include archived sessions fix(codex): dedupe archived session files Jun 8, 2026
@ryoppippi ryoppippi merged commit 3c2e5ae into ccusage:main Jun 8, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants