fix(codex): dedupe archived session files#1176
Conversation
|
This PR was auto-closed. Only contributors approved with Maintainers review auto-closed issues and reopen worthwhile ones. Issues that do not meet the quality bar in CONTRIBUTING.md may not be reopened or receive a reply. If a maintainer replies See CONTRIBUTING.md. |
📝 WalkthroughWalkthroughExtends Codex discovery and loading to support multiple CODEX_HOME roots and both ChangesCodex Multi-Directory Session Loading
Sequence Diagram(s)sequenceDiagram
participant codex_usage_sources as codex_usage_sources()
participant collect_groups as collect_deduped_codex_usage_files
participant loader as load_codex_events_from_sources
participant aggregator as load_groups_from_sources
participant fs as Filesystem
codex_usage_sources->>fs: discover sessions/ and archived_sessions/ under homes
codex_usage_sources->>collect_groups: build CodexUsageSource list
collect_groups->>fs: collect and sort files per source, compute dedupe keys
collect_groups->>loader: emit per-source deduped file groups
loader->>fs: read session files (serial/parallel)
loader->>loader: append group events and dedupe globally
loader->>aggregator: provide per-source aggregated groups (shared dedupe state)
aggregator->>aggregator: merge groups using shared dedupe shards
aggregator->>caller: return deduplicated aggregated groups
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@yashau I have started the AI code review. It will take a few minutes to complete. |
|
✅ Actions performedFull review triggered. |
|
Opened the requested contribution issue first: #1177. This closed PR can serve as the ready implementation reference if maintainers approve/reopen. |
There was a problem hiding this comment.
2 issues found across 5 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
|
@yashau looks reasonable. thank you for this pr. |
|
@pullfrog review it. if we need to fix it just fix it |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — Adds archived Codex session support so ccusage codex reads from both sessions/ and archived_sessions/, with file-level deduplication by relative JSONL path (active sessions/ wins) and shared file discovery across loader and aggregate paths.
- Discover
sessions/andarchived_sessions/—codex_usage_paths_from_homesnow detects both directories; direct JSONL directory fallback is unchanged. - Cross-directory file deduplication —
collect_deduped_codex_usage_filesdeduplicates by relative session path; the first-seen (active) copy takes precedence. - Shared file collection —
load_codex_events_from_directoriesandload_groups_from_directoriesuse the same helpers so--json, table output, and session reports all see the same file set. - Simplified aggregate — removed
load_groups_from_directory_with_dedupe; shared dedupe logic inlined intoload_groups_from_directories. - Documentation — guide and adapter README updated for archived sessions and active-session precedence.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
ccusage
@ccusage/ccusage-darwin-arm64
@ccusage/ccusage-darwin-x64
@ccusage/ccusage-linux-arm64
@ccusage/ccusage-linux-x64
@ccusage/ccusage-win32-arm64
@ccusage/ccusage-win32-x64
commit: |
|
@pullfrog check cubic's response |
|
Verified cubic's PR summary against the actual code diff (5 files, commit
Notably, cubic's summary is more accurate than CodeRabbit's, which incorrectly claims "CODEX_HOME accepts multiple comma-separated roots" as a new feature — that was pre-existing. Cubic restricts itself to only what this PR actually introduces. Task list (4/4 completed)
|
|
@pullfrog rebase main |
|
Rebase of Task list (5/5 completed)
|
Rebase the remaining ccusage#1176 behavior on top of the archived_sessions support that landed in ccusage#1230. Collect Codex JSONL files through scoped usage sources so an active sessions/ file wins over an archived_sessions/ file with the same relative path within the same CODEX_HOME. Keep different CODEX_HOME roots independent, and use PathBuf keys instead of lossy string conversion for relative paths.
d29832d to
3327310
Compare
|
@coderabbitai @cubic-dev-ai please review the rebased update at 3327310.\n\nThis keeps #1176 scoped to the remaining value after #1230: file-level dedupe between sessions/ and archived_sessions/ within the same CODEX_HOME. It also addresses the prior cubic findings by scoping the dedupe key per CODEX_HOME and using PathBuf relative keys instead of lossy string conversion. |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — Adds file-level deduplication between active and archived Codex sessions so the active sessions/ copy wins over archived_sessions/ for the same relative JSONL path within a CODEX_HOME, with different homes kept independent.
CodexUsageSourceandCodexUsageFileGrouptypes — scoped file discovery withdedupe_scopeper home so deduplication is home-scoped.collect_deduped_codex_usage_files— deduplicates JSONL files by(dedupe_scope, relative_path)usingPathBufkeys, handling non-UTF8 paths without lossy string conversion.- Shared dedup across loader and aggregate — both
load_codex_events_from_sourcesandload_groups_from_sourcesuse the same collection logic for consistent results. - Tests — file-level dedup tests in
paths, event-level inloader, and aggregate-level inaggregate, covering active-wins, cross-home independence, and non-UTF8 paths.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏


Summary\n\n- Rebased #1176 on top of #1230\n- Keep active sessions/ JSONL files when archived_sessions/ has the same relative path in the same CODEX_HOME\n- Keep separate CODEX_HOME roots independent and use PathBuf keys for file dedupe\n- Document active-session precedence\n\n## Testing\n\n- direnv exec . cargo fmt --manifest-path rust/Cargo.toml -p ccusage\n- direnv exec . cargo test --manifest-path rust/Cargo.toml -p ccusage adapter::codex\n- direnv exec . cargo clippy --manifest-path rust/Cargo.toml -p ccusage --all-targets -- -D warnings\n- direnv exec . cargo test --manifest-path rust/Cargo.toml -p ccusage\n- direnv exec . pnpm run format\n- direnv exec . cargo run --manifest-path rust/Cargo.toml -q -p ccusage --bin ccusage -- codex daily --json --offline