{{ message }}
Tags: num42/codeqa-action
Tags
fix(comments): sticky sentinel on agent-actions PR comment part (#73) The agent-actions part (kind: refactoring-tasks) was rendered without a sticky-comment sentinel, so run.sh never found an existing comment and posted a new one on every workflow run, spamming PRs with duplicates. render_parts/2 now appends the positional sentinel to any part that does not already carry it, covering all views (:metrics, :actions, :both).
perf(block-impact): byte cap + node-level parallelism (42.5s→15.4s) (#67 ) * perf(block-impact): skip LOO for oversized files + multi-path args LOO is O(file_bytes) per node; a few large/generated files (lockfiles, bundled assets) dominate runtime. Add a byte cap (default 32KB, CLI --max-loo-file-bytes, YAML max_loo_file_bytes, nil disables) — files over the cap get no refactoring nodes but still flow into the codebase aggregate and grade. Telemetry-measured: assets sample 42.5s -> 25.8s. The cap was being filtered out in Analyzer.do_analyze_codebase's Keyword.take before reaching BlockImpactAnalyzer; added it to the take list and a regression test at the analyze_codebase layer. Also lands multi-path args: `health-report <path> [subpath ...]` restricts the walk to given subpaths (e.g. `. lib test` skips priv/assets/config) while git context stays anchored at <path>. * perf(block-impact): parallelize per-node LOO across files in one pool Per-node leave-one-out previously parallelized only over files (Task.async_stream), running a file's nodes serially. A few large files (hundreds of sub-nodes each) then grind single-threaded while other cores idle. Split the work into three phases: prepare (tokenize/parse/index, per file), a single shared pool over every node of every file, and reconstruct (rebuild each tree, per file). Now the hundreds of nodes of one large file compete with all other nodes for the same worker pool. The node tree is flattened into indexed work units and rebuilt from the results keyed by a globally-unique index path (prefixed with the file path — a file-local index collides across files, since every file has a top-level node at index 0, which let parallel completion order pick a different file's result for concurrency >= 2). Guard added: node results are bit-identical between workers: 1 and workers: 8. Telemetry-measured on the assets sample: 25.8s -> 15.4s (and 42.5s -> 15.4s combined with the byte cap). * perf(block-impact): bound LOO memory via Flow + slim per-node units At ~3000 nodes the node pool drove peak memory to ~54GB and a hard slowdown cliff (240 files: 103s). Two causes, both fixed: 1. Each work unit carried its file's full node_ctx (content, tokens, cosines). Dispatching a unit to a worker copies the message, so the node_ctx was copied once PER NODE. Units now carry only the file key; the node_ctx lives in a per-file map captured once per Flow stage — O(stages) copies instead of O(nodes). Dropped the unused root_tokens from node_ctx while here. 2. The incremental aggregate and project languages were rebuilt per file over all file_results — O(files^2). Built once now and shared. Phase B switched from Task.async_stream to Flow.from_enumerable with max_demand, bounding in-flight units per stage (backpressure). Telemetry-measured at 240 files: 54GB -> 10.8GB peak, 103s -> 17s (6.1x). Output stays bit-identical (parallel == serial guard). * perf(block-impact): stream work units into Flow instead of materializing Phase B flat-mapped all prep units into one list before handing it to Flow, materializing the entire unit set (~3000 maps at 240 files) up front. Switched to Stream.flat_map so units are pulled lazily under Flow's max_demand backpressure. Telemetry-measured at 240 files: peak memory 10.8GB -> 7.1GB. Output stays bit-identical (parallel == serial guard).
fix: JS/Phoenix false-positives in health-report (no_dead_code, boole… …an ?-suffix) (#66) * feat(metrics): add UnreachableCode file metric Detects statements unreachable because they follow a terminal statement (return/raise/throw/break/continue) within the same indentation scope. Distinguishes genuine dead code from idiomatic early-return guards: a guard's trailing code sits at a shallower indent (outside the block) and is not flagged, while siblings at the same-or-deeper indent after a terminal are. Lines ending with net-open brackets are treated as multi-line expression continuations, not block-level terminals. Line- and indent-based, language-agnostic across brace/keyword-delimited languages. Gives the no_dead_code_after_return behavior the structural signal cosine similarity on aggregate metrics cannot capture. * fix(function-design): disable boolean ?-suffix check for JavaScript JS has no `?`-suffix predicate convention — `isActive()`/`hasFoo()` is the idiom, not `active?()`. Removed the JS sample pair so apply-languages drops javascript from the behavior's _languages allowlist (now elixir/python/ruby), and corrected the doc string. Also weaves the new unreachable_code metric scalars into this category (side effect of apply-scalars). * fix(code-smells): cut no_dead_code_after_return JS false-positives JS early-return guards were flagged as dead code: as a pure cosine classifier the behavior could not tell a guard from genuine unreachable code (near-identical aggregate profiles). Two changes: - 10 good/bad JS sample pairs showcasing guard patterns (DOM hooks, listeners, utils, async) as positive samples, so apply-scalars learns the new unreachable_code metric weight (mean_unreachable_after_terminal_ratio -1.97, the strongest negative scalar in the behavior). - _excludes_languages now also blocks json/xml — data files have no returns and were nonsensically flagged. Recalibration (apply-scalars) re-weaves the unreachable_code scalars across all behaviors; the touched YAMLs reflect that, not behavior changes. Verified against position-db: no_dead_code JS false-positives 4 -> 1, force_graph.js (multi-line return) and package.json (JSON) eliminated. Refs #65
perf(health-report): skip codebase metrics in base snapshot (#63) After #61 removed leave-one-out from the base snapshot, an observed v1.8 run on a 714-file repo still spent ~6min there. Telemetry pinned it: the base snapshot was running run_codebase_metrics — dominated by near_duplicate_blocks_codebase, which is O(files^2)-ish (17.9s for 373 files alone). Like the LOO nodes, that output is never read: codebase metrics sit beside the aggregate, and the base snapshot feeds only Delta.compute, which reads ['codebase']['aggregate']. Add a skip_codebase_metrics opt and set it for the base snapshot. The aggregate (all the delta needs) is still built from every file. Proven identical: analyze_codebase(compute_nodes: false) with and without the skip yields a byte-equal ['codebase']['aggregate'] on position-db/lib; only the unread near_duplicate/similarity keys drop. Measured locally against the real position-db (714 base files): the base snapshot's analyze phase dropped from ~6min to ~24s. 921 tests green.
perf(block-impact): byte-exact block slice enables subtractive LOO (#62) The leave-one-out path rebuilt the file-minus-block from normalized structural tokens, which collapse inter-token whitespace, round indentation to 2-space units, and map non-ASCII to spaces. So the reconstructed string diverged from the source, and any subtractive metric (baseline minus block) computed against an original-file baseline diverged too — the latent separator_counts bug (0/50 matches), which had been worked around by removing its analyze_loo/2. Fix the root cause: slice_without_original/2 cuts the block out of the original bytes using the first/last token's line+col, so block_content is the verbatim source span and reconstructed is the original file minus that span. Cuts fall on token boundaries, so counts subtract exactly. - analyze_loo/2 contract takes a shared block FileContext (built once per node) instead of a raw string, so subtractive metrics extract identifiers/tokens via the same pipeline as the baseline. - separator_counts and vowel_density go subtractive again, now correct: ~23ms and ~7ms per node drop to ~0.1ms. Verified by the subtractive_loo goldfile guard against real nested sample blocks. - The big LOO costs (punctuation densities, halstead n1/n2, ngram, line-based) stay on the re-analyze fallback: they are set-based or context-dependent at block boundaries and cannot subtract exactly. Measured on a 1/10 subset of a large repo via --telemetry: analyze_file per node 196ms -> 180ms; report output byte-identical except timestamp.
perf(health-report): skip leave-one-out in base snapshot (#61) The base snapshot feeds only the metric-changes delta, which reads codebase aggregates (Delta.compute) and never per-node block impact. It was running the full LOO over the entire base tree and throwing the nodes away — the dominant cost on large-repo PR runs. Build the snapshot with compute_nodes: false (and drop node_paths). Aggregates are byte-identical between the compute_nodes branches, so the report output is unchanged; only the wasted work is gone. Verified against a 1/10 subset of a large repo via --telemetry: nodes processed 744 -> 372 (base tree no longer re-analyzed for LOO), report output byte-identical except timestamp.
perf(health-report): skip codebase metrics in base snapshot (#63) After #61 removed leave-one-out from the base snapshot, an observed v1.8 run on a 714-file repo still spent ~6min there. Telemetry pinned it: the base snapshot was running run_codebase_metrics — dominated by near_duplicate_blocks_codebase, which is O(files^2)-ish (17.9s for 373 files alone). Like the LOO nodes, that output is never read: codebase metrics sit beside the aggregate, and the base snapshot feeds only Delta.compute, which reads ['codebase']['aggregate']. Add a skip_codebase_metrics opt and set it for the base snapshot. The aggregate (all the delta needs) is still built from every file. Proven identical: analyze_codebase(compute_nodes: false) with and without the skip yields a byte-equal ['codebase']['aggregate'] on position-db/lib; only the unread near_duplicate/similarity keys drop. Measured locally against the real position-db (714 base files): the base snapshot's analyze phase dropped from ~6min to ~24s. 921 tests green.
fix(action): scope health-report to the PR diff via base-ref (#60) run.sh only passed base-ref to `compare`, so health-report always ran the block-impact leave-one-out over the whole codebase — minutes on a large repo. Now health-report resolves a base ref too (explicit input, else the PR base branch) and passes --base-ref when one is available, scoping blocks to the diff. Standalone runs without a PR context fall through and analyze everything, unchanged. Extracted the base-ref resolution shared by both commands into resolve_base_ref(). base-ref stays required for compare, optional for health-report.
PreviousNext
