fix: JS/Phoenix false-positives in health-report (no_dead_code, boolean ?-suffix) by aspala · Pull Request #66 · num42/codeqa-action · GitHub
Skip to content

fix: JS/Phoenix false-positives in health-report (no_dead_code, boolean ?-suffix)#66

Merged
aspala merged 4 commits into
mainfrom
fix/js-false-positives
Jun 10, 2026
Merged

fix: JS/Phoenix false-positives in health-report (no_dead_code, boolean ?-suffix)#66
aspala merged 4 commits into
mainfrom
fix/js-false-positives

Conversation

@aspala

@aspala aspala commented Jun 10, 2026

Copy link
Copy Markdown
Member

Was & warum

health-report erzeugte gegen Phoenix-Repos mit JS-Assets eine Reihe von False Positives in den Handlungsempfehlungen (entdeckt beim Lauf gegen position-db, s. #65). Drei Ursachen, drei Fixes:

1. no_dead_code_after_return flaggte idiomatische JS early-return-Guards als dead code

Als reiner cosine-Klassifizierer konnte das Behavior if (!x) return; (Guard) nicht von echtem unreachable code unterscheiden — die Aggregat-Metrik-Profile sind nahezu identisch. Sample-Tuning allein war empirisch asymptotisch (Cosine -0.41 → -0.34 über 3→10 Sample-Paare, Schwelle nie erreicht).

Fix: Neue UnreachableCode-File-Metrik (feat-Commit) liefert das strukturelle Signal, das cosine fehlt — sie misst, ob nach einem terminalen Statement (return/raise/throw/break/continue) Code auf gleichem-oder-tieferem Indent folgt. Guards → 0, dead code → >0. Mehrzeilige return (…)-Ausdrücke werden über Klammer-Bilanz korrekt als Fortsetzung erkannt, nicht als dead code. Nach apply-scalars bekam mean_unreachable_after_terminal_ratio das stärkste negative Gewicht (-1.97) im Behavior.

Zusätzlich: 10 good/bad JS-Sample-Paare (Guard-Patterns) als Trainingsbasis, und _excludes_languages blockt jetzt auch json/xml (Datenfiles haben keine returns).

2. boolean_function_has_question_mark feuerte auf JS

Das ?-Suffix ist eine Elixir/Ruby-Konvention — JS nutzt isActive()/hasFoo(). JavaScript aus der _languages-Allowlist entfernt (JS-Samples gelöscht → apply-languages zieht das automatisch), doc korrigiert.

3. Re-Kalibrierung

apply-scalars webt die neue Metrik in alle Behaviors ein und re-trainiert die Gewichte über die Sample-Basis. Die berührten YAMLs (consistency, error_handling, scope_and_assignment, type_and_value, variable_naming) spiegeln das — keine bewussten Behavior-Änderungen, sondern die unvermeidliche Folge einer neuen globalen Metrik.

Verifikation (gegen position-db, full-repo)

orig jetzt
no_dead_code JS-FP 4 1
boolean_function_has_question_mark auf JS 1 0
force_graph.js (multi-line return) 0
package.json (JSON) 0
  • ✅ 928 Tests, 0 failures
  • mix credo --strict, no issues
  • ✅ Neue Metrik mit 7 Unit-Tests (Guard/dead-code/multi-line/edge cases)

Bekannte Grenze

Ein no_dead_code-FP bleibt: sticky_offset.js. Die neue Metrik gibt für diese Datei korrekt 0 zurück — das Behavior feuert dennoch knapp, weil das additive cosine-Modell das starke -1.97-Signal von vielen schwachen anderen Metriken überstimmen lässt. Das ist eine Modell-Grenze, kein Metrik-Bug. Verbleibende FPs (sticky_offset cosine, registry_mass.ex boolean-?, name_contains_and) als Follow-up unter #65.

Refs #65

aspala added 3 commits June 10, 2026 18:11
Detects statements unreachable because they follow a terminal statement
(return/raise/throw/break/continue) within the same indentation scope.

Distinguishes genuine dead code from idiomatic early-return guards: a
guard's trailing code sits at a shallower indent (outside the block) and
is not flagged, while siblings at the same-or-deeper indent after a
terminal are. Lines ending with net-open brackets are treated as
multi-line expression continuations, not block-level terminals.

Line- and indent-based, language-agnostic across brace/keyword-delimited
languages. Gives the no_dead_code_after_return behavior the structural
signal cosine similarity on aggregate metrics cannot capture.
JS has no `?`-suffix predicate convention — `isActive()`/`hasFoo()` is
the idiom, not `active?()`. Removed the JS sample pair so apply-languages
drops javascript from the behavior's _languages allowlist (now
elixir/python/ruby), and corrected the doc string.

Also weaves the new unreachable_code metric scalars into this category
(side effect of apply-scalars).
JS early-return guards were flagged as dead code: as a pure cosine
classifier the behavior could not tell a guard from genuine unreachable
code (near-identical aggregate profiles). Two changes:

- 10 good/bad JS sample pairs showcasing guard patterns (DOM hooks,
  listeners, utils, async) as positive samples, so apply-scalars learns
  the new unreachable_code metric weight (mean_unreachable_after_terminal_ratio
  -1.97, the strongest negative scalar in the behavior).
- _excludes_languages now also blocks json/xml — data files have no
  returns and were nonsensically flagged.

Recalibration (apply-scalars) re-weaves the unreachable_code scalars
across all behaviors; the touched YAMLs reflect that, not behavior
changes. Verified against position-db: no_dead_code JS false-positives
4 -> 1, force_graph.js (multi-line return) and package.json (JSON)
eliminated.

Refs #65
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Score: C+ → C+ | Δ -1 pts | 10 blocks flagged across 0 files | 0 modified, 0 added

🟠 Code Health: C+ (63/100)

193 files · codeqa-action · 2026-06-10

Combined metric scores use cosine similarity: +1 = metric profile perfectly matches healthy pattern for this behavior, 0 = no signal, −1 = anti-pattern detected. Mapped to 0–100 using breakpoints (approx: ≥0.5→A, ≥0.2→B, ≥0.0→C, ≥−0.3→D, <−0.3→F); actual letter grades use the full 15-step scale.

Metric Changes

Category Base Head Δ
Readability 88.55 97.84 +9.28
Complexity 30.50 41.46 +10.95
Duplication 0.57 0.59 +0.02
Structure 6.22 9.16 +2.94
%%{init: {'theme': 'neutral'}}%%
xychart-beta
    title "Code Health Scores"
    x-axis ["Readability", "Complexity", "Structure", "Duplication", "Naming", "Magic Numbers", "Combined Metrics"]
    y-axis "Score" 0 --> 100
    bar [94, 30, 88, 48, 96, 100, 65]
Loading
Readability       ███████████████████░   94  🟢 A
Complexity        ██████░░░░░░░░░░░░░░   30  🔴 D-
Structure         ██████████████████░░   88  🟢 A-
Duplication       ██████████░░░░░░░░░░   48  🟠 C-
Naming            ███████████████████░   96  🟢 A
Magic Numbers     ████████████████████  100  🟢 A
Combined Metrics  █████████████░░░░░░░   65  🔴 D

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
🔍 Top Likely Issues (cosine similarity)

Most negative cosine = file's metric profile best matches this anti-pattern.

Behavior Cosine Score
dependencies.low_coupling -0.56 -12.56
file_structure.single_responsibility -0.52 -12.34
file_structure.line_count_under_300 -0.44 -9.44
code_smells.no_dead_code_after_return -0.41 -22.74
scope_and_assignment.shadowed_by_inner_scope -0.34 -4.94
file_structure.line_length_under_120 -0.30 -8.32
variable_naming.loop_var_is_single_letter -0.23 3.53
type_and_value.no_implicit_null_initial -0.21 -14.36
variable_naming.name_contains_and -0.21 -36.24
variable_naming.name_contains_type_suffix -0.20 -1.55
🟢 Readability — A (94/100)

Codebase averages: flesch_adapted=97.84, fog_adapted=4.82, avg_tokens_per_line=9.54, avg_line_length=35.75

Metric Value Score
readability.flesch_adapted 97.84 100
readability.fog_adapted 4.82 100
readability.avg_tokens_per_line 9.54 72
readability.avg_line_length 35.75 100
🔴 Complexity — D- (30/100)

Codebase averages: difficulty=41.46, effort=237270.59, volume=4072.01, estimated_bugs=1.36

Metric Value Score
halstead.difficulty 41.46 41
halstead.effort 237270.59 0
halstead.volume 4072.01 46
halstead.estimated_bugs 1.36 46
🟢 Structure — A- (88/100)

Codebase averages: branching_density=0.14, mean_depth=3.86, avg_function_lines=8.31, max_depth=9.22, max_function_lines=19.97, variance=6.84, avg_param_count=1.16, max_param_count=2.06

Metric Value Score
branching.branching_density 0.14 76
indentation.mean_depth 3.86 88
function_metrics.avg_function_lines 8.31 89
indentation.max_depth 9.22 87
function_metrics.max_function_lines 19.97 100
indentation.variance 6.84 100
function_metrics.avg_param_count 1.16 100
function_metrics.max_param_count 2.06 100
🟠 Duplication — C- (48/100)

Codebase averages: redundancy=0.59, bigram_repetition_rate=0.54, trigram_repetition_rate=0.37

Metric Value Score
compression.redundancy 0.59 58
ngram.bigram_repetition_rate 0.54 38
ngram.trigram_repetition_rate 0.37 40
🟢 Naming — A (96/100)

Codebase averages: entropy=0.89, mean=6.64, variance=18.82, avg_sub_words_per_id=1.17

Metric Value Score
casing_entropy.entropy 0.89 100
identifier_length_variance.mean 6.64 100
identifier_length_variance.variance 18.82 85
readability.avg_sub_words_per_id 1.17 100
🟢 Magic Numbers — A (100/100)

Codebase averages: density=0.00

Metric Value Score
magic_number_density.density 0.00 100
🔴 Combined Metrics — D (65/100)
Category Score Grade
Code Smells 25 🔴 D-
Consistency 81 🟡 B+
Dependencies 19 🔴 E+
Documentation 83 🟡 B+
Error Handling 91 🟢 A-
File Structure 48 🟠 C-
Function Design 81 🟡 B+
Naming Conventions 90 🟢 A-
Scope And Assignment 28 🔴 D-
Testing 83 🟡 B+
Type And Value 89 🟢 A-
Variable Naming 74 🟡 B
🔴 Code Smells — D- (25/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
no_dead_code_after_return -0.41 25 D-
🟡 Consistency — B+ (81/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
consistent_function_style 0.36 81 B+
🔴 Dependencies — E+ (19/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
low_coupling -0.56 19 E+
🟡 Documentation — B+ (83/100)

Cosine similarity scores for 3 behaviors.

Behavior Cosine Score Grade
file_has_module_docstring 0.30 77 B
function_has_docstring 0.45 86 A-
docstring_is_nonempty 0.45 87 A-
🟢 Error Handling — A- (91/100)

Cosine similarity scores for 3 behaviors.

Behavior Cosine Score Grade
error_message_is_descriptive 0.45 87 A-
does_not_swallow_errors 0.60 92 A-
returns_typed_error 0.69 94 A
🟠 File Structure — C- (48/100)

Cosine similarity scores for 5 behaviors.

Behavior Cosine Score Grade
single_responsibility -0.52 21 E+
line_count_under_300 -0.44 24 E+
line_length_under_120 -0.30 30 D-
has_consistent_indentation 0.26 74 B
no_magic_numbers 0.57 91 A-
🟡 Function Design — B+ (81/100)

Cosine similarity scores for 3 behaviors.

Behavior Cosine Score Grade
is_less_than_20_lines 0.33 79 B+
no_magic_numbers 0.38 82 B+
has_verb_in_name 0.40 83 B+
🟢 Naming Conventions — A- (90/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
function_name_is_not_single_word 0.50 90 A-
🔴 Scope And Assignment — D- (28/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
shadowed_by_inner_scope -0.34 28 D-
🟡 Testing — B+ (83/100)

Cosine similarity scores for 2 behaviors.

Behavior Cosine Score Grade
test_single_concept 0.27 74 B
test_name_describes_behavior 0.53 91 A-
🟢 Type And Value — A- (89/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
hardcoded_url_or_path 0.49 89 A-
🟡 Variable Naming — B (74/100)

Cosine similarity scores for 1 behaviors.

Behavior Cosine Score Grade
name_is_generic 0.26 74 B

@github-actions

Copy link
Copy Markdown
Contributor

kind: refactoring-tasks
path: /home/runner/work/codeqa-action/codeqa-action
timestamp: 2026-06-10T16:14:55.433355Z
overall_grade: C+
overall_score: 63
task_count: 0
critical: 0
high: 0
instructions: >-
Address the tasks below in order of severity (critical first).
After each fix, run the project's test suite and confirm it passes
before moving on.

No critical or high-severity blocks need attention. ✅

@aspala aspala merged commit 3f024ce into main Jun 10, 2026
8 checks passed
@aspala aspala deleted the fix/js-false-positives branch June 10, 2026 19:50
@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant