Fix and speed up check_submodules.sh by Algunenano · Pull Request #105053 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix and speed up check_submodules.sh#105053

Merged
Algunenano merged 9 commits into
ClickHouse:masterfrom
Algunenano:fix-check-submodules-recursive
May 20, 2026
Merged

Fix and speed up check_submodules.sh#105053
Algunenano merged 9 commits into
ClickHouse:masterfrom
Algunenano:fix-check-submodules-recursive

Conversation

@Algunenano

@Algunenano Algunenano commented May 15, 2026

Copy link
Copy Markdown
Member

The recursive-submodule check in ci/jobs/scripts/check_style/check_submodules.sh was unreachable: a preceding [[ url != ... ]] && echo line inside a cmd | while block caused the pipeline to return 1 on the last iteration when all URLs were valid, and set -e aborted the script before reaching the recursive check. The recursive check itself only echoed without exiting either, so it would not have failed CI even if reached.

The script now uses while ... done < <(cmd) and explicit if ...; then echo; exit 1; fi, so the first violation is reported and the script exits 1. As a side effect it is also ~27× faster (3.2s → 0.12s), by replacing 131 per-submodule git submodule status -q "$path" calls with a single bulk git submodule status -q.

Depends on #105052 — that PR removes the nested submodules currently sitting in contrib/silk/contrib. Without it, this check would fail on master.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Version info

  • Merged into: 26.5.1.901

The recursive-submodule check was unreachable: a preceding `[[ url != ... ]] && echo`
inside a `cmd | while` block caused the pipeline to exit 1 on the last iteration
when all URLs were valid, and `set -e` aborted the script before reaching the
recursive check. The recursive check itself also only echoed without exiting.

Switch each loop to `while ... done < <(cmd)` and replace the `&&` shorthand
with `if ...; then echo; exit 1; fi` so the first violation is reported and
exits 1, as intended.
Replace per-submodule `git submodule status -q "$path"` (131 forks, ~3.2s) with
a single bulk `git submodule status -q` call (~0.1s). Merge the directory-
existence loop with the recursive-submodule check so we iterate the submodule
paths once instead of twice.
@clickhouse-gh

clickhouse-gh Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-ci label May 15, 2026
@thevar1able thevar1able self-assigned this May 15, 2026

@Ergus Ergus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. It fixes two issues by simplifying and optimizing... heart for you @Algunenano

@Algunenano

Copy link
Copy Markdown
Member Author

Welp, this isn't working because it's not triggering and failing the Style check

The Style check job did not opt into the submodule cache, so submodule
working trees were never populated. With no `contrib/<x>/.gitmodules`
present, the recursive-submodule check silently passed even when a
submodule pulled in nested submodules (as `contrib/silk` did).

Two changes:

- Opt `style_check` into `needs_submodules=True` so the cache is restored
  before the job runs.
- Fail the script up front if any registered submodule has no `.git`
  gitlink file. Using `[ -e "$path/.git" ]` is ~250x faster than parsing
  `git submodule status` and is the exact signal we need.
@thevar1able

This comment was marked as resolved.

Comment thread ci/jobs/scripts/check_style/check_submodules.sh Outdated
The submodule cache restores .git/modules but does not check out the
working trees, so check_submodules.sh would fail with "Submodule X is
not initialized". Run `git submodule update --init` first.
Reading .gitmodules from each submodule's bare repo at
.git/modules/<path> avoids the 93s of working-tree checkout that
`git submodule update --init` did on every Style check run. The
restored submodule cache already has each bare repo's HEAD pointing
at the pinned commit, so `git show HEAD:.gitmodules` is enough.

If a submodule isn't initialized, the script still fails fast with a
clear message.
Comment thread ci/jobs/scripts/check_style/check_submodules.sh Outdated
Reading .gitmodules from the bare repo's HEAD trusted that HEAD always
matches the superproject's gitlink, which can drift if the bare repo
gets touched out-of-band. Resolve every submodule's pinned SHA from the
superproject up front with a single `git ls-tree HEAD`, then read
.gitmodules at that exact SHA from each bare repo.

Addresses review feedback on PR ClickHouse#105053.
Comment thread ci/jobs/scripts/check_style/check_submodules.sh
Algunenano and others added 2 commits May 18, 2026 09:51
`git show $sha:.gitmodules | grep -q ...` returns grep's exit code, so a
missing pinned commit (incomplete/corrupted cache) silently looked like
"no nested submodules". Add an explicit `git cat-file -e` check so a
missing commit fails loudly.

Addresses review feedback on PR ClickHouse#105053.
Comment thread ci/defs/job_configs.py Outdated
The check verifies `.gitmodules` (no recursive submodules, valid URLs,
name == path). It requires submodules to be initialized, which made the
style check ~20s slower because it had to download the submodule cache.
Move it to the arm_tidy build, which already initializes submodules.

Addresses ClickHouse#105053 (comment)
@Algunenano Algunenano requested a review from maxknv May 20, 2026 12:38
@maxknv

maxknv commented May 20, 2026

Copy link
Copy Markdown
Member

Thanks! The check itself is fast, less than half a second, but it's 50s for the submodules checkout. Good save for style check!
image

@Algunenano

Copy link
Copy Markdown
Member Author

@Algunenano Algunenano enabled auto-merge May 20, 2026 16:56
@Algunenano Algunenano added this pull request to the merge queue May 20, 2026
Merged via the queue into ClickHouse:master with commit 48b7d9f May 20, 2026
162 of 166 checks passed
@Algunenano Algunenano deleted the fix-check-submodules-recursive branch May 20, 2026 17:11
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 20, 2026
DavidHe-2008 pushed a commit to DavidHe-2008/ClickHouse that referenced this pull request Jun 1, 2026
The check verifies `.gitmodules` (no recursive submodules, valid URLs,
name == path). It requires submodules to be initialized, which made the
style check ~20s slower because it had to download the submodule cache.
Move it to the arm_tidy build, which already initializes submodules.

Addresses ClickHouse#105053 (comment)
DavidHe-2008 pushed a commit to DavidHe-2008/ClickHouse that referenced this pull request Jun 1, 2026
…ules-recursive

Fix and speed up check_submodules.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants