Test case
submodule_foreach_bare.patch.txt
You can see in the backtrace that, because the wd check here
fails, mods is still NULL -- but submodules_from_index and submodules_from_head need it to be non-NULL.
We could just set it to an empty config in a bare repo, which would stop the crash, but it wouldn't actually enable data about those modules to be correctly loaded, since e.g. submodules_from_index seems to depend on the .gitmodules in the working directory (that is, it has no provision to load one from the ODB).
The root cause of this is that the submodule API has a few separate concepts mixed together. Here's a classification:
Operations on in-memory object, or operations that only affect the submodule's dir in .git/modules:
git_submodule_free
git_submodule_owner
git_submodule_repo_init (when used correctly, that is, with use_gitlink=true)
git_submodule_set_fetch_recurse_submodules
git_submodule_set_ignore
git_submodule_set_update
git_submodule_update_init_options
git_submodule_url
git_submodule_branch
git_submodule_fetch_recurse_submodules
git_submodule_ignore
git_submodule_name
git_submodule_path
git_submodule_update_strategy
The second set of these depend on various configuration, some of which will come from the config and some of which originate from a .gitmodules file, probably the one in the wd -- I haven't looked carefully.
Operations that depend on or affect the working directory:
git_submodule_add_setup
git_submodule_init
git_submodule_open
git_submodule_sync
git_submodule_update
git_submodule_wd_id
Operations that depend on or affect the index:
git_submodule_add_finalize
git_submodule_index_id
Operations that depend on or affect the HEAD:
git_submodule_head_id
Operations that depend on or affect the wd and index:
git_submodule_add_to_index
Operations that depend on or affect the wd, index, and HEAD:
git_submodule_foreach
git_submodule_location
git_submodule_lookup
git_submodule_reload
git_submodule_status
Further, there is no way to access information about a submodule's state as-of a prior commit (at least that I can see in the git_submodule_xxx api).
Perhaps the right way to handle submodules is to model submodules using a few kinds of object:
- a "submodule configuration set", representing the contents of a .gitmodules file, possibly one in the index or in a commit, and optionally as modified by local config.
- a "submodule configuration", an entry in such a set for a single submodule.
- a "submodule repository", representing a repo in .gitmodules (or, I guess, in the working tree, for legacy repos). It's just a normal git_repository, so not really a separate object type. This repository's HEAD can be read to see if there are new commits in the repo (since HEAD or the index).
- a "submodule status", for information about a working tree's checkout of a submodule: whether it is initialized, and if so, whether it has untracked content or modified files. This would be a transient thing created on request.
Under this approach, git_submodule_foreach might need three versions: one for a tree, one for the index, and one for the working directory. Or, perhaps, it would not exist at all, since as far as I can tell from a Google search, pretty much nobody actually uses it. We do call the underlying, buggy, all_submodules function in #4016. But under the above theory, #4016 would not need to exist; instead, users would simply manage submodule config sets (the cache) themselves.
Test case
submodule_foreach_bare.patch.txt
You can see in the backtrace that, because the wd check here
libgit2/src/submodule.c
Line 459 in 89c332e
We could just set it to an empty config in a bare repo, which would stop the crash, but it wouldn't actually enable data about those modules to be correctly loaded, since e.g. submodules_from_index seems to depend on the .gitmodules in the working directory (that is, it has no provision to load one from the ODB).
The root cause of this is that the submodule API has a few separate concepts mixed together. Here's a classification:
Operations on in-memory object, or operations that only affect the submodule's dir in .git/modules:
git_submodule_free
git_submodule_owner
git_submodule_repo_init (when used correctly, that is, with use_gitlink=true)
git_submodule_set_fetch_recurse_submodules
git_submodule_set_ignore
git_submodule_set_update
git_submodule_update_init_options
git_submodule_url
git_submodule_branch
git_submodule_fetch_recurse_submodules
git_submodule_ignore
git_submodule_name
git_submodule_path
git_submodule_update_strategy
The second set of these depend on various configuration, some of which will come from the config and some of which originate from a .gitmodules file, probably the one in the wd -- I haven't looked carefully.
Operations that depend on or affect the working directory:
git_submodule_add_setup
git_submodule_init
git_submodule_open
git_submodule_sync
git_submodule_update
git_submodule_wd_id
Operations that depend on or affect the index:
git_submodule_add_finalize
git_submodule_index_id
Operations that depend on or affect the HEAD:
git_submodule_head_id
Operations that depend on or affect the wd and index:
git_submodule_add_to_index
Operations that depend on or affect the wd, index, and HEAD:
git_submodule_foreach
git_submodule_location
git_submodule_lookup
git_submodule_reload
git_submodule_status
Further, there is no way to access information about a submodule's state as-of a prior commit (at least that I can see in the git_submodule_xxx api).
Perhaps the right way to handle submodules is to model submodules using a few kinds of object:
Under this approach, git_submodule_foreach might need three versions: one for a tree, one for the index, and one for the working directory. Or, perhaps, it would not exist at all, since as far as I can tell from a Google search, pretty much nobody actually uses it. We do call the underlying, buggy, all_submodules function in #4016. But under the above theory, #4016 would not need to exist; instead, users would simply manage submodule config sets (the cache) themselves.