{{ message }}
cmd/containerd-shim-runhcs-v1: serialize confidential container bring-up#2804
Open
EmanuelOprea wants to merge 1 commit into
Open
Conversation
Starting the containers of a multi-container container group in a single
confidential WCOW UVM concurrently reliably crashes the guest: the guest
resets and the GCS bridge drops ("bridge closed: use of closed network
connection"), and all containers end up Exited.
Root cause is concurrent container bring-up into one confidential UVM.
createContainer runs on its own goroutine per container, so two containers'
hcsoci.CreateContainer calls overlap. CreateContainer performs the container
bring-up: block-CIM mount, scratch SCSI attach, CombineLayers/hive-merge and
the guest container create. The host-side device hot-adds (uvm.modify ->
hcsSystem.Modify) go straight to the VM worker and do not travel over the GCS
bridge, so nothing serializes them; overlapping mount/device operations into
the confidential guest put it into a bad state and it resets.
Serialize the bring-up with a per-UVM lock held across CreateContainer, taken
only for confidential UVMs (HasConfidentialPolicy). This makes the container
starts effectively one-at-a-time into a given confidential UVM. createContainer
is shared with LCOW, hence the generic name and the confidential-only guard.
Validated on a confidential WCOW UVM (VBS): concurrent multi-container groups
that previously crashed every time (2x nanoserver, 2x mount-host,
1x mount-host + 2x nanoserver) now come up cleanly and repeatably; removing the
lock reproduces the crash on the same image.
Signed-off-by: Emanuel Oprea <2664342+EmanuelOprea@users.noreply.github.com>
869709c to
94e13cd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
Starting the containers of a multi-container container group in a single confidential WCOW UVM concurrently reliably crashes the guest: the guest resets and the GCS bridge drops (�ridge closed: use of closed network connection), and all containers end up
Exited.Root cause
Container bring-up is concurrent.
createContainerruns on its own goroutine per container, so two containers'hcsoci.CreateContainercalls overlap.CreateContainerperforms the bring-up: block-CIM mount, scratch SCSI attach,CombineLayers/hive-merge and the guest container create.The host-side device hot-adds (
uvm.modify->hcsSystem.Modify) go straight to the VM worker and do not travel over the GCS bridge, so nothing serializes them. Overlapping mount/device operations into the confidential guest put it into a bad state and it resets.Fix
Serialize the bring-up with a per-UVM lock held across
CreateContainer, taken only for confidential UVMs (HasConfidentialPolicy). This makes container starts effectively one-at-a-time into a given confidential UVM.createContaineris shared with LCOW, hence the generic name and the confidential-only guard.Testing
Validated on a confidential WCOW UVM (VBS). Concurrent multi-container groups that previously crashed every time now come up cleanly and repeatably:
Removing the lock reproduces the crash on the same image. LCOW is unaffected (guard + non-confidential path unchanged).