uki: introduce support for a .efifw section by ani-sinha · Pull Request #35091 · systemd/systemd · GitHub
Skip to content

uki: introduce support for a .efifw section#35091

Merged
poettering merged 1 commit into
systemd:mainfrom
ani-sinha:fw-support
Jan 31, 2025
Merged

uki: introduce support for a .efifw section#35091
poettering merged 1 commit into
systemd:mainfrom
ani-sinha:fw-support

Conversation

@ani-sinha

@ani-sinha ani-sinha commented Nov 8, 2024

Copy link
Copy Markdown
Contributor
UKIs can be used to bundle uefi firmwares that can be measured and
used on a confidential computing environment. There can be more than one
firmware blob bundle, each one for a specific platform. Also firmware images
can themselves be containers like IGVM files that can in turn bundle the
actual firmware blob. This change is specifically for uefi firmwares, not
IGVM container files.

This change adds support to introduce a .efifw section in UKI that can be
used for firmware blobs/images. There can be multiple such sections and each
section can contain a single firmware image.

The matching .hwids entry for a specific platform can be used to select the
most appropriate firmware blob.

ukify tool has been also changed to support addition of a firmware image
in UKI.

Since firmware gets measured automatically, we do not need to measure it
separately as a part of the UKI.

@github-actions github-actions Bot added documentation util-lib uki please-review PR is ready for (re-)review by a maintainer labels Nov 8, 2024
@poettering

Copy link
Copy Markdown
Member

@ani-sinha

Copy link
Copy Markdown
Contributor Author

including this in the UKI is one thing, but how is this actually supposed to be used? i mean, the firmware usually comes first, and it then loads the UKI. So how would this be used here, it seems to reverse the roles?

Can you elaborate?

I am glad you asked. This change is first of a set of multiple changes in UKI in order to support the idea we have described in our talk at the KVM Forum this year:
https://pretalx.com/kvm-forum-2024/talk/HJSKRQ/

Recording of the presentation is here: https://youtu.be/VCMBxU6tAto?feature=shared
Expanded slides on the firmware update mechanism is here: https://people.redhat.com/~anisinha/BYOF-mechanism.pdf

The basic idea is this - in the cloud deployment environment, the end users (tenants) need to put an implicit trust in the firmware supplied by the cloud provider (AWS, Azure, GCP etc). In confidential environment, this is problematic and it poses various other issues such as when the cloud provider changes the firmware image, the measurements break which breaks tenants. IGVM does not solve all cases since some cloud providers do not have access to tenants file system i some cases and providing a mechanism async to tentant's guest image requires additional storage on the provider's side (plus providing igvm support in the hypervisor which is non-trivial given igvm specs).

So we propose an idea where the tenants bring their own firmware with known measurements using UKI (which can also embed kernel, inirtrd, command line all together). The VM initially boots with cloud provider's firmware. The measurements calculated with this initial boot will fail because its not running the known firmware image, known to the tenant. So UKI loads the launch digests (firmware, kernel, initrd etc) into the guest memory and uses our simpler hypervisor interface to request a VM reset with these launch digests. The hypervisor resets the VM, moving launch digests to target vm memory (if required or requested by the guest) and in this second boot, the vm runs launch digests with known measurements and does not use cloud provider's firmware image. The attestation will pass and secrets will be unloacked within the vm.

I intend to send a proposal to FOSDEM 2025 to describe this idea more from systemd/UKI perspective.

@ani-sinha

Copy link
Copy Markdown
Contributor Author
2024-11-08T09:53:43.4924083Z [   83.498469] TEST-70-TPM2.sh[663]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-70-TPM2.cryptenroll.sh failed'
[FAILED] Failed to start TEST-86-MULTI-PROFILE-UKI.service - TEST-86-MULTI-PROFILE-UKI.

I see failures like this but I am not sure where the details logs can be seen and what exactly failed.

@daandemeyer

Copy link
Copy Markdown
Collaborator

@ani-sinha Probably need to update systemd-pcrlock as well

@poettering

Copy link
Copy Markdown
Member

Hmm, so what further patches will be coming? i.e. how will this section be actually consumed?

i.e. is sd-stub going to apply the firmware and reboot, or which part of the codebase is going to do that?

(I am not sure pcrlock needs to know about this thing btw, it kinda already locks to the firmware via PCR 4, and at the point pcrlock sees this the firmware should already be in effect. I mean, this is a PE section that userspace will not consume nor see. It's like the bootsplash from .splash in that regard, which is purely an sd-stub thing, and userspace really doesn't have to care anymore.)

@ani-sinha

Copy link
Copy Markdown
Contributor Author

Hmm, so what further patches will be coming? i.e. how will this section be actually consumed?

Basic changes that would be required is:

  • Load the firmware and other launch digests (kernel, initrd etc) into guest memory.
  • For QEMU, add ability to detect availability of fw_cfg device. Code already exists in edk2 so we intend to go along the same lines.
  • For QEMU, use fw_cfg hypervisor provided files (we will implement that in QEMU) to pass to the hypervisor address ranges of the launch digests and some other details (like where to copy the fw blob to so that upon cpu reset, the vm starts from the correct firmware address).

i.e. is sd-stub going to apply the firmware and reboot, or which part of the codebase is going to do that?

We will need to add the code, its not there yet. t will take some time to incrementally add the functionality.
Some hacky way to do this is in this patch implemented by herald for KVM Forum demo purposes to show the idea:
agraf@539c798

@haraldh

haraldh commented Nov 9, 2024

Copy link
Copy Markdown
Member

Yeah, sd-stub is talking to the Hypervisor via the qemu-fwcfg interface, which then will reset the VM, starts the new FW which directly boots into the kernel+initramfs+cmdline, which is still in memory.

@ani-sinha

Copy link
Copy Markdown
Contributor Author

The failure logs that I got from the failed tests show:

[   83.048463] systemd-cryptenroll[2096]: New TPM2 token enrolled as key slot 1.
[   83.053655] TEST-70-TPM2.sh[684]: + systemd-cryptenroll --unlock-tpm2-device=auto --recovery-key /tmp/systemd-cryptenroll-X8A.image
[   83.061109] systemd-cryptenroll[2129]: Automatically discovered security TPM2 token unlocks volume.
[   83.496074] systemd-cryptenroll[2129]: Failed to unseal secret using TPM2: No such device or address
[   83.496171] systemd-cryptenroll[2129]: Unlocking via TPM2 device failed: No such device or address
[   83.498241] TEST-70-TPM2.sh[663]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/TEST-70-TPM2.cryptenroll.sh failed'
[   83.498241] TEST-70-TPM2.sh[663]: Subtest /usr/lib/systemd/tests/testdata/units/TEST-70-TPM2.cryptenroll.sh failed
[   83.498241] TEST-70-TPM2.sh[663]: + return 1


[   17.686494] systemd-cryptenroll[657]: New TPM2 token enrolled as key slot 1.
[   17.688457] TEST-86-MULTI-PROFILE-UKI.sh[646]: + rm -f /root/encrypted.secret
[   17.690151] TEST-86-MULTI-PROFILE-UKI.sh[646]: + systemd-cryptsetup attach multiprof /root/encrypted.raw - tpm2-device=auto,headless=1
[   18.153892] systemd-cryptsetup[659]: Failed to unseal secret using TPM2: No such device or address
[   18.154254] systemd-cryptsetup[659]: Set cipher aes, mode xts-plain64, key size 512 bits for device /root/encrypted.raw.
[   18.608580] systemd-cryptsetup[659]: Failed to unseal secret using TPM2: No such device or address
[   18.608769] systemd-cryptsetup[659]: No TPM2 metadata matching the current system state found in LUKS2 header, falling back to traditional u>
[   18.610036] systemd-cryptsetup[659]: Password querying disabled via 'headless' option.
[   18.612149] systemd[1]: TEST-86-MULTI-PROFILE-UKI.service: Main process exited, code=exited, status=1/FAILURE

Is this related to my changes?

@bluca

bluca commented Nov 9, 2024

Copy link
Copy Markdown
Member

Yes, it's something in this PR

@bluca bluca added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR and removed please-review PR is ready for (re-)review by a maintainer labels Nov 9, 2024
@ani-sinha

Copy link
Copy Markdown
Contributor Author

Yes, it's something in this PR

Any help where to investigate/what to change will be appretiated.

@haraldh

haraldh commented Nov 11, 2024

Copy link
Copy Markdown
Member

maybe the changes to ‎src/boot/measure.c have to be included, too.

https://github.com/agraf/systemd/blob/539c7987fee3cb585d044d33277cc9837c98fc0d/src/boot/measure.c#L193

@github-actions github-actions Bot added please-review PR is ready for (re-)review by a maintainer and removed ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR labels Nov 11, 2024
@ani-sinha

Copy link
Copy Markdown
Contributor Author

Is this also related to my change?

58s fatal: unable to access 'https://salsa.debian.org/systemd-team/systemd.git/': CONNECT tunnel failed, response 403
 73s Cloning into 'systemd'...
 73s fatal: unable to access 'https://salsa.debian.org/systemd-team/systemd.git/': CONNECT tunnel failed, response 403
 73s autopkgtest [08:59:42]: ERROR: erroneous package: rules extract failed with exit code 128

@bluca

bluca commented Nov 11, 2024

Copy link
Copy Markdown
Member

No, datacenter issues, you can ignore those

@poettering

Copy link
Copy Markdown
Member

adding such a concept sounds ok, but I'd like to see spec'ed out in more detail.

  1. there should be an accompanying patch to the uki spec (https://github.com/uapi-group/specifications/blob/main/specs/unified_kernel_image.md)
  2. i really would like this to be something reasonably generic, i.e. that instead of sd-stub carrying qemu-specific code we talk to some generic uefi protocol which does the right thing.
  3. this PR makes no attempt whatsoever to actually use the data and leaves open entirely what specifically the format of the data is. We should clarify this. Are these uefi update capsules? (i'd love that, because so generic
  4. in order to trigger the update sd-stub would have to detect if the current firmware matches the current firmware already. how is that supposed to take place? how would we determine some version id from the included blob, and how woul we determine the version id of the firmware we are currently booted with? will we need some metainfo embedded into the UKi for that?

@keszybz keszybz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. We need the other parts for this to be viable, but this part here looks OK.

Comment thread src/ukify/ukify.py Outdated
@ani-sinha

Copy link
Copy Markdown
Contributor Author

adding such a concept sounds ok, but I'd like to see spec'ed out in more detail.

  1. there should be an accompanying patch to the uki spec (https://github.com/uapi-group/specifications/blob/main/specs/unified_kernel_image.md)

OK let me send the PR against this repo.

  1. i really would like this to be something reasonably generic, i.e. that instead of sd-stub carrying qemu-specific code we talk to some generic uefi protocol which does the right thing.

We need to design this

  1. this PR makes no attempt whatsoever to actually use the data and leaves open entirely what specifically the format of the data is. We should clarify this.

Yes this is intentional. The format is completely determined by the guest and would be opaque as far as UKI is concerned.
The job of UKI would be to put this in the guest memory and invoke hypervisor API.

Are these uefi update capsules? (i'd love that, because so generic

  1. in order to trigger the update sd-stub would have to detect if the current firmware matches the current firmware already. how is that supposed to take place? how would we determine some version id from the included blob, and how woul we determine the version id of the firmware we are currently booted with? will we need some metainfo embedded into the UKi for that?

We intended to use DMI for it. There is already detect_vm_dmi_vendor() etc that can be reused?

cc: @haraldh @agraf

@ani-sinha

Copy link
Copy Markdown
Contributor Author

4. will we need some metainfo embedded into the UKi for that?

Maybe this is something that would be worth having in the future.

@ani-sinha

Copy link
Copy Markdown
Contributor Author

There is already detect_vm_dmi_vendor() etc that can be reused?

Grr! That is userland stuff. But smbios.c and chid.c should have something that we could use.

@ani-sinha

Copy link
Copy Markdown
Contributor Author

@kraxel

  1. will we need some metainfo embedded into the UKi for that?

Maybe this is something that would be worth having in the future.

@kraxel Please feel free to comment.

@ani-sinha

ani-sinha commented Jan 25, 2025

Copy link
Copy Markdown
Contributor Author

It would be nice to have some sort of unit test to excerize pe.c routines. I do not know enough yet to write one. Seems there isn't such a test already. Maybe @anonymix007 van help?

@anonymix007

Copy link
Copy Markdown
Contributor

There's already a CHID matching test, I guess a "PE lookup" one would look very similar. It doesn't look like having unit tests is worth it though. I'll try to draft something later.

@anonymix007

Copy link
Copy Markdown
Contributor

@ani-sinha: https://github.com/anonymix007/systemd/commits/pe-test/
You may pick these 4 commits to your fw-support branch. Tests are passing (at least locally) without the last commit, that is, the .dtbauto section look-up tests. Looks like you'll have to find what's wrong and fix it.

@ani-sinha

Copy link
Copy Markdown
Contributor Author

https://github.com/anonymix007/systemd/commits/pe-test/

@anonymix007 locally mkosi is failing as its not able to find dtc with your third commit. I added it in mkosi.conf but still I get:

Checking if "linker supports LTO with -nostdlib" : links: YES 
Program dtc found: NO

src/boot/meson.build:463:14: ERROR: Program 'dtc' not found or not executable

@ani-sinha

Copy link
Copy Markdown
Contributor Author

https://github.com/anonymix007/systemd/commits/pe-test/

@anonymix007 locally mkosi is failing as its not able to find dtc with your third commit. I added it in mkosi.conf but still I get:

Checking if "linker supports LTO with -nostdlib" : links: YES 
Program dtc found: NO

src/boot/meson.build:463:14: ERROR: Program 'dtc' not found or not executable
$ git diff
diff --git a/mkosi.conf b/mkosi.conf
index 559901dfff..145164ab34 100644
--- a/mkosi.conf
+++ b/mkosi.conf
@@ -92,6 +92,7 @@ Packages=
         diffutils
         dnsmasq
         dosfstools
+        dtc
         e2fsprogs
         findutils
         gdb
diff --git a/mkosi.conf.d/05-tools/mkosi.conf b/mkosi.conf.d/05-tools/mkosi.conf
index 6656cee287..e606e3a4f1 100644
--- a/mkosi.conf.d/05-tools/mkosi.conf
+++ b/mkosi.conf.d/05-tools/mkosi.conf
@@ -2,6 +2,7 @@
 
 [Build]
 ToolsTreePackages=
+        dtc
         gcc
         gdb
         gperf
fc40-anivm:systemd-ani anisinha$ rpm -q dtc
dtc-1.7.0-7.fc40.aarch64

@ani-sinha

ani-sinha commented Jan 27, 2025

Copy link
Copy Markdown
Contributor Author

https://github.com/anonymix007/systemd/commits/pe-test/

@anonymix007 locally mkosi is failing as its not able to find dtc with your third commit. I added it in mkosi.conf but still I get:

Checking if "linker supports LTO with -nostdlib" : links: YES 
Program dtc found: NO

src/boot/meson.build:463:14: ERROR: Program 'dtc' not found or not executable
$ git diff
diff --git a/mkosi.conf b/mkosi.conf
index 559901dfff..145164ab34 100644
--- a/mkosi.conf
+++ b/mkosi.conf
@@ -92,6 +92,7 @@ Packages=
         diffutils
         dnsmasq
         dosfstools
+        dtc
         e2fsprogs
         findutils
         gdb
diff --git a/mkosi.conf.d/05-tools/mkosi.conf b/mkosi.conf.d/05-tools/mkosi.conf
index 6656cee287..e606e3a4f1 100644
--- a/mkosi.conf.d/05-tools/mkosi.conf
+++ b/mkosi.conf.d/05-tools/mkosi.conf
@@ -2,6 +2,7 @@
 
 [Build]
 ToolsTreePackages=
+        dtc
         gcc
         gdb
         gperf
fc40-anivm:systemd-ani anisinha$ rpm -q dtc
dtc-1.7.0-7.fc40.aarch64

I added it here and I am able to get past it

diff --git a/mkosi.images/build/mkosi.conf b/mkosi.images/build/mkosi.conf
index 8a67c76ee5..6744499870 100644
--- a/mkosi.images/build/mkosi.conf
+++ b/mkosi.images/build/mkosi.conf
@@ -3,6 +3,7 @@
 [Content]
 Packages=
         clang
+        dtc
         lld
         llvm

Program dtc found: YES (/usr/bin/dtc)

Now I am seeing this:

+ /usr/bin/meson compile -C /work/build -j 4
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /work/build -j 4
ninja: Entering directory `/work/build'
ninja: error: '/work/build/src/boot/linuxaa64.efi.stub', needed by 'src/boot/pe.efi', missing and no known rule to make it
error: Bad exit status from /var/tmp/rpm-tmp.eLQSA3 (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.eLQSA3 (%build)

I also see this:

src/boot/meson.build:486: WARNING: Source item '/work/build/src/boot/linuxaa64.efi.stub' cannot be converted to File object, because it is a generated file. This will become a hard error in meson 2.0

Anyway I think your patches needs more work. Why don't you merge them except the efifw stuff then I can add efifw specific tests.

@ani-sinha ani-sinha mentioned this pull request Jan 27, 2025
@ani-sinha

Copy link
Copy Markdown
Contributor Author

https://github.com/anonymix007/systemd/commits/pe-test/

@anonymix007 locally mkosi is failing as its not able to find dtc with your third commit. I added it in mkosi.conf but still I get:

Checking if "linker supports LTO with -nostdlib" : links: YES 
Program dtc found: NO

src/boot/meson.build:463:14: ERROR: Program 'dtc' not found or not executable
$ git diff
diff --git a/mkosi.conf b/mkosi.conf
index 559901dfff..145164ab34 100644
--- a/mkosi.conf
+++ b/mkosi.conf
@@ -92,6 +92,7 @@ Packages=
         diffutils
         dnsmasq
         dosfstools
+        dtc
         e2fsprogs
         findutils
         gdb
diff --git a/mkosi.conf.d/05-tools/mkosi.conf b/mkosi.conf.d/05-tools/mkosi.conf
index 6656cee287..e606e3a4f1 100644
--- a/mkosi.conf.d/05-tools/mkosi.conf
+++ b/mkosi.conf.d/05-tools/mkosi.conf
@@ -2,6 +2,7 @@
 
 [Build]
 ToolsTreePackages=
+        dtc
         gcc
         gdb
         gperf
fc40-anivm:systemd-ani anisinha$ rpm -q dtc
dtc-1.7.0-7.fc40.aarch64

I added it here and I am able to get past it

diff --git a/mkosi.images/build/mkosi.conf b/mkosi.images/build/mkosi.conf
index 8a67c76ee5..6744499870 100644
--- a/mkosi.images/build/mkosi.conf
+++ b/mkosi.images/build/mkosi.conf
@@ -3,6 +3,7 @@
 [Content]
 Packages=
         clang
+        dtc
         lld
         llvm
Program dtc found: YES (/usr/bin/dtc)

Now I am seeing this:

+ /usr/bin/meson compile -C /work/build -j 4
INFO: autodetecting backend as ninja
INFO: calculating backend command to run: /usr/bin/ninja -C /work/build -j 4
ninja: Entering directory `/work/build'
ninja: error: '/work/build/src/boot/linuxaa64.efi.stub', needed by 'src/boot/pe.efi', missing and no known rule to make it
error: Bad exit status from /var/tmp/rpm-tmp.eLQSA3 (%build)

RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.eLQSA3 (%build)

I also see this:

src/boot/meson.build:486: WARNING: Source item '/work/build/src/boot/linuxaa64.efi.stub' cannot be converted to File object, because it is a generated file. This will become a hard error in meson 2.0

Anyway I think your patches needs more work. Why don't you merge them except the efifw stuff then I can add efifw specific tests.

I have removed efifw stuff and rebased the changes on top of main, built it locally and pushed it here: #36188
Lets see how it fares in CI.

Comment thread src/boot/efifirmware.c Outdated
Comment thread src/boot/efifirmware.c Outdated
Comment thread src/boot/efifirmware.c Outdated
Comment thread src/boot/efifirmware.c Outdated
Comment thread src/boot/efifirmware.c Outdated
Comment thread src/boot/efifirmware.h Outdated
@poettering

Copy link
Copy Markdown
Member

lgtm, i guess. (still find naming pointer variables "offset" a bit weird...)

@ani-sinha

Copy link
Copy Markdown
Contributor Author

lgtm, i guess. (still find naming pointer variables "offset" a bit weird...)

Oh I thought you had objections against using off as opposed to offset. Removed it in latest changes.

p-b-o pushed a commit to p-b-o/qemu-ci that referenced this pull request Jan 29, 2025
VM firmware update is a mechanism where the virtual machines can use their
preferred and trusted firmware image in their execution environment without
having to depend on a untrusted party to provide the firmware bundle. This is
particularly useful for confidential virtual machines that are deployed in the
cloud where the tenant and the cloud provider are two different entities. In
this scenario, virtual machines can bring their own trusted firmware image
bundled as a part of their filesystem (using UKIs for example[1]) and then use
this hypervisor interface to update to their trusted firmware image. This also
allows the guests to have a consistent measurements on the firmware image.

This change introduces support for the fw-cfg based hypervisor interface
and the corresponding device.The change also includes the
specification document for this interface. The interface is made generic
enough so that guests are free to use their own ABI to pass required
information between initial and trusted execution contexts (where they are
running their own trusted firmware image) without the hypervisor getting
involved in between. For pc machines, it implements support for
copying firmware image from the guest source physical address specifiied
by the guest where the guest loaded the next stage firmware.

Currently, this device is only supported for pc machines. Hence, the device
is not initialized for other machine types. Trying to initialize it
for arm for example will lead to failure:

$ ./qemu-system-arm -device vmfwupdate -machine virt
qemu-system-arm: -device vmfwupdate: This machine does not support vmfwupdate device

Functional and qtests has been added to test basic device operations and fw-cfg
files.

[1] See systemd pull requests systemd/systemd#35091
and systemd/systemd#35281 for some discussions on
how we can bundle firmware image within an UKI.

CC: Alex Graf <graf@amazon.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gerd Hoffman <kraxel@redhat.com>
CC: Igor Mammedov <imammedo@redhat.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
p-b-o pushed a commit to p-b-o/qemu-ci that referenced this pull request Jan 29, 2025
https://lore.kernel.org/qemu-devel/20250129063153.3967220-1-anisinha@redhat.com

---

From: Ani Sinha <anisinha@redhat.com>
To: Ani Sinha <anisinha@redhat.com>, Alex Graf <graf@amazon.com>,
 Paolo Bonzini <pbonzini@redhat.com>, Eduardo Habkost <eduardo@habkost.net>,
 Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
 =?UTF-8?q?Philippe=20Mathieu-Daud=C3=A9?= <philmd@linaro.org>,
 Yanan Wang <wangyanan55@huawei.com>, Zhao Liu <zhao1.liu@intel.com>,
 Richard Henderson <richard.henderson@linaro.org>,
 "Michael S. Tsirkin" <mst@redhat.com>, Fabiano Rosas <farosas@suse.de>,
 Laurent Vivier <lvivier@redhat.com>
Cc: Gerd Hoffman <kraxel@redhat.com>, Igor Mammedov <imammedo@redhat.com>,
 Vitaly Kuznetsov <vkuznets@redhat.com>, qemu-devel@nongnu.org
Subject: [PATCH v5] hw/misc/vmfwupdate: Introduce hypervisor fw-cfg interface
 support
Date: Wed, 29 Jan 2025 12:01:47 +0530
Message-ID: <20250129063153.3967220-1-anisinha@redhat.com>
X-Mailer: git-send-email 2.45.2
MIME-Version: 1.0
Content-Type: text/plain; charset=y
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=170.10.133.124; envelope-from=anisinha@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -33
X-Spam_score: -3.4
X-Spam_bar: ---
X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.3,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=0.001,
 RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

VM firmware update is a mechanism where the virtual machines can use their
preferred and trusted firmware image in their execution environment without
having to depend on a untrusted party to provide the firmware bundle. This is
particularly useful for confidential virtual machines that are deployed in the
cloud where the tenant and the cloud provider are two different entities. In
this scenario, virtual machines can bring their own trusted firmware image
bundled as a part of their filesystem (using UKIs for example[1]) and then use
this hypervisor interface to update to their trusted firmware image. This also
allows the guests to have a consistent measurements on the firmware image.

This change introduces support for the fw-cfg based hypervisor interface
and the corresponding device.The change also includes the
specification document for this interface. The interface is made generic
enough so that guests are free to use their own ABI to pass required
information between initial and trusted execution contexts (where they are
running their own trusted firmware image) without the hypervisor getting
involved in between. For pc machines, it implements support for
copying firmware image from the guest source physical address specifiied
by the guest where the guest loaded the next stage firmware.

Currently, this device is only supported for pc machines. Hence, the device
is not initialized for other machine types. Trying to initialize it
for arm for example will lead to failure:

$ ./qemu-system-arm -device vmfwupdate -machine virt
qemu-system-arm: -device vmfwupdate: This machine does not support vmfwupdate device

Functional and qtests has been added to test basic device operations and fw-cfg
files.

[1] See systemd pull requests systemd/systemd#35091
and systemd/systemd#35281 for some discussions on
how we can bundle firmware image within an UKI.

CC: Alex Graf <graf@amazon.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gerd Hoffman <kraxel@redhat.com>
CC: Igor Mammedov <imammedo@redhat.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Ani Sinha <anisinha@redhat.com>
---
 MAINTAINERS                         |  11 ++
 docs/specs/index.rst                |   1 +
 docs/specs/vmfwupdate.rst           | 119 ++++++++++++++++
 hw/core/machine.c                   |   2 +
 hw/i386/pc.c                        |  55 ++++++++
 hw/misc/meson.build                 |   2 +
 hw/misc/vmfwupdate.c                | 212 ++++++++++++++++++++++++++++
 include/hw/misc/vmfwupdate.h        | 105 ++++++++++++++
 tests/functional/meson.build        |   2 +
 tests/functional/test_vmfwupdate.py |  82 +++++++++++
 tests/qtest/meson.build             |   1 +
 tests/qtest/vmfwupdate-test.c       |  67 +++++++++
 12 files changed, 659 insertions(+)
 create mode 100644 docs/specs/vmfwupdate.rst
 create mode 100644 hw/misc/vmfwupdate.c
 create mode 100644 include/hw/misc/vmfwupdate.h
 create mode 100644 tests/functional/test_vmfwupdate.py
 create mode 100644 tests/qtest/vmfwupdate-test.c

changelogs:
v5: Alex's input, add qtest and functional test. guest reset full
support for x86. More testing is required but will need support of
DMA based fw-cfw file write capability. This has been added in the
patchset https://patchwork.ozlabs.org/project/qemu-devel/list/?series=441003
Cover letter: https://patchwork.ozlabs.org/project/qemu-devel/cover/20250120043847.954881-1-anisinha@redhat.com/ .
For more comprehensive testing of this patch, I am requesting inputs and
suggestions.

CI pipeline is green - so no regressions.
https://gitlab.com/anisinha/qemu/-/pipelines/1646036807

v4: remove delay in functional test. Not needed now.
v3: inputs from Gerd and Phil taken into account. One basic functional
test added. Spec doc updated as per Gerd's suggestions.
v2: do not allow changing bios region if advertized size is 0 (non-pc
platforms).

diff --git a/MAINTAINERS b/MAINTAINERS
index 7be3d8f..370bd4d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2559,6 +2559,17 @@ F: include/hw/acpi/vmgenid.h
 F: docs/specs/vmgenid.rst
 F: tests/qtest/vmgenid-test.c

+VM Firmware Update
+M: Ani Sinha <anisinha@redhat.com>
+M: Alex Graf <graf@amazon.com>
+M: Paolo Bonzini <pbonzini@redhat.com>
+S: Maintained
+F: hw/misc/vmfwupdate.c
+F: include/hw/misc/vmfwupdate.h
+F: docs/specs/vmfwupdate.rst
+F: tests/qtest/vmfwupdate-test.c
+F: tests/functional/test_vmfwupdate.py
+
 LED
 M: Philippe Mathieu-Daudé <philmd@linaro.org>
 S: Maintained
diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index d7675ce..8d78b64 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -34,6 +34,7 @@ guest hardware that is specific to QEMU.
    virt-ctlr
    vmcoreinfo
    vmgenid
+   vmfwupdate
    rapl-msr
    rocker
    riscv-iommu
diff --git a/docs/specs/vmfwupdate.rst b/docs/specs/vmfwupdate.rst
new file mode 100644
index 0000000000..fbe8f18
--- /dev/null
+++ b/docs/specs/vmfwupdate.rst
@@ -0,0 +1,119 @@
+VMFWUPDATE INTERFACE SPECIFICATION
+##################################
+
+Introduction
+************
+
+``Vmfwupdate`` is an extension to ``fw-cfg`` that allows guests to replace early boot
+code in their virtual machine. Through a combination of vmfwupdate and
+hypervisor stack knowledge, guests can deterministically replace the launch
+payload for guests. This is useful for environments like SEV-SNP where the
+launch payload becomes the launch digest. Guests can use vmfwupdate to provide
+a measured, full guest payload (BIOS image, kernel, initramfs, kernel
+command line) to the virtual machine which enables them to easily reason about
+integrity of the resulting system.
+For more information, please see the `KVM Forum 2024 presentation <KVMFORUM_>`__
+about this work from the authors [1]_.
+
+
+.. _KVMFORUM: https://www.youtube.com/watch?v=VCMBxU6tAto
+
+Base Requirements
+*****************
+
+#. **fw-cfg**:
+     The target system must provide a ``fw-cfg`` interface. For x86 based
+     environments, this ``fw-cfg`` interface must be accessible through PIO ports
+     0x510 and 0x511. The ``fw-cfg`` interface does not need to be announced as part
+     of system device tables such as DSDT. The ``fw-cfg`` interface must support the
+     DMA interface. It may only support the DMA interface for write operations.
+
+#. **BIOS region**:
+     The hypervisor must provide a BIOS region which may be
+     statically sized. Through vmfwupdate, the guest is able to atomically replace
+     its contents. The BIOS region must be mapped as read-write memory. In a
+     SEV-SNP environment, the BIOS region must be mapped as private memory at
+     launch time.
+
+Fw-cfg Files
+************
+
+Guests drive vmfwupdate through special ``fw-cfg`` files that control its flow
+followed by a standard system reset operation. When vmfwupdate is available,
+it provides the following ``fw-cfg`` files:
+
+* ``vmfwupdate/cap`` (``u64``) - Read-only Little Endian encoded bitmap of additional
+  capabilities the interface supports. List of available capabilities:
+
+     ``VMFWUPDATE_CAP_BIOS_RESIZE        0x0000000000000001``
+
+* ``vmfwupdate/bios-size`` (``u64``) - Little Endian encoded size of the BIOS region.
+  Read-only by default. Optionally Read-write if ``vmfwupdate/cap`` contains
+  ``VMFWUPDATE_CAP_BIOS_RESIZE``. On write, the BIOS region may resize. Guests are
+  required to read the value after writing and compare it with the requested size
+  to determine whether the resize was successful. Note, x86 BIOS regions always
+  start at 4GiB - bios-size.
+
+* ``vmfwupdate/opaque`` (``4096 bytes``) - A 4 KiB buffer that survives the BIOS replacement
+  flow. Can be used by the guest to propagate guest physical addresses of payloads
+  to its BIOS stage. It’s recommended to make the new BIOS clear this file on boot
+  if it exists. Contents of this file are under control by the hypervisor. In an
+  environment that considers the hypervisor outside of its trust boundary, guests
+  are advised to validate its contents before consumption.
+
+* ``vmfwupdate/disable`` (``u8``) - Indicates whether the interface is disabled.
+  Returns 0 for enabled, 1 for disabled. Writing any value disables it. Writing is
+  only allowed if the value is 0. When the interface is disabled, the replace file
+  is ignored on reset. This value resets to 0 on system reset.
+
+* ``vmfwupdate/bios-addr`` (``u64``) - A 64bit Little Endian encoded guest physical address
+  at the beginning of the replacement BIOS region. The provided payload must reside
+  in shared memory. 0 on system reset.
+
+
+Triggering the Firmware Update
+******************************
+
+To initiate the firmware update process, the guest issues a standard system reset
+operation through any of the means implemented by the machine model.
+
+On reset, the hypervisor evaluates whether ``vmfwupdate/disable`` is ``1``. If it is, it ignores
+any other vmfwupdate values and performs a standard system reset.
+
+If ``vmfwupdate/disable`` is ``0``, the hypervisor checks if bios-addr is ``0``. If it is, it
+performs a standard system reset.
+
+If ``vmfwupdate/bios-addr`` is ``non-0``, the hypervisor replaces the contents of the system’s
+BIOS region with the guest physically contiguous ``vmfwupdate/bios-size`` sized payload at the
+guest physical address address vmfwupdate/bios-addr.
+
+The firmware update mechanism works both for confidential and non-confidential
+guests. In confidential guests, as a part of the reset operation, all existing
+guest shared memory (shared with the hypervisor) as well as the ``vmfwupdate/opaque`` file
+are preserved. The reset causes recreation of the VM context which triggers a fresh
+measurement of the replaced BIOS region and reset CPU state [2]_ .
+For non-confidential guests, there is no concept of guest private memory and all the existing
+guest memory is preserved (this is the default behavior today - QEMU does not reset/clear
+guest memory upon reset).
+
+In both confidential and non-confidential cases, CPU and device state are reset to
+the default hypervisor specific reset states. In confidential environments, the guest
+always resumes operation in the highest privileged mode available to it (VMPL0 in SEV-SNP).
+
+Closing Remarks
+***************
+The handover protocol (format of the ``vmwupdate/opaque`` file etc.) will be implemented by
+the firmware loader and firmware image, both provided by the guest.  The hypervisor does
+not need to know these details, so it is not included in this specification.
+
+
+
+Footnotes:
+^^^^^^^^^^
+.. [1] Original author of the specification: *Alex Graf <graf@amazon.com>*,
+       converted to re-structured-text (rst format) and slightly edited
+       by *Ani Sinha <anisinha@redhat.com>*.
+.. [2] Currently SEV-SNP guests do not support reset. Upon reset, the instance is
+       terminated and a new instance must be created with new VM confidential context.
+       Work is being done currently to support resetting SEV-SNP guests with a new
+       confidential/SEV context after reset.
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c23b399..0eaf8aa 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -34,6 +34,7 @@
 #include "hw/virtio/virtio-pci.h"
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/misc/vmfwupdate.h"
 #include "audio/audio.h"

 GlobalProperty hw_compat_9_2[] = {
@@ -252,6 +253,7 @@ GlobalProperty hw_compat_2_8[] = {
     { "virtio-pci", "x-pcie-pm-init", "off" },
     { "cirrus-vga", "vgamem_mb", "8" },
     { "isa-cirrus-vga", "vgamem_mb", "8" },
+    {TYPE_VMFWUPDATE, "disable", "1"},
 };
 const size_t hw_compat_2_8_len = G_N_ELEMENTS(hw_compat_2_8);

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index b46975c..5ae7d56 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -53,6 +53,7 @@
 #include "hw/usb.h"
 #include "hw/i386/intel_iommu.h"
 #include "hw/net/ne2000-isa.h"
+#include "hw/misc/vmfwupdate.h"
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/virtio/virtio-md-pci.h"
 #include "hw/i386/kvm/xen_overlay.h"
@@ -1719,10 +1720,64 @@ static void pc_machine_initfn(Object *obj)
     qemu_add_machine_init_done_notifier(&pcms->machine_done);
 }

+static void handle_vmfwupd_reset(MachineState *machine,
+                                 ResetType type, VMFwUpdateState *vmfw)
+{
+    X86MachineState *x86ms = X86_MACHINE(machine);
+    void *biosmem = memory_region_get_ram_ptr(&x86ms->bios);
+    uint64_t bios_size = memory_region_size(&x86ms->bios);
+
+    if (type != RESET_TYPE_COLD) {
+        return;
+    }
+
+    if (vmfw->disable) {
+        return;
+    }
+
+    if (!vmfw->fw_blob.bios_paddr) {
+        return;
+    }
+
+    if (!vmfw->fw_blob.bios_size) {
+        return;
+    }
+
+    g_assert(!(vmfw->fw_blob.bios_size % 65536));
+    g_assert(vmfw->fw_blob.bios_size <= vmfw->plat_bios_size);
+
+    /*
+     * bios memory region initialization will need to be performed here
+     * if bios_size < vfw->plat_bios_size. We may need to call
+     * memory_region_init_ram() or memory_region_init_ram_guest_memfd()
+     * to initialize a new bios memory region.
+     */
+
+    /*
+     * Read new BIOS from guest RAM into the BIOS region.
+     */
+    cpu_physical_memory_read(vmfw->fw_blob.bios_paddr,
+                             biosmem + bios_size - vmfw->fw_blob.bios_size,
+                             vmfw->fw_blob.bios_size);
+    x86_firmware_configure(0x100000000ULL - vmfw->fw_blob.bios_size,
+                           biosmem, vmfw->fw_blob.bios_size);
+}
+
 static void pc_machine_reset(MachineState *machine, ResetType type)
 {
     CPUState *cs;
     X86CPU *cpu;
+    VMFwUpdateState *vmfw = vmfwupdate_find();
+
+    /*
+     * When vmfwupdate device is present, handle reset actions for
+     * this firmware update device. The reset operations are
+     * defined in the device specification document. See
+     * docs/specs/vmfwupdate.rst.
+     */
+    if (vmfw) {
+        handle_vmfwupd_reset(machine, type, vmfw);
+    }

     qemu_devices_reset(type);

diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 55f4935..e806bf4 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -150,6 +150,8 @@ specific_ss.add(when: 'CONFIG_MAC_VIA', if_true: files('mac_via.c'))
 specific_ss.add(when: 'CONFIG_MIPS_CPS', if_true: files('mips_cmgcr.c', 'mips_cpc.c'))
 specific_ss.add(when: 'CONFIG_MIPS_ITU', if_true: files('mips_itu.c'))

+specific_ss.add(when: 'CONFIG_FW_CFG_DMA', if_true: files('vmfwupdate.c'))
+
 system_ss.add(when: 'CONFIG_SBSA_REF', if_true: files('sbsa_ec.c'))

 # HPPA devices
diff --git a/hw/misc/vmfwupdate.c b/hw/misc/vmfwupdate.c
new file mode 100644
index 0000000000..93474ff
--- /dev/null
+++ b/hw/misc/vmfwupdate.c
@@ -0,0 +1,212 @@
+/*
+ * Guest driven VM boot component update device
+ * For details and specification, please look at docs/specs/vmfwupdate.rst.
+ *
+ * Copyright (C) 2025 Red Hat, Inc.
+ *
+ * Authors: Ani Sinha <anisinha@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "system/reset.h"
+#include "hw/nvram/fw_cfg.h"
+#include "hw/i386/pc.h"
+#include "hw/qdev-properties.h"
+#include "hw/misc/vmfwupdate.h"
+#include "qemu/error-report.h"
+
+/*
+ * the following is the list of machines currently
+ * supporting this device.
+ * If a new machine is added in this list, the
+ * corresponding vm/machine reset operations must also
+ * be implemented. Please see pc_machine_reset() ->
+ * handle_vmfwupd_reset() as an example. The reset
+ * implementation must adhere to the device spec.
+ */
+static const char *supported_machines[] = {
+    TYPE_X86_MACHINE,
+    NULL,
+};
+
+static const char *vmfwupdate_supported(void)
+{
+    MachineState *ms = MACHINE(qdev_get_machine());
+    const char **machine = supported_machines;
+    while (*machine) {
+        if (object_dynamic_cast(OBJECT(ms), *machine)) {
+            return *machine;
+        }
+        machine++;
+    }
+    return NULL;
+}
+
+static uint64_t get_bios_size(void)
+{
+    Object *m_obj = qdev_get_machine();
+    MachineState *ms = MACHINE(m_obj);
+    X86MachineState *x86ms;
+
+    if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) {
+        x86ms = X86_MACHINE(ms);
+        /*
+         * for pc machines, return the current size of the bios memory region.
+         */
+        return memory_region_size(&x86ms->bios);
+    } else {
+        /*
+         * for other machine types and platforms, return 0 for now.
+         * non-pc machines are currently not supported anyway.
+         */
+        return 0;
+    }
+}
+
+static void fw_blob_write(void *dev, off_t offset, size_t len)
+{
+    VMFwUpdateState *s = VMFWUPDATE(dev);
+
+    /* for non-pc platform, we do not allow changing bios_size yet */
+    if (!s->plat_bios_size) {
+        return;
+    }
+
+    /*
+     * in order to change the bios size, appropriate capability
+     * must be enabled
+     */
+    if (s->fw_blob.bios_size &&
+        !(s->capability & VMFWUPDATE_CAP_BIOS_RESIZE)) {
+        warn_report("vmfwupdate: VMFWUPDATE_CAP_BIOS_RESIZE not enabled");
+        return;
+    }
+
+    /*
+     * For now, we do not let the guest resize the bios size to a value
+     * larger than the size of the memory region that holds the current image.
+     * If the size is larger, we may have to reinitialize the bios
+     * memory region. For pc, see x86_bios_rom_init().
+     */
+    if (s->fw_blob.bios_size > get_bios_size()) {
+        warn_report("vmfwupdate: bios size cannot be larger than %" PRIu64,
+                    get_bios_size());
+        return;
+    }
+
+    s->plat_bios_size = s->fw_blob.bios_size;
+
+    return;
+}
+
+static void vmfwupdate_realize(DeviceState *dev, Error **errp)
+{
+    VMFwUpdateState *s = VMFWUPDATE(dev);
+    FWCfgState *fw_cfg = fw_cfg_find();
+
+    /* multiple devices are not supported */
+    if (!vmfwupdate_find()) {
+        error_setg(errp, "at most one %s device is permitted",
+                   TYPE_VMFWUPDATE);
+        return;
+    }
+
+    /* if current machine is not supported, do not initialize */
+    if (!vmfwupdate_supported()) {
+        error_setg(errp, "This machine does not support vmfwupdate device");
+        return;
+    }
+
+    /* fw_cfg with DMA support is necessary to support this device */
+    if (!fw_cfg || !fw_cfg_dma_enabled(fw_cfg)) {
+        error_setg(errp, "%s device requires fw_cfg",
+                   TYPE_VMFWUPDATE);
+        return;
+    }
+
+    /*
+     * If the device is disabled on purpose, do not initialize.
+     * Old machines like pc-i440fx-2.8 does not have enough fw-cfg slots
+     * and hence this device is disabled for those machines.
+     */
+    if (s->disable) {
+        info_report("vmfwupdate device is disabled on the command-line");
+        return;
+    }
+
+    memset(&s->fw_blob, 0, sizeof(s->fw_blob));
+    memset(&s->opaque_blobs, 0, sizeof(s->opaque_blobs));
+
+    fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_OBLOB,
+                             NULL, NULL, s,
+                             &s->opaque_blobs,
+                             sizeof(s->opaque_blobs),
+                             false);
+
+    fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_FWBLOB,
+                             NULL, fw_blob_write, s,
+                             &s->fw_blob,
+                             sizeof(s->fw_blob),
+                             false);
+
+    /*
+     * Add global capability fw_cfg file. This will be used by the guest to
+     * check capability of the hypervisor.
+     * We do not allow the guest to change bios size for now.
+     */
+    s->capability = cpu_to_le64(CAP_VMFWUPD_MASK | VMFWUPDATE_CAP_EDKROM);
+
+    fw_cfg_add_file(fw_cfg, FILE_VMFWUPDATE_CAP,
+                    &s->capability, sizeof(s->capability));
+
+    s->plat_bios_size = get_bios_size(); /* for non-pc, this is 0 */
+    /* size of bios region for the platform - read only by the guest */
+    fw_cfg_add_file(fw_cfg, FILE_VMFWUPDATE_BIOS_SIZE,
+                    &s->plat_bios_size, sizeof(s->plat_bios_size));
+    /*
+     * add fw cfg control file to disable the hypervisor interface.
+     */
+    fw_cfg_add_file_callback(fw_cfg, FILE_VMFWUPDATE_CONTROL,
+                             NULL, NULL, s,
+                             &s->disable,
+                             sizeof(s->disable),
+                             false);
+    /*
+     * This device requires to register a global reset because it is
+     * not plugged to a bus (which, as its QOM parent, would reset it).
+     */
+    qemu_register_resettable(OBJECT(s));
+}
+
+static const Property vmfwupdate_properties[] = {
+    DEFINE_PROP_UINT8("disable", VMFwUpdateState, disable, 0),
+};
+
+static void vmfwupdate_device_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    /* we are not interested in migration - so no need to populate dc->vmsd */
+    dc->desc = "VM firmware update device";
+    dc->realize = vmfwupdate_realize;
+    dc->hotpluggable = false;
+    device_class_set_props(dc, vmfwupdate_properties);
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo vmfwupdate_device_types[] = {
+    {
+        .name          = TYPE_VMFWUPDATE,
+        .parent        = TYPE_DEVICE,
+        .instance_size = sizeof(VMFwUpdateState),
+        .class_init    = vmfwupdate_device_class_init,
+    },
+};
+
+DEFINE_TYPES(vmfwupdate_device_types)
diff --git a/include/hw/misc/vmfwupdate.h b/include/hw/misc/vmfwupdate.h
new file mode 100644
index 0000000000..adddb4c
--- /dev/null
+++ b/include/hw/misc/vmfwupdate.h
@@ -0,0 +1,105 @@
+/*
+ * Guest driven VM boot component update device
+ * For details and specification, please look at docs/specs/vmfwupdate.rst.
+ *
+ * Copyright (C) 2025 Red Hat, Inc.
+ *
+ * Authors: Ani Sinha <anisinha@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+#ifndef VMFWUPDATE_H
+#define VMFWUPDATE_H
+
+#include "hw/qdev-core.h"
+#include "qom/object.h"
+#include "qemu/units.h"
+
+#define TYPE_VMFWUPDATE "vmfwupdate"
+
+#define VMFWUPDCAPMSK  0xffff /* least significant 16 capability bits */
+
+#define VMFWUPDATE_CAP_EDKROM 0x08 /* bit 4 represents support for EDKROM */
+#define VMFWUPDATE_CAP_BIOS_RESIZE 0x04 /* guests may resize bios region */
+#define CAP_VMFWUPD_MASK 0x80
+
+#define VMFWUPDATE_OPAQUE_SIZE (4 * KiB) /* PAGE_SIZE */
+
+/* fw_cfg file definitions */
+#define FILE_VMFWUPDATE_OBLOB "etc/vmfwupdate/opaque-blob"
+#define FILE_VMFWUPDATE_FWBLOB "etc/vmfwupdate/fw-blob"
+#define FILE_VMFWUPDATE_CAP "etc/vmfwupdate/cap"
+#define FILE_VMFWUPDATE_BIOS_SIZE "etc/vmfwupdate/bios-size"
+#define FILE_VMFWUPDATE_CONTROL "etc/vmfwupdate/disable"
+
+/*
+ * Address and length of the guest provided firmware blob.
+ * The blob itself is passed using the guest shared memory to QEMU.
+ * This is then copied to the guest private memeory in the secure vm
+ * by the hypervisor.
+ */
+typedef struct {
+    uint64_t bios_size; /*
+                         * this is used by the guest to update plat_bios_size
+                         * when VMFWUPDATE_CAP_BIOS_RESIZE is set.
+                         */
+    uint64_t bios_paddr; /*
+                          * starting gpa where the blob is in shared guest
+                          * memory. Cleared upon system reset.
+                          */
+} VMFwUpdateFwBlob;
+
+typedef struct VMFwUpdateState {
+    DeviceState parent_obj;
+
+    /*
+     * capabilities - 64 bits.
+     * Little endian format.
+     */
+    uint64_t capability;
+
+    /*
+     * size of the bios region - architecture dependent.
+     * Read-only by the guest unless VMFWUPDATE_CAP_BIOS_RESIZE
+     * capability is set.
+     */
+    uint64_t plat_bios_size;
+
+    /*
+     * disable - disables the interface when non-zero value is written to it.
+     * Writing 0 to this file enables the interface.
+     */
+    uint8_t disable;
+
+    /*
+     * The first stage boot uses this opaque blob to convey to the next stage
+     * where the next stage components are loaded. The exact structure and
+     * number of entries are unknown to the hypervisor and the hypervisor
+     * does not touch this memory or does any validations. The contents of
+     * this memory survives a vm reset.
+     * The contents of this memory needs to be validated by the guest and
+     * must be ABI compatible between the first and second boot stages of
+     * the guest.
+     */
+    unsigned char opaque_blobs[VMFWUPDATE_OPAQUE_SIZE];
+
+    /*
+     * firmware blob addresses and sizes. These are moved to guest
+     * private memory.
+     */
+    VMFwUpdateFwBlob fw_blob;
+} VMFwUpdateState;
+
+OBJECT_DECLARE_SIMPLE_TYPE(VMFwUpdateState, VMFWUPDATE);
+
+/* returns NULL unless there is exactly one device */
+static inline VMFwUpdateState *vmfwupdate_find(void)
+{
+    Object *o = object_resolve_path_type("", TYPE_VMFWUPDATE, NULL);
+
+    return o ? VMFWUPDATE(o) : NULL;
+}
+
+#endif
diff --git a/tests/functional/meson.build b/tests/functional/meson.build
index b7719ab..31a3e46 100644
--- a/tests/functional/meson.build
+++ b/tests/functional/meson.build
@@ -72,6 +72,7 @@ tests_aarch64_system_thorough = [
   'aarch64_virt',
   'aarch64_xlnx_versal',
   'multiprocess',
+  'vmfwupdate',
 ]

 tests_alpha_system_thorough = [
@@ -235,6 +236,7 @@ tests_x86_64_system_quick = [
   'virtio_version',
   'x86_cpu_model_versions',
   'vnc',
+  'vmfwupdate',
 ]

 tests_x86_64_system_thorough = [
diff --git a/tests/functional/test_vmfwupdate.py b/tests/functional/test_vmfwupdate.py
new file mode 100644
index 0000000000..13eefd3
--- /dev/null
+++ b/tests/functional/test_vmfwupdate.py
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+#
+# Check for vmfwupdate device.
+#
+# Copyright (c) 2025 Red Hat, Inc.
+#
+# Author:
+#  Ani Sinha <anisinha@redhat.com>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+from qemu_test import QemuSystemTest
+import time
+
+class VmFwUpdateDeviceCheck(QemuSystemTest):
+    DELAY_BOOT_SEQUENCE = 1
+
+    def test_vmfwupdate_pass(self):
+        """
+        Basic test to make sure vmfwupdate device can be instantiated.
+        """
+        if self.arch != 'x86_64':
+            return
+
+        self.vm.add_args('-device', 'vmfwupdate,id=fwupd1')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        time.sleep(self.DELAY_BOOT_SEQUENCE)
+        self.vm.shutdown()
+        self.assertEqual(self.vm.exitcode(), 0, "QEMU exit code should be 0")
+
+    def test_vmfwupdate_disabled(self):
+        """
+        Basic test to make sure vmfwupdate device can be instantiated.
+        """
+        if self.arch != 'x86_64':
+            return
+
+        self.vm.add_args('-device', 'vmfwupdate,id=fwupd,disable=1')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        time.sleep(self.DELAY_BOOT_SEQUENCE)
+        self.vm.shutdown()
+        self.assertRegex(self.vm.get_log(),
+                         r'vmfwupdate device is disabled on the command-line')
+        self.assertEqual(self.vm.exitcode(), 0, "QEMU exit code should be 0")
+
+    def test_multiple_device_fail(self):
+        """
+        Only one vmfwdevice can be instantiated. Ensure failure if
+        user tries to create more than one device.
+        """
+        if self.arch != 'x86_64':
+            return
+
+        self.vm.add_args('-device', 'vmfwupdate,id=fw1',
+                         '-device', 'vmfwupdate,id=fw2')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        self.vm.wait()
+        self.assertEqual(self.vm.exitcode(), 1, "QEMU exit code should be 1")
+        self.assertRegex(self.vm.get_log(),
+                         r'at most one vmfwupdate device is permitted')
+
+    def aarch64_fail_test(self):
+        """
+        Currently the device is only supported for pc platforms.
+        """
+        if self.arch != 'aarch64':
+            return
+
+        self.vm.add_args('-machine', 'virt', '-device',
+                         'vmfwupdate,id=fwupd1')
+        self.vm.set_qmp_monitor(enabled=False)
+        self.vm.launch()
+        self.vm.wait()
+        self.assertEqual(self.vm.exitcode(), 1, "QEMU exit code should be 1")
+        self.assertRegex(self.vm.get_log(),
+                         r'This machine does not support vmfwupdate device')
+
+if __name__ == '__main__':
+    QemuSystemTest.main()
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 94b28e5..afe52f5 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -57,6 +57,7 @@ qtests_i386 = \
   (config_all_devices.has_key('CONFIG_AHCI_ICH9') ? ['tco-test'] : []) +                    \
   (config_all_devices.has_key('CONFIG_FDC_ISA') ? ['fdc-test'] : []) +                      \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['fw_cfg-test'] : []) +                    \
+  (config_all_devices.has_key('CONFIG_Q35') ? ['vmfwupdate-test'] : []) +                   \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['i440fx-test'] : []) +                    \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['ide-test'] : []) +                       \
   (config_all_devices.has_key('CONFIG_I440FX') ? ['numa-test'] : []) +                      \
diff --git a/tests/qtest/vmfwupdate-test.c b/tests/qtest/vmfwupdate-test.c
new file mode 100644
index 0000000000..fc1a91b
--- /dev/null
+++ b/tests/qtest/vmfwupdate-test.c
@@ -0,0 +1,67 @@
+/*
+ * vmfwupdate device fwcfg test.
+ *
+ * Copyright (c) 2025 Red Hat, Inc.
+ *
+ * Author:
+ *   Ani Sinha <anisinha@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "libqtest.h"
+#include "standard-headers/linux/qemu_fw_cfg.h"
+#include "libqos/fw_cfg.h"
+#include "qemu/bswap.h"
+#include "hw/misc/vmfwupdate.h"
+
+static void test_vmfwupdate_capability(void)
+{
+    QFWCFG *fw_cfg;
+    QTestState *s;
+    uint64_t capability = 0;
+    size_t filesize;
+
+    s = qtest_init("-device vmfwupdate");
+    fw_cfg = pc_fw_cfg_init(s);
+
+    filesize = qfw_cfg_get_file(fw_cfg, FILE_VMFWUPDATE_CAP,
+                                &capability, sizeof(capability));
+    g_assert_cmpint(filesize, ==, sizeof(capability));
+    capability = le64_to_cpu(capability);
+    g_assert_cmpint(capability, ==, CAP_VMFWUPD_MASK | VMFWUPDATE_CAP_EDKROM);
+    pc_fw_cfg_uninit(fw_cfg);
+    qtest_quit(s);
+}
+
+static void test_vmfwupdate_bios_size(void)
+{
+    QFWCFG *fw_cfg;
+    QTestState *s;
+    uint64_t bios_size = 0;
+    size_t filesize;
+
+    s = qtest_init("-device vmfwupdate");
+    fw_cfg = pc_fw_cfg_init(s);
+
+    filesize = qfw_cfg_get_file(fw_cfg, FILE_VMFWUPDATE_BIOS_SIZE,
+                                &bios_size, sizeof(bios_size));
+    g_assert_cmpint(filesize, ==, sizeof(bios_size));
+    bios_size = le64_to_cpu(bios_size);
+    fprintf(stderr, "bios_size: %" PRIu64 "\n", bios_size);
+    g_assert_cmpint(bios_size, !=, 0);
+    pc_fw_cfg_uninit(fw_cfg);
+    qtest_quit(s);
+}
+
+int main(int argc, char **argv)
+{
+    g_test_init(&argc, &argv, NULL);
+
+    qtest_add_func("vmfwupdate/cap", test_vmfwupdate_capability);
+    qtest_add_func("vmfwupdate/bios_size", test_vmfwupdate_bios_size);
+
+    return g_test_run();
+}
--
2.45.2

Signed-off-by: GitHub Actions Bot <bot@github.com>
@poettering

Copy link
Copy Markdown
Member

A pointer is not an offset, that's what I am saying. An offset is a small value you add to a pointer, and the result is then a pointer again. But it's not a pointer in itself. hence I find it really weird to name a pointer xyz_off or xyz_offset, because that makes no sense.

seems good to merge if you change the name of those variables.

@ani-sinha

ani-sinha commented Jan 31, 2025

Copy link
Copy Markdown
Contributor Author

UKIs can be used to bundle uefi firmwares that can be measured and
used on a confidential computing environment. There can be more than one
firmware blob bundle, each one for a specific platform. Also firmware images
can themselves be containers like IGVM files that can in turn bundle the
actual firmware blob. This change is specifically for uefi firmwares, not
IGVM container files.

This change adds support to introduce a .efifw section in UKI that can be
used for firmware blobs/images. There can be multiple such sections and each
section can contain a single firmware image.

The matching .hwids entry for a specific platform can be used to select the
most appropriate firmware blob.

ukify tool has been also changed to support addition of a firmware image
in UKI.

Since firmware gets measured automatically, we do not need to measure it
separately as a part of the UKI.
@poettering poettering merged commit 83bf58f into systemd:main Jan 31, 2025
@github-actions github-actions Bot removed the please-review PR is ready for (re-)review by a maintainer label Jan 31, 2025
Comment thread src/ukify/ukify.py
yuwata added a commit that referenced this pull request Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

9 participants