Topics

[RFC PATCH 00/14] Firmware Support for Fast Live Migration for AMD SEV


Tobin Feldman-Fitzthum
 

This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.

Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c

--
2.20.1


Laszlo Ersek
 

Hi Tobin,

On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote:
This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.
I plan to do a lightweight review for this series. (My understanding is
that it's an RFC and not actually being proposed for merging.)

Regarding the MH's availability at runtime -- does that necessarily
require the isolation of an AP? Because in the current approach,
allowing the MP Services to survive into OS runtime (in some form or
another) seems critical, and I don't think it's going to fly.

I agree that the UefiCpuPkg patches have been well separated from the
rest of the series, but I'm somewhat doubtful the "firmware-initiated
background process" idea will be accepted. Have you investigated
exposing a new "runtime service" (a function pointer) via the UEFI
Configuration table, and calling that (perhaps periodically?) from the
guest kernel? It would be a form of polling I guess. Or maybe, poll the
mailbox directly in the kernel, and call the new firmware runtime
service when there's an actual command to process.

(You do spell out "little kernel support", and I'm not sure if that's a
technical benefit, or a political / community benefit.)

I'm quite uncomfortable with an attempt to hide a CPU from the OS via
ACPI. The OS has other ways to learn (for example, a boot loader could
use the MP services itself, stash the information, and hand it to the OS
kernel -- this would minimally allow for detecting an inconsistency in
the OS). What about "all-but-self" IPIs too -- the kernel might think
all the processors it's poking like that were under its control.

Also, as far as I can tell from patch #7, the AP seems to be
busy-looping (with a CpuPause() added in), for the entire lifetime of
the OS. Do I understand right? If so -- is it a temporary trait as well?

Sorry if my questions are "premature", in the sense that I could get my
own answers as well if I actually read the patches in detail -- however,
I wouldn't like to do that at once, because then I'll be distracted by
many style issues and other "trivial" stuff. Examples for the latter:

- patch#1 calls SetMemoryEncDecHypercall3(), but there is no such
function in edk2, so minimally it's a patch ordering bug in the series,

- in patch#1, there's minimally one whitespace error (no whitespace
right after "EFI_SIZE_TO_PAGES")

- in patch#1, the alphabetical ordering in the [LibraryClasses] section,
and in the matching #include directives, gets broken,

- I'd prefer if the "SevLiveMigrationEnabled" UEFI variable were set in
ConfidentialMigrationDxe, rather than PlatformDxe (patch #3), or at
least another AMD SEV related DXE driver (OvmfPkg/AmdSevDxe etc).

- any particular reasonf or making the UEFI variable non-volatile? I
don't think it should survive any particular boot of the guest.

- Why do we need a variable in the first place?

etc etc

Thanks!
Laszlo





Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c


Tobin Feldman-Fitzthum
 

Hi Tobin,

On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote:
This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.
I plan to do a lightweight review for this series. (My understanding is
that it's an RFC and not actually being proposed for merging.)

Regarding the MH's availability at runtime -- does that necessarily
require the isolation of an AP? Because in the current approach,
allowing the MP Services to survive into OS runtime (in some form or
another) seems critical, and I don't think it's going to fly.

I agree that the UefiCpuPkg patches have been well separated from the
rest of the series, but I'm somewhat doubtful the "firmware-initiated
background process" idea will be accepted. Have you investigated
exposing a new "runtime service" (a function pointer) via the UEFI
Configuration table, and calling that (perhaps periodically?) from the
guest kernel? It would be a form of polling I guess. Or maybe, poll the
mailbox directly in the kernel, and call the new firmware runtime
service when there's an actual command to process.
Continuous runtime availability for the MH is almost certainly the most controversial part of this proposal, which is why I put it in the cover letter and why it's good to discuss.
(You do spell out "little kernel support", and I'm not sure if that's a
technical benefit, or a political / community benefit.)
As you allude to, minimal kernel support is really one of the main things that shapes our approach. This is partly a political and practical benefit, but there are also technical benefits. Having the MH in firmware likely leads to higher availability. It can be accessed when the OS is unreachable, perhaps during boot or when the OS is hung. There are also potential portability advantages although we do currently require support for one hypercall. The cost of implementing this hypercall is low.

Generally speaking, our task is to find a home for functionality that was traditionally provided by the hypervisor, but that needs to be inside the trust domain, but that isn't really part of a guest. A meta-goal of this project is to figure out the best way to do this.


I'm quite uncomfortable with an attempt to hide a CPU from the OS via
ACPI. The OS has other ways to learn (for example, a boot loader could
use the MP services itself, stash the information, and hand it to the OS
kernel -- this would minimally allow for detecting an inconsistency in
the OS). What about "all-but-self" IPIs too -- the kernel might think
all the processors it's poking like that were under its control.
This might be the second most controversial piece. Here's a question: if we could successfully hide the MH vCPU from the OS, would it still make you uncomfortable? In other words, is the worry that there might be some inconsistency or more generally that there is something hidden from the OS? One thing to think about is that the guest owner should generally be aware that there is a migration handler running. The way I see it, a guest owner of an SEV VM would need to opt-in to migration and should then expect that there is an MH running even if they aren't able to see it. Of course we need to be certain that the MH isn't going to break the OS.

Also, as far as I can tell from patch #7, the AP seems to be
busy-looping (with a CpuPause() added in), for the entire lifetime of
the OS. Do I understand right? If so -- is it a temporary trait as well?
In our approach the MH continuously checks for commands from the hypervisor. There are potentially ways to optimize this, such as having the hypervisor de-schedule the MH vCPU while not migrating. You could potentially shut down down the MH on the target after receiving the MH_RESET command (when the migration finishes), but what if you want to migrate that VM somewhere else?


Sorry if my questions are "premature", in the sense that I could get my
own answers as well if I actually read the patches in detail -- however,
I wouldn't like to do that at once, because then I'll be distracted by
many style issues and other "trivial" stuff. Examples for the latter:
Not premature at all. I think you hit the nail on the head with everything you raised.

-Tobin


- patch#1 calls SetMemoryEncDecHypercall3(), but there is no such
function in edk2, so minimally it's a patch ordering bug in the series,

- in patch#1, there's minimally one whitespace error (no whitespace
right after "EFI_SIZE_TO_PAGES")

- in patch#1, the alphabetical ordering in the [LibraryClasses] section,
and in the matching #include directives, gets broken,

- I'd prefer if the "SevLiveMigrationEnabled" UEFI variable were set in
ConfidentialMigrationDxe, rather than PlatformDxe (patch #3), or at
least another AMD SEV related DXE driver (OvmfPkg/AmdSevDxe etc).

- any particular reasonf or making the UEFI variable non-volatile? I
don't think it should survive any particular boot of the guest.

- Why do we need a variable in the first place?

etc etc

Thanks!
Laszlo




Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c


Yao, Jiewen
 

Hi Tobin
Thanks for your patch.
You may that Intel is working on TDX for the same live migration feature.

Please give me some time (about 1 work week) to digest and evaluate the patch and impact.
Then I will provide feedback.

Thank you
Yao Jiewen

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin
Feldman-Fitzthum
Sent: Wednesday, March 3, 2021 4:48 AM
To: devel@edk2.groups.io
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; James
Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>;
Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra <ashish.kalra@amd.com>;
Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.

Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c

--
2.20.1





Paolo Bonzini
 

Hi Tobin,

as mentioned in the reply to the QEMU patches posted by Tobin, I think the firmware helper approach is very good, but there are some disadvantages in the idea of auxiliary vCPUs. These are especially true in the VMM, where it's much nicer to have a separate VM that goes through a specialized run loop; however, even in the firmware level there are some complications (as you pointed out) in letting MpService workers run after ExitBootServices.

My idea would be that the firmware would start the VM as usual using the same launch data; then, the firmware would detect it was running as a migration helper VM during the SEC or PEI phases (for example via the GHCB or some other unencrypted communication area), and divert execution to the migration helper instead of proceeding to the next boot phase. This would be somewhat similar in spirit to how edk2 performs S3 resume, if my memory serves correctly.

What do you think?

Thanks,

Paolo


Laszlo Ersek
 

On 03/03/21 19:25, Tobin Feldman-Fitzthum wrote:
Laszlo wrote:
I'm quite uncomfortable with an attempt to hide a CPU from the OS via
ACPI. The OS has other ways to learn (for example, a boot loader could
use the MP services itself, stash the information, and hand it to the OS
kernel -- this would minimally allow for detecting an inconsistency in
the OS). What about "all-but-self" IPIs too -- the kernel might think
all the processors it's poking like that were under its control.
This might be the second most controversial piece. Here's a question: if
we could successfully hide the MH vCPU from the OS, would it still make
you uncomfortable? In other words, is the worry that there might be some
inconsistency or more generally that there is something hidden from the
OS?
(1) My personal concern is the consistency aspect. In *some* parts of
the firmware, we'd rely on the hidden CPU to behave as a "logical
execution unit" (because we want it to run the MH), but in other parts
of the firmware, we'd expect it to be hidden. (Consider what
EFI_MP_SERVICES_PROTOCOL.StartupAllAPs() should do while the MH is
running!) And then the CPU should be hidden from the OS completely, even
if the OS doesn't rely on ACPI, but massages LAPIC stuff that is
architecturally specified.

In other words, we'd have to treat this processor as a "service
processor", outside of the "normal" (?) processor domain -- basically
what the PSP is right now. I don't have the slightest idea how physical
firmware deals with service processors in general. I'm really scared of
the many possible corner cases (CPU hot(un)plug, NUMA proximity, ...)

(2) I expect kernel developers to have concerns about a firmware-level
"background job" at OS runtime. SMM does something similar (periodic or
otherwise hardware-initiated async SMIs etc), and kernel developers
already dislike those (latency spikes, messing with hardware state...).


One thing to think about is that the guest owner should generally be
aware that there is a migration handler running. The way I see it, a
guest owner of an SEV VM would need to opt-in to migration and should
then expect that there is an MH running even if they aren't able to see
it. Of course we need to be certain that the MH isn't going to break the
OS.
I didn't think of the guest owner, but the developers that work on
(possibly unrelated parts of) the guest kernel.



Also, as far as I can tell from patch #7, the AP seems to be
busy-looping (with a CpuPause() added in), for the entire lifetime of
the OS. Do I understand right? If so -- is it a temporary trait as well?
In our approach the MH continuously checks for commands from the
hypervisor. There are potentially ways to optimize this, such as having
the hypervisor de-schedule the MH vCPU while not migrating. You could
potentially shut down down the MH on the target after receiving the
MH_RESET command (when the migration finishes), but what if you want to
migrate that VM somewhere else?
I have no idea.

In the current world, de-scheduling a particular VCPU for extended
periods of time is a bad idea (stolen time goes up, ticks get lost, ...)
So I guess this would depend on how well you could "hide" the service
processor from the guest kernel.


I'd really like if we could rely on an established "service processor"
methodology, in the guest. Physical platform vendors have used service
processors for ages, the firmwares on those platforms (on the main
boards) do manage the service processors, and the service processors are
hidden from the OS too (beyond specified access methods, if any).

My understanding (or assumption) is that such a service processor is
primarily a separate entity (you cannot talk to them "unintentionally",
for example with an All-But-Self IPI), and that it's reachable only with
specific access methods. I think the AMD PSP itself might follow this
approach (AIUI it's an aarch64 CPU on an otherwise Intel/AMD arch platform).

I'd like us to benefit from a crystallized "service processor"
abstraction, if possible. I apologize that I'm this vague -- I've never
seen such firmware code that deals with a service processor, I just
assume it exists.

Thanks
Laszlo




Sorry if my questions are "premature", in the sense that I could get my
own answers as well if I actually read the patches in detail -- however,
I wouldn't like to do that at once, because then I'll be distracted by
many style issues and other "trivial" stuff. Examples for the latter:
Not premature at all. I think you hit the nail on the head with
everything you raised.

-Tobin


- patch#1 calls SetMemoryEncDecHypercall3(), but there is no such
function in edk2, so minimally it's a patch ordering bug in the series,

- in patch#1, there's minimally one whitespace error (no whitespace
right after "EFI_SIZE_TO_PAGES")

- in patch#1, the alphabetical ordering in the [LibraryClasses] section,
and in the matching #include directives, gets broken,

- I'd prefer if the "SevLiveMigrationEnabled" UEFI variable were set in
ConfidentialMigrationDxe, rather than PlatformDxe (patch #3), or at
least another AMD SEV related DXE driver (OvmfPkg/AmdSevDxe etc).

- any particular reasonf or making the UEFI variable non-volatile? I
don't think it should survive any particular boot of the guest.

- Why do we need a variable in the first place?

etc etc

Thanks!
Laszlo




Ashish Kalra (2):
   OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion
bitmap.
   OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
   OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
   OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
   OvmfPkg/AmdSev: Base for Confidential Migration Handler
   OvmfPkg/PlatfomPei: Set Confidential Migration PCD
   OvmfPkg/AmdSev: Setup Migration Handler Mailbox
   OvmfPkg/AmdSev: MH support for mailbox protocol
   UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
   UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
   UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
     memory
   OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
   OvmfPkg/AmdSev: Don't overwrite MH stack
   OvmfPkg/AmdSev: MH page encryption POC

  OvmfPkg/OvmfPkg.dec                           |  11 +
  OvmfPkg/AmdSev/AmdSevX64.dsc                  |   2 +
  OvmfPkg/AmdSev/AmdSevX64.fdf                  |  13 +-
  .../ConfidentialMigrationDxe.inf              |  45 +++
  .../ConfidentialMigrationPei.inf              |  35 ++
  .../DxeMemEncryptSevLib.inf                   |   1 +
  .../PeiMemEncryptSevLib.inf                   |   1 +
  OvmfPkg/PlatformDxe/Platform.inf              |   2 +
  OvmfPkg/PlatformPei/PlatformPei.inf           |   2 +
  UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf |   2 +
  UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf |   2 +
  OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h  | 235 +++++++++++++
  .../ConfidentialMigration/VirtualMemory.h     | 177 ++++++++++
  OvmfPkg/Include/Guid/MemEncryptLib.h          |  16 +
  OvmfPkg/PlatformDxe/PlatformConfig.h          |   5 +
  .../ConfidentialMigrationDxe.c                | 325 ++++++++++++++++++
  .../ConfidentialMigrationPei.c                |  25 ++
  .../X64/PeiDxeVirtualMemory.c                 |  18 +
  OvmfPkg/PlatformDxe/AmdSev.c                  |  99 ++++++
  OvmfPkg/PlatformDxe/Platform.c                |   6 +
  OvmfPkg/PlatformPei/AmdSev.c                  |  10 +
  OvmfPkg/PlatformPei/Platform.c                |  10 +
  .../CpuExceptionHandlerLib/DxeException.c     |   8 +-
  UefiCpuPkg/Library/MpInitLib/DxeMpLib.c       |  21 +-
  UefiCpuPkg/Library/MpInitLib/MpLib.c          |   7 +-
  25 files changed, 1061 insertions(+), 17 deletions(-)
  create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
  create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
  create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
  create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
  create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
  create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
  create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
  create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c


Laszlo Ersek
 

On 03/04/21 10:21, Paolo Bonzini wrote:
Hi Tobin,

as mentioned in the reply to the QEMU patches posted by Tobin, I
think the firmware helper approach is very good, but there are some
disadvantages in the idea of auxiliary vCPUs. These are especially
true in the VMM, where it's much nicer to have a separate VM that
goes through a specialized run loop; however, even in the firmware
level there are some complications (as you pointed out) in letting
MpService workers run after ExitBootServices.

My idea would be that the firmware would start the VM as usual using
the same launch data; then, the firmware would detect it was running
as a migration helper VM during the SEC or PEI phases (for example
via the GHCB or some other unencrypted communication area), and
divert execution to the migration helper instead of proceeding to the
next boot phase. This would be somewhat similar in spirit to how edk2
performs S3 resume, if my memory serves correctly.
Very cool. You'd basically warm-reboot the virtual machine into a new
boot mode (cf. BOOT_WITH_FULL_CONFIGURATION vs. BOOT_ON_S3_RESUME in
OvmfPkg/PlatformPei).

To me that's much more attractive than a "background job".

The S3 parallel is great. What I'm missing is:

- Is it possible to warm-reboot an SEV VM? (I vaguely recall that it's
not possible for SEV-ES at least.) Because, that's how we'd transfer
control to the early parts of the firmware again, IIUC your idea, while
preserving the memory contents.

- Who would initiate this process? S3 suspend is guest-initiated. (Not
that we couldn't use the guest agent, if needed.)

(In case the idea is really about a separate VM, and not about rebooting
the already running VM, then I don't understand -- how would a separate
VM access the guest RAM that needs to be migrated?)

NB in the X64 PEI phase of OVMF, only the first 4GB of RAM is mapped, so
the migration handler would have to build its own page table under this
approach too.

Thanks!
Laszlo


Laszlo Ersek
 

On 03/04/21 21:45, Laszlo Ersek wrote:
On 03/04/21 10:21, Paolo Bonzini wrote:
Hi Tobin,

as mentioned in the reply to the QEMU patches posted by Tobin, I
think the firmware helper approach is very good, but there are some
disadvantages in the idea of auxiliary vCPUs. These are especially
true in the VMM, where it's much nicer to have a separate VM that
goes through a specialized run loop; however, even in the firmware
level there are some complications (as you pointed out) in letting
MpService workers run after ExitBootServices.

My idea would be that the firmware would start the VM as usual using
the same launch data; then, the firmware would detect it was running
as a migration helper VM during the SEC or PEI phases (for example
via the GHCB or some other unencrypted communication area), and
divert execution to the migration helper instead of proceeding to the
next boot phase. This would be somewhat similar in spirit to how edk2
performs S3 resume, if my memory serves correctly.
Very cool. You'd basically warm-reboot the virtual machine into a new
boot mode (cf. BOOT_WITH_FULL_CONFIGURATION vs. BOOT_ON_S3_RESUME in
OvmfPkg/PlatformPei).

To me that's much more attractive than a "background job".

The S3 parallel is great. What I'm missing is:

- Is it possible to warm-reboot an SEV VM? (I vaguely recall that it's
not possible for SEV-ES at least.) Because, that's how we'd transfer
control to the early parts of the firmware again, IIUC your idea, while
preserving the memory contents.

- Who would initiate this process? S3 suspend is guest-initiated. (Not
that we couldn't use the guest agent, if needed.)

(In case the idea is really about a separate VM, and not about rebooting
the already running VM, then I don't understand -- how would a separate
VM access the guest RAM that needs to be migrated?)
Sorry -- I've just caught up with the QEMU thread. Your message there:

https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01220.html

says:

Patches were posted recently to the KVM mailing list to create
secondary VMs sharing the encryption context (ASID) with a primary VM

I did think of VMs sharing memory, but the goal of SEV seemed to be to
prevent exactly that, so I didn't think that was possible. I stand
corrected, and yes, this way I understand -- and welcome -- a completely
separate VM snooping the migration subject VM's memory.

My question would be then whether the migration helper VM would run on
its own memory, and just read out the other VM's memory -- or the MH VM
would run somewhere inside the original VM's memory (which sounds a lot
riskier). But your message explains that too:

The main advantage would be that the migration VM would not have to
share the address space with the primary VM

This sounds ideal; it should allow for a completely independent firmware
platform -- we wouldn't even have to call it "OVMF", and it might not
even have to contain the DXE Core and later-phase components. (Of course
if it's more convenient to keep the stuff in OVMF, that works too.)

(For some unsolicited personal information, now I feel less bad about
this idea never occurring to me -- I never knew about the KVM patch set
that would enable encryption context sharing. (TBH I thought that was
prevented, by design, in the SEV hardware...))


A workflow request to Tobin and Dov -- when posting closely interfacing
QEMU and edk2 series, it's best to cross-post both series to both lists,
and to CC everybody on everything. Feel free to use subject prefixes
like [qemu PATCH] and [edk2 PATCH] for clarity. It's been difficult for
me to follow both discussions (it doesn't help that I've been CC'd on
neither).

Thanks!
Laszlo


NB in the X64 PEI phase of OVMF, only the first 4GB of RAM is mapped, so
the migration handler would have to build its own page table under this
approach too.

Thanks!
Laszlo


Paolo Bonzini
 

On 04/03/21 21:45, Laszlo Ersek wrote:
On 03/04/21 10:21, Paolo Bonzini wrote:
Hi Tobin,

as mentioned in the reply to the QEMU patches posted by Tobin, I
think the firmware helper approach is very good, but there are some
disadvantages in the idea of auxiliary vCPUs. These are especially
true in the VMM, where it's much nicer to have a separate VM that
goes through a specialized run loop; however, even in the firmware
level there are some complications (as you pointed out) in letting
MpService workers run after ExitBootServices.

My idea would be that the firmware would start the VM as usual using
the same launch data; then, the firmware would detect it was running
as a migration helper VM during the SEC or PEI phases (for example
via the GHCB or some other unencrypted communication area), and
divert execution to the migration helper instead of proceeding to the
next boot phase. This would be somewhat similar in spirit to how edk2
performs S3 resume, if my memory serves correctly.
Very cool. You'd basically warm-reboot the virtual machine into a new
boot mode (cf. BOOT_WITH_FULL_CONFIGURATION vs. BOOT_ON_S3_RESUME in
OvmfPkg/PlatformPei).
To me that's much more attractive than a "background job".
The S3 parallel is great. What I'm missing is:
- Is it possible to warm-reboot an SEV VM? (I vaguely recall that it's
not possible for SEV-ES at least.) Because, that's how we'd transfer
control to the early parts of the firmware again, IIUC your idea, while
preserving the memory contents.
It's not exactly a warm reboot. It's two VMs booted at the same time, with exactly the same contents as far as encrypted RAM goes, but different unencrypted RAM. The difference makes one VM boot regularly and the other end up in the migration helper. The migration helper can be entirely contained in PEI, or it can even be its own OS, stored as a flat binary in the firmware. Whatever is easier.

The divergence would happen much earlier than S3 though. It would have to happen before the APs are brought up, for example, and essentially before the first fw_cfg access if (as is likely) the migration helper VM does not have fw_cfg at all. That's why I brought up the possibility of diverging as soon as SEC.

- Who would initiate this process? S3 suspend is guest-initiated. (Not
that we couldn't use the guest agent, if needed.)
(In case the idea is really about a separate VM, and not about rebooting
the already running VM, then I don't understand -- how would a separate
VM access the guest RAM that needs to be migrated?)
Answering the other message:

(For some unsolicited personal information, now I feel less bad about
this idea never occurring to me -- I never knew about the KVM patch set
that would enable encryption context sharing. (TBH I thought that was
prevented, by design, in the SEV hardware...))
As far as the SEV hardware is concerned, a "VM" is defined by the ASID.

The VM would be separate at the KVM level, but it would share the ASID (and thus the guest RAM) with the primary VM. So as far as the SEV hardware and the processor are concerned, the separate VM would be just one more VMCB that runs with that ASID. Only KVM knows that they are backed by different file descriptors etc.

In fact, another advantage is that it would be much easier to scale the migration helper to multiple vCPUs. This is probably also a case for diverging much earlier than PEI, because a multi-processor migration helper running in PEI or DXE would require ACPI tables and a lot of infrastructure that is probably undesirable.

Paolo


Ashish Kalra
 

On Wed, Mar 03, 2021 at 01:25:40PM -0500, Tobin Feldman-Fitzthum wrote:

Hi Tobin,

On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote:
This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.
I plan to do a lightweight review for this series. (My understanding is
that it's an RFC and not actually being proposed for merging.)

Regarding the MH's availability at runtime -- does that necessarily
require the isolation of an AP? Because in the current approach,
allowing the MP Services to survive into OS runtime (in some form or
another) seems critical, and I don't think it's going to fly.

I agree that the UefiCpuPkg patches have been well separated from the
rest of the series, but I'm somewhat doubtful the "firmware-initiated
background process" idea will be accepted. Have you investigated
exposing a new "runtime service" (a function pointer) via the UEFI
Configuration table, and calling that (perhaps periodically?) from the
guest kernel? It would be a form of polling I guess. Or maybe, poll the
mailbox directly in the kernel, and call the new firmware runtime
service when there's an actual command to process.
Continuous runtime availability for the MH is almost certainly the most
controversial part of this proposal, which is why I put it in the cover
letter and why it's good to discuss.
(You do spell out "little kernel support", and I'm not sure if that's a
technical benefit, or a political / community benefit.)
As you allude to, minimal kernel support is really one of the main things
that shapes our approach. This is partly a political and practical benefit,
but there are also technical benefits. Having the MH in firmware likely
leads to higher availability. It can be accessed when the OS is unreachable,
perhaps during boot or when the OS is hung. There are also potential
portability advantages although we do currently require support for one
hypercall. The cost of implementing this hypercall is low.

Generally speaking, our task is to find a home for functionality that was
traditionally provided by the hypervisor, but that needs to be inside the
trust domain, but that isn't really part of a guest. A meta-goal of this
project is to figure out the best way to do this.


I'm quite uncomfortable with an attempt to hide a CPU from the OS via
ACPI. The OS has other ways to learn (for example, a boot loader could
use the MP services itself, stash the information, and hand it to the OS
kernel -- this would minimally allow for detecting an inconsistency in
the OS). What about "all-but-self" IPIs too -- the kernel might think
all the processors it's poking like that were under its control.
This might be the second most controversial piece. Here's a question: if we
could successfully hide the MH vCPU from the OS, would it still make you
uncomfortable? In other words, is the worry that there might be some
inconsistency or more generally that there is something hidden from the OS?
One thing to think about is that the guest owner should generally be aware
that there is a migration handler running. The way I see it, a guest owner
of an SEV VM would need to opt-in to migration and should then expect that
there is an MH running even if they aren't able to see it. Of course we need
to be certain that the MH isn't going to break the OS.

Also, as far as I can tell from patch #7, the AP seems to be
busy-looping (with a CpuPause() added in), for the entire lifetime of
the OS. Do I understand right? If so -- is it a temporary trait as well?
In our approach the MH continuously checks for commands from the hypervisor.
There are potentially ways to optimize this, such as having the hypervisor
de-schedule the MH vCPU while not migrating. You could potentially shut down
down the MH on the target after receiving the MH_RESET command (when the
migration finishes), but what if you want to migrate that VM somewhere else?
I think another approach can be considered here, why not implement MH
vCPU(s) as hot-plugged vCPU(s), basically hot-plug a new vCPU when migration
is started and hot unplug the vCPU when migration is completed, then we
won't need a vCPU running (and potentially consuming cycles) forever and
busy-looping with CpuPause().

Thanks,
Ashish


Ashish Kalra
 

On Fri, Mar 05, 2021 at 10:44:23AM +0000, Ashish Kalra wrote:
On Wed, Mar 03, 2021 at 01:25:40PM -0500, Tobin Feldman-Fitzthum wrote:

Hi Tobin,

On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote:
This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.
I plan to do a lightweight review for this series. (My understanding is
that it's an RFC and not actually being proposed for merging.)

Regarding the MH's availability at runtime -- does that necessarily
require the isolation of an AP? Because in the current approach,
allowing the MP Services to survive into OS runtime (in some form or
another) seems critical, and I don't think it's going to fly.

I agree that the UefiCpuPkg patches have been well separated from the
rest of the series, but I'm somewhat doubtful the "firmware-initiated
background process" idea will be accepted. Have you investigated
exposing a new "runtime service" (a function pointer) via the UEFI
Configuration table, and calling that (perhaps periodically?) from the
guest kernel? It would be a form of polling I guess. Or maybe, poll the
mailbox directly in the kernel, and call the new firmware runtime
service when there's an actual command to process.
Continuous runtime availability for the MH is almost certainly the most
controversial part of this proposal, which is why I put it in the cover
letter and why it's good to discuss.
(You do spell out "little kernel support", and I'm not sure if that's a
technical benefit, or a political / community benefit.)
As you allude to, minimal kernel support is really one of the main things
that shapes our approach. This is partly a political and practical benefit,
but there are also technical benefits. Having the MH in firmware likely
leads to higher availability. It can be accessed when the OS is unreachable,
perhaps during boot or when the OS is hung. There are also potential
portability advantages although we do currently require support for one
hypercall. The cost of implementing this hypercall is low.

Generally speaking, our task is to find a home for functionality that was
traditionally provided by the hypervisor, but that needs to be inside the
trust domain, but that isn't really part of a guest. A meta-goal of this
project is to figure out the best way to do this.


I'm quite uncomfortable with an attempt to hide a CPU from the OS via
ACPI. The OS has other ways to learn (for example, a boot loader could
use the MP services itself, stash the information, and hand it to the OS
kernel -- this would minimally allow for detecting an inconsistency in
the OS). What about "all-but-self" IPIs too -- the kernel might think
all the processors it's poking like that were under its control.
This might be the second most controversial piece. Here's a question: if we
could successfully hide the MH vCPU from the OS, would it still make you
uncomfortable? In other words, is the worry that there might be some
inconsistency or more generally that there is something hidden from the OS?
One thing to think about is that the guest owner should generally be aware
that there is a migration handler running. The way I see it, a guest owner
of an SEV VM would need to opt-in to migration and should then expect that
there is an MH running even if they aren't able to see it. Of course we need
to be certain that the MH isn't going to break the OS.

Also, as far as I can tell from patch #7, the AP seems to be
busy-looping (with a CpuPause() added in), for the entire lifetime of
the OS. Do I understand right? If so -- is it a temporary trait as well?
In our approach the MH continuously checks for commands from the hypervisor.
There are potentially ways to optimize this, such as having the hypervisor
de-schedule the MH vCPU while not migrating. You could potentially shut down
down the MH on the target after receiving the MH_RESET command (when the
migration finishes), but what if you want to migrate that VM somewhere else?
I think another approach can be considered here, why not implement MH
vCPU(s) as hot-plugged vCPU(s), basically hot-plug a new vCPU when migration
is started and hot unplug the vCPU when migration is completed, then we
won't need a vCPU running (and potentially consuming cycles) forever and
busy-looping with CpuPause().
After internal discussions, realized that this approach will not work as
vCPU hotplug will not work for SEV-ES, SNP. As the VMSA has to be
encrypted as part of the LAUNCH command, therefore we can't create/add a
new vCPU after LAUNCH has completed.

Thanks,
Ashish


Tobin Feldman-Fitzthum
 

On Fri, Mar 05, 2021 at 10:44:23AM +0000, Ashish Kalra wrote:
On Wed, Mar 03, 2021 at 01:25:40PM -0500, Tobin Feldman-Fitzthum wrote:
Hi Tobin,

On 03/02/21 21:48, Tobin Feldman-Fitzthum wrote:
This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.
I plan to do a lightweight review for this series. (My understanding is
that it's an RFC and not actually being proposed for merging.)

Regarding the MH's availability at runtime -- does that necessarily
require the isolation of an AP? Because in the current approach,
allowing the MP Services to survive into OS runtime (in some form or
another) seems critical, and I don't think it's going to fly.

I agree that the UefiCpuPkg patches have been well separated from the
rest of the series, but I'm somewhat doubtful the "firmware-initiated
background process" idea will be accepted. Have you investigated
exposing a new "runtime service" (a function pointer) via the UEFI
Configuration table, and calling that (perhaps periodically?) from the
guest kernel? It would be a form of polling I guess. Or maybe, poll the
mailbox directly in the kernel, and call the new firmware runtime
service when there's an actual command to process.
Continuous runtime availability for the MH is almost certainly the most
controversial part of this proposal, which is why I put it in the cover
letter and why it's good to discuss.
(You do spell out "little kernel support", and I'm not sure if that's a
technical benefit, or a political / community benefit.)
As you allude to, minimal kernel support is really one of the main things
that shapes our approach. This is partly a political and practical benefit,
but there are also technical benefits. Having the MH in firmware likely
leads to higher availability. It can be accessed when the OS is unreachable,
perhaps during boot or when the OS is hung. There are also potential
portability advantages although we do currently require support for one
hypercall. The cost of implementing this hypercall is low.

Generally speaking, our task is to find a home for functionality that was
traditionally provided by the hypervisor, but that needs to be inside the
trust domain, but that isn't really part of a guest. A meta-goal of this
project is to figure out the best way to do this.

I'm quite uncomfortable with an attempt to hide a CPU from the OS via
ACPI. The OS has other ways to learn (for example, a boot loader could
use the MP services itself, stash the information, and hand it to the OS
kernel -- this would minimally allow for detecting an inconsistency in
the OS). What about "all-but-self" IPIs too -- the kernel might think
all the processors it's poking like that were under its control.
This might be the second most controversial piece. Here's a question: if we
could successfully hide the MH vCPU from the OS, would it still make you
uncomfortable? In other words, is the worry that there might be some
inconsistency or more generally that there is something hidden from the OS?
One thing to think about is that the guest owner should generally be aware
that there is a migration handler running. The way I see it, a guest owner
of an SEV VM would need to opt-in to migration and should then expect that
there is an MH running even if they aren't able to see it. Of course we need
to be certain that the MH isn't going to break the OS.

Also, as far as I can tell from patch #7, the AP seems to be
busy-looping (with a CpuPause() added in), for the entire lifetime of
the OS. Do I understand right? If so -- is it a temporary trait as well?
In our approach the MH continuously checks for commands from the hypervisor.
There are potentially ways to optimize this, such as having the hypervisor
de-schedule the MH vCPU while not migrating. You could potentially shut down
down the MH on the target after receiving the MH_RESET command (when the
migration finishes), but what if you want to migrate that VM somewhere else?
I think another approach can be considered here, why not implement MH
vCPU(s) as hot-plugged vCPU(s), basically hot-plug a new vCPU when migration
is started and hot unplug the vCPU when migration is completed, then we
won't need a vCPU running (and potentially consuming cycles) forever and
busy-looping with CpuPause().
After internal discussions, realized that this approach will not work as
vCPU hotplug will not work for SEV-ES, SNP. As the VMSA has to be
encrypted as part of the LAUNCH command, therefore we can't create/add a
new vCPU after LAUNCH has completed.

Thanks,
Ashish
Hm yeah we talked about hotplug a bit. It was never clear how it would square with OVMF.

-Tobin


Yao, Jiewen
 

Hi
We discuss the patch internally. We do see PROs and CONs with this approach.
The advantage is that it is very simple. In-VM migration can save lots of effort on security context restore.
On the other hand, we feel not so comfortable to reserve a dedicate CPU to achieve that. Similar to the feedback in the community.

Using Hot-Plug is not a solution for Intel TDX as well. It is unsupported now.

I like the idea to diverge the migration boot mode v.s. normal boot mode in SEC phase.
We must be very carefully handle this migration boot mode, to avoid any touching on system memory.
Intel TDX Virtual Firmware skips the PEI phase directly. If we choose this approach, SEC-based migration is our preference.

Besides this patch, we would like to understand a full picture.
1) How the key is passed from source VM to destination?
I saw you mentions: "Key sharing is out of scope for this part of the RFC."
"This will probably be implemented via inject-launch-secret in the future"

Does that mean two PSP will sync with each other and negotiate the key, after the Migration Agent (MA) checks the policy?

2) How the attestation is supported?
I read the whitepaper https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf.
It seems SEV and SEV-ES only support attestation during launch, I don't believe this migration feature will impact the attestation report. Am I right?
SEV-SNP supports more flexible attestation, does it include any information about the new migrated content?

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
Sent: Thursday, March 4, 2021 9:49 AM
To: devel@edk2.groups.io; tobin@linux.ibm.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke
<frankeh@us.ibm.com>; Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra
<ashish.kalra@amd.com>; Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: Re: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

Hi Tobin
Thanks for your patch.
You may that Intel is working on TDX for the same live migration feature.

Please give me some time (about 1 work week) to digest and evaluate the patch
and impact.
Then I will provide feedback.

Thank you
Yao Jiewen

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin
Feldman-Fitzthum
Sent: Wednesday, March 3, 2021 4:48 AM
To: devel@edk2.groups.io
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; James
Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>;
Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra
<ashish.kalra@amd.com>;
Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.

Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c

--
2.20.1








Tobin Feldman-Fitzthum
 

On 3/12/21 9:32 PM, Yao, Jiewen wrote:

Hi
We discuss the patch internally. We do see PROs and CONs with this approach.
The advantage is that it is very simple. In-VM migration can save lots of effort on security context restore.
On the other hand, we feel not so comfortable to reserve a dedicate CPU to achieve that. Similar to the feedback in the community.

Using Hot-Plug is not a solution for Intel TDX as well. It is unsupported now.

I like the idea to diverge the migration boot mode v.s. normal boot mode in SEC phase.
We must be very carefully handle this migration boot mode, to avoid any touching on system memory.
Intel TDX Virtual Firmware skips the PEI phase directly. If we choose this approach, SEC-based migration is our preference.

Besides this patch, we would like to understand a full picture.
1) How the key is passed from source VM to destination?
I saw you mentions: "Key sharing is out of scope for this part of the RFC."
"This will probably be implemented via inject-launch-secret in the future"

Does that mean two PSP will sync with each other and negotiate the key, after the Migration Agent (MA) checks the policy?
The source and destination migration handlers will need to share a key. If we only relied on the PSP for migration, we could use the existing secure channel between the PSP and the guest owner to transfer the pages. Unfortunately the throughput of this approach is far too low. Thus, we have some migration handler running on a guest vCPU with a transport key shared between the source and the target.

The main mechanism for getting a key to the migration handler is inject-launch-secret. Here the guest owner can provide a secret to the PSP via a secure channel and the PSP will inject it at some guest physical address. You use inject-launch-secret after the launch measurement of the guest has been generated to inject the secret conditionally. One approach would be to inject the transport key directly in the source and the target. This is pretty simple, but might have a few drawbacks. The injection has to happen at boot, meaning that the source machine would have to be provisioned with a transport key before a migration happens and that all migrations from that machine would have to use the same transport key. One way around this would be to inject asymmetric keys and use them to derive the transport key.

Another approach entirely is to use the PSP to migrate just a few pages, which might include a secret set by the source MH that the target MH could use to decrypt incoming pages. Using the PSP to migrate pages requires some extra kernel support.

For the RFC, we just assume that there is some shared key. We have talked some about the various options internally.

2) How the attestation is supported?
I read the whitepaper https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf.
It seems SEV and SEV-ES only support attestation during launch, I don't believe this migration feature will impact the attestation report. Am I right?
SEV-SNP supports more flexible attestation, does it include any information about the new migrated content?
Brijesh already addressed most of this. In our approach the MH is baked into the firmware, which can be attested prior to injecting the key. In other words there aren't any additional steps to attest the MH and it does not change the functionality of any existing attestation mechanisms.

-Tobin


-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao, Jiewen
Sent: Thursday, March 4, 2021 9:49 AM
To: devel@edk2.groups.io; tobin@linux.ibm.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke
<frankeh@us.ibm.com>; Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra
<ashish.kalra@amd.com>; Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: Re: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

Hi Tobin
Thanks for your patch.
You may that Intel is working on TDX for the same live migration feature.

Please give me some time (about 1 work week) to digest and evaluate the patch
and impact.
Then I will provide feedback.

Thank you
Yao Jiewen

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin
Feldman-Fitzthum
Sent: Wednesday, March 3, 2021 4:48 AM
To: devel@edk2.groups.io
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; James
Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>;
Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra
<ashish.kalra@amd.com>;
Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.

Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c

--
2.20.1






Singh, Brijesh <brijesh.singh@...>
 

[AMD Official Use Only - Internal Distribution Only]

Hi Yao,

In the current proposal the accelerated migration does not involve the PSP. I will let Tobin and Dov comment on how things works in current prototype.

If PSP was involved in the migration, then flow would be like this:

- During the guest creation time two things will happen (both source and destination VMs go through this step)
a) create a random VM encryption key (VEK) -- the key is used for encrypting the guest pages.
b) guest owner supplies a session blob to the PSP. The session blob contains transport encryption key (TEK). The TEK is used to encrypt all the confidential information exchanged between the PSP and the external entities such as a guest owner or another PSP.

During the migration
i) source VMM asks PSP to get a page that can be migrated.
ii) source PSP encrypt the guest pages using the TEK
iii) source VMM write the encrypted pages on the wire
iv) destination VMM will call PSP to put the received encrypted page in the guest memory.
v) destination PSP will decrypt the received pages using TEK, then encrypt it using the VEK before copying it to the guest memory.

As you see in the flow, the PSP's never share the keys. The TEK is wrapped in the session blob provided to the PSP on launch.

You are correct that the SEV/SEV-ES does not support querying the attestation report after the guest boot. All the attestation need to be done during the guest creation time.

With SEV-SNP, a guest OS/BIOS can call PSP to get the attestation report. The SEV-SNP, provides a method in which the guest owner can provide an IMI (Initial migration agent) through the launch process. The IMI will be measured separately and stored in IMD (Initial Migration Digest). When source VMM is ready to migrate it will use a PSP command (VM_EXPORT) to export the data from source to destination. The export will contains information about IMD etc. The destination VMM will use the PSP command (ABSORB) to import the incoming data. During the absorb process the destination PSP will check the IMD to ensure that same IMI is used at the source end. I have cut short few details in the email; See the SEV-SNP spec (section migration 4.11) for more.

Thanks
Brijesh

-----Original Message-----
From: Yao, Jiewen <jiewen.yao@intel.com>
Sent: Friday, March 12, 2021 8:32 PM
To: devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>; tobin@linux.ibm.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum <tobin@ibm.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>; Singh, Brijesh <brijesh.singh@amd.com>; Kalra, Ashish <Ashish.Kalra@amd.com>; Grimm, Jon <Jon.Grimm@amd.com>; Lendacky, Thomas <Thomas.Lendacky@amd.com>
Subject: RE: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live Migration for AMD SEV

Hi
We discuss the patch internally. We do see PROs and CONs with this approach.
The advantage is that it is very simple. In-VM migration can save lots of effort on security context restore.
On the other hand, we feel not so comfortable to reserve a dedicate CPU to achieve that. Similar to the feedback in the community.

Using Hot-Plug is not a solution for Intel TDX as well. It is unsupported now.

I like the idea to diverge the migration boot mode v.s. normal boot mode in SEC phase.
We must be very carefully handle this migration boot mode, to avoid any touching on system memory.
Intel TDX Virtual Firmware skips the PEI phase directly. If we choose this approach, SEC-based migration is our preference.

Besides this patch, we would like to understand a full picture.
1) How the key is passed from source VM to destination?
I saw you mentions: "Key sharing is out of scope for this part of the RFC."
"This will probably be implemented via inject-launch-secret in the future"

Does that mean two PSP will sync with each other and negotiate the key, after the Migration Agent (MA) checks the policy?

2) How the attestation is supported?
I read the whitepaper https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2FSEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb19ccecd6ca946abd0eb08d8e5c84177%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637511995981376795%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h67VntbdjigZFvhRfP6%2FGYTE9eqrFDqJRojWqG0C25c%3D&amp;reserved=0.
It seems SEV and SEV-ES only support attestation during launch, I don't believe this migration feature will impact the attestation report. Am I right?
SEV-SNP supports more flexible attestation, does it include any information about the new migrated content?


-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
Jiewen
Sent: Thursday, March 4, 2021 9:49 AM
To: devel@edk2.groups.io; tobin@linux.ibm.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke
<frankeh@us.ibm.com>; Brijesh Singh <brijesh.singh@amd.com>; Ashish
Kalra <ashish.kalra@amd.com>; Jon Grimm <jon.grimm@amd.com>; Tom
Lendacky <thomas.lendacky@amd.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: Re: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast
Live Migration for AMD SEV

Hi Tobin
Thanks for your patch.
You may that Intel is working on TDX for the same live migration feature.

Please give me some time (about 1 work week) to digest and evaluate
the patch and impact.
Then I will provide feedback.

Thank you
Yao Jiewen

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin
Feldman-Fitzthum
Sent: Wednesday, March 3, 2021 4:48 AM
To: devel@edk2.groups.io
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; James
Bottomley <jejb@linux.ibm.com>; Hubertus Franke
<frankeh@us.ibm.com>; Brijesh Singh <brijesh.singh@amd.com>; Ashish
Kalra
<ashish.kalra@amd.com>;
Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast
Live Migration for AMD SEV

This is a demonstration of fast migration for encrypted virtual
machines using a Migration Handler that lives in OVMF. This demo
uses AMD SEV, but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot
access or move it. This makes migration tricky. In this demo, we
show how the HV can ask a Migration Handler (MH) in the firmware for
an encrypted page. The MH encrypts the page with a transport key
prior to releasing it to the HV. The target machine also runs an MH
that decrypts the page once it is passed in by the target HV. These
patches are not ready for production, but the are a full end-to-end
solution that facilitates a fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov
Murik on qemu-devel. Our approach needs little kernel support,
requiring only one hypercall that the guest can use to mark a page
as encrypted or shared. This series includes updated patches from
Ashish Kalra and Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication
from the HV. The HV starts an additional vCPU for the MH but does
not expose it to the guest OS via ACPI. We use the MpService to
start the MH. The MpService is only available at runtime and
processes that are started by it are usually cleaned up on
ExitBootServices. Since we need the MH to run continuously, we had
to make some modifications. Ideally a feature could be added to the
MpService to allow for the starting of long-running processes.
Besides migration, this could support other background processes
that need to operate within the encryption boundary. For now, we
have included a handful of patches that modify the MpService to
allow the MH to keep running after ExitBootServices. These are temporary.

Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-) create mode
100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c

--
2.20.1








Yao, Jiewen
 

Thank you very much Tobin and Brijesh.

Yes, I agree that there are multiple ways to pass the transport key from source to destination.
I will wait for your final solution.

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin
Feldman-Fitzthum
Sent: Wednesday, March 17, 2021 1:47 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; devel@edk2.groups.io
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke
<frankeh@us.ibm.com>; Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra
<ashish.kalra@amd.com>; Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: Re: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

On 3/12/21 9:32 PM, Yao, Jiewen wrote:

Hi
We discuss the patch internally. We do see PROs and CONs with this approach.
The advantage is that it is very simple. In-VM migration can save lots of effort
on security context restore.
On the other hand, we feel not so comfortable to reserve a dedicate CPU to
achieve that. Similar to the feedback in the community.

Using Hot-Plug is not a solution for Intel TDX as well. It is unsupported now.

I like the idea to diverge the migration boot mode v.s. normal boot mode in
SEC phase.
We must be very carefully handle this migration boot mode, to avoid any
touching on system memory.
Intel TDX Virtual Firmware skips the PEI phase directly. If we choose this
approach, SEC-based migration is our preference.

Besides this patch, we would like to understand a full picture.
1) How the key is passed from source VM to destination?
I saw you mentions: "Key sharing is out of scope for this part of the RFC."
"This will probably be implemented via inject-launch-secret in the future"

Does that mean two PSP will sync with each other and negotiate the key, after
the Migration Agent (MA) checks the policy?

The source and destination migration handlers will need to share a key.
If we only relied on the PSP for migration, we could use the existing
secure channel between the PSP and the guest owner to transfer the
pages. Unfortunately the throughput of this approach is far too low.
Thus, we have some migration handler running on a guest vCPU with a
transport key shared between the source and the target.

The main mechanism for getting a key to the migration handler is
inject-launch-secret. Here the guest owner can provide a secret to the
PSP via a secure channel and the PSP will inject it at some guest
physical address. You use inject-launch-secret after the launch
measurement of the guest has been generated to inject the secret
conditionally. One approach would be to inject the transport key
directly in the source and the target. This is pretty simple, but might
have a few drawbacks. The injection has to happen at boot, meaning that
the source machine would have to be provisioned with a transport key
before a migration happens and that all migrations from that machine
would have to use the same transport key. One way around this would be
to inject asymmetric keys and use them to derive the transport key.

Another approach entirely is to use the PSP to migrate just a few pages,
which might include a secret set by the source MH that the target MH
could use to decrypt incoming pages. Using the PSP to migrate pages
requires some extra kernel support.

For the RFC, we just assume that there is some shared key. We have
talked some about the various options internally.

2) How the attestation is supported?
I read the whitepaper https://www.amd.com/system/files/TechDocs/SEV-
SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf.
It seems SEV and SEV-ES only support attestation during launch, I don't believe
this migration feature will impact the attestation report. Am I right?
SEV-SNP supports more flexible attestation, does it include any information
about the new migrated content?

Brijesh already addressed most of this. In our approach the MH is baked
into the firmware, which can be attested prior to injecting the key. In
other words there aren't any additional steps to attest the MH and it
does not change the functionality of any existing attestation mechanisms.

-Tobin


-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Yao,
Jiewen
Sent: Thursday, March 4, 2021 9:49 AM
To: devel@edk2.groups.io; tobin@linux.ibm.com
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; James Bottomley <jejb@linux.ibm.com>; Hubertus Franke
<frankeh@us.ibm.com>; Brijesh Singh <brijesh.singh@amd.com>; Ashish
Kalra
<ashish.kalra@amd.com>; Jon Grimm <jon.grimm@amd.com>; Tom
Lendacky
<thomas.lendacky@amd.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: Re: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

Hi Tobin
Thanks for your patch.
You may that Intel is working on TDX for the same live migration feature.

Please give me some time (about 1 work week) to digest and evaluate the
patch
and impact.
Then I will provide feedback.

Thank you
Yao Jiewen

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Tobin
Feldman-Fitzthum
Sent: Wednesday, March 3, 2021 4:48 AM
To: devel@edk2.groups.io
Cc: Dov Murik <dovmurik@linux.vnet.ibm.com>; Tobin Feldman-Fitzthum
<tobin@ibm.com>; Tobin Feldman-Fitzthum <tobin@linux.ibm.com>; James
Bottomley <jejb@linux.ibm.com>; Hubertus Franke <frankeh@us.ibm.com>;
Brijesh Singh <brijesh.singh@amd.com>; Ashish Kalra
<ashish.kalra@amd.com>;
Jon Grimm <jon.grimm@amd.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: [edk2-devel] [RFC PATCH 00/14] Firmware Support for Fast Live
Migration for AMD SEV

This is a demonstration of fast migration for encrypted virtual machines
using a Migration Handler that lives in OVMF. This demo uses AMD SEV,
but the ideas may generalize to other confidential computing platforms.
With AMD SEV, guest memory is encrypted and the hypervisor cannot
access
or move it. This makes migration tricky. In this demo, we show how the
HV can ask a Migration Handler (MH) in the firmware for an encrypted
page. The MH encrypts the page with a transport key prior to releasing
it to the HV. The target machine also runs an MH that decrypts the page
once it is passed in by the target HV. These patches are not ready for
production, but the are a full end-to-end solution that facilitates a
fast live migration between two SEV VMs.

Corresponding patches for QEMU have been posted my colleague Dov
Murik
on qemu-devel. Our approach needs little kernel support, requiring only
one hypercall that the guest can use to mark a page as encrypted or
shared. This series includes updated patches from Ashish Kalra and
Brijesh Singh that allow OVMF to use this hypercall.

The MH runs continuously in the guest, waiting for communication from
the HV. The HV starts an additional vCPU for the MH but does not expose
it to the guest OS via ACPI. We use the MpService to start the MH. The
MpService is only available at runtime and processes that are started by
it are usually cleaned up on ExitBootServices. Since we need the MH to
run continuously, we had to make some modifications. Ideally a feature
could be added to the MpService to allow for the starting of
long-running processes. Besides migration, this could support other
background processes that need to operate within the encryption
boundary. For now, we have included a handful of patches that modify the
MpService to allow the MH to keep running after ExitBootServices. These
are temporary.

Ashish Kalra (2):
OvmfPkg/PlatformPei: Mark SEC GHCB page in the page encrpytion bitmap.
OvmfPkg/PlatformDxe: Add support for SEV live migration.

Brijesh Singh (1):
OvmfPkg/BaseMemEncryptLib: Support to issue unencrypted hypercall

Dov Murik (1):
OvmfPkg/AmdSev: Build page table for migration handler

Tobin Feldman-Fitzthum (10):
OvmfPkg/AmdSev: Base for Confidential Migration Handler
OvmfPkg/PlatfomPei: Set Confidential Migration PCD
OvmfPkg/AmdSev: Setup Migration Handler Mailbox
OvmfPkg/AmdSev: MH support for mailbox protocol
UefiCpuPkg/MpInitLib: temp removal of MpLib cleanup
UefiCpuPkg/MpInitLib: Allocate MP buffer as runtime memory
UefiCpuPkg/CpuExceptionHandlerLib: Exception handling as runtime
memory
OvmfPkg/AmdSev: Don't overwrite mailbox or pagetables
OvmfPkg/AmdSev: Don't overwrite MH stack
OvmfPkg/AmdSev: MH page encryption POC

OvmfPkg/OvmfPkg.dec | 11 +
OvmfPkg/AmdSev/AmdSevX64.dsc | 2 +
OvmfPkg/AmdSev/AmdSevX64.fdf | 13 +-
.../ConfidentialMigrationDxe.inf | 45 +++
.../ConfidentialMigrationPei.inf | 35 ++
.../DxeMemEncryptSevLib.inf | 1 +
.../PeiMemEncryptSevLib.inf | 1 +
OvmfPkg/PlatformDxe/Platform.inf | 2 +
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 2 +
OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h | 235 +++++++++++++
.../ConfidentialMigration/VirtualMemory.h | 177 ++++++++++
OvmfPkg/Include/Guid/MemEncryptLib.h | 16 +
OvmfPkg/PlatformDxe/PlatformConfig.h | 5 +
.../ConfidentialMigrationDxe.c | 325 ++++++++++++++++++
.../ConfidentialMigrationPei.c | 25 ++
.../X64/PeiDxeVirtualMemory.c | 18 +
OvmfPkg/PlatformDxe/AmdSev.c | 99 ++++++
OvmfPkg/PlatformDxe/Platform.c | 6 +
OvmfPkg/PlatformPei/AmdSev.c | 10 +
OvmfPkg/PlatformPei/Platform.c | 10 +
.../CpuExceptionHandlerLib/DxeException.c | 8 +-
UefiCpuPkg/Library/MpInitLib/DxeMpLib.c | 21 +-
UefiCpuPkg/Library/MpInitLib/MpLib.c | 7 +-
25 files changed, 1061 insertions(+), 17 deletions(-)
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.inf
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.inf
create mode 100644 OvmfPkg/AmdSev/ConfidentialMigration/MpLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/VirtualMemory.h
create mode 100644 OvmfPkg/Include/Guid/MemEncryptLib.h
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationDxe.c
create mode 100644
OvmfPkg/AmdSev/ConfidentialMigration/ConfidentialMigrationPei.c
create mode 100644 OvmfPkg/PlatformDxe/AmdSev.c

--
2.20.1