Topics

[RFC PATCH 00/19] Add AMD Secure Nested Paging (SEV-SNP) support


Brijesh Singh
 

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.

Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.

Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only support the pre-validation. OVMF detects all the available
system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
before it is made available to the EDK2 core.

This series does not implements the following SEV-SNP features yet:

* CPUID filtering
* AP bring up using the new SEV-SNP NAE
* Lazy validation
* Interrupt security

The series is based on commit:
e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber

Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36)

The complete source is available at
https://github.com/AMDESE/ovmf/tree/sev-snp-rfc-1

GHCB spec v2:
The draft specification is posted on AMD-SEV-SNP mailing list:
https://lists.suse.com/mailman/private/amd-sev-snp/

Copy of the spec is also available at
https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf

GHCB spec v1:
SEV-SNP firmware specification:
https://developer.amd.com/sev/

Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Laszlo Ersek <lersek@redhat.com>

Brijesh Singh (19):
OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest
OvmfPkg: validate the data pages used in the SEC phase
MdePkg: Expand the SEV MSR to include the SNP definition
OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled()
MdePkg: Define the GHCB GPA structure
UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is
enabled
OvmfPkg: Add a library to support registering GHCB GPA
OvmfPkg: register GHCB gpa for the SEV-SNP guest
MdePkg: Add AsmPvalidate() support
OvmfPkg: Define the Page State Change VMGEXIT structures
OvmfPkg/ResetVector: Invalidate the GHCB page
OvmfPkg/MemEncryptSevLib: Add support to validate system RAM
OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase
OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase
OvmfPkg/PlatformPei: Validate the system RAM when SNP is active
OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI
phase
OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase
OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc
attribute
OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region

MdePkg/Include/Library/BaseLib.h | 37 +++
MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++-
MdePkg/Include/Register/Amd/Ghcb.h | 39 ++-
MdePkg/Library/BaseLib/BaseLib.inf | 1 +
MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++
OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++
OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++
.../DxeMemEncryptSevLib.inf | 7 +
.../DxeMemEncryptSevLibInternal.c | 27 ++
.../Ia32/SnpPageStateChange.c | 17 ++
.../PeiMemEncryptSevLib.inf | 9 +
.../PeiMemEncryptSevLibInternal.c | 47 ++++
.../SecMemEncryptSevLib.inf | 4 +
.../SecMemEncryptSevLibInternal.c | 39 +++
.../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++
.../X64/PeiDxeSnpSetPageState.c | 63 +++++
.../X64/PeiDxeVirtualMemory.c | 151 ++++++++++-
.../X64/PeiSnpSystemRamValidate.c | 129 +++++++++
.../X64/SecSnpSystemRamValidate.c | 23 ++
.../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++
.../X64/SnpPageStateTrack.c | 119 ++++++++
.../X64/SnpPageStateTrack.h | 36 +++
.../X64/SnpSetPageState.h | 27 ++
.../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++
.../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++
.../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++
OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 +
OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 +
OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++
OvmfPkg/OvmfPkg.dec | 12 +
OvmfPkg/OvmfPkgX64.dsc | 1 +
OvmfPkg/OvmfPkgX64.fdf | 33 ++-
OvmfPkg/PlatformPei/AmdSev.c | 52 ++++
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++
OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++
OvmfPkg/ResetVector/ResetVector.inf | 5 +
OvmfPkg/ResetVector/ResetVector.nasmb | 4 +
OvmfPkg/Sec/SecMain.c | 102 +++++++
OvmfPkg/Sec/SecMain.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 +
UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 +
UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++
UefiCpuPkg/UefiCpuPkg.dec | 6 +
47 files changed, 1790 insertions(+), 19 deletions(-)
create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm
create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf

--
2.17.1


Laszlo Ersek
 

On 03/24/21 16:31, Brijesh Singh wrote:
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.
Thanks, Brijesh. I'm adding the series to my review queue. Due to some
PTO coming up, I'll probably start reviewing this work only in April.
Other reviewers, please feel free to have at it.

Cheers
Laszlo


Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.

Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.

Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only support the pre-validation. OVMF detects all the available
system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
before it is made available to the EDK2 core.

This series does not implements the following SEV-SNP features yet:

* CPUID filtering
* AP bring up using the new SEV-SNP NAE
* Lazy validation
* Interrupt security

The series is based on commit:
e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber

Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36)

The complete source is available at
https://github.com/AMDESE/ovmf/tree/sev-snp-rfc-1

GHCB spec v2:
The draft specification is posted on AMD-SEV-SNP mailing list:
https://lists.suse.com/mailman/private/amd-sev-snp/

Copy of the spec is also available at
https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf

GHCB spec v1:
SEV-SNP firmware specification:
https://developer.amd.com/sev/

Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Laszlo Ersek <lersek@redhat.com>

Brijesh Singh (19):
OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest
OvmfPkg: validate the data pages used in the SEC phase
MdePkg: Expand the SEV MSR to include the SNP definition
OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled()
MdePkg: Define the GHCB GPA structure
UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is
enabled
OvmfPkg: Add a library to support registering GHCB GPA
OvmfPkg: register GHCB gpa for the SEV-SNP guest
MdePkg: Add AsmPvalidate() support
OvmfPkg: Define the Page State Change VMGEXIT structures
OvmfPkg/ResetVector: Invalidate the GHCB page
OvmfPkg/MemEncryptSevLib: Add support to validate system RAM
OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase
OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase
OvmfPkg/PlatformPei: Validate the system RAM when SNP is active
OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI
phase
OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase
OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc
attribute
OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region

MdePkg/Include/Library/BaseLib.h | 37 +++
MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++-
MdePkg/Include/Register/Amd/Ghcb.h | 39 ++-
MdePkg/Library/BaseLib/BaseLib.inf | 1 +
MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++
OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++
OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++
.../DxeMemEncryptSevLib.inf | 7 +
.../DxeMemEncryptSevLibInternal.c | 27 ++
.../Ia32/SnpPageStateChange.c | 17 ++
.../PeiMemEncryptSevLib.inf | 9 +
.../PeiMemEncryptSevLibInternal.c | 47 ++++
.../SecMemEncryptSevLib.inf | 4 +
.../SecMemEncryptSevLibInternal.c | 39 +++
.../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++
.../X64/PeiDxeSnpSetPageState.c | 63 +++++
.../X64/PeiDxeVirtualMemory.c | 151 ++++++++++-
.../X64/PeiSnpSystemRamValidate.c | 129 +++++++++
.../X64/SecSnpSystemRamValidate.c | 23 ++
.../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++
.../X64/SnpPageStateTrack.c | 119 ++++++++
.../X64/SnpPageStateTrack.h | 36 +++
.../X64/SnpSetPageState.h | 27 ++
.../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++
.../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++
.../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++
OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 +
OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 +
OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++
OvmfPkg/OvmfPkg.dec | 12 +
OvmfPkg/OvmfPkgX64.dsc | 1 +
OvmfPkg/OvmfPkgX64.fdf | 33 ++-
OvmfPkg/PlatformPei/AmdSev.c | 52 ++++
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++
OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++
OvmfPkg/ResetVector/ResetVector.inf | 5 +
OvmfPkg/ResetVector/ResetVector.nasmb | 4 +
OvmfPkg/Sec/SecMain.c | 102 +++++++
OvmfPkg/Sec/SecMain.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 +
UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 +
UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++
UefiCpuPkg/UefiCpuPkg.dec | 6 +
47 files changed, 1790 insertions(+), 19 deletions(-)
create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm
create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf


Laszlo Ersek
 

Hi Brijesh,

On 03/24/21 16:31, Brijesh Singh wrote:
BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3275

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.

Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.

Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only support the pre-validation. OVMF detects all the available
system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
before it is made available to the EDK2 core.
Can you describe this in a bit more detail, before I look at the
individual patches? Specifically, what existing logic in the PEI phase
was taken, and extended, and how?

If there is a particular patch whose commit message is closely related
to my question, can you point it out? Patch#15 perhaps? (Doesn't seem
like a big patch; for some reason I'd expect something more complex, but
perhaps that's only because it builds upon the many earlier patches.)

Thanks,
Laszlo


This series does not implements the following SEV-SNP features yet:

* CPUID filtering
* AP bring up using the new SEV-SNP NAE
* Lazy validation
* Interrupt security

The series is based on commit:
e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber

Additional resources
---------------------
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf

APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf (section 15.36)

The complete source is available at
https://github.com/AMDESE/ovmf/tree/sev-snp-rfc-1

GHCB spec v2:
The draft specification is posted on AMD-SEV-SNP mailing list:
https://lists.suse.com/mailman/private/amd-sev-snp/

Copy of the spec is also available at
https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf

GHCB spec v1:
SEV-SNP firmware specification:
https://developer.amd.com/sev/

Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Laszlo Ersek <lersek@redhat.com>

Brijesh Singh (19):
OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest
OvmfPkg: validate the data pages used in the SEC phase
MdePkg: Expand the SEV MSR to include the SNP definition
OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled()
MdePkg: Define the GHCB GPA structure
UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is
enabled
OvmfPkg: Add a library to support registering GHCB GPA
OvmfPkg: register GHCB gpa for the SEV-SNP guest
MdePkg: Add AsmPvalidate() support
OvmfPkg: Define the Page State Change VMGEXIT structures
OvmfPkg/ResetVector: Invalidate the GHCB page
OvmfPkg/MemEncryptSevLib: Add support to validate system RAM
OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase
OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase
OvmfPkg/PlatformPei: Validate the system RAM when SNP is active
OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI
phase
OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase
OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc
attribute
OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region

MdePkg/Include/Library/BaseLib.h | 37 +++
MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++-
MdePkg/Include/Register/Amd/Ghcb.h | 39 ++-
MdePkg/Library/BaseLib/BaseLib.inf | 1 +
MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++
OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++
OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++
.../DxeMemEncryptSevLib.inf | 7 +
.../DxeMemEncryptSevLibInternal.c | 27 ++
.../Ia32/SnpPageStateChange.c | 17 ++
.../PeiMemEncryptSevLib.inf | 9 +
.../PeiMemEncryptSevLibInternal.c | 47 ++++
.../SecMemEncryptSevLib.inf | 4 +
.../SecMemEncryptSevLibInternal.c | 39 +++
.../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++
.../X64/PeiDxeSnpSetPageState.c | 63 +++++
.../X64/PeiDxeVirtualMemory.c | 151 ++++++++++-
.../X64/PeiSnpSystemRamValidate.c | 129 +++++++++
.../X64/SecSnpSystemRamValidate.c | 23 ++
.../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++
.../X64/SnpPageStateTrack.c | 119 ++++++++
.../X64/SnpPageStateTrack.h | 36 +++
.../X64/SnpSetPageState.h | 27 ++
.../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++
.../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++
.../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++
OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 +
OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 +
OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++
OvmfPkg/OvmfPkg.dec | 12 +
OvmfPkg/OvmfPkgX64.dsc | 1 +
OvmfPkg/OvmfPkgX64.fdf | 33 ++-
OvmfPkg/PlatformPei/AmdSev.c | 52 ++++
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++
OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++
OvmfPkg/ResetVector/ResetVector.inf | 5 +
OvmfPkg/ResetVector/ResetVector.nasmb | 4 +
OvmfPkg/Sec/SecMain.c | 102 +++++++
OvmfPkg/Sec/SecMain.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 +
UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 +
UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++
UefiCpuPkg/UefiCpuPkg.dec | 6 +
47 files changed, 1790 insertions(+), 19 deletions(-)
create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm
create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf


Brijesh Singh
 

Hi Laszlo,

On 4/8/21 4:58 AM, Laszlo Ersek wrote:
Hi Brijesh,

On 03/24/21 16:31, Brijesh Singh wrote:
BZ: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D3275&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=G2AQ%2FCks3%2BbczHJXMwqlqpWpoBJmb0pmxb1VNLw6t%2BA%3D&amp;reserved=0

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.

Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.

Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only support the pre-validation. OVMF detects all the available
system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
before it is made available to the EDK2 core.
Can you describe this in a bit more detail, before I look at the
individual patches? Specifically, what existing logic in the PEI phase
was taken, and extended, and how?
One of the key requirement is that the guest private pages much be
validated before the access. If guest tries to access the pages before
the validation then it will result in #VC (page-not-validated)
exception. To avoid the #VC, we propose the validating the memory before
the access. We will incrementally add the support to lazy validate (i.e
validate on access).

Let me try explaining a bit, the page validation process consist of two
steps:

1. Add the pages in the RMP table -- must be done by the hypervisor
using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask
hypervisor to add or remove pages from the RMP table.

2. Guest issue the PVALIDATE instruction -- this sets the validate bit
in the RMP table.

Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware
before the launch. The SNP firmware also validates the memory page after
encrypting. This allows us to boot the initial entry code without guest
going through the validation process.

The OVMF reset vector uses few data pages (e.g page table, early Sec
stack). Access to these data pages will result in #VC. There are two
approaches we can take to validate these data pages:

1. Ask SNP firmware to pre-validate it -- SNP firmware provides an
special command that can be used to pre-validate the pages without
affecting the measurement.

2. Enhance the reset vector code to validate the pages.

For now I choose #1.

The pre-validation performed by the SNP firmware is sufficient to boot
through the SEC phase. The SEC phase later decompress the Fv to a new
memory location. Now we need the OVMF to take over the validation
procedure.  The series extends the MemEncryptSevLib to add a new helper
MemEncryptSevSnpValidateRam(). The helper is used to validate the system
RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to
validate the output buffer used for the decompression. This was
sufficient to boot into the PEI phase, see patch #13. The PEI detects
all the available system RAM. After the memory detection is completed
the PlatformPei calls the AmdSevSnpInitialize(). The initialization
routine iterate through the HOB and calls the
MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it
possible the more system ram can be detected after the PlatformPei is
completed ?

One of the important thing is we should *never* validate the pages
twice. The MemEncryptSevSnpValidateRam() uses a interval search tree to
keep the record of what has been validated. Before validating the range,
it lookup in its tree and if it finds that range is already validated
then do nothing. If it detects an overlap then it will validate only non
overlapping regions -- see patch #14.

The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the
SNP page state change during the C-bit toggle.

Please let me know if you have any questions. We can hash out the design
here before you taking a closure look at the code.


If there is a particular patch whose commit message is closely related
to my question, can you point it out? Patch#15 perhaps? (Doesn't seem
like a big patch; for some reason I'd expect something more complex, but
perhaps that's only because it builds upon the many earlier patches.)

Thanks,
Laszlo

This series does not implements the following SEV-SNP features yet:

* CPUID filtering
* AP bring up using the new SEV-SNP NAE
* Lazy validation
* Interrupt security

The series is based on commit:
e542e05d4f UefiCpuPkg/SmmCpuFeaturesLib: Abstract PcdCpuMaxLogicalProcessorNumber

Additional resources
---------------------
SEV-SNP whitepaper
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2FSEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uUqlVHhWQ6geGDaNHwxGMpoSpIamB%2F1vHH69h%2FEGUro%3D&amp;reserved=0

APM 2: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=R2kU42jvCDZat8kGZ5gDDz2nFXIawHfXdRW1aovhNK8%3D&amp;reserved=0 (section 15.36)

The complete source is available at
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2Fovmf%2Ftree%2Fsev-snp-rfc-1&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yreFg2hr%2F82WYEjxqCmb7pXUdtrRCJRYPrPHgfrWjM8%3D&amp;reserved=0

GHCB spec v2:
The draft specification is posted on AMD-SEV-SNP mailing list:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.suse.com%2Fmailman%2Fprivate%2Famd-sev-snp%2F&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=PFV2mA7T%2Fbl2zP5j52kNdT%2FavDMRLWDEDqz6JGusEFg%3D&amp;reserved=0

Copy of the spec is also available at
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAMDESE%2FAMDSEV%2Fblob%2Fsev-snp-devel%2Fdocs%2F56421-Guest_Hypervisor_Communication_Block_Standardization.pdf&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=U3oKRe5m0NxI0xqv1gBoyh%2BEEX1LVeCWR42rvPh6XZ8%3D&amp;reserved=0

GHCB spec v1:
SEV-SNP firmware specification:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.amd.com%2Fsev%2F&;data=04%7C01%7Cbrijesh.singh%40amd.com%7Cb8ee3abf81aa4e5b1e6008d8fa74dbbc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637534728122143364%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=G%2BttkORsTJchJ1Fy1iNlD%2B%2BqiQZuwI8md5vhJjEb%2Fn4%3D&amp;reserved=0

Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Min Xu <min.m.xu@intel.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Laszlo Ersek <lersek@redhat.com>

Brijesh Singh (19):
OvmfPkg: Reserve the Secrets and Cpuid page for the SEV-SNP guest
OvmfPkg: validate the data pages used in the SEC phase
MdePkg: Expand the SEV MSR to include the SNP definition
OvmfPkg/MemEncryptSevLib: add MemEncryptSevSnpEnabled()
MdePkg: Define the GHCB GPA structure
UefiCpuPkg/MpLib: add support to register GHCB GPA when SEV-SNP is
enabled
OvmfPkg: Add a library to support registering GHCB GPA
OvmfPkg: register GHCB gpa for the SEV-SNP guest
MdePkg: Add AsmPvalidate() support
OvmfPkg: Define the Page State Change VMGEXIT structures
OvmfPkg/ResetVector: Invalidate the GHCB page
OvmfPkg/MemEncryptSevLib: Add support to validate system RAM
OvmfPkg/SecMain: Validate the data/code pages used for the PEI phase
OvmfPkg/MemEncryptSevLib: Add support to validate RAM in PEI phase
OvmfPkg/PlatformPei: Validate the system RAM when SNP is active
OvmfPkg/MemEncryptSevLib: Add support to validate > 4GB memory in PEI
phase
OvmfPkg/VmgExitLib: Allow PMBASE register access in Dxe phase
OvmfPkg/MemEncryptSevLib: Validate the memory during set or clear enc
attribute
OvmfPkg/MemEncryptSevLib: Skip page state change for non RAM region

MdePkg/Include/Library/BaseLib.h | 37 +++
MdePkg/Include/Register/Amd/Fam17Msr.h | 31 ++-
MdePkg/Include/Register/Amd/Ghcb.h | 39 ++-
MdePkg/Library/BaseLib/BaseLib.inf | 1 +
MdePkg/Library/BaseLib/X64/Pvalidate.nasm | 43 +++
OvmfPkg/Include/Library/GhcbRegisterLib.h | 27 ++
OvmfPkg/Include/Library/MemEncryptSevLib.h | 30 +++
.../DxeMemEncryptSevLib.inf | 7 +
.../DxeMemEncryptSevLibInternal.c | 27 ++
.../Ia32/SnpPageStateChange.c | 17 ++
.../PeiMemEncryptSevLib.inf | 9 +
.../PeiMemEncryptSevLibInternal.c | 47 ++++
.../SecMemEncryptSevLib.inf | 4 +
.../SecMemEncryptSevLibInternal.c | 39 +++
.../BaseMemEncryptSevLib/SnpPageStateChange.h | 37 +++
.../X64/PeiDxeSnpSetPageState.c | 63 +++++
.../X64/PeiDxeVirtualMemory.c | 151 ++++++++++-
.../X64/PeiSnpSystemRamValidate.c | 129 +++++++++
.../X64/SecSnpSystemRamValidate.c | 23 ++
.../X64/SnpPageStateChangeInternal.c | 254 ++++++++++++++++++
.../X64/SnpPageStateTrack.c | 119 ++++++++
.../X64/SnpPageStateTrack.h | 36 +++
.../X64/SnpSetPageState.h | 27 ++
.../BaseMemEncryptSevLib/X64/VirtualMemory.h | 19 ++
.../Library/GhcbRegisterLib/GhcbRegisterLib.c | 97 +++++++
.../GhcbRegisterLib/GhcbRegisterLib.inf | 33 +++
OvmfPkg/Library/VmgExitLib/SecVmgExitLib.inf | 4 +
OvmfPkg/Library/VmgExitLib/VmgExitLib.inf | 7 +
OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c | 45 ++++
OvmfPkg/OvmfPkg.dec | 12 +
OvmfPkg/OvmfPkgX64.dsc | 1 +
OvmfPkg/OvmfPkgX64.fdf | 33 ++-
OvmfPkg/PlatformPei/AmdSev.c | 52 ++++
OvmfPkg/PlatformPei/PlatformPei.inf | 2 +
OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 24 ++
OvmfPkg/ResetVector/Ia32/PageTables64.asm | 106 ++++++++
OvmfPkg/ResetVector/ResetVector.inf | 5 +
OvmfPkg/ResetVector/ResetVector.nasmb | 4 +
OvmfPkg/Sec/SecMain.c | 102 +++++++
OvmfPkg/Sec/SecMain.inf | 2 +
UefiCpuPkg/Library/MpInitLib/DxeMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/MpEqu.inc | 1 +
UefiCpuPkg/Library/MpInitLib/MpLib.c | 2 +
UefiCpuPkg/Library/MpInitLib/MpLib.h | 2 +
UefiCpuPkg/Library/MpInitLib/PeiMpInitLib.inf | 1 +
UefiCpuPkg/Library/MpInitLib/X64/MpFuncs.nasm | 51 ++++
UefiCpuPkg/UefiCpuPkg.dec | 6 +
47 files changed, 1790 insertions(+), 19 deletions(-)
create mode 100644 MdePkg/Library/BaseLib/X64/Pvalidate.nasm
create mode 100644 OvmfPkg/Include/Library/GhcbRegisterLib.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/Ia32/SnpPageStateChange.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/SnpPageStateChange.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiDxeSnpSetPageState.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SecSnpSystemRamValidate.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateChangeInternal.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.c
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpPageStateTrack.h
create mode 100644 OvmfPkg/Library/BaseMemEncryptSevLib/X64/SnpSetPageState.h
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.c
create mode 100644 OvmfPkg/Library/GhcbRegisterLib/GhcbRegisterLib.inf


Laszlo Ersek
 

On 04/08/21 13:59, Brijesh Singh wrote:
On 4/8/21 4:58 AM, Laszlo Ersek wrote:
On 03/24/21 16:31, Brijesh Singh wrote:
At this time we only support the pre-validation. OVMF detects all the available
system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
before it is made available to the EDK2 core.
Can you describe this in a bit more detail, before I look at the
individual patches? Specifically, what existing logic in the PEI phase
was taken, and extended, and how?
One of the key requirement is that the guest private pages much be
validated before the access. If guest tries to access the pages before
the validation then it will result in #VC (page-not-validated)
exception. To avoid the #VC, we propose the validating the memory before
the access. We will incrementally add the support to lazy validate (i.e
validate on access).
What is the primary threat (in simple terms please) that validation is
supposed to prevent?

And, against that particular threat, does pre-validation offer some
protection, or will that only come with lazy validation?


Let me try explaining a bit, the page validation process consist of two
steps:

1. Add the pages in the RMP table -- must be done by the hypervisor
using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask
hypervisor to add or remove pages from the RMP table.

2. Guest issue the PVALIDATE instruction -- this sets the validate bit
in the RMP table.

Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware
before the launch. The SNP firmware also validates the memory page after
encrypting. This allows us to boot the initial entry code without guest
going through the validation process.

The OVMF reset vector uses few data pages (e.g page table, early Sec
stack). Access to these data pages will result in #VC. There are two
approaches we can take to validate these data pages:

1. Ask SNP firmware to pre-validate it -- SNP firmware provides an
special command that can be used to pre-validate the pages without
affecting the measurement.
This means the two pflash chips, right?


2. Enhance the reset vector code to validate the pages.

For now I choose #1.

The pre-validation performed by the SNP firmware is sufficient to boot
through the SEC phase. The SEC phase later decompress the Fv to a new
memory location. Now we need the OVMF to take over the validation
procedure.  The series extends the MemEncryptSevLib to add a new helper
MemEncryptSevSnpValidateRam(). The helper is used to validate the system
RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to
validate the output buffer used for the decompression. This was
sufficient to boot into the PEI phase, see patch #13.
Two questions here:

- Is ACPI S3 in scope for now?

- We need more areas made accessible in SEC than just the decompression
buffer; for example the temporary SEC/PEI heap and stack, and (IIRC)
various special SEV-ES areas laid out via MEMFD. Where are all of those
handled / enumerated?

The PEI detects
all the available system RAM. After the memory detection is completed
the PlatformPei calls the AmdSevSnpInitialize(). The initialization
routine iterate through the HOB and calls the
MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it
possible the more system ram can be detected after the PlatformPei is
completed ?
That would cause problems even without SEV-SNP (i.e., with plain SEV),
so I'm not worried about it.


One of the important thing is we should *never* validate the pages
twice.
What are the symptoms / consequences of:

- the guest accessing an unvalidated page (I understand it causes a #VC,
but what is the direct result of that, when this series is applied?),

- doubly-validating a page?

The first question is relevant because we should crash as soon as we
touch a page we forgot to validate (we shouldn't let any corruption or
similar spread out until we finally realize there's a problem).

The second question is relevant for security I guess. What attacks
become possible, and/or what symptoms are produced, if we
doubly-validate a page?

Furthermore, IIRC, we have separate #VC handlers for the different
firmware phases; do they behave consistently / identicall when a
#VC(page-not-validated) occurs, when this patch set is applied?

My first question is basically asking whether we can *exclusively* rely
on #VC(page-not-validated) faults to tell us that we missed validating a
particular page. If we can do that, then the job is a bit easier,
because from the GPA, we can more or less also derive *when and where*
we should pre-validate the page (at least until validation is done
completely lazily).

The MemEncryptSevSnpValidateRam() uses a interval search tree to
keep the record of what has been validated. Before validating the range,
it lookup in its tree and if it finds that range is already validated
then do nothing. If it detects an overlap then it will validate only non
overlapping regions -- see patch #14.
What data structure is used for implementing the interval tree?

I'm not necessarily looking for a data structure with "nice" asymptotic
time complexity. With pre-validation especially, I think simplicity
(ease of review) is more important for the data structure than
performance. If it's not an actual "tree", we shouldn't call it a
"tree". (An "interval tree" is usually an extension of a Red-Black Tree,
and that's not the simplest data structure in existence; although edk2
does offer an rbtree library.)

Furthermore, what you describe above is called idempotency. No matter
how many times we attempt to validate a range, it may (or may not even)
cause an actual change in the first action only. Is this property
(=idempotency) an inherent requirement of the technology, or is it a
simplification of the implementation? Put differently: if you called
CpuDeadLoop() in the validation function any time an overlapping
validation request were received, would that hugely complicate the call
sites?

I'm kind of "obsessing" about idempotency because you say we must
*never* doubly-validate a page, so the *difference* between:
- explicitly crashing on such an attempt,
- and silently ignoring such an attempt,
may be meaningful.

It's kind of the difference between "oops this is a call site *bug*, but
we patched it up", vs. "this is expected of call sites, we should just
handle it internally".

The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the
SNP page state change during the C-bit toggle.
- When exactly do we invalidate?

- Does the time and place of invalidation depend on whether we perform
pre-validation, or lazy validation?

- Is page invalidation idempotent too?

- What is the consequence (security impact) if we forget invalidation?

- There are four page states I can imagine:
- encrypted for host access, valid for guest access
- encrypted for host access, invalid for guest access
- decrypted for host access, valid for guest access
- decrypted for host access, invalid for guest access

Does a state exist, from these four, that's never used? (Potentially
caught by the hardware?)

Do the patches highlight / explain the validity transitions, in
comments? My understanding is that the C-bit toggle and the guest access
valid/invalid toggle are separate actions, so "middle" states do exist,
but perhaps only temporarily.

I'm curious how it works, for example, with variousvirtio transfers (bus
master read, bus master write, bus master common buffer). In the
IoMmuDxe driver minimally, we access memory both through PTEs with the
C-bit set and through PTEs with the C-bit clear, meaning that
"encrypted, valid", and "decrypted, valid" are both required states. But
that seems to conflict with the notion that "C-bit toggle" be directly
coupled with a "validity toggle".

Put differently, I'm happy that modifying IoMmuDxe appears unnecessary,
but then that tells me I'm missing something about the state transitions
between the above *four* states.


Please let me know if you have any questions. We can hash out the design
here before you taking a closure look at the code.
Sorry that I've been (and am being) slow to start reviewing this series.

Thanks,
Laszlo


Brijesh Singh
 

On 4/9/21 7:24 AM, Laszlo Ersek wrote:
On 04/08/21 13:59, Brijesh Singh wrote:
On 4/8/21 4:58 AM, Laszlo Ersek wrote:
On 03/24/21 16:31, Brijesh Singh wrote:
At this time we only support the pre-validation. OVMF detects all the available
system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
before it is made available to the EDK2 core.
Can you describe this in a bit more detail, before I look at the
individual patches? Specifically, what existing logic in the PEI phase
was taken, and extended, and how?
One of the key requirement is that the guest private pages much be
validated before the access. If guest tries to access the pages before
the validation then it will result in #VC (page-not-validated)
exception. To avoid the #VC, we propose the validating the memory before
the access. We will incrementally add the support to lazy validate (i.e
validate on access).
What is the primary threat (in simple terms please) that validation is
supposed to prevent?

To protect against the memory re-mapping attack the guest pages must be
validated. The idea is that every guest page can map only to a single
physical memory page at one time.


And, against that particular threat, does pre-validation offer some
protection, or will that only come with lazy validation?
For the hardware it does not matter how the memory was validated -- lazy
vs prevalidate. Both approaches use the PVALIDATE instruction to
validate the page.

In the case of pre-validation, the memory is validated before the
access. Whereas in the lazy validation, the access will cause a fault
and fault handler should validate the page and retry the access. Its
similar to a page fault handler, OS can populate the page table or back
the pages on demand.

The only downside of pre-validation is, we will take a hit on the boot
time. The GHCB spec provides method by which we can batch multiple
requests at once to minimize the context switches.



Let me try explaining a bit, the page validation process consist of two
steps:

1. Add the pages in the RMP table -- must be done by the hypervisor
using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask
hypervisor to add or remove pages from the RMP table.

2. Guest issue the PVALIDATE instruction -- this sets the validate bit
in the RMP table.

Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware
before the launch. The SNP firmware also validates the memory page after
encrypting. This allows us to boot the initial entry code without guest
going through the validation process.

The OVMF reset vector uses few data pages (e.g page table, early Sec
stack). Access to these data pages will result in #VC. There are two
approaches we can take to validate these data pages:

1. Ask SNP firmware to pre-validate it -- SNP firmware provides an
special command that can be used to pre-validate the pages without
affecting the measurement.
This means the two pflash chips, right?

This does not need to be two pflash chips. The SNP firmware command does
not know anything about the ROM or pflash chip. The command accepts the
system physical address that need to be validated by the firmware. In
patch #2, OVMF provides a range of data pages that need to be validated
by the SNP firmware before booting the guest.


2. Enhance the reset vector code to validate the pages.

For now I choose #1.

The pre-validation performed by the SNP firmware is sufficient to boot
through the SEC phase. The SEC phase later decompress the Fv to a new
memory location. Now we need the OVMF to take over the validation
procedure.  The series extends the MemEncryptSevLib to add a new helper
MemEncryptSevSnpValidateRam(). The helper is used to validate the system
RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to
validate the output buffer used for the decompression. This was
sufficient to boot into the PEI phase, see patch #13.
Two questions here:

- Is ACPI S3 in scope for now?

Its not in the scope yet. I have not looked at it.



- We need more areas made accessible in SEC than just the decompression
buffer; for example the temporary SEC/PEI heap and stack, and (IIRC)
various special SEV-ES areas laid out via MEMFD. Where are all of those
handled / enumerated?

Sorry, I simplified my response by saying just decompression. You are
right that its more than the decompression buffer. In my current patch,
the SEC phase validates all the memory up to
PcdOvmfDecompressionScratchEnd (which includes more than just output
buffer). See patch #13.


The PEI detects
all the available system RAM. After the memory detection is completed
the PlatformPei calls the AmdSevSnpInitialize(). The initialization
routine iterate through the HOB and calls the
MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it
possible the more system ram can be detected after the PlatformPei is
completed ?
That would cause problems even without SEV-SNP (i.e., with plain SEV),
so I'm not worried about it.

One of the important thing is we should *never* validate the pages
twice.
What are the symptoms / consequences of:

- the guest accessing an unvalidated page (I understand it causes a #VC,
but what is the direct result of that, when this series is applied?),

After the series it applied, an access to an unvalidated page will
result in #VC. The series does not extend the VC handler to handle the
page-not-validated error code. The current VC handler
[InternalVmgExitHandleVc()] will inject a #GP to terminate the guest if
#VC is raised for unvalidated page.



- doubly-validating a page?

The first question is relevant because we should crash as soon as we
touch a page we forgot to validate (we shouldn't let any corruption or
similar spread out until we finally realize there's a problem).

The second question is relevant for security I guess. What attacks
become possible, and/or what symptoms are produced, if we
doubly-validate a page?
Assuming that guest validates its memory, this guarantees the injective
mapping between GPAs and SPAs.

As I noted that the page validation process consist of two steps #1
RMPUPDATE and 2#PVALIDATE. The RMPUPDATE is a hypervisor instruction and
PVALIDATE is a guest instruction. A malicious hypervisor can remap gpa
to different physical address in RMP table using the RMPUPDATE just
before the second PVALIDATE instruction is executed. The guest need to
ensure that it does not leave the security vulnerability that can be
exploited by a malicious hypervisor. The SNP white paper recommends that
a guest OS should keep track of what is has validate so that it can
avoid getting into these situation.



Furthermore, IIRC, we have separate #VC handlers for the different
firmware phases; do they behave consistently / identicall when a
#VC(page-not-validated) occurs, when this patch set is applied?

My first question is basically asking whether we can *exclusively* rely
on #VC(page-not-validated) faults to tell us that we missed validating a
particular page. If we can do that, then the job is a bit easier,
because from the GPA, we can more or less also derive *when and where*
we should pre-validate the page (at least until validation is done
completely lazily).
Yes, we should be able to rely #VC to tell us that we missed validating
the page. In my this series so far I have not seen a #VC due to
page-not-validated because we have carefully validated the pages before
they are accessed. I am thinking that #VC(page-not-validated) should be
treated as an error in the first phase. Incrementally we will add the
lazy validation, in which the #VC(page-not-validated) will provide a
hint as to what has not been validated.



The MemEncryptSevSnpValidateRam() uses a interval search tree to
keep the record of what has been validated. Before validating the range,
it lookup in its tree and if it finds that range is already validated
then do nothing. If it detects an overlap then it will validate only non
overlapping regions -- see patch #14.
What data structure is used for implementing the interval tree?

I'm not necessarily looking for a data structure with "nice" asymptotic
time complexity. With pre-validation especially, I think simplicity
(ease of review) is more important for the data structure than
performance. If it's not an actual "tree", we shouldn't call it a
"tree". (An "interval tree" is usually an extension of a Red-Black Tree,
and that's not the simplest data structure in existence; although edk2
does offer an rbtree library.)

I am using binary search tree. Currently, we don't have many nodes to
track. Most of the guest memory is private and are validated before we
exit from the PEI phase.

Furthermore, what you describe above is called idempotency. No matter
how many times we attempt to validate a range, it may (or may not even)
cause an actual change in the first action only. Is this property
(=idempotency) an inherent requirement of the technology, or is it a
simplification of the implementation? Put differently: if you called
CpuDeadLoop() in the validation function any time an overlapping
validation request were received, would that hugely complicate the call
sites?
From the technology point of view you can validate the same page again
and again. Hardware does not force any restriction. To protect against
time-based remap attack a guest also need to ensure that it does not
blindly validates the pages otherwise guest is taking risk. Avoiding a
double validation is strong recommendation. In current implementation,
the validation burden is not put on the caller. Caller calls standard
APIs to toggle the encryption attribute. Under the hood, if we see that
page is already validated then no need to issue the PVALIDATE. If the
page is getting changed from C=1 -> 0 then unvalidate it.


I'm kind of "obsessing" about idempotency because you say we must
*never* doubly-validate a page, so the *difference* between:
- explicitly crashing on such an attempt,
- and silently ignoring such an attempt,
may be meaningful.

I said we should never doubly validate to avoid creating a vulnerability
that can later be exploited by the malicious hypervisor. Currently we
lookup address in our tree to ensure that we don't do a double
validation. The PVALIDATE instruction also can be used to determine if
we are attempting to validate an already validated page. In current
implementation after the PVALIDATE completion, I check the rflags.CF to
ensure that its not a double validation condition. If its double
validation then abort the guest. Its bug in the guest.

It's kind of the difference between "oops this is a call site *bug*, but
we patched it up", vs. "this is expected of call sites, we should just
handle it internally".
Our attempt should be to patch to avoid the double validation. I would
still check the status of PVALIDATE, if instruction detects it as a
double validation then we have a bug in our tracking and should abort
the guest.


The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the
SNP page state change during the C-bit toggle.
- When exactly do we invalidate?
Basically before the page is transition from C=1 -> 0 in the page table.


- Does the time and place of invalidation depend on whether we perform
pre-validation, or lazy validation?
It does not matter. Both type of validation update the Validated bit in
the RMP table. The important thing is that invalidation can happen only
for a already validated private page. If the page is shared then
invalidation will result in #GP.

- Is page invalidation idempotent too?
We should treat this as idempotent too. As I said, tacking the page
state is strongly recommended for the security reason. The SNP white
paper has a section dedicated for the validation and may able to clarify
things which is not covered in my response.

- What is the consequence (security impact) if we forget invalidation?
I can't think of a security implication of it but if we forget to
invalidate then it can lead issues in the tracking data structure when
it comes to validation next time. e.g a page was mapped as C=1 and we
changed it to C=0 without invalidating it. Now if the same page is being
changed from C=0 -> 1 then tracking data structure may think that page
is already validated and will skip the validation process and thus
access will result in #VC (page-not-validated).

- There are four page states I can imagine:
- encrypted for host access, valid for guest access
- encrypted for host access, invalid for guest access
- decrypted for host access, valid for guest access
- decrypted for host access, invalid for guest access

Does a state exist, from these four, that's never used? (Potentially
caught by the hardware?)
I would not mix the page state with the host access. Basically there is
two page state exist

- Guest-Valid

- Guest-Invalid

By default all the guest memory is in Guest-Invalid state on boot. The
PVALIDATE instruction is used to transition from Valid->Invalid. Guest
can use the PVALIDATE to change the state from Valid to Invalid.

A guest private page (i.e page mapped C=1) must be in Guest Valid state.
If hardware see that a private access page is not in the Guest-Valid
state then it will trigger #VC.



Do the patches highlight / explain the validity transitions, in
comments? My understanding is that the C-bit toggle and the guest access
valid/invalid toggle are separate actions, so "middle" states do exist,
but perhaps only temporarily.
I have tried to document the steps in the patch description and I can
add more comments as required. When SEV-SNP is active then we need to
invalidate the page before toggling the C-bit from C=1 -> 0 and
similarly after the page is toggled from C=0 -> 1 we must validate it.


I'm curious how it works, for example, with variousvirtio transfers (bus
master read, bus master write, bus master common buffer). In the
IoMmuDxe driver minimally, we access memory both through PTEs with the
C-bit set and through PTEs with the C-bit clear, meaning that
"encrypted, valid", and "decrypted, valid" are both required states. But
that seems to conflict with the notion that "C-bit toggle" be directly
coupled with a "validity toggle".
That should not be an issue. Each page have entry in the RMP table. So,
you can have one page as guest invalid and another page as a guest valid.

Put differently, I'm happy that modifying IoMmuDxe appears unnecessary,
but then that tells me I'm missing something about the state transitions
between the above *four* states.

Only thing which I would propose changing in IoMmuDxe driver is to use a
pre-allocated buffer and avoid toggling the C-bit on every read/write.
In past the C-bit toggle was all contained inside the guest. But when
SNP is enabled, each C-bit toggle causes two additional steps #1
RMPUPDATE and #2 PVALIDATE. The RMPUPDATE is requested using the VMEXIT
so it will cause a VM context switch. I would prefer to avoid it.

My current patch set does not have this optimization yet.



Please let me know if you have any questions. We can hash out the design
here before you taking a closure look at the code.
Sorry that I've been (and am being) slow to start reviewing this series.

No worries, take your time. Thank you for all your feedback.

-Brijesh


Thanks,
Laszlo


Laszlo Ersek
 

On 04/10/21 00:43, Brijesh Singh wrote:

On 4/9/21 7:24 AM, Laszlo Ersek wrote:
On 04/08/21 13:59, Brijesh Singh wrote:
On 4/8/21 4:58 AM, Laszlo Ersek wrote:
On 03/24/21 16:31, Brijesh Singh wrote:
At this time we only support the pre-validation. OVMF detects all
the available system RAM in the PEI phase. When SEV-SNP is
enabled, the memory is validated before it is made available to
the EDK2 core.
Can you describe this in a bit more detail, before I look at the
individual patches? Specifically, what existing logic in the PEI
phase was taken, and extended, and how?
One of the key requirement is that the guest private pages much be
validated before the access. If guest tries to access the pages
before the validation then it will result in #VC
(page-not-validated) exception. To avoid the #VC, we propose the
validating the memory before the access. We will incrementally add
the support to lazy validate (i.e validate on access).
What is the primary threat (in simple terms please) that validation
is supposed to prevent?

To protect against the memory re-mapping attack the guest pages must
be validated. The idea is that every guest page can map only to a
single physical memory page at one time.
I don't understand. How can a given guest page frame map to multiple
host page frames?

Do you mean that the situation to prevent is when multiple guest page
frames (GPAs) map to the same host page frame, as set up by the
hypervisor?


And, against that particular threat, does pre-validation offer some
protection, or will that only come with lazy validation?
For the hardware it does not matter how the memory was validated --
lazy vs prevalidate. Both approaches use the PVALIDATE instruction to
validate the page.

In the case of pre-validation, the memory is validated before the
access. Whereas in the lazy validation, the access will cause a fault
and fault handler should validate the page and retry the access. Its
similar to a page fault handler, OS can populate the page table or
back the pages on demand.

The only downside of pre-validation is, we will take a hit on the boot
time. The GHCB spec provides method by which we can batch multiple
requests at once to minimize the context switches.
If pre-validation is simpler, and the only drawback is a one-time hit
during boot, then wouldn't it be better to stick with pre-validation
forever? I expect that would be a lot simpler for (a) the #VC handlers,
(b) the tracking of the "already validated" information.


Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP
firmware before the launch. The SNP firmware also validates the
memory page after encrypting. This allows us to boot the initial
entry code without guest going through the validation process.

The OVMF reset vector uses few data pages (e.g page table, early Sec
stack). Access to these data pages will result in #VC. There are two
approaches we can take to validate these data pages:

1. Ask SNP firmware to pre-validate it -- SNP firmware provides an
special command that can be used to pre-validate the pages without
affecting the measurement.
This means the two pflash chips, right?

This does not need to be two pflash chips. The SNP firmware command
does not know anything about the ROM or pflash chip. The command
accepts the system physical address that need to be validated by the
firmware. In patch #2, OVMF provides a range of data pages that need
to be validated by the SNP firmware before booting the guest.
I guess my question related to the guest code that executes from pflash,
and/or accesses data from pflash, before the guest itself could ask for
any validation. Such as, the reset vector (which runs from pflash).
Then, some non-volatile UEFI variable data that resides in the other
pflash chip, and is accessed (read-only) during the PEI phase (the
memory type information variable namely). I understood your comments as
QEMU pre-validating those areas before guest launch. Is that correct?

Anyway, I guess at this point I should just start reviewing the series.

Some more comments below.


- We need more areas made accessible in SEC than just the
decompression buffer; for example the temporary SEC/PEI heap and
stack, and (IIRC) various special SEV-ES areas laid out via MEMFD.
Where are all of those handled / enumerated?

Sorry, I simplified my response by saying just decompression. You are
right that its more than the decompression buffer. In my current
patch, the SEC phase validates all the memory up to
PcdOvmfDecompressionScratchEnd (which includes more than just output
buffer). See patch #13.
That sounds good (will check the patch out of course).


One of the important thing is we should *never* validate the pages
twice.
What are the symptoms / consequences of:

- the guest accessing an unvalidated page (I understand it causes a
#VC, but what is the direct result of that, when this series is
applied?),

After the series it applied, an access to an unvalidated page will
result in #VC. The series does not extend the VC handler to handle the
page-not-validated error code. The current VC handler
[InternalVmgExitHandleVc()] will inject a #GP to terminate the guest
if #VC is raised for unvalidated page.
OK.


- doubly-validating a page?

The first question is relevant because we should crash as soon as we
touch a page we forgot to validate (we shouldn't let any corruption
or similar spread out until we finally realize there's a problem).

The second question is relevant for security I guess. What attacks
become possible, and/or what symptoms are produced, if we
doubly-validate a page?
Assuming that guest validates its memory, this guarantees the
injective mapping between GPAs and SPAs.

As I noted that the page validation process consist of two steps #1
RMPUPDATE and 2#PVALIDATE. The RMPUPDATE is a hypervisor instruction
and PVALIDATE is a guest instruction. A malicious hypervisor can remap
gpa to different physical address in RMP table using the RMPUPDATE
just before the second PVALIDATE instruction is executed. The guest
need to ensure that it does not leave the security vulnerability that
can be exploited by a malicious hypervisor. The SNP white paper
recommends that a guest OS should keep track of what is has validate
so that it can avoid getting into these situation.
Hm, OK. I think this answers my top-most question.


Furthermore, IIRC, we have separate #VC handlers for the different
firmware phases; do they behave consistently / identicall when a
#VC(page-not-validated) occurs, when this patch set is applied?

My first question is basically asking whether we can *exclusively*
rely on #VC(page-not-validated) faults to tell us that we missed
validating a particular page. If we can do that, then the job is a
bit easier, because from the GPA, we can more or less also derive
*when and where* we should pre-validate the page (at least until
validation is done completely lazily).
Yes, we should be able to rely #VC to tell us that we missed
validating the page. In my this series so far I have not seen a #VC
due to page-not-validated because we have carefully validated the
pages before they are accessed. I am thinking that
#VC(page-not-validated) should be treated as an error in the first
phase. Incrementally we will add the lazy validation, in which the
#VC(page-not-validated) will provide a hint as to what has not been
validated.
I'm OK with the current handling of #VC(page-not-validated) -- i.e.,
fatal error.

I'm not yet convinced that lazy validation will be useful at all, but
that could be just my lack of understanding.


The MemEncryptSevSnpValidateRam() uses a interval search tree to
keep the record of what has been validated. Before validating the
range, it lookup in its tree and if it finds that range is already
validated then do nothing. If it detects an overlap then it will
validate only non overlapping regions -- see patch #14. hat data
structure is used for implementing the interval tree?
I'm not necessarily looking for a data structure with "nice"
asymptotic time complexity. With pre-validation especially, I think
simplicity (ease of review) is more important for the data structure
than performance. If it's not an actual "tree", we shouldn't call it
a "tree". (An "interval tree" is usually an extension of a Red-Black
Tree, and that's not the simplest data structure in existence;
although edk2 does offer an rbtree library.)

I am using binary search tree. Currently, we don't have many nodes to
track. Most of the guest memory is private and are validated before we
exit from the PEI phase.
My preliminary comments about MemEncryptSevSnpValidateSystemRam():

(1) The function prototype (patch#12) promises that the function can
never fail. In patch#14 however, I see some AddRangeToIntervalTree()
calls whose return values are not checked. I think error checking should
be strict, and if an error is detected (such as memory allocation
failure), a plain ASSERT (FALSE) is not sufficient. Both ASSERT() and
CpuDeadLoop() should be used explicitly, for the sake of RELEASE builds.
And the function prototype (the leading comment) should highlight that
the function either succeeds, or doesn't return.


(2) I don't like the recursive nature of the functions. That's always
convenient first, and almost always a mess to get out of later. However,
this is not a request to complicate the tree's implementation,
because...


(3) ... you mention we don't have many nodes to track. Is a tree
structure, with dynamically allocated nodes, justified at all?


(4) Regarding repeating dynamic allocation -- that's never a great thing
to do in the SEC / PEI phases. Can we use a fixed size, once-allocated,
or even static, array somewhere?

In particular, patch#14 uses AllocatePool() for SNP_VALIDATED_RANGE, in
the PEI phase, that results in EFI_HOB_TYPE_MEMORY_POOL HOBs; see:

AllocatePool() [MdePkg/Library/PeiMemoryAllocationLib/MemoryAllocationLib.c]
PeiServicesAllocatePool() [MdePkg/Library/PeiServicesLib/PeiServicesLib.c]
PeiAllocatePool() [MdeModulePkg/Core/Pei/Memory/MemoryServices.c]

Now, dependent on where you call this AllocatePool() from, it could
occur still on the temporary SEC/PEI RAM -- i.e., before permanent PEI
RAM is installed. That means that such HOBs would be subject to the HOB
list migration, after permanent PEI RAM is installed. The PI spec writes
about EFI_HOB_MEMORY_POOL: "The HOB consumer phase should be able to
ignore these HOBs". The PI spec writes about AllocatePool(): "The early
allocations from temporary memory will be migrated to permanent memory
when permanent main memory is installed; this migration shall occur when
the HOB list is migrated to permanent memory".

So, whenever we use AllocatePool() in the PEI phase, we need an "address
stability proof", namely that the AllocatePool() call, and any later
access to the allocated area (the HOB), are never separated by the
installation of "gEfiPeiMemoryDiscoveredPpiGuid" in the PPI database, by
the PEI Core. If we can't guarantee that, then an access later on could
occur to the original allocation address, which has by then been
invalidated by the temporary RAM migration (= the allocation HOB has
been moved).

AllocatePages() is "stable" however, in this sense; no proof like the
one described above is necessary.


(5) The "mRootNode" variable is defined in the file
"OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c"
(patch #14). That means that every single module (i.e., PEIM) which
consumes this library instance will own a separate "mRootNode" variable,
with a validation-tracking tree that's correspondingly private to that
PEIM.

But the validation state is global to the entire firmware. I think we
should somehow clarify, or even enforce, that the tracking data
structure is firmware-global.

Perhaps this can be solved by (a) moving the ownership of the data
structure to a particular firmware module, with sole responsibility for
pre-validation, and (b) updating the validation APIs to take this data
structure explicitly.

Alternatively, if multiple modules are expected to inter-operate on the
tracker data structure (by which I mean inter-operation *in sequence*,
not concurrently in a multiprocessing sense), then we might need a
dynamic PCD for inter-driver understanding.


(NB: both points (4) and (5) seem to conflict with *existent* code in
BaseMemEncryptSevLib. However:

- concerning (4), the current allocations are all performed with
AllocateAlignedPages(), which are unaffected by temporary-to-permanent
RAM migration,

- and concerning (5), namely the existent "mPageTablePool" global
variable, is a *known bug* -- see commit b721aa749b86a and
TianoCore#847. This issue (namely, multiple modules messing with the
page tables independently) remains unsolved, and I'd like to avoid
introducing *more* paging-related allocations that are module-owned,
and not firmware-level.)


All in all, here's what I'd prefer (and please correct me if it's
unrealistic):

- make the SNP validation functions take a simple, *unresizeable*
tracker structure for input/output,

- allocate that data structure with a single AllocatePages() call in the
proper PEIM,

- if the SNP validation functions run out of space in the structure,
hang hard,

- avoid recursion, and use a simple linear search through the array,

- evaluate whether graceful handling of overlaps (= range splitting,
merging) is actually a requirement. See more on this below.


(6) Can you please confirm that the order in which
MemEncryptSevSnpValidateSystemRam()'s declaration, call sites, and
implementations are introduced, is sensible? I can't tell immediately,
but I'd like to be sure that the tree at least builds at every stage (no
linkage errors at any stage).


Furthermore, what you describe above is called idempotency. No matter
how many times we attempt to validate a range, it may (or may not
even) cause an actual change in the first action only. Is this
property (=idempotency) an inherent requirement of the technology, or
is it a simplification of the implementation? Put differently: if you
called CpuDeadLoop() in the validation function any time an
overlapping validation request were received, would that hugely
complicate the call sites?
From the technology point of view you can validate the same page again
and again. Hardware does not force any restriction. To protect against
time-based remap attack a guest also need to ensure that it does not
blindly validates the pages otherwise guest is taking risk. Avoiding a
double validation is strong recommendation. In current implementation,
the validation burden is not put on the caller. Caller calls standard
APIs to toggle the encryption attribute. Under the hood, if we see
that page is already validated then no need to issue the PVALIDATE. If
the page is getting changed from C=1 -> 0 then unvalidate it.
There remains a huge gap in my understanding here. (The rest of my
earlier questions did concern the relationship between the C bit and
validation, but I might as well comment right at this spot.)

"Invalidation in connection to C-bit toggling" means that there's going
to be significant "validation churn" in the DXE phase. For example when
fw_cfg is used (which relies on the edk2 IOMMU protocol), or when virtio
transfers are performed (which also relies on the edk2 IOMMU protocol,
although less directly -- through PciIo, namely).

That, however, brings up several problems for me:

- Is the tracker data structure scalable? I don't see any balancing in
it. (And I wouldn't want a new, complex data structure for it anyway.)

- In fact, I don't see any code related to invalidation -- more
precisely, I don't see where and how the C-bit changes consult the
tracker structure *at all*.

- Given issue (5), I don't see how the tracker structure would be
inherited from the PEI phase to the DXE phase.

In particular, in patch#18, the flipping of the C-bit results in calls
to SetPageStateInternal(). But SetPageStateInternal() does not go
*through* the tracker structure. Even in patch#14,
SevSnpValidateSystemRam() calls SetPageStateInternal() *in addition to*
AddRangeToIntervalTree().

In other words, I don't see where validity is tracked in the DXE phase,
in response to the C-bit toggling.


I'm kind of "obsessing" about idempotency because you say we must
*never* doubly-validate a page, so the *difference* between:
- explicitly crashing on such an attempt,
- and silently ignoring such an attempt,
may be meaningful.

I said we should never doubly validate to avoid creating a
vulnerability that can later be exploited by the malicious hypervisor.
Currently we lookup address in our tree to ensure that we don't do a
double validation. The PVALIDATE instruction also can be used to
determine if we are attempting to validate an already validated page.
In current implementation after the PVALIDATE completion, I check the
rflags.CF to ensure that its not a double validation condition. If its
double validation then abort the guest. Its bug in the guest.

It's kind of the difference between "oops this is a call site *bug*,
but we patched it up", vs. "this is expected of call sites, we should
just handle it internally".
Our attempt should be to patch to avoid the double validation. I would
still check the status of PVALIDATE, if instruction detects it as a
double validation then we have a bug in our tracking and should abort
the guest.
Let me rephrase. My question concerns the *abstraction level* at which
we should catch and handle double-validation attempts.

If we silently split and merge ranges in the tracker structure, then the
guest (as a whole) will not double-validate, which is good. However,
double-validation attempts will potentially still *exist*, in
higher-level modules in the firmware. My question is whether it is
*right* to paper over such overlaps, coming from the high-level code, in
the tracker structure. I suspect that it's not right.

In other words, the spot where we should abort, due to
"double-validation detected", is not when the PVALIDATE instruction
reports a problem in the tracker structure. Instead, the spot where we
should abort is when we detect an overlapping request *before* issuing
PVALIDATE. I'd like the tracker to be as simple and dumb as possible,
and I'd like the actual sites that request validation to blow up
immediately, if they issue an overlapping request. Is my expectation
wrong?


The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call
the SNP page state change during the C-bit toggle.
- When exactly do we invalidate?
Basically before the page is transition from C=1 -> 0 in the page
table.


- Does the time and place of invalidation depend on whether we
perform pre-validation, or lazy validation?
It does not matter. Both type of validation update the Validated bit
in the RMP table. The important thing is that invalidation can happen
only for a already validated private page. If the page is shared then
invalidation will result in #GP.

- Is page invalidation idempotent too?
We should treat this as idempotent too. As I said, tacking the page
state is strongly recommended for the security reason. The SNP white
paper has a section dedicated for the validation and may able to
clarify things which is not covered in my response.

- What is the consequence (security impact) if we forget
invalidation?
I can't think of a security implication of it but if we forget to
invalidate then it can lead issues in the tracking data structure when
it comes to validation next time. e.g a page was mapped as C=1 and we
changed it to C=0 without invalidating it. Now if the same page is
being changed from C=0 -> 1 then tracking data structure may think
that page is already validated and will skip the validation process
and thus access will result in #VC (page-not-validated).
As I said above, I don't see where the validation tracking and the C-bit
flipping "meet". Especially across the PEI and DXE phases.

Basically I'm missing the core concept for maintaining a firmware-global
structure, for tracking the validation status of GPA intervals.

Up to and *excluding* the DXE phase, the validation tracking should be
trivial, naive, and unforgiving. This is because, in the PEI phase,
validation is one-shot. I would expect the tracker structure to live in
the PlatformPei module.

Starting with the DXE phase, where we need (in)validation to closely
accompany C-bit toggling, we need a different, performant data structure
for tracking validation status. This data structure should be primed
from the status last set up in the PEI phase. The tracking should still
be unforgiving. I believe *this* tracker structure may have to live in
the IOMMU driver.

BaseMemEncryptSevLib may offer helper functions, but even if it does, it
should never attempt to *own* tracking information, via an internal
global variable. Furthermore, those helper functions that only make
sense for a particular firmware phase, should have such implementations
that hang hard, in library instances that match the *other* firmware
phases. For example, SevSnpValidateSystemRam() should have a
DXE-specific implementation that hangs hard.

It's entirely possible that my points above are complete garbage. I
don't have a grasp on the core concept, for the firmware-wide tracking
of the validation status.

... Ugh, wait. I've just noticed something. In patch#19,
"SEC_SEV_ES_WORK_AREA.SnpSystemRamValidatedRootAddress" is introduced,
which seems (?) to be implementing the kind of global (inter-driver,
inter-phase) tracking that I've been missing. However, even if my guess
(?) is correct about that, this idea definitely doesn't belong in the
last patch, presented as a "bugfix". Even if it were a bugfix, it should
be incorporated into one of the earlier patches (so that we preferably
not introduce the bug in the first place). However, my argument is that
it's not a bugfix -- to me that seems to be the core concept. That's
what the whole series needs to be built upon. (In all of my thinking,
before this particular paragraph, I *never* looked at patch#19. There
was no reason to.)


- There are four page states I can imagine:
- encrypted for host access, valid for guest access
- encrypted for host access, invalid for guest access
- decrypted for host access, valid for guest access
- decrypted for host access, invalid for guest access

Does a state exist, from these four, that's never used? (Potentially
caught by the hardware?)
I would not mix the page state with the host access. Basically there
is two page state exist

- Guest-Valid

- Guest-Invalid

By default all the guest memory is in Guest-Invalid state on boot. The
PVALIDATE instruction is used to transition from Valid->Invalid.
Do you mean "Invalid->Valid" transition?


Guest can use the PVALIDATE to change the state from Valid to Invalid.

A guest private page (i.e page mapped C=1) must be in Guest Valid
state. If hardware see that a private access page is not in the
Guest-Valid state then it will trigger #VC.



Do the patches highlight / explain the validity transitions, in
comments? My understanding is that the C-bit toggle and the guest
access valid/invalid toggle are separate actions, so "middle" states
do exist, but perhaps only temporarily.
I have tried to document the steps in the patch description and I can
add more comments as required. When SEV-SNP is active then we need to
invalidate the page before toggling the C-bit from C=1 -> 0 and
similarly after the page is toggled from C=0 -> 1 we must validate it.


I'm curious how it works, for example, with variousvirtio transfers
(bus master read, bus master write, bus master common buffer). In the
IoMmuDxe driver minimally, we access memory both through PTEs with
the C-bit set and through PTEs with the C-bit clear, meaning that
"encrypted, valid", and "decrypted, valid" are both required states.
But that seems to conflict with the notion that "C-bit toggle" be
directly coupled with a "validity toggle".
That should not be an issue. Each page have entry in the RMP table.
So, you can have one page as guest invalid and another page as a guest
valid.

Put differently, I'm happy that modifying IoMmuDxe appears
unnecessary, but then that tells me I'm missing something about the
state transitions between the above *four* states.

Only thing which I would propose changing in IoMmuDxe driver is to use
a pre-allocated buffer and avoid toggling the C-bit on every
read/write. In past the C-bit toggle was all contained inside the
guest. But when SNP is enabled, each C-bit toggle causes two
additional steps #1 RMPUPDATE and #2 PVALIDATE. The RMPUPDATE is
requested using the VMEXIT so it will cause a VM context switch. I
would prefer to avoid it.

My current patch set does not have this optimization yet.
Please let's stick with the absolute minimum in this series, yes.

I realize my email is complete chaos. That's because my understanding of
this work is complete chaos.

Can we approach this feature as follows:

- Make "validation tracking" the core tenet (central principle) of this
feature.

- Design the validation *patterns* that are required in each firmware
phase.

- Identify the *sole* owner module, for validation tracking, in each
firmware phase. If we can't readily do this, we might have to
introduce new drivers (PEIMs, DXE drivers?) and new interfaces (PPIs,
protocols?)

- Design the hand-off between the owner modules (on firmware phase
boundaries).

- Add only helper functions to BaseMemEncryptSevLib, without any
ownership of tracking info. The helpers should be as restricted as
possible.

I'd like to avoid "smearing" page validation responsibility over a bunch
of modules. (See again: TianoCore#847.) I'd like responsibilities
strictly centralized.

If you feel that my points make no sense, I can start going over the
individual patches. However, even that way, I'd have found patch#19 only
in the end, and that doesn't seem right. And I wouldn't like to go
"numb" on the actual code, being distracted by style issues and so on,
before we have some agreement on the core approach.

Another tip: if you can add content to MdePkg that is governed by AMD
specifications, such as patches #3, #5, #9, I would recommend splitting
that off to a separate (pre-requisite) series. That content seems
unaffected by whatever we do in OvmfPkg, and it would help decrease the
size of the OvmfPkg series. It's not like we're going to abandon the
SEV-SNP feature, so I don't think there's a risk in merging the MdePkg
series first, while the OvmfPkg series is not ready. The MdePkg stuff
will not remain unused indefinitely.

Thanks,
Laszlo


Brijesh Singh
 

On 4/12/21 11:23 AM, Laszlo Ersek wrote:
On 04/10/21 00:43, Brijesh Singh wrote:
On 4/9/21 7:24 AM, Laszlo Ersek wrote:
On 04/08/21 13:59, Brijesh Singh wrote:
On 4/8/21 4:58 AM, Laszlo Ersek wrote:
On 03/24/21 16:31, Brijesh Singh wrote:
At this time we only support the pre-validation. OVMF detects all
the available system RAM in the PEI phase. When SEV-SNP is
enabled, the memory is validated before it is made available to
the EDK2 core.
Can you describe this in a bit more detail, before I look at the
individual patches? Specifically, what existing logic in the PEI
phase was taken, and extended, and how?
One of the key requirement is that the guest private pages much be
validated before the access. If guest tries to access the pages
before the validation then it will result in #VC
(page-not-validated) exception. To avoid the #VC, we propose the
validating the memory before the access. We will incrementally add
the support to lazy validate (i.e validate on access).
What is the primary threat (in simple terms please) that validation
is supposed to prevent?
To protect against the memory re-mapping attack the guest pages must
be validated. The idea is that every guest page can map only to a
single physical memory page at one time.
I don't understand. How can a given guest page frame map to multiple
host page frames?

Do you mean that the situation to prevent is when multiple guest page
frames (GPAs) map to the same host page frame, as set up by the
hypervisor?
A malicious hypervisor may attempt to remap validated gpa to a different
host frame. The PVALIDATE is designed to protect against those type of
attacks. See
https://static.sched.com/hosted_files/lsseu2019/65/SEV-SNP%20Slides%20Nov%201%202019.pdf
(slide 13). You can also find more information about it in the SEV-SNP
whitepaper.


And, against that particular threat, does pre-validation offer some
protection, or will that only come with lazy validation?
For the hardware it does not matter how the memory was validated --
lazy vs prevalidate. Both approaches use the PVALIDATE instruction to
validate the page.

In the case of pre-validation, the memory is validated before the
access. Whereas in the lazy validation, the access will cause a fault
and fault handler should validate the page and retry the access. Its
similar to a page fault handler, OS can populate the page table or
back the pages on demand.

The only downside of pre-validation is, we will take a hit on the boot
time. The GHCB spec provides method by which we can batch multiple
requests at once to minimize the context switches.
If pre-validation is simpler, and the only drawback is a one-time hit
during boot, then wouldn't it be better to stick with pre-validation
forever? I expect that would be a lot simpler for (a) the #VC handlers,
(b) the tracking of the "already validated" information.
This could be an issue for the larger VMs. The boot delay will vary
based on the VM size. In addition to the boot delay,  the PVALIDATE
instruction requires that there is a valid entry for the page in the
nested page table. If the entry does not exist in the NPT then HV will
get #NPF and will fill the NPT with the backing host page. By using the
lazy-validation we can push allocation of the backing pages to only when
required and thus release the memory pressure on the VMM.




Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP
firmware before the launch. The SNP firmware also validates the
memory page after encrypting. This allows us to boot the initial
entry code without guest going through the validation process.

The OVMF reset vector uses few data pages (e.g page table, early Sec
stack). Access to these data pages will result in #VC. There are two
approaches we can take to validate these data pages:

1. Ask SNP firmware to pre-validate it -- SNP firmware provides an
special command that can be used to pre-validate the pages without
affecting the measurement.
This means the two pflash chips, right?
This does not need to be two pflash chips. The SNP firmware command
does not know anything about the ROM or pflash chip. The command
accepts the system physical address that need to be validated by the
firmware. In patch #2, OVMF provides a range of data pages that need
to be validated by the SNP firmware before booting the guest.
I guess my question related to the guest code that executes from pflash,
and/or accesses data from pflash, before the guest itself could ask for
any validation. Such as, the reset vector (which runs from pflash).
Then, some non-volatile UEFI variable data that resides in the other
pflash chip, and is accessed (read-only) during the PEI phase (the
memory type information variable namely). I understood your comments as
QEMU pre-validating those areas before guest launch. Is that correct?

Yes that is correct. All the OVMF pflash0 memory will be pre-validated
before the guest is launched. IIRC, the variables which resides on the
other pflash chip are accessed unencrypted in the guest. Only the
encrypted memory access (aka C=1) need to be validated. The host MMIO
regions does not need to be validated.



Anyway, I guess at this point I should just start reviewing the series.

Some more comments below.


- We need more areas made accessible in SEC than just the
decompression buffer; for example the temporary SEC/PEI heap and
stack, and (IIRC) various special SEV-ES areas laid out via MEMFD.
Where are all of those handled / enumerated?
Sorry, I simplified my response by saying just decompression. You are
right that its more than the decompression buffer. In my current
patch, the SEC phase validates all the memory up to
PcdOvmfDecompressionScratchEnd (which includes more than just output
buffer). See patch #13.
That sounds good (will check the patch out of course).


One of the important thing is we should *never* validate the pages
twice.
What are the symptoms / consequences of:

- the guest accessing an unvalidated page (I understand it causes a
#VC, but what is the direct result of that, when this series is
applied?),
After the series it applied, an access to an unvalidated page will
result in #VC. The series does not extend the VC handler to handle the
page-not-validated error code. The current VC handler
[InternalVmgExitHandleVc()] will inject a #GP to terminate the guest
if #VC is raised for unvalidated page.
OK.


- doubly-validating a page?

The first question is relevant because we should crash as soon as we
touch a page we forgot to validate (we shouldn't let any corruption
or similar spread out until we finally realize there's a problem).

The second question is relevant for security I guess. What attacks
become possible, and/or what symptoms are produced, if we
doubly-validate a page?
Assuming that guest validates its memory, this guarantees the
injective mapping between GPAs and SPAs.

As I noted that the page validation process consist of two steps #1
RMPUPDATE and 2#PVALIDATE. The RMPUPDATE is a hypervisor instruction
and PVALIDATE is a guest instruction. A malicious hypervisor can remap
gpa to different physical address in RMP table using the RMPUPDATE
just before the second PVALIDATE instruction is executed. The guest
need to ensure that it does not leave the security vulnerability that
can be exploited by a malicious hypervisor. The SNP white paper
recommends that a guest OS should keep track of what is has validate
so that it can avoid getting into these situation.
Hm, OK. I think this answers my top-most question.


Furthermore, IIRC, we have separate #VC handlers for the different
firmware phases; do they behave consistently / identicall when a
#VC(page-not-validated) occurs, when this patch set is applied?

My first question is basically asking whether we can *exclusively*
rely on #VC(page-not-validated) faults to tell us that we missed
validating a particular page. If we can do that, then the job is a
bit easier, because from the GPA, we can more or less also derive
*when and where* we should pre-validate the page (at least until
validation is done completely lazily).
Yes, we should be able to rely #VC to tell us that we missed
validating the page. In my this series so far I have not seen a #VC
due to page-not-validated because we have carefully validated the
pages before they are accessed. I am thinking that
#VC(page-not-validated) should be treated as an error in the first
phase. Incrementally we will add the lazy validation, in which the
#VC(page-not-validated) will provide a hint as to what has not been
validated.
I'm OK with the current handling of #VC(page-not-validated) -- i.e.,
fatal error.

I'm not yet convinced that lazy validation will be useful at all, but
that could be just my lack of understanding.

See if my above comment makes sense on why we may need the lazy
validation. IMO, we should not be going with the lazy validation in
OVMF. To minimize the boot delay and avoid the complexity of the code in
the best case OVMF validated lower 4GB memory and mark rest of the
memory as "unaccepted" in the UEFI map.

David from google has started a thread about lazy validation in kernel
https://www.spinics.net/lists/linux-mm/msg243954.html. We would like to
find a method which works for both Intel TDX and AMD SEV. We will build
these support incrementally both in the kernel and OVMF.



The MemEncryptSevSnpValidateRam() uses a interval search tree to
keep the record of what has been validated. Before validating the
range, it lookup in its tree and if it finds that range is already
validated then do nothing. If it detects an overlap then it will
validate only non overlapping regions -- see patch #14. hat data
structure is used for implementing the interval tree?
I'm not necessarily looking for a data structure with "nice"
asymptotic time complexity. With pre-validation especially, I think
simplicity (ease of review) is more important for the data structure
than performance. If it's not an actual "tree", we shouldn't call it
a "tree". (An "interval tree" is usually an extension of a Red-Black
Tree, and that's not the simplest data structure in existence;
although edk2 does offer an rbtree library.)
I am using binary search tree. Currently, we don't have many nodes to
track. Most of the guest memory is private and are validated before we
exit from the PEI phase.
My preliminary comments about MemEncryptSevSnpValidateSystemRam():

(1) The function prototype (patch#12) promises that the function can
never fail. In patch#14 however, I see some AddRangeToIntervalTree()
calls whose return values are not checked. I think error checking should
be strict, and if an error is detected (such as memory allocation
failure), a plain ASSERT (FALSE) is not sufficient. Both ASSERT() and
CpuDeadLoop() should be used explicitly, for the sake of RELEASE builds.
And the function prototype (the leading comment) should highlight that
the function either succeeds, or doesn't return.


(2) I don't like the recursive nature of the functions. That's always
convenient first, and almost always a mess to get out of later. However,
this is not a request to complicate the tree's implementation,
because...


(3) ... you mention we don't have many nodes to track. Is a tree
structure, with dynamically allocated nodes, justified at all?


(4) Regarding repeating dynamic allocation -- that's never a great thing
to do in the SEC / PEI phases. Can we use a fixed size, once-allocated,
or even static, array somewhere?

In particular, patch#14 uses AllocatePool() for SNP_VALIDATED_RANGE, in
the PEI phase, that results in EFI_HOB_TYPE_MEMORY_POOL HOBs; see:

AllocatePool() [MdePkg/Library/PeiMemoryAllocationLib/MemoryAllocationLib.c]
PeiServicesAllocatePool() [MdePkg/Library/PeiServicesLib/PeiServicesLib.c]
PeiAllocatePool() [MdeModulePkg/Core/Pei/Memory/MemoryServices.c]

Now, dependent on where you call this AllocatePool() from, it could
occur still on the temporary SEC/PEI RAM -- i.e., before permanent PEI
RAM is installed. That means that such HOBs would be subject to the HOB
list migration, after permanent PEI RAM is installed. The PI spec writes
about EFI_HOB_MEMORY_POOL: "The HOB consumer phase should be able to
ignore these HOBs". The PI spec writes about AllocatePool(): "The early
allocations from temporary memory will be migrated to permanent memory
when permanent main memory is installed; this migration shall occur when
the HOB list is migrated to permanent memory".

So, whenever we use AllocatePool() in the PEI phase, we need an "address
stability proof", namely that the AllocatePool() call, and any later
access to the allocated area (the HOB), are never separated by the
installation of "gEfiPeiMemoryDiscoveredPpiGuid" in the PPI database, by
the PEI Core. If we can't guarantee that, then an access later on could
occur to the original allocation address, which has by then been
invalidated by the temporary RAM migration (= the allocation HOB has
been moved).

AllocatePages() is "stable" however, in this sense; no proof like the
one described above is necessary.


(5) The "mRootNode" variable is defined in the file
"OvmfPkg/Library/BaseMemEncryptSevLib/X64/PeiSnpSystemRamValidate.c"
(patch #14). That means that every single module (i.e., PEIM) which
consumes this library instance will own a separate "mRootNode" variable,
with a validation-tracking tree that's correspondingly private to that
PEIM.

But the validation state is global to the entire firmware. I think we
should somehow clarify, or even enforce, that the tracking data
structure is firmware-global.

Perhaps this can be solved by (a) moving the ownership of the data
structure to a particular firmware module, with sole responsibility for
pre-validation, and (b) updating the validation APIs to take this data
structure explicitly.

Alternatively, if multiple modules are expected to inter-operate on the
tracker data structure (by which I mean inter-operation *in sequence*,
not concurrently in a multiprocessing sense), then we might need a
dynamic PCD for inter-driver understanding.


(NB: both points (4) and (5) seem to conflict with *existent* code in
BaseMemEncryptSevLib. However:

- concerning (4), the current allocations are all performed with
AllocateAlignedPages(), which are unaffected by temporary-to-permanent
RAM migration,

- and concerning (5), namely the existent "mPageTablePool" global
variable, is a *known bug* -- see commit b721aa749b86a and
TianoCore#847. This issue (namely, multiple modules messing with the
page tables independently) remains unsolved, and I'd like to avoid
introducing *more* paging-related allocations that are module-owned,
and not firmware-level.)


All in all, here's what I'd prefer (and please correct me if it's
unrealistic):

- make the SNP validation functions take a simple, *unresizeable*
tracker structure for input/output,

- allocate that data structure with a single AllocatePages() call in the
proper PEIM,

- if the SNP validation functions run out of space in the structure,
hang hard,

- avoid recursion, and use a simple linear search through the array,

- evaluate whether graceful handling of overlaps (= range splitting,
merging) is actually a requirement. See more on this below.

Yes this will work fine. The primary reason for me adding the interval
tree was to detect the overlap condition before we finalize the
pre-validation in PEI. The validation flow works like this:

- Qemu validated few pages

- SEC validates few more page

- When we detect the memory in the PEI we need to exclude the pages
validated by the Qemu and SEC.

The page ranges validated using the Qemu and SEC are accessible through
a fixed PCDs. I can do simple overlap check with those fixed PCDs and
skip the pre-validated ranges. I don't have any strong reason to use the
interval tree to detect the overlap condition. There is no call to
toggle the C-bit before we finalize the validation. 

IMO, there is no need for tracking the page state after we finish the
pre-validation. After the pre-validtion is completed, the caller will
use either clear the C-bit or set the C-bit. If it attempts to set the
C-bit without clearing it first then its a bug -- which will be caught
by the page state change internal function. We can use the output of CF
flag from the PVALIDATE to determine where caller is trying to do doubly
validation or invalidation and abort it.



(6) Can you please confirm that the order in which
MemEncryptSevSnpValidateSystemRam()'s declaration, call sites, and
implementations are introduced, is sensible? I can't tell immediately,
but I'd like to be sure that the tree at least builds at every stage (no
linkage errors at any stage).
I can certainly say the order in which they are introduced and the call
sites are a sensible. I don't expect any linkage failure but since it
was an RFC so I didn't go through making sure that it builds in every
stage. I wanted to get the overall direction on where community would
like to go before making the final changes. You have been asking some
very good question and that is certainly helping us to reduce some of
the complexity in the patch.



Furthermore, what you describe above is called idempotency. No matter
how many times we attempt to validate a range, it may (or may not
even) cause an actual change in the first action only. Is this
property (=idempotency) an inherent requirement of the technology, or
is it a simplification of the implementation? Put differently: if you
called CpuDeadLoop() in the validation function any time an
overlapping validation request were received, would that hugely
complicate the call sites?
From the technology point of view you can validate the same page again
and again. Hardware does not force any restriction. To protect against
time-based remap attack a guest also need to ensure that it does not
blindly validates the pages otherwise guest is taking risk. Avoiding a
double validation is strong recommendation. In current implementation,
the validation burden is not put on the caller. Caller calls standard
APIs to toggle the encryption attribute. Under the hood, if we see
that page is already validated then no need to issue the PVALIDATE. If
the page is getting changed from C=1 -> 0 then unvalidate it.
There remains a huge gap in my understanding here. (The rest of my
earlier questions did concern the relationship between the C bit and
validation, but I might as well comment right at this spot.)

"Invalidation in connection to C-bit toggling" means that there's going
to be significant "validation churn" in the DXE phase. For example when
fw_cfg is used (which relies on the edk2 IOMMU protocol), or when virtio
transfers are performed (which also relies on the edk2 IOMMU protocol,
although less directly -- through PciIo, namely).

That, however, brings up several problems for me:

- Is the tracker data structure scalable? I don't see any balancing in
it. (And I wouldn't want a new, complex data structure for it anyway.)

- In fact, I don't see any code related to invalidation -- more
precisely, I don't see where and how the C-bit changes consult the
tracker structure *at all*.

- Given issue (5), I don't see how the tracker structure would be
inherited from the PEI phase to the DXE phase.

In particular, in patch#18, the flipping of the C-bit results in calls
to SetPageStateInternal(). But SetPageStateInternal() does not go
*through* the tracker structure. Even in patch#14,
SevSnpValidateSystemRam() calls SetPageStateInternal() *in addition to*
AddRangeToIntervalTree().

In other words, I don't see where validity is tracked in the DXE phase,
in response to the C-bit toggling.
As I highlighted above there is actually no need to track the page state
after we successfully complete the pre-validation. All these guest page
validation or invalidation are applicable to the system RAM. But
AmdSevDxe driver calls to clear the C-bit of the MMIO range. These range
are not a system and have never been validated so we invalidating it
will cause an issue. Since I had the data structures available in PEI
phase for the tracking the page state hence made those available to DXE
to verify that we are called to invalidate the SYSTEM_RAM and not MMIO.
IMO, we should either extend the MemEncryptSev{Set,Clear}PageEncMask()
to pass either a new flag to hint where its system RAM or non-RAM.
Thoughts ?




I'm kind of "obsessing" about idempotency because you say we must
*never* doubly-validate a page, so the *difference* between:
- explicitly crashing on such an attempt,
- and silently ignoring such an attempt,
may be meaningful.
I said we should never doubly validate to avoid creating a
vulnerability that can later be exploited by the malicious hypervisor.
Currently we lookup address in our tree to ensure that we don't do a
double validation. The PVALIDATE instruction also can be used to
determine if we are attempting to validate an already validated page.
In current implementation after the PVALIDATE completion, I check the
rflags.CF to ensure that its not a double validation condition. If its
double validation then abort the guest. Its bug in the guest.

It's kind of the difference between "oops this is a call site *bug*,
but we patched it up", vs. "this is expected of call sites, we should
just handle it internally".
Our attempt should be to patch to avoid the double validation. I would
still check the status of PVALIDATE, if instruction detects it as a
double validation then we have a bug in our tracking and should abort
the guest.
Let me rephrase. My question concerns the *abstraction level* at which
we should catch and handle double-validation attempts.

If we silently split and merge ranges in the tracker structure, then the
guest (as a whole) will not double-validate, which is good. However,
double-validation attempts will potentially still *exist*, in
higher-level modules in the firmware. My question is whether it is
*right* to paper over such overlaps, coming from the high-level code, in
the tracker structure. I suspect that it's not right.
Yes we should bail out if such a request comes from the high level
module. This is why I was actually not checking the tracking structure
when toggling the C-bit. The complete memory much have been validated
before we are asked to toggle the C-bit. If a module is performing a
doubly validation or invalidation then it should be aborted to avoid any
future exploits.



In other words, the spot where we should abort, due to
"double-validation detected", is not when the PVALIDATE instruction
reports a problem in the tracker structure. Instead, the spot where we
should abort is when we detect an overlapping request *before* issuing
PVALIDATE. I'd like the tracker to be as simple and dumb as possible,
and I'd like the actual sites that request validation to blow up
immediately, if they issue an overlapping request. Is my expectation
wrong?
If we go with the tracking only until we complete the pre-validation
then we can use the hardware PVALIDATE help to detect the doubly
validation and abort it. There is no need to check the overlap conditions.


The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call
the SNP page state change during the C-bit toggle.
- When exactly do we invalidate?
Basically before the page is transition from C=1 -> 0 in the page
table.

- Does the time and place of invalidation depend on whether we
perform pre-validation, or lazy validation?
It does not matter. Both type of validation update the Validated bit
in the RMP table. The important thing is that invalidation can happen
only for a already validated private page. If the page is shared then
invalidation will result in #GP.
- Is page invalidation idempotent too?
We should treat this as idempotent too. As I said, tacking the page
state is strongly recommended for the security reason. The SNP white
paper has a section dedicated for the validation and may able to
clarify things which is not covered in my response.
- What is the consequence (security impact) if we forget
invalidation?
I can't think of a security implication of it but if we forget to
invalidate then it can lead issues in the tracking data structure when
it comes to validation next time. e.g a page was mapped as C=1 and we
changed it to C=0 without invalidating it. Now if the same page is
being changed from C=0 -> 1 then tracking data structure may think
that page is already validated and will skip the validation process
and thus access will result in #VC (page-not-validated).
As I said above, I don't see where the validation tracking and the C-bit
flipping "meet". Especially across the PEI and DXE phases.

Basically I'm missing the core concept for maintaining a firmware-global
structure, for tracking the validation status of GPA intervals.

Up to and *excluding* the DXE phase, the validation tracking should be
trivial, naive, and unforgiving. This is because, in the PEI phase,
validation is one-shot. I would expect the tracker structure to live in
the PlatformPei module.

Starting with the DXE phase, where we need (in)validation to closely
accompany C-bit toggling, we need a different, performant data structure
for tracking validation status. This data structure should be primed
from the status last set up in the PEI phase. The tracking should still
be unforgiving. I believe *this* tracker structure may have to live in
the IOMMU driver.

BaseMemEncryptSevLib may offer helper functions, but even if it does, it
should never attempt to *own* tracking information, via an internal
global variable. Furthermore, those helper functions that only make
sense for a particular firmware phase, should have such implementations
that hang hard, in library instances that match the *other* firmware
phases. For example, SevSnpValidateSystemRam() should have a
DXE-specific implementation that hangs hard.

It's entirely possible that my points above are complete garbage. I
don't have a grasp on the core concept, for the firmware-wide tracking
of the validation status.

... Ugh, wait. I've just noticed something. In patch#19,
"SEC_SEV_ES_WORK_AREA.SnpSystemRamValidatedRootAddress" is introduced,
which seems (?) to be implementing the kind of global (inter-driver,
inter-phase) tracking that I've been missing. However, even if my guess
(?) is correct about that, this idea definitely doesn't belong in the
last patch, presented as a "bugfix". Even if it were a bugfix, it should
be incorporated into one of the earlier patches (so that we preferably
not introduce the bug in the first place). However, my argument is that
it's not a bugfix -- to me that seems to be the core concept. That's
what the whole series needs to be built upon. (In all of my thinking,
before this particular paragraph, I *never* looked at patch#19. There
was no reason to.)
It was my bad that I should have highlighted this before and not left it
to the last patch. As you can see the only reason pass the data
structure from PEI to DXE is to catch those non SYSTEM invalidation
request coming from the AmdSev Driver. I am too happy about the solution
I have in this RFC. I want to go with option to change the call sites to
tell us whether its a system RAM or non existence memory type. 



- There are four page states I can imagine:
- encrypted for host access, valid for guest access
- encrypted for host access, invalid for guest access
- decrypted for host access, valid for guest access
- decrypted for host access, invalid for guest access

Does a state exist, from these four, that's never used? (Potentially
caught by the hardware?)
I would not mix the page state with the host access. Basically there
is two page state exist

- Guest-Valid

- Guest-Invalid

By default all the guest memory is in Guest-Invalid state on boot. The
PVALIDATE instruction is used to transition from Valid->Invalid.
Do you mean "Invalid->Valid" transition?

Ah yes that is correct.




Guest can use the PVALIDATE to change the state from Valid to Invalid.

A guest private page (i.e page mapped C=1) must be in Guest Valid
state. If hardware see that a private access page is not in the
Guest-Valid state then it will trigger #VC.


Do the patches highlight / explain the validity transitions, in
comments? My understanding is that the C-bit toggle and the guest
access valid/invalid toggle are separate actions, so "middle" states
do exist, but perhaps only temporarily.
I have tried to document the steps in the patch description and I can
add more comments as required. When SEV-SNP is active then we need to
invalidate the page before toggling the C-bit from C=1 -> 0 and
similarly after the page is toggled from C=0 -> 1 we must validate it.


I'm curious how it works, for example, with variousvirtio transfers
(bus master read, bus master write, bus master common buffer). In the
IoMmuDxe driver minimally, we access memory both through PTEs with
the C-bit set and through PTEs with the C-bit clear, meaning that
"encrypted, valid", and "decrypted, valid" are both required states.
But that seems to conflict with the notion that "C-bit toggle" be
directly coupled with a "validity toggle".
That should not be an issue. Each page have entry in the RMP table.
So, you can have one page as guest invalid and another page as a guest
valid.

Put differently, I'm happy that modifying IoMmuDxe appears
unnecessary, but then that tells me I'm missing something about the
state transitions between the above *four* states.
Only thing which I would propose changing in IoMmuDxe driver is to use
a pre-allocated buffer and avoid toggling the C-bit on every
read/write. In past the C-bit toggle was all contained inside the
guest. But when SNP is enabled, each C-bit toggle causes two
additional steps #1 RMPUPDATE and #2 PVALIDATE. The RMPUPDATE is
requested using the VMEXIT so it will cause a VM context switch. I
would prefer to avoid it.

My current patch set does not have this optimization yet.
Please let's stick with the absolute minimum in this series, yes.

I realize my email is complete chaos. That's because my understanding of
this work is complete chaos.

Can we approach this feature as follows:

- Make "validation tracking" the core tenet (central principle) of this
feature.

- Design the validation *patterns* that are required in each firmware
phase.

- Identify the *sole* owner module, for validation tracking, in each
firmware phase. If we can't readily do this, we might have to
introduce new drivers (PEIMs, DXE drivers?) and new interfaces (PPIs,
protocols?)

- Design the hand-off between the owner modules (on firmware phase
boundaries).

- Add only helper functions to BaseMemEncryptSevLib, without any
ownership of tracking info. The helpers should be as restricted as
possible.

I'd like to avoid "smearing" page validation responsibility over a bunch
of modules. (See again: TianoCore#847.) I'd like responsibilities
strictly centralized.

If you feel that my points make no sense, I can start going over the
individual patches. However, even that way, I'd have found patch#19 only
in the end, and that doesn't seem right. And I wouldn't like to go
"numb" on the actual code, being distracted by style issues and so on,
before we have some agreement on the core approach.

Certainly your points makes sense. Let me know what you think about
removing the tracking data structure and using a liner array of
pre-valiated range (build with a Fixed PCD) works ? I was hoping you to
just glance over it and provide me high level feedback. I will go
through styles and comments in great details on non RFC patches. I still
have some TODO items (e.g Secrets, CPUID etc) before we take off the RFC
tag. Thank you so much for your detail feedback.



Another tip: if you can add content to MdePkg that is governed by AMD
specifications, such as patches #3, #5, #9, I would recommend splitting
that off to a separate (pre-requisite) series. That content seems
unaffected by whatever we do in OvmfPkg, and it would help decrease the
size of the OvmfPkg series. It's not like we're going to abandon the
SEV-SNP feature, so I don't think there's a risk in merging the MdePkg
series first, while the OvmfPkg series is not ready. The MdePkg stuff
will not remain unused indefinitely.

Ah, sure I can do that. 


-Brijsh


Laszlo Ersek
 

On 04/12/21 22:14, Brijesh Singh wrote:

On 4/12/21 11:23 AM, Laszlo Ersek wrote:
On 04/10/21 00:43, Brijesh Singh wrote:
On 4/9/21 7:24 AM, Laszlo Ersek wrote:
What is the primary threat (in simple terms please) that validation
is supposed to prevent?
To protect against the memory re-mapping attack the guest pages must
be validated. The idea is that every guest page can map only to a
single physical memory page at one time.
I don't understand. How can a given guest page frame map to multiple
host page frames?

Do you mean that the situation to prevent is when multiple guest page
frames (GPAs) map to the same host page frame, as set up by the
hypervisor?
A malicious hypervisor may attempt to remap validated gpa to a
different host frame. The PVALIDATE is designed to protect against
those type of attacks. See
https://static.sched.com/hosted_files/lsseu2019/65/SEV-SNP%20Slides%20Nov%201%202019.pdf
(slide 13). You can also find more information about it in the SEV-SNP
whitepaper.
Thanks for the link. Page 7 confirms my understanding:

"""
Memory Aliasing
Example attack: Map two guest pages to same DRAM page
"""

(On the other hand, slide 12 seems to be buggy, "Page validation
guarantees that each GPA only maps to one valid SPA at any given time"
-- the GPA:SPA relationship is n:1, so any given GPA could never ever
map to multiple SPAs, regardless of SEV-SNP. The multiplicity is
possible in the other direction (a single SPA is exposed as multiple
GPAs, maybe in a single guest, maybe in multiple guests), and that's
what SEV-SNP is supposed to prevent. The GPA->SPA mapping must be
injective, but that's not what the words on the slide actually express.)

The page remapping slide (13) does not necessarily break injectivity,
and even if the hypervisor does that, the same GPA *still* corresponds
to a single SPA, at this new ("one") time. It's just that the
correspondence is not what the guest has approved. Expressing this
correctly would require a temporal logic formula of sorts.

Anyway... at least now I (believe I) understand that these are two
separate attacks (aliasing vs. remapping).

If pre-validation is simpler, and the only drawback is a one-time hit
during boot, then wouldn't it be better to stick with pre-validation
forever? I expect that would be a lot simpler for (a) the #VC
handlers, (b) the tracking of the "already validated" information.
This could be an issue for the larger VMs. The boot delay will vary
based on the VM size. In addition to the boot delay, the PVALIDATE
instruction requires that there is a valid entry for the page in the
nested page table. If the entry does not exist in the NPT then HV will
get #NPF and will fill the NPT with the backing host page. By using
the lazy-validation we can push allocation of the backing pages to
only when required and thus release the memory pressure on the VMM.
Ugh... OK, I guess. I'd need to digest this a lot more, for a fully
informed agreement, but I'm happy with this answer (and my rudimentary
understanding thereof) for now.

I guess my question related to the guest code that executes from
pflash, and/or accesses data from pflash, before the guest itself
could ask for any validation. Such as, the reset vector (which runs
from pflash). Then, some non-volatile UEFI variable data that resides
in the other pflash chip, and is accessed (read-only) during the PEI
phase (the memory type information variable namely). I understood
your comments as QEMU pre-validating those areas before guest launch.
Is that correct?

Yes that is correct. All the OVMF pflash0 memory will be pre-validated
before the guest is launched. IIRC, the variables which resides on the
other pflash chip are accessed unencrypted in the guest. Only the
encrypted memory access (aka C=1) need to be validated. The host MMIO
regions does not need to be validated.
Good point. Thanks for the explanation / reminder!

See if my above comment makes sense on why we may need the lazy
validation. IMO, we should not be going with the lazy validation in
OVMF. To minimize the boot delay and avoid the complexity of the code
in the best case OVMF validated lower 4GB memory and mark rest of the
memory as "unaccepted" in the UEFI map.
I'm going to violently support any idea that keeps things simple for
OVMF -- is this "lower 4GB" what the current patch set does already? Or
is it the proposed next step?

David from google has started a thread about lazy validation in kernel
https://www.spinics.net/lists/linux-mm/msg243954.html. We would like
to find a method which works for both Intel TDX and AMD SEV. We will
build these support incrementally both in the kernel and OVMF.
Ah, OK. That explains it. Next step then. Hmm, yes, patch#15 seems to
pre-validate everything now. OK.

Yes this will work fine. The primary reason for me adding the interval
tree was to detect the overlap condition before we finalize the
pre-validation in PEI. The validation flow works like this:

- Qemu validated few pages

- SEC validates few more page

- When we detect the memory in the PEI we need to exclude the pages
validated by the Qemu and SEC.

The page ranges validated using the Qemu and SEC are accessible
through a fixed PCDs. I can do simple overlap check with those fixed
PCDs and skip the pre-validated ranges. I don't have any strong reason
to use the interval tree to detect the overlap condition. There is no
call to toggle the C-bit before we finalize the validation.
Thank you. This is a *huge* relief.

IMO, there is no need for tracking the page state after we finish the
pre-validation. After the pre-validtion is completed, the caller will
use either clear the C-bit or set the C-bit. If it attempts to set the
C-bit without clearing it first then its a bug -- which will be caught
by the page state change internal function. We can use the output of
CF flag from the PVALIDATE to determine where caller is trying to do
doubly validation or invalidation and abort it.
Sounds good. Even in case the final PVALIDATE action proves incorrect,
if we catch that immediately via CF, and refuse to do anything else
(i.e. we won't go ahead and use the page(s) in question), things should
be OK.

(6) Can you please confirm that the order in which
MemEncryptSevSnpValidateSystemRam()'s declaration, call sites, and
implementations are introduced, is sensible? I can't tell
immediately, but I'd like to be sure that the tree at least builds at
every stage (no linkage errors at any stage).
I can certainly say the order in which they are introduced and the
call sites are a sensible. I don't expect any linkage failure but
since it was an RFC so I didn't go through making sure that it builds
in every stage. I wanted to get the overall direction on where
community would like to go before making the final changes. You have
been asking some very good question and that is certainly helping us
to reduce some of the complexity in the patch.
Thanks. I've been very stressed as to whether my questions or
"proposals" make any sense.

And yes, it's a desirable trait for a patch set to build at every stage
(bisectability).

As I highlighted above there is actually no need to track the page
state after we successfully complete the pre-validation.
Again, I welcome this very much.

All these guest page
validation or invalidation are applicable to the system RAM. But
AmdSevDxe driver calls to clear the C-bit of the MMIO range. These
range are not a system and have never been validated so we
invalidating it will cause an issue.
Understood.

Since I had the data structures available in PEI
phase for the tracking the page state hence made those available to
DXE to verify that we are called to invalidate the SYSTEM_RAM and not
MMIO. IMO, we should either extend the
MemEncryptSev{Set,Clear}PageEncMask() to pass either a new flag to
hint where its system RAM or non-RAM. Thoughts ?
I think I'd prefer a new function in the lib class, one that's dedicated
to clearing the C bit on MMIO without attempting invalidation. It's a
special-case function, and the dedicated name helps with two things:

- no need to mess with existing call sites except in AmdSevDxe,
- the new function is easier to grep for.

BTW, there would be at least one other approach for this, I think -- the
PEI instance of the library could consult the HOB list, and the DXE
instance of the library could consult the GCD memory space map, to
decide whether the page being C-bit-flipped is MMIO or not. But that's a
lot of complexity (and likely some performance hit too), when in fact we
know at the call sites whether the area being encrypted/decrypted is
MMIO or RAM.

Note also that (IIUC) in AmdSevDxe, we only *decrypt*, so we only need
*one* new function, "MemEncryptSevClearMmioPageEncMask" or similar.

BTW, in the "MemEncryptSevLib.h" header, the documentation of the
MemEncryptSev{Set,Clear}PageEncMask function declarations should be
updated -- the leading comments should explain that, in case SEV-SNP is
found active, then page (in)validation will occur too (as appropriate).

Yes we should bail out if such a request comes from the high level
module. This is why I was actually not checking the tracking structure
when toggling the C-bit. The complete memory much have been validated
before we are asked to toggle the C-bit. If a module is performing a
doubly validation or invalidation then it should be aborted to avoid
any future exploits.
Great, thank you.

If we go with the tracking only until we complete the pre-validation
then we can use the hardware PVALIDATE help to detect the doubly
validation and abort it. There is no need to check the overlap
conditions.
Again, great. In the pre-validation step, where you explicitly carve out
the PCD-described ranges that have been handled already by QEMU and
earlier phases of the firmware, you can still check the PVALIDATE result
(CF), just to be sure.

Certainly your points makes sense. Let me know what you think about
removing the tracking data structure and using a liner array of
pre-valiated range (build with a Fixed PCD) works ?
I couldn't have hoped for such a great resolution to this conundrum.
Yes, please do that.

I was hoping you to just glance over it and provide me high level
feedback. I will go through styles and comments in great details on
non RFC patches. I still have some TODO items (e.g Secrets, CPUID etc)
before we take off the RFC tag. Thank you so much for your detail
feedback.
Well, this is the first email on the SEV-SNP topic where what I feel can
be described as "relief" :)

Thank you!
Laszlo


Brijesh Singh
 

Since I had the data structures available in PEI
phase for the tracking the page state hence made those available to
DXE to verify that we are called to invalidate the SYSTEM_RAM and not
MMIO. IMO, we should either extend the
MemEncryptSev{Set,Clear}PageEncMask() to pass either a new flag to
hint where its system RAM or non-RAM. Thoughts ?
I think I'd prefer a new function in the lib class, one that's dedicated
to clearing the C bit on MMIO without attempting invalidation. It's a
special-case function, and the dedicated name helps with two things:

- no need to mess with existing call sites except in AmdSevDxe,
- the new function is easier to grep for.

BTW, there would be at least one other approach for this, I think -- the
PEI instance of the library could consult the HOB list, and the DXE
instance of the library could consult the GCD memory space map, to
decide whether the page being C-bit-flipped is MMIO or not. But that's a
lot of complexity (and likely some performance hit too), when in fact we
know at the call sites whether the area being encrypted/decrypted is
MMIO or RAM.

Sound good to me, will introduce new function for the MMIO in v2.


Note also that (IIUC) in AmdSevDxe, we only *decrypt*, so we only need
*one* new function, "MemEncryptSevClearMmioPageEncMask" or similar.

BTW, in the "MemEncryptSevLib.h" header, the documentation of the
MemEncryptSev{Set,Clear}PageEncMask function declarations should be
updated -- the leading comments should explain that, in case SEV-SNP is
found active, then page (in)validation will occur too (as appropriate).
Noted.