[PATCH V5 2/2] OvmfPkg/ResetVector: Enable Intel TDX in ResetVector of Ovmf


Min Xu
 

RFC: https://bugzilla.tianocore.org/show_bug.cgi?id=3429

Intel's Trust Domain Extensions (Intel TDX) refers to an Intel technology
that extends Virtual Machines Extensions (VMX) and Multi-Key Total Memory
Encryption (MKTME) with a new kind of virutal machines guest called a
Trust Domain (TD). A TD is desinged to run in a CPU mode that protects the
confidentiality of TD memory contents and the TD's CPU state from other
software, including the hosting Virtual-Machine Monitor (VMM), unless
explicitly shared by the TD itself.

Note: Intel TDX is only available on X64, so the Tdx related changes are
in X64 path. In IA32 path, there may be null stub to make the build
success.

This patch includes below major changes.

1. TDX_WORK_AREA
It is an internal structure for holding Intel TDX information needed
during SEC phase. It is a field of union OVMF_WORK_AREA.

2. X64/IntelTdxMetadata.asm
IntelTdxMetadata describes the information about the image for VMM use.
For example, the base address and length of the TdHob, TdMailbox, etc.
Its offset is put in a GUID-ed structure which is appended in the GUID-ed
chain from a fixed GPA (0xffffffd0). Below are the items in TdxMetadata:
_Bfv:
Boot Firmware Volume
_Cfv:
Configuration Firmware Volume
_Stack:
Initial stack
_Heap:
Initial heap
_MailBox:
TDVF reserves the memory region so each AP can receive the message
sent by the guest OS.
_OvmfWorkarea:
Compute Confidential work area which is consumed by CC technologies,
such as SEV, TDX.
_TdHob:
VMM pass the resource information in TdHob to TDVF.
_TdxPageTable:
If 5-level page table is supported (GPAW is 52), a top level page
directory pointers (1 * 256TB entry) is generated in this page.
_OvmfPageTable:
Initial page table for standard Ovmf.

TDVF indicate above chunk of temporary initialized memory region (_Stack/
_Heap/_MailBox/_OvmfWorkarea/_TdHob/_TdxPageTables/OvmfPageTable) to
support TDVF code finishing the memory initialization. Because the other
unaccepted memory cannot be accessed until they're accepted.

Since AMD SEV has already defined some SEV specific memory region in
MEMFD. TDX re-use the memory regions defined by SEV.
- MailBox : PcdOvmfSecGhcbBackupBase|PcdOvmfSecGhcbBackupSize
- TdHob : PcdOvmfSecGhcbBase|PcdOvmfSecGhcbSize
- TdxPageTable : PcdOvmfSecGhcbPageTableBase|PcdOvmfSecGhcbPageTableSize
- OvmfWorkArea : PcdOvmfWorkAreaBase|PcdOvmfWorkAreaSize

3. Ia32/IntelTdx.asm
IntelTdx.asm includes below routines used in ResetVector
- IsTdx
Check if the running system is Tdx guest.

- InitTdxWorkarea
It initialize the TDX_WORK_AREA. Because it is called by both BSP and
APs and to avoid the race condition, only BSP can initialize the
WORK_AREA. AP will wait until the field of TDX_WORK_AREA_PGTBL_READY
is set.

- ReloadFlat32
After reset all CPUs in TDX are initialized to 32-bit protected mode.
But GDT register is not set. So this routine loads the GDT and set the
CR0, then jump to Flat 32 protected mode. After that CR4 and other
registers are set.

- InitTdx
This routine wrap above 3 routines together to do Tdx initialization
in ResetVector phase.

- PostSetCr3PageTables64Tdx
It is called after SetCr3PageTables64 in Tdx guest to set CR0/CR4.
If GPAW is 52, then CR3 is adjusted as well.

- IsTdxEnabled
It is a OneTimeCall to probe if TDX is enabled by checking the
CC_WORK_AREA.

- TdxBuildExtraPageTables
It builds the extra TDX page tables if 5-level paging is supported.

- CheckTdxFeaturesBeforeBuildPagetables
This routine is called to check if it is Non-TDX guest, TDX-Bsp or
TDX-APs. Because in TDX guest all the initialization is done by BSP
(including the page tables). APs should not build the tables.

4. Main.asm
Previously OvmfPkg/ResetVector use the Main.asm in UefiCpuPkg. There is
only Main16 entry point. Main32 entry point is needed in Main.asm because
of Intel TDX. To reduce the complexity of Main.asm in UefiCpuPkg, OvmfPkg
create its own Main.asm to meet the requirement of Intel TDX. There are
below changes in this Main.asm:
- A new entry point (Main32) is added. TDX guest will jump to Main32
from ResetVecotr. In Main32, InitTdx is called to initialize TDX
specific information.
- In Main16 entry point, after TransitionFromReal16To32BitFlat,
WORK_AREA_GUEST_TYPE is cleared to 0. WORK_AREA_GUEST_TYPE was
previously cleared in SetCr3ForPageTables64 (see commit ab77b6031b).
This doesn't work after TDX is introduced in Ovmf. It is because all
TDX CPUs (BSP and APs) start to run from 0xfffffff0. In previous code
WORK_AREA_GUEST_TYPE will be cleared multi-times in TDX guest. So for
SEV and Legacy guest it is moved to Main16 entry point (after
TransitionFromReal16To32BitFlat). For TDX guest WORK_AREA_GUEST_TYPE
is cleared and set in InitTdxWorkarea.

5. Ia32/PageTables64.asm
GPAW of TDX can be 48 or 52, which determines the level of page table.
If Level-5(GPAW 52) paging is supported, then an extra page is needed
to hold the top level Page Directory Pointers (1 * 256TB entry).

6. Ia16/ResetVectorVtf0.asm
In Tdx all CPUs "reset" to run on 32-bit protected mode with flat
descriptor (paging disabled). But in Non-Td guest the initial state of
CPUs is 16-bit real mode. To resolve this conflict, BITS 16/32 is used
in the ResetVectorVtf0.asm. It checks the 32-bit protected mode or 16-bit
real mode, then jump to the corresponding entry point.

7. ResetVector.nasmb
TDX related macros and files are added in ResetVecotr.nasmb.

Cc: Ard Biesheuvel <ardb+tianocore@kernel.org>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Erdem Aktas <erdemaktas@google.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Min Xu <min.m.xu@intel.com>
---
OvmfPkg/Include/WorkArea.h | 30 ++
OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm | 39 +++
OvmfPkg/ResetVector/Ia32/Flat32ToFlat64.asm | 10 +
OvmfPkg/ResetVector/Ia32/IntelTdx.asm | 302 +++++++++++++++++++
OvmfPkg/ResetVector/Ia32/PageTables64.asm | 20 +-
OvmfPkg/ResetVector/Main.asm | 119 ++++++++
OvmfPkg/ResetVector/ResetVector.inf | 10 +
OvmfPkg/ResetVector/ResetVector.nasmb | 47 ++-
OvmfPkg/ResetVector/X64/IntelTdxMetadata.asm | 110 +++++++
9 files changed, 680 insertions(+), 7 deletions(-)
create mode 100644 OvmfPkg/ResetVector/Ia32/IntelTdx.asm
create mode 100644 OvmfPkg/ResetVector/Main.asm
create mode 100644 OvmfPkg/ResetVector/X64/IntelTdxMetadata.asm

diff --git a/OvmfPkg/Include/WorkArea.h b/OvmfPkg/Include/WorkArea.h
index c16030e3ac0a..47a7e40c9078 100644
--- a/OvmfPkg/Include/WorkArea.h
+++ b/OvmfPkg/Include/WorkArea.h
@@ -59,9 +59,39 @@ typedef struct _SEV_WORK_AREA {
SEC_SEV_ES_WORK_AREA SevEsWorkArea;
} SEV_WORK_AREA;

+//
+// Internal structure for holding Intel TDX information needed during SEC phase
+// and valid only during SEC phase and early PEI during platform
+// initialization.
+//
+// This structure is also used by assembler files:
+// OvmfPkg/ResetVector/ResetVector.nasmb
+// OvmfPkg/ResetVector/Ia32/PageTables64.asm
+// OvmfPkg/ResetVector/Ia32/Flat32ToFlat64.asm
+// OvmfPkg/ResetVector/Ia32/IntelTdx.asm
+// OvmfPkg/ResetVector/Main.asm
+// any changes must stay in sync with its usage.
+//
+typedef struct _SEC_TDX_WORK_AREA {
+ UINT8 IsPageLevel5;
+ UINT8 IsPageTableReady;
+ UINT8 Rsvd[2];
+ UINT32 Gpaw;
+} SEC_TDX_WORK_AREA;
+
+//
+// The Intel TDX work area definition.
+//
+typedef struct _TDX_WORK_AREA {
+ CONFIDENTIAL_COMPUTING_WORK_AREA_HEADER Header;
+
+ SEC_TDX_WORK_AREA SecTdxWorkArea;
+} TDX_WORK_AREA;
+
typedef union {
CONFIDENTIAL_COMPUTING_WORK_AREA_HEADER Header;
SEV_WORK_AREA SevWorkArea;
+ TDX_WORK_AREA TdxWorkArea;
} OVMF_WORK_AREA;

#endif
diff --git a/OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm b/OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm
index 7ec3c6e980c3..c3b856754008 100644
--- a/OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm
+++ b/OvmfPkg/ResetVector/Ia16/ResetVectorVtf0.asm
@@ -47,6 +47,25 @@ TIMES (15 - ((guidedStructureEnd - guidedStructureStart + 15) % 16)) DB 0
;
guidedStructureStart:

+%ifdef ARCH_X64
+;
+; TDX Metadata offset block
+;
+; TdxMetadata.asm is included in ARCH_X64 because Inte TDX is only
+; available in ARCH_X64. Below block describes the offset of
+; TdxMetadata block in Ovmf image
+;
+; GUID : e47a6535-984a-4798-865e-4685a7bf8ec2
+;
+tdxMetadataOffsetStart:
+ DD (OVMF_IMAGE_SIZE_IN_KB * 1024 - (fourGigabytes - TdxMetadataGuid - 16))
+ DW tdxMetadataOffsetEnd - tdxMetadataOffsetStart
+ DB 0x35, 0x65, 0x7a, 0xe4, 0x4a, 0x98, 0x98, 0x47
+ DB 0x86, 0x5e, 0x46, 0x85, 0xa7, 0xbf, 0x8e, 0xc2
+tdxMetadataOffsetEnd:
+
+%endif
+
; SEV Hash Table Block
;
; This describes the guest ram area where the hypervisor should
@@ -158,10 +177,30 @@ resetVector:
;
; This is where the processor will begin execution
;
+; In IA32 we follow the standard reset vector flow. While in X64, Td guest
+; may be supported. Td guest requires the startup mode to be 32-bit
+; protected mode but the legacy VM startup mode is 16-bit real mode.
+; To make NASM generate such shared entry code that behaves correctly in
+; both 16-bit and 32-bit mode, more BITS directives are added.
+;
+%ifdef ARCH_IA32
nop
nop
jmp EarlyBspInitReal16

+%else
+
+ smsw ax
+ test al, 1
+ jz .Real
+BITS 32
+ jmp Main32
+BITS 16
+.Real:
+ jmp EarlyBspInitReal16
+
+%endif
+
ALIGN 16

fourGigabytes:
diff --git a/OvmfPkg/ResetVector/Ia32/Flat32ToFlat64.asm b/OvmfPkg/ResetVector/Ia32/Flat32ToFlat64.asm
index c6d0d898bcd1..470f428a1b81 100644
--- a/OvmfPkg/ResetVector/Ia32/Flat32ToFlat64.asm
+++ b/OvmfPkg/ResetVector/Ia32/Flat32ToFlat64.asm
@@ -17,6 +17,15 @@ Transition32FlatTo64Flat:

OneTimeCall SetCr3ForPageTables64

+ OneTimeCall PostSetCr3PageTables64Tdx
+
+ ;
+ ; If it is TDX, we're done and jump to enable paging
+ ;
+ OneTimeCall IsTdxEnabled
+ test eax, eax
+ jnz EnablePaging
+
mov eax, cr4
bts eax, 5 ; enable PAE
mov cr4, eax
@@ -71,6 +80,7 @@ jumpTo64BitAndLandHere:

;
; Check if the second step of the SEV-ES mitigation is to be performed.
+ ; If it is Tdx, ebx is cleared in PostSetCr3PageTables64Tdx.
;
test ebx, ebx
jz InsnCompare
diff --git a/OvmfPkg/ResetVector/Ia32/IntelTdx.asm b/OvmfPkg/ResetVector/Ia32/IntelTdx.asm
new file mode 100644
index 000000000000..dbe0913bd051
--- /dev/null
+++ b/OvmfPkg/ResetVector/Ia32/IntelTdx.asm
@@ -0,0 +1,302 @@
+;------------------------------------------------------------------------------
+; @file
+; Intel TDX routines
+;
+; Copyright (c) 2021, Intel Corporation. All rights reserved.<BR>
+; SPDX-License-Identifier: BSD-2-Clause-Patent
+;
+;------------------------------------------------------------------------------
+
+%define SEC_DEFAULT_CR0 0x00000023
+%define SEC_DEFAULT_CR4 0x640
+%define VM_GUEST_TDX 2
+
+BITS 32
+
+;
+; Check if it is Intel Tdx
+;
+; Modified: EAX, EBX, ECX, EDX
+;
+; If it is Intel Tdx, EAX is zero
+; If it is not Intel Tdx, EAX is non-zero
+;
+IsTdx:
+ ;
+ ; CPUID (0)
+ ;
+ mov eax, 0
+ cpuid
+ cmp ebx, 0x756e6547 ; "Genu"
+ jne IsNotTdx
+ cmp edx, 0x49656e69 ; "ineI"
+ jne IsNotTdx
+ cmp ecx, 0x6c65746e ; "ntel"
+ jne IsNotTdx
+
+ ;
+ ; CPUID (1)
+ ;
+ mov eax, 1
+ cpuid
+ test ecx, 0x80000000
+ jz IsNotTdx
+
+ ;
+ ; CPUID[0].EAX >= 0x21?
+ ;
+ mov eax, 0
+ cpuid
+ cmp eax, 0x21
+ jl IsNotTdx
+
+ ;
+ ; CPUID (0x21,0)
+ ;
+ mov eax, 0x21
+ mov ecx, 0
+ cpuid
+
+ cmp ebx, 0x65746E49 ; "Inte"
+ jne IsNotTdx
+ cmp edx, 0x5844546C ; "lTDX"
+ jne IsNotTdx
+ cmp ecx, 0x20202020 ; " "
+ jne IsNotTdx
+
+ mov eax, 0
+ jmp ExitIsTdx
+
+IsNotTdx:
+ mov eax, 1
+
+ExitIsTdx:
+
+ OneTimeCallRet IsTdx
+
+;
+; Initialize work area if it is Tdx guest. Detailed definition is in
+; OvmfPkg/Include/WorkArea.h.
+; BSP and APs all go here. Only BSP initialize this work area.
+;
+; Param[in] EBP[5:0] CPU Supported GPAW (48 or 52)
+; Param[in] ESI[31:0] vCPU ID (BSP is 0, others are AP)
+;
+; Modified: EBP
+;
+InitTdxWorkarea:
+
+ ;
+ ; First check if it is Tdx
+ ;
+ OneTimeCall IsTdx
+
+ test eax, eax
+ jnz ExitInitTdxWorkarea
+
+ cmp esi, 0
+ je TdxBspEntry
+
+ ;
+ ; In Td guest, BSP/AP shares the same entry point
+ ; BSP builds up the page table, while APs shouldn't do the same task.
+ ; Instead, APs just leverage the page table which is built by BSP.
+ ; APs will wait until the page table is ready.
+ ;
+TdxApWait:
+ cmp byte[TDX_WORK_AREA_PGTBL_READY], 0
+ je TdxApWait
+ jmp ExitInitTdxWorkarea
+
+TdxBspEntry:
+ ;
+ ; Set Type/Subtype of WORK_AREA_GUEST_TYPE so that the following code can use
+ ; these information.
+ ;
+ mov byte[WORK_AREA_GUEST_TYPE], VM_GUEST_TDX
+ mov byte[WORK_AREA_GUEST_SUBTYPE], 0
+
+ ;
+ ; EBP[5:0] CPU supported GPA width
+ ;
+ and ebp, 0x3f
+ cmp ebp, 52
+ jl NotPageLevel5
+ mov byte[TDX_WORK_AREA_PAGELEVEL5], 1
+
+NotPageLevel5:
+ mov DWORD[TDX_WORK_AREA_GPAW], ebp
+
+ExitInitTdxWorkarea:
+ OneTimeCallRet InitTdxWorkarea
+
+;
+; Load the GDT and set the CR0, then jump to Flat 32 protected mode.
+;
+; Modified: EAX, EBX, CR0, CR4, DS, ES, FS, GS, SS
+;
+ReloadFlat32:
+
+ cli
+ mov ebx, ADDR_OF(gdtr)
+ lgdt [ebx]
+
+ mov eax, SEC_DEFAULT_CR0
+ mov cr0, eax
+
+ jmp LINEAR_CODE_SEL:dword ADDR_OF(jumpToFlat32BitAndLandHere)
+
+jumpToFlat32BitAndLandHere:
+
+ mov eax, SEC_DEFAULT_CR4
+ mov cr4, eax
+
+ debugShowPostCode POSTCODE_32BIT_MODE
+
+ mov ax, LINEAR_SEL
+ mov ds, ax
+ mov es, ax
+ mov fs, ax
+ mov gs, ax
+ mov ss, ax
+
+ OneTimeCallRet ReloadFlat32
+
+;
+; Tdx initialization after entering into ResetVector
+;
+; Modified: EAX, EBX, ECX, EDX, EBP, EDI, ESP
+;
+InitTdx:
+ ;
+ ; Save EBX in EBP because EBX will be changed in ReloadFlat32
+ ;
+ mov ebp, ebx
+
+ ;
+ ; First load the GDT and jump to Flat32 mode
+ ;
+ OneTimeCall ReloadFlat32
+
+ ;
+ ; Initialization of Tdx work area
+ ;
+ OneTimeCall InitTdxWorkarea
+
+ OneTimeCallRet InitTdx
+
+;
+; Called after SetCr3PageTables64 in Tdx guest to set CR0/CR4.
+; If GPAW is 52, then CR3 is adjusted as well.
+;
+; Modified: EAX, EBX, CR0, CR3, CR4
+;
+PostSetCr3PageTables64Tdx:
+ ;
+ ; WORK_AREA_GUEST_TYPE was set in InitTdx if it is Tdx guest
+ ;
+ cmp byte[WORK_AREA_GUEST_TYPE], VM_GUEST_TDX
+ jne ExitPostSetCr3PageTables64Tdx
+
+ mov eax, cr4
+ bts eax, 5 ; enable PAE
+
+ ;
+ ; byte[TDX_WORK_AREA_PAGELEVEL5] holds the indicator whether 52bit is
+ ; supported. if it is the case, need to set LA57 and use 5-level paging
+ ;
+ cmp byte[TDX_WORK_AREA_PAGELEVEL5], 0
+ jz TdxSetCr4
+ bts eax, 12
+
+TdxSetCr4:
+ mov cr4, eax
+ mov ebx, cr3
+
+ ;
+ ; if la57 is not set, we are ok
+ ; if using 5-level paging, adjust top-level page directory
+ ;
+ bt eax, 12
+ jnc TdxSetCr3
+ mov ebx, TDX_PT_ADDR (0)
+
+TdxSetCr3:
+ mov cr3, ebx
+
+ xor ebx, ebx
+
+ExitPostSetCr3PageTables64Tdx:
+ OneTimeCallRet PostSetCr3PageTables64Tdx
+
+;
+; Build TDX Extra page table
+;
+; Modified: EAX, ECX
+;
+TdxBuildExtraPageTables:
+ cmp byte[WORK_AREA_GUEST_TYPE], VM_GUEST_TDX
+ jne ExitTdxBuildExtraPageTables
+
+ xor eax, eax
+ mov ecx, 0x400
+tdClearTdxPageTablesMemoryLoop:
+ mov dword [ecx * 4 + TDX_PT_ADDR(0) - 4], eax
+ loop tdClearTdxPageTablesMemoryLoop
+
+ ;
+ ; Top level Page Directory Pointers (1 * 256TB entry)
+ ;
+ mov dword[TDX_PT_ADDR (0)], PT_ADDR(0) + PAGE_PDP_ATTR
+
+ ;
+ ; Set TDX_WORK_AREA_PGTBL_READY to notify APs to go
+ ;
+ mov byte[TDX_WORK_AREA_PGTBL_READY], 1
+
+ExitTdxBuildExtraPageTables:
+ OneTimeCallRet TdxBuildExtraPageTables
+
+;
+; Check TDX features, Non-TDX or TDX-BSP or TDX-APs?
+;
+; By design TDX BSP is reponsible for inintializing the PageTables.
+; After PageTables are ready, byte[TDX_WORK_AREA_PGTBL_READY] is set to 1.
+; APs will spin when byte[TDX_WORK_AREA_PGTBL_READY] is 0 until it is set to 1.
+;
+; When this routine is run on TDX BSP, byte[TDX_WORK_AREA_PGTBL_READY] should be 0.
+; When this routine is run on TDX APs, byte[TDX_WORK_AREA_PGTBL_READY] should be 1.
+;
+;
+; Modified: EAX, EDX
+;
+; 0-NonTdx, 1-TdxBsp, 2-TdxAps
+;
+CheckTdxFeaturesBeforeBuildPagetables:
+ xor eax, eax
+ cmp byte[WORK_AREA_GUEST_TYPE], VM_GUEST_TDX
+ jne NotTdx
+
+ xor edx, edx
+ mov al, byte[TDX_WORK_AREA_PGTBL_READY]
+ inc eax
+
+NotTdx:
+ OneTimeCallRet CheckTdxFeaturesBeforeBuildPagetables
+
+;
+; Check if TDX is enabled
+;
+; Modified: EAX
+;
+; If TDX is enabled then EAX will be 1
+; If TDX is disabled then EAX will be 0.
+;
+IsTdxEnabled:
+ xor eax, eax
+ cmp byte[WORK_AREA_GUEST_TYPE], VM_GUEST_TDX
+ jne TdxNotEnabled
+ mov eax, 1
+
+TdxNotEnabled:
+ OneTimeCallRet IsTdxEnabled
diff --git a/OvmfPkg/ResetVector/Ia32/PageTables64.asm b/OvmfPkg/ResetVector/Ia32/PageTables64.asm
index 07b6ca070909..585325382e18 100644
--- a/OvmfPkg/ResetVector/Ia32/PageTables64.asm
+++ b/OvmfPkg/ResetVector/Ia32/PageTables64.asm
@@ -37,14 +37,22 @@ BITS 32
PAGE_READ_WRITE + \
PAGE_PRESENT)

+%define TDX_BSP 1
+%define TDX_AP 2
;
; Modified: EAX, EBX, ECX, EDX
;
SetCr3ForPageTables64:
-
- ; Clear the WorkArea header. The SEV probe routines will populate the
- ; work area when detected.
- mov byte[WORK_AREA_GUEST_TYPE], 0
+ ; Check the TDX features.
+ ; If it is TDX APs, then jump to SetCr3 directly.
+ ; In TD guest the initialization is done by BSP, including building
+ ; the page tables. APs will spin on until byte[TDX_WORK_AREA_PGTBL_READY]
+ ; is set.
+ OneTimeCall CheckTdxFeaturesBeforeBuildPagetables
+ cmp eax, TDX_BSP
+ je ClearOvmfPageTables
+ cmp eax, TDX_AP
+ je SetCr3

; Check whether the SEV is active and populate the SevEsWorkArea
OneTimeCall CheckSevFeatures
@@ -54,6 +62,7 @@ SetCr3ForPageTables64:
; the page table build below.
OneTimeCall GetSevCBitMaskAbove31

+ClearOvmfPageTables:
;
; For OVMF, build some initial page tables at
; PcdOvmfSecPageTablesBase - (PcdOvmfSecPageTablesBase + 0x6000).
@@ -105,6 +114,9 @@ pageTableEntriesLoop:
; Clear the C-bit from the GHCB page if the SEV-ES is enabled.
OneTimeCall SevClearPageEncMaskForGhcbPage

+ ; Build Tdx extra pages
+ OneTimeCall TdxBuildExtraPageTables
+
SetCr3:
;
; Set CR3 now that the paging structures are available
diff --git a/OvmfPkg/ResetVector/Main.asm b/OvmfPkg/ResetVector/Main.asm
new file mode 100644
index 000000000000..2a7efbc48a2a
--- /dev/null
+++ b/OvmfPkg/ResetVector/Main.asm
@@ -0,0 +1,119 @@
+;------------------------------------------------------------------------------
+; @file
+; Main routine of the pre-SEC code up through the jump into SEC
+;
+; Copyright (c) 2008 - 2009, Intel Corporation. All rights reserved.<BR>
+; SPDX-License-Identifier: BSD-2-Clause-Patent
+;
+;------------------------------------------------------------------------------
+
+
+BITS 16
+
+;
+; Modified: EBX, ECX, EDX, EBP
+;
+; @param[in,out] RAX/EAX Initial value of the EAX register
+; (BIST: Built-in Self Test)
+; @param[in,out] DI 'BP': boot-strap processor, or
+; 'AP': application processor
+; @param[out] RBP/EBP Address of Boot Firmware Volume (BFV)
+; @param[out] DS Selector allowing flat access to all addresses
+; @param[out] ES Selector allowing flat access to all addresses
+; @param[out] FS Selector allowing flat access to all addresses
+; @param[out] GS Selector allowing flat access to all addresses
+; @param[out] SS Selector allowing flat access to all addresses
+;
+; @return None This routine jumps to SEC and does not return
+;
+Main16:
+ OneTimeCall EarlyInit16
+
+ ;
+ ; Transition the processor from 16-bit real mode to 32-bit flat mode
+ ;
+ OneTimeCall TransitionFromReal16To32BitFlat
+
+BITS 32
+%ifdef ARCH_X64
+
+ ; Clear the WorkArea header. The SEV probe routines will populate the
+ ; work area when detected.
+ mov byte[WORK_AREA_GUEST_TYPE], 0
+
+ jmp SearchBfv
+
+;
+; Entry point of Main32
+;
+Main32:
+ OneTimeCall InitTdx
+
+SearchBfv:
+
+%endif
+ ;
+ ; Search for the Boot Firmware Volume (BFV)
+ ;
+ OneTimeCall Flat32SearchForBfvBase
+
+ ;
+ ; EBP - Start of BFV
+ ;
+
+ ;
+ ; Search for the SEC entry point
+ ;
+ OneTimeCall Flat32SearchForSecEntryPoint
+
+ ;
+ ; ESI - SEC Core entry point
+ ; EBP - Start of BFV
+ ;
+
+%ifdef ARCH_IA32
+
+ ;
+ ; Restore initial EAX value into the EAX register
+ ;
+ mov eax, esp
+
+ ;
+ ; Jump to the 32-bit SEC entry point
+ ;
+ jmp esi
+
+%else
+
+ ;
+ ; Transition the processor from 32-bit flat mode to 64-bit flat mode
+ ;
+ OneTimeCall Transition32FlatTo64Flat
+
+BITS 64
+
+ ;
+ ; Some values were calculated in 32-bit mode. Make sure the upper
+ ; 32-bits of 64-bit registers are zero for these values.
+ ;
+ mov rax, 0x00000000ffffffff
+ and rsi, rax
+ and rbp, rax
+ and rsp, rax
+
+ ;
+ ; RSI - SEC Core entry point
+ ; RBP - Start of BFV
+ ;
+
+ ;
+ ; Restore initial EAX value into the RAX register
+ ;
+ mov rax, rsp
+
+ ;
+ ; Jump to the 64-bit SEC entry point
+ ;
+ jmp rsi
+
+%endif
diff --git a/OvmfPkg/ResetVector/ResetVector.inf b/OvmfPkg/ResetVector/ResetVector.inf
index a2520dde5508..d49c7ca37ec9 100644
--- a/OvmfPkg/ResetVector/ResetVector.inf
+++ b/OvmfPkg/ResetVector/ResetVector.inf
@@ -44,6 +44,16 @@
gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPeiTempRamBase
gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecPeiTempRamSize
gUefiOvmfPkgTokenSpaceGuid.PcdOvmfWorkAreaBase
+ gUefiOvmfPkgTokenSpaceGuid.PcdOvmfWorkAreaSize
+ gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecGhcbBackupBase
+ gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecGhcbBackupSize
+ gUefiOvmfPkgTokenSpaceGuid.PcdOvmfImageSizeInKb
+ gUefiOvmfPkgTokenSpaceGuid.PcdCfvBase
+ gUefiOvmfPkgTokenSpaceGuid.PcdCfvRawDataOffset
+ gUefiOvmfPkgTokenSpaceGuid.PcdCfvRawDataSize
+ gUefiOvmfPkgTokenSpaceGuid.PcdBfvBase
+ gUefiOvmfPkgTokenSpaceGuid.PcdBfvRawDataOffset
+ gUefiOvmfPkgTokenSpaceGuid.PcdBfvRawDataSize

[FixedPcd]
gUefiOvmfPkgTokenSpaceGuid.PcdSevLaunchSecretBase
diff --git a/OvmfPkg/ResetVector/ResetVector.nasmb b/OvmfPkg/ResetVector/ResetVector.nasmb
index d1d800c56745..996edec07985 100644
--- a/OvmfPkg/ResetVector/ResetVector.nasmb
+++ b/OvmfPkg/ResetVector/ResetVector.nasmb
@@ -67,19 +67,59 @@
%error "This implementation inherently depends on PcdOvmfSecGhcbBase not straddling a 2MB boundary"
%endif

+ %define TDX_BFV_RAW_DATA_OFFSET FixedPcdGet32 (PcdBfvRawDataOffset)
+ %define TDX_BFV_RAW_DATA_SIZE FixedPcdGet32 (PcdBfvRawDataSize)
+ %define TDX_BFV_MEMORY_BASE FixedPcdGet32 (PcdBfvBase)
+ %define TDX_BFV_MEMORY_SIZE FixedPcdGet32 (PcdBfvRawDataSize)
+
+ %define TDX_CFV_RAW_DATA_OFFSET FixedPcdGet32 (PcdCfvRawDataOffset)
+ %define TDX_CFV_RAW_DATA_SIZE FixedPcdGet32 (PcdCfvRawDataSize)
+ %define TDX_CFV_MEMORY_BASE FixedPcdGet32 (PcdCfvBase),
+ %define TDX_CFV_MEMORY_SIZE FixedPcdGet32 (PcdCfvRawDataSize),
+
+ %define TDX_HEAP_MEMORY_BASE FixedPcdGet32 (PcdOvmfSecPeiTempRamBase)
+ %define TDX_HEAP_MEMORY_SIZE FixedPcdGet32 (PcdOvmfSecPeiTempRamSize) / 2
+
+ %define TDX_STACK_MEMORY_BASE (TDX_HEAP_MEMORY_BASE + TDX_HEAP_MEMORY_SIZE)
+ %define TDX_STACK_MEMORY_SIZE FixedPcdGet32 (PcdOvmfSecPeiTempRamSize) / 2
+
+ %define TDX_HOB_MEMORY_BASE FixedPcdGet32 (PcdOvmfSecGhcbBase)
+ %define TDX_HOB_MEMORY_SIZE FixedPcdGet32 (PcdOvmfSecGhcbSize)
+
+ %define TDX_MAILBOX_MEMORY_BASE FixedPcdGet32 (PcdOvmfSecGhcbBackupBase)
+ %define TDX_MAILBOX_MEMORY_SIZE FixedPcdGet32 (PcdOvmfSecGhcbBackupSize)
+
+ %define OVMF_PAGE_TABLE_BASE FixedPcdGet32 (PcdOvmfSecPageTablesBase)
+ %define OVMF_PAGE_TABLE_SIZE FixedPcdGet32 (PcdOvmfSecPageTablesSize)
+
+ %define TDX_EXTRA_PAGE_TABLE_BASE FixedPcdGet32 (PcdOvmfSecGhcbPageTableBase)
+ %define TDX_EXTRA_PAGE_TABLE_SIZE FixedPcdGet32 (PcdOvmfSecGhcbPageTableSize)
+ %define TDX_PT_ADDR(Offset) (TDX_EXTRA_PAGE_TABLE_BASE + (Offset))
+
+ %define TDX_WORK_AREA_PAGELEVEL5 (FixedPcdGet32 (PcdOvmfWorkAreaBase) + 4)
+ %define TDX_WORK_AREA_PGTBL_READY (FixedPcdGet32 (PcdOvmfWorkAreaBase) + 5)
+ %define TDX_WORK_AREA_GPAW (FixedPcdGet32 (PcdOvmfWorkAreaBase) + 8)
+
%define PT_ADDR(Offset) (FixedPcdGet32 (PcdOvmfSecPageTablesBase) + (Offset))

+ %define OVMF_WORK_AREA_BASE (FixedPcdGet32 (PcdOvmfWorkAreaBase))
+ %define OVMF_WORK_AREA_SIZE (FixedPcdGet32 (PcdOvmfWorkAreaSize))
+
%define GHCB_PT_ADDR (FixedPcdGet32 (PcdOvmfSecGhcbPageTableBase))
%define GHCB_BASE (FixedPcdGet32 (PcdOvmfSecGhcbBase))
%define GHCB_SIZE (FixedPcdGet32 (PcdOvmfSecGhcbSize))
%define WORK_AREA_GUEST_TYPE (FixedPcdGet32 (PcdOvmfWorkAreaBase))
+ %define WORK_AREA_GUEST_SUBTYPE (FixedPcdGet32 (PcdOvmfWorkAreaBase) + 1)
%define SEV_ES_WORK_AREA (FixedPcdGet32 (PcdSevEsWorkAreaBase))
%define SEV_ES_WORK_AREA_RDRAND (FixedPcdGet32 (PcdSevEsWorkAreaBase) + 8)
%define SEV_ES_WORK_AREA_ENC_MASK (FixedPcdGet32 (PcdSevEsWorkAreaBase) + 16)
%define SEV_ES_VC_TOP_OF_STACK (FixedPcdGet32 (PcdOvmfSecPeiTempRamBase) + FixedPcdGet32 (PcdOvmfSecPeiTempRamSize))
-%include "Ia32/Flat32ToFlat64.asm"
-%include "Ia32/AmdSev.asm"
-%include "Ia32/PageTables64.asm"
+
+ %include "X64/IntelTdxMetadata.asm"
+ %include "Ia32/Flat32ToFlat64.asm"
+ %include "Ia32/AmdSev.asm"
+ %include "Ia32/PageTables64.asm"
+ %include "Ia32/IntelTdx.asm"
%endif

%include "Ia16/Real16ToFlat32.asm"
@@ -92,5 +132,6 @@
%define SEV_LAUNCH_SECRET_SIZE FixedPcdGet32 (PcdSevLaunchSecretSize)
%define SEV_FW_HASH_BLOCK_BASE FixedPcdGet32 (PcdQemuHashTableBase)
%define SEV_FW_HASH_BLOCK_SIZE FixedPcdGet32 (PcdQemuHashTableSize)
+ %define OVMF_IMAGE_SIZE_IN_KB FixedPcdGet32 (PcdOvmfImageSizeInKb)
%include "Ia16/ResetVectorVtf0.asm"

diff --git a/OvmfPkg/ResetVector/X64/IntelTdxMetadata.asm b/OvmfPkg/ResetVector/X64/IntelTdxMetadata.asm
new file mode 100644
index 000000000000..ce92795851b2
--- /dev/null
+++ b/OvmfPkg/ResetVector/X64/IntelTdxMetadata.asm
@@ -0,0 +1,110 @@
+;------------------------------------------------------------------------------
+; @file
+; Tdx Virtual Firmware metadata
+;
+; Copyright (c) 2021, Intel Corporation. All rights reserved.<BR>
+; SPDX-License-Identifier: BSD-2-Clause-Patent
+;
+;------------------------------------------------------------------------------
+
+BITS 64
+
+%define TDX_METADATA_SECTION_TYPE_BFV 0
+%define TDX_METADATA_SECTION_TYPE_CFV 1
+%define TDX_METADATA_SECTION_TYPE_TD_HOB 2
+%define TDX_METADATA_SECTION_TYPE_TEMP_MEM 3
+%define TDX_METADATA_VERSION 1
+%define TDX_METADATA_ATTRIBUTES_EXTENDMR 0x00000001
+
+ALIGN 16
+TIMES (15 - ((TdxGuidedStructureEnd - TdxGuidedStructureStart + 15) % 16)) DB 0
+
+TdxGuidedStructureStart:
+
+;
+; TDVF meta data
+;
+TdxMetadataGuid:
+ DB 0xf3, 0xf9, 0xea, 0xe9, 0x8e, 0x16, 0xd5, 0x44
+ DB 0xa8, 0xeb, 0x7f, 0x4d, 0x87, 0x38, 0xf6, 0xae
+
+_Descriptor:
+ DB 'T','D','V','F' ; Signature
+ DD TdxGuidedStructureEnd - _Descriptor ; Length
+ DD TDX_METADATA_VERSION ; Version
+ DD (TdxGuidedStructureEnd - _Descriptor - 16)/32 ; Number of sections
+
+_Bfv:
+ DD TDX_BFV_RAW_DATA_OFFSET
+ DD TDX_BFV_RAW_DATA_SIZE
+ DQ TDX_BFV_MEMORY_BASE
+ DQ TDX_BFV_MEMORY_SIZE
+ DD TDX_METADATA_SECTION_TYPE_BFV
+ DD TDX_METADATA_ATTRIBUTES_EXTENDMR
+
+_Cfv:
+ DD TDX_CFV_RAW_DATA_OFFSET
+ DD TDX_CFV_RAW_DATA_SIZE
+ DQ TDX_CFV_MEMORY_BASE
+ DQ TDX_CFV_MEMORY_SIZE
+ DD TDX_METADATA_SECTION_TYPE_CFV
+ DD 0
+
+_Stack:
+ DD 0
+ DD 0
+ DQ TDX_STACK_MEMORY_BASE
+ DQ TDX_STACK_MEMORY_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TEMP_MEM
+ DD 0
+
+_Heap:
+ DD 0
+ DD 0
+ DQ TDX_HEAP_MEMORY_BASE
+ DQ TDX_HEAP_MEMORY_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TEMP_MEM
+ DD 0
+
+_MailBox:
+ DD 0
+ DD 0
+ DQ TDX_MAILBOX_MEMORY_BASE
+ DQ TDX_MAILBOX_MEMORY_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TEMP_MEM
+ DD 0
+
+_OvmfWorkarea:
+ DD 0
+ DD 0
+ DQ OVMF_WORK_AREA_BASE
+ DQ OVMF_WORK_AREA_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TEMP_MEM
+ DD 0
+
+_TdHob:
+ DD 0
+ DD 0
+ DQ TDX_HOB_MEMORY_BASE
+ DQ TDX_HOB_MEMORY_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TD_HOB
+ DD 0
+
+_TdxPageTable:
+ DD 0
+ DD 0
+ DQ TDX_EXTRA_PAGE_TABLE_BASE
+ DQ TDX_EXTRA_PAGE_TABLE_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TEMP_MEM
+ DD 0
+
+_OvmfPageTable:
+ DD 0
+ DD 0
+ DQ OVMF_PAGE_TABLE_BASE
+ DQ OVMF_PAGE_TABLE_SIZE
+ DD TDX_METADATA_SECTION_TYPE_TEMP_MEM
+ DD 0
+
+TdxGuidedStructureEnd:
+ALIGN 16
--
2.29.2.windows.2


Gerd Hoffmann
 

Hi,

_TdxPageTable:
If 5-level page table is supported (GPAW is 52), a top level page
directory pointers (1 * 256TB entry) is generated in this page.
_OvmfPageTable:
Initial page table for standard Ovmf.
Hmm, isn't 5-level paging independent from TDX? Why mix the two?

I think a top level page directory should be added to the standard ovmf
initial page tables instead, and setting up 5-level paging should not
happen in tdx-specific code.

take care,
Gerd


Min Xu
 

On Monday, August 30, 2021 3:41 PM, Gerd Hoffmann wrote:
Hi,

_TdxPageTable:
If 5-level page table is supported (GPAW is 52), a top level page
directory pointers (1 * 256TB entry) is generated in this page.
_OvmfPageTable:
Initial page table for standard Ovmf.
Hmm, isn't 5-level paging independent from TDX? Why mix the two?

I think a top level page directory should be added to the standard ovmf initial
page tables instead, and setting up 5-level paging should not happen in tdx-
specific code.
In current Ovmf implementation (OvmfPkg/ResetVector/Ia32/PageTables64.asm)
there are 6 pages reserved for initial page tables. It doesn't support 5-level paging.

TDX support GPAW 48 and 52. If GPAW is 52 we need an extra page to hold the top
level page directory pointers (1 * 256TB entry).

This TDX extra page reuses the memory region defined by PcdOvmfSecGhcbPageTableBase
In MEMFD. Because this memory region (PcdOvmfSecGhcbPageTableBase) will not
be consumed by SEV code in Tdx guest.
Thanks!
Min


Gerd Hoffmann
 

On Tue, Aug 31, 2021 at 03:09:08AM +0000, Xu, Min M wrote:
On Monday, August 30, 2021 3:41 PM, Gerd Hoffmann wrote:
Hi,

_TdxPageTable:
If 5-level page table is supported (GPAW is 52), a top level page
directory pointers (1 * 256TB entry) is generated in this page.
_OvmfPageTable:
Initial page table for standard Ovmf.
Hmm, isn't 5-level paging independent from TDX? Why mix the two?

I think a top level page directory should be added to the standard ovmf initial
page tables instead, and setting up 5-level paging should not happen in tdx-
specific code.
In current Ovmf implementation (OvmfPkg/ResetVector/Ia32/PageTables64.asm)
there are 6 pages reserved for initial page tables. It doesn't support 5-level paging.
Sure. And I think we should add proper 5-level paging support to the
current ovmf implementation instead of adding hacks to the tdx code.

take care,
Gerd


Min Xu
 

On August 31, 2021 1:35 PM, Gerd Hoffmann wrote:
On Tue, Aug 31, 2021 at 03:09:08AM +0000, Xu, Min M wrote:
On Monday, August 30, 2021 3:41 PM, Gerd Hoffmann wrote:
Hi,

_TdxPageTable:
If 5-level page table is supported (GPAW is 52), a top level page
directory pointers (1 * 256TB entry) is generated in this page.
_OvmfPageTable:
Initial page table for standard Ovmf.
Hmm, isn't 5-level paging independent from TDX? Why mix the two?

I think a top level page directory should be added to the standard
ovmf initial page tables instead, and setting up 5-level paging
should not happen in tdx- specific code.
In current Ovmf implementation
(OvmfPkg/ResetVector/Ia32/PageTables64.asm)
there are 6 pages reserved for initial page tables. It doesn't support 5-level
paging.

Sure. And I think we should add proper 5-level paging support to the current
ovmf implementation instead of adding hacks to the tdx code.
My understanding is that we should first add 5-level paging support in OVMF, right?
I am planning to add 5-level paging in OvmfPkgX64.dsc. Any comments?

take care,
Gerd





Gerd Hoffmann
 

Hi,

Sure. And I think we should add proper 5-level paging support to the current
ovmf implementation instead of adding hacks to the tdx code.
My understanding is that we should first add 5-level paging support in OVMF, right?
Well, the page table setup should be in common code not tdx code as
5-level paging isn't something tdx-specific.

I'd suggest to add this to OvmfPkg/ResetVector/Ia32/PageTables64.asm.
Reserve one more page, setup the tables for 5-level paging by inserting
a level 5 page directory.

When using 5-level paging let cr3 point to the first page (level 5
pagedir), when using 4-level paging let cr3 point to the second page
(level 4 pagedir).

Can be part of this patch series, just make it a separate patch for
easier review.

Whenever we should enable 5-level paging even in non-tdx mode or use
5-level paging only with tdx is a separate question. We can continue to
use 4-level paging in non-tdx mode for now and discuss that later.

I'm not sure which implications this would have for booting older
kernels, when handing over control to a OS kernel without 5-level paging
support but 5-level paging enabled (non-issue for tdx as this requires a
new tdx-aware guest kernel anyway ...).

take care,
Gerd


Min Xu
 

On September 2, 2021 3:18 PM, Gerd Hoffmann wrote:
Hi,

Sure. And I think we should add proper 5-level paging support to
the current ovmf implementation instead of adding hacks to the tdx code.
My understanding is that we should first add 5-level paging support in
OVMF, right?

Well, the page table setup should be in common code not tdx code as 5-level
paging isn't something tdx-specific.
Agree.

I'd suggest to add this to OvmfPkg/ResetVector/Ia32/PageTables64.asm.
Reserve one more page, setup the tables for 5-level paging by inserting a
level 5 page directory.
In the current patch a page (defined by PcdOvmfSecGhcbPageTableBase) reserved in MEMFD
is used as the 5-level page directory.
Now One new page will be reserved in MEMFD to hold the level 5 page directory. Like below:
0x00C000|0x001000
gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecGhcbBackupBase|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecGhcbBackupSize

+0x00D000|0x001000
+gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPml5Base|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPml5Size

When using 5-level paging let cr3 point to the first page (level 5 pagedir),
when using 4-level paging let cr3 point to the second page (level 4 pagedir).
Yes. CPUID.(EAX=07H, ECX=0):ECX[bit 16] will be used to check if 5-level paging
is supported.

Can be part of this patch series, just make it a separate patch for easier
review.
Sure.

Whenever we should enable 5-level paging even in non-tdx mode or use 5-
level paging only with tdx is a separate question. We can continue to use 4-
level paging in non-tdx mode for now and discuss that later.
Agree.

I'm not sure which implications this would have for booting older kernels,
when handing over control to a OS kernel without 5-level paging support but
5-level paging enabled (non-issue for tdx as this requires a new tdx-aware
guest kernel anyway ...).
Thanks!
Min


Yao, Jiewen
 

HI Min/Gerd
I think we have multiple ways to enable 5 level paging.

1) We do not change to 5 level in initial paging in reset vector.
We can switch from 4 level to 5 level later when permanent memory is available.
We don't need change flash layout.

2) We can enable 5 level paging in initial paging.
2.1) We can enable 5 level paging with 1G paging support.
We don't need change flash layout. Only 3 pages is needed. (12K)
I don't know if we can real case that a CPU support 5 level but without 1G paging.

2.2) We can still enable 5 level paging with 2M paging.
2.2.1) We can change flash layout to increase 6 pages (24K) memory to 7 pages (28K).
So the CR3 in 5 level is same as the CR3 in 4 level.

2.2.2) We don't change flash layout but steal another page in somewhere else - PcdOvmfPml5Base
That means CR3 in 5 level is different with CR4 in 4 level.
Personally, I don't like the idea to create PcdOvmfPml5Base/Size
Other AP MUST check 5 level and 4 level to get right CR3 location. That is tricky and unnecessary.

In current patch, 2.2.2) is used.

I suggest we also evaluate option 1), 2.1) and 2.2.1).

If changing layout is NOT a concern then we can do 2.2.1).
If we don't want to change layout, we can do 2.1) and fall back to 1).



Thank you
Yao Jiewen

-----Original Message-----
From: Xu, Min M <min.m.xu@intel.com>
Sent: Thursday, September 2, 2021 3:49 PM
To: kraxel@redhat.com
Cc: devel@edk2.groups.io; Ard Biesheuvel <ardb+tianocore@kernel.org>; Justen,
Jordan L <jordan.l.justen@intel.com>; Brijesh Singh <brijesh.singh@amd.com>;
Erdem Aktas <erdemaktas@google.com>; James Bottomley
<jejb@linux.ibm.com>; Yao, Jiewen <jiewen.yao@intel.com>; Tom Lendacky
<thomas.lendacky@amd.com>
Subject: RE: [edk2-devel] [PATCH V5 2/2] OvmfPkg/ResetVector: Enable Intel
TDX in ResetVector of Ovmf

On September 2, 2021 3:18 PM, Gerd Hoffmann wrote:
Hi,

Sure. And I think we should add proper 5-level paging support to
the current ovmf implementation instead of adding hacks to the tdx code.
My understanding is that we should first add 5-level paging support in
OVMF, right?

Well, the page table setup should be in common code not tdx code as 5-level
paging isn't something tdx-specific.
Agree.

I'd suggest to add this to OvmfPkg/ResetVector/Ia32/PageTables64.asm.
Reserve one more page, setup the tables for 5-level paging by inserting a
level 5 page directory.
In the current patch a page (defined by PcdOvmfSecGhcbPageTableBase)
reserved in MEMFD
is used as the 5-level page directory.
Now One new page will be reserved in MEMFD to hold the level 5 page directory.
Like below:
0x00C000|0x001000
gUefiOvmfPkgTokenSpaceGuid.PcdOvmfSecGhcbBackupBase|gUefiOvmfPkgTo
kenSpaceGuid.PcdOvmfSecGhcbBackupSize

+0x00D000|0x001000
+gUefiOvmfPkgTokenSpaceGuid.PcdOvmfPml5Base|gUefiOvmfPkgTokenSpace
Guid.PcdOvmfPml5Size

When using 5-level paging let cr3 point to the first page (level 5 pagedir),
when using 4-level paging let cr3 point to the second page (level 4 pagedir).
Yes. CPUID.(EAX=07H, ECX=0):ECX[bit 16] will be used to check if 5-level paging
is supported.

Can be part of this patch series, just make it a separate patch for easier
review.
Sure.

Whenever we should enable 5-level paging even in non-tdx mode or use 5-
level paging only with tdx is a separate question. We can continue to use 4-
level paging in non-tdx mode for now and discuss that later.
Agree.

I'm not sure which implications this would have for booting older kernels,
when handing over control to a OS kernel without 5-level paging support but
5-level paging enabled (non-issue for tdx as this requires a new tdx-aware
guest kernel anyway ...).
Thanks!
Min


Gerd Hoffmann
 

On Fri, Sep 03, 2021 at 03:03:50AM +0000, Yao, Jiewen wrote:
HI Min/Gerd
I think we have multiple ways to enable 5 level paging.

1) We do not change to 5 level in initial paging in reset vector.
We can switch from 4 level to 5 level later when permanent memory is available.
We don't need change flash layout.
Does that work with tdx?

I had the impression that ovmf can't choose whenever it uses 4-level or
5-level paging in case tdx is enabled, but instead has to use what the
tdx firmware (or hardware?) dictates. And this being the reason why we
have to deal with that in the reset vector in the first place.

But maybe I'm wrong here.

If we can use 4-level paging initially, then we surely should go for
option (1) and simply not touch the reset vectors paging code.

2) We can enable 5 level paging in initial paging.
2.1) We can enable 5 level paging with 1G paging support.
We don't need change flash layout. Only 3 pages is needed. (12K)
I don't know if we can real case that a CPU support 5 level but without 1G paging.

2.2) We can still enable 5 level paging with 2M paging.
2.2.1) We can change flash layout to increase 6 pages (24K) memory to 7 pages (28K).
So the CR3 in 5 level is same as the CR3 in 4 level.

2.2.2) We don't change flash layout but steal another page in somewhere else - PcdOvmfPml5Base
That means CR3 in 5 level is different with CR4 in 4 level.
Personally, I don't like the idea to create PcdOvmfPml5Base/Size
Other AP MUST check 5 level and 4 level to get right CR3 location. That is tricky and unnecessary.

In current patch, 2.2.2) is used.

I suggest we also evaluate option 1), 2.1) and 2.2.1).
My idea is 2.2.1 with a fixed, 5-level layout.
Then use 4-level-cr3 == 5-level-cr3 + PAGE_SIZE

2.1 looks good too.

take care,
Gerd


Min Xu
 

On September 3, 2021 1:39 PM, Gerd Hoffmann wrote:
On Fri, Sep 03, 2021 at 03:03:50AM +0000, Yao, Jiewen wrote:
HI Min/Gerd
I think we have multiple ways to enable 5 level paging.

1) We do not change to 5 level in initial paging in reset vector.
We can switch from 4 level to 5 level later when permanent memory is
available.
We don't need change flash layout.
Does that work with tdx?

I had the impression that ovmf can't choose whenever it uses 4-level or 5-level
paging in case tdx is enabled, but instead has to use what the tdx firmware (or
hardware?) dictates. And this being the reason why we have to deal with that
in the reset vector in the first place.

But maybe I'm wrong here.

If we can use 4-level paging initially, then we surely should go for option (1)
and simply not touch the reset vectors paging code.
After PoC I find this option is not a good one. Though the reset vectors is not touched, there are tricky changes in DxeIpl. To set up 5-level paging in an 4-level paging, it should first be switched from 64-bit long mode to 32 protected mode, then turn off the Paging, disable IA32_ERER.LME, then set the Cr4. The tricky thing is that in TDX IA32_EFER is not changeable. MdeModulePkg/.../DxeIpl is widely used and it is high risk to make such changes.

2) We can enable 5 level paging in initial paging.
2.1) We can enable 5 level paging with 1G paging support.
We don't need change flash layout. Only 3 pages is needed. (12K) I
don't know if we can real case that a CPU support 5 level but without 1G
paging.
According to Intel SDM Volume 3 Section 4.1.1.
Quote "6. Processors that support 4-level paging or 5-level paging do not necessarily support 1-GByte page; see Section 4.1.4"
So option 2.1 is not feasible.

2.2) We can still enable 5 level paging with 2M paging.
2.2.1) We can change flash layout to increase 6 pages (24K) memory to 7
pages (28K).
So the CR3 in 5 level is same as the CR3 in 4 level.

2.2.2) We don't change flash layout but steal another page in
somewhere else - PcdOvmfPml5Base That means CR3 in 5 level is different
with CR4 in 4 level.
Personally, I don't like the idea to create PcdOvmfPml5Base/Size Other
AP MUST check 5 level and 4 level to get right CR3 location. That is tricky and
unnecessary.

In current patch, 2.2.2) is used.

I suggest we also evaluate option 1), 2.1) and 2.2.1).
My idea is 2.2.1 with a fixed, 5-level layout.
Then use 4-level-cr3 == 5-level-cr3 + PAGE_SIZE
Agree. 5-level-cr3 = PT_ADDR (0), 4-level-cr3 = PT_ADDR (0x1000)
2.2.1 is preferred.

2.1 looks good too.
As I explained above, 2.1 is not feasible.
I will use 2.2.1 to implement 5-level paging in OvmfPkgX64.

Thanks!
Min


Gerd Hoffmann
 

Hi,

If we can use 4-level paging initially, then we surely should go for option (1)
and simply not touch the reset vectors paging code.
After PoC I find this option is not a good one. Though the reset
vectors is not touched, there are tricky changes in DxeIpl. To set up
5-level paging in an 4-level paging, it should first be switched from
64-bit long mode to 32 protected mode, then turn off the Paging,
disable IA32_ERER.LME, then set the Cr4. The tricky thing is that in
TDX IA32_EFER is not changeable. MdeModulePkg/.../DxeIpl is widely
used and it is high risk to make such changes.
Ok. One more question: Do we have to use 5-level paging at all?

The only reason I could see is accepting memory with a gpa above 4-level
address space. But with the longer-term plan to support lazy acceptance
(and passing unaccepted memory ranges to the guest kernel) this reason
goes away.

So I think we could just leave it to the guest kernel to deal with the
switch from 4-level to 5-level paging. Or do I miss something?

take care,
Gerd


Erdem Aktas
 

On Mon, Aug 30, 2021 at 5:35 AM Min Xu <min.m.xu@intel.com> wrote:
+;
+; Check if it is Intel Tdx
+;
+; Modified: EAX, EBX, ECX, EDX
+;
+; If it is Intel Tdx, EAX is zero
+; If it is not Intel Tdx, EAX is non-zero
+;
+IsTdx:
IsTdx returns 0 when TDX is enabled in CPUID but IsTdxEnabled return 1
when TDX is enabled. Is this intentional?

here is how IsTdxEnabled defined.
; If TDX is enabled then EAX will be 1
; If TDX is disabled then EAX will be 0.
;
IsTdxEnabled:

+ ;
+ ; In Td guest, BSP/AP shares the same entry point
+ ; BSP builds up the page table, while APs shouldn't do the same task.
+ ; Instead, APs just leverage the page table which is built by BSP.
+ ; APs will wait until the page table is ready.
+ ;
+TdxApWait:
+ cmp byte[TDX_WORK_AREA_PGTBL_READY], 0
+ je TdxApWait
+ jmp ExitInitTdxWorkarea
Don't we need memory fence before je TdxApWait


+; Check TDX features, Non-TDX or TDX-BSP or TDX-APs?
+;
+; By design TDX BSP is reponsible for inintializing the PageTables.
s/reponsible/responsible
s/inintializing/initializing


Yao, Jiewen
 

I think it is OK to always enable 4-level paging at this moment.

5-level paging enabling is NOT super critical for TDX enabling at this moment, as long as we can boot OS kernel. I am fine to enable it later, in a separate patch.

Let's cross the bridge when we come to it.

Thank you
Yao Jiewen

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Gerd
Hoffmann
Sent: Friday, September 10, 2021 4:20 PM
To: Xu, Min M <min.m.xu@intel.com>
Cc: Yao, Jiewen <jiewen.yao@intel.com>; devel@edk2.groups.io; Ard
Biesheuvel <ardb+tianocore@kernel.org>; Justen, Jordan L
<jordan.l.justen@intel.com>; Brijesh Singh <brijesh.singh@amd.com>; Erdem
Aktas <erdemaktas@google.com>; James Bottomley <jejb@linux.ibm.com>;
Tom Lendacky <thomas.lendacky@amd.com>
Subject: Re: [edk2-devel] [PATCH V5 2/2] OvmfPkg/ResetVector: Enable Intel
TDX in ResetVector of Ovmf

Hi,

If we can use 4-level paging initially, then we surely should go for option (1)
and simply not touch the reset vectors paging code.
After PoC I find this option is not a good one. Though the reset
vectors is not touched, there are tricky changes in DxeIpl. To set up
5-level paging in an 4-level paging, it should first be switched from
64-bit long mode to 32 protected mode, then turn off the Paging,
disable IA32_ERER.LME, then set the Cr4. The tricky thing is that in
TDX IA32_EFER is not changeable. MdeModulePkg/.../DxeIpl is widely
used and it is high risk to make such changes.
Ok. One more question: Do we have to use 5-level paging at all?

The only reason I could see is accepting memory with a gpa above 4-level
address space. But with the longer-term plan to support lazy acceptance
(and passing unaccepted memory ranges to the guest kernel) this reason
goes away.

So I think we could just leave it to the guest kernel to deal with the
switch from 4-level to 5-level paging. Or do I miss something?

take care,
Gerd