[edk2-devel] RFC: design review for TDVF in OVMF


Yao, Jiewen
 

Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is now available in blow link: https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly welcomed and we will continuously update the slides based on the feedbacks.



Thank you

Yao Jiewen


Laszlo Ersek
 

On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the
feedback inserted at the proper spots that has been given in the
off-list thread since then:


*** Slides 4, 6, 7: the "one binary requirement".

(1) The presentation refers to "OvmfPkgX64.dsc" as "the one" on slide#4,
but then the explanation for the requirement, given on slide 7, speaks
about "common attestation interface".

I think we have a misunderstanding here. The "OvmfPkgX64.dsc" platform
indeed contains SEV, SEV-ES, and (in the future, will contain) SEV-SNP
support. In that sense, adding TDX support to the same platform should
be (hopefully) possible, at the cost of ugly gymnastics in the reset
vector.

But "OvmfPkgX64.dsc" is *already* different from the remotely attested
OVMF platform, namely "OvmfPkg/AmdSev/AmdSevX64.dsc".

The latter has some additional modules (secret PEIM and DXE driver), it
has the Grub binary built in, and -- probably most importantly -- it
trusts host-originated information less than "OvmfPkgX64.dsc".

For example, "OvmfPkg/AmdSev/AmdSevX64.dsc" has a dedicated
PlatformBootManagerLib instance, one that ignores the non-volatile UEFI
variables Boot#### and BootOrder, and ignores (thus far) the fw_cfg
originated kernel/initrd/cmdline as well.

It remains an "area of research" to see what else should be removed from
the traditional host-guest integration (which integration is usually
desirable for management and convenience), in the remotely-attested boot
scenario. See e.g.
<https://bugzilla.tianocore.org/show_bug.cgi?id=3180>.

My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc" exists.


*** Slides 8-9: general boot flow -- TDVF; TDVF Flow

I'm likely missing a big bunch of background here, so just a few
questions:

(2) Not sure what RTMR is, but it's associated with "Enable TrustedBoot"
-- so is a virtual TPM a hard requirement?

... Back from slide 10: "TCG measurement and event log framework w/o
TPM" -- that's curious.

[response from Dave Gilbert:]
My reading of this is that the RTMR (and another set of similar
registers) are a TDX thing that is like the PCRs from a TPM but
without the rest of the TPM features; so you can do the one-way
measurement into the RTMRs just like you do into a TPM PCR, and the
measurements pop out somewhere in the TDX quote. Just like a TPM you
need the event log to make any sense of how the final hashed value
supposedly got to where it did.
[response from Erdem Aktas:]
+1 to David on this. TDX provides 2 kinds of measurement registers:
MRTDs and RTMRs
(https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-module-1eas-v0.85.039.pdf
section 10.1.2) . MRTDs are build-time measurement registers which are
updated when TDX is being created. Once TDX is finalized (before the
first run), the MRTDs are finalized and cannot be updated anymore. On
the other hand, while the TD is running, TD can extend RTMRs through
TDCALLs which will provide TPM PCR kind of capabilities.
... Back from slide 43: feel free to skip this now; I will comment in
more detail below.

(3) Prepare AcpiTable -- OVMF fetches ACPI tables from QEMU; is this a
new (firmware originated) table that is installed in addition, or does
it replace QEMU's tables?

... Ignore for now, will comment on the MADT stuff later.

(4) Does DMA management mean a different IOMMU protocol? That is going
to conflict with the SEV IOMMU protocol. Depexes in OVMF expect one or
zero IOMMU protocols to be present.

... Back from slide 40: feel free to skip this now; I'll comment on this
separately, below.

(5) Enumerate VirtIO -- virtio enumeration is PCI based on x86. But I
see no mention of PCI. If you mean VirtioMmioDeviceLib, that's no good,
because it does not support virtio-1.0, only virtio-0.9.5.

... Back from slide 42: I got my answer to this on slide 42, so don't
worry about this point.

(6) The PEI phase is skipped as a whole. I don't see how that can be
reasonably brought together with either "OvmfPkgX64.dsc" or
"OvmfPkg/AmdSev/AmdSevX64.dsc". I guess you can always modify SEC to
jump into DXE directly, but then why keep the PEI core and a bunch of
PEIMs in the firmware binary?

Also touched on by slide 9, TDVF Flow -- PEI is eliminated but SEC
becomes more heavy-weight.

Wouldn't this deserve a dedicated, separate platform DSC? The
8-bit/32-bit branching at the front of the reset vector is a smaller
complication in comparison.

Slide 6 references the mailing list archive:

https://edk2.groups.io/g/devel/topic/81969494#74319

and in that message, I wrote:

I'm doubtful that this is a unique problem ("just fix the reset
vector") the likes of which will supposedly never return during the
integration of SEV and TDX

See also:

https://listman.redhat.com/archives/edk2-devel-archive/2021-April/msg00784.html

where I said:

It's not lost on me that we're talking about ~3 instructions.

Let's keep a close eye on further "polymorphisms" that would require
hacks.

The first 9 slides in the presentation introduce much-much more
intrusive problems than the "polymorphism" of the reset vector. Would I
be correct to say that my concern in the above messages was right? I
think I was only given a fraction of the necessary information when I
was asked "do you agree 'one binary' is viable".

[response from Erdem Aktas:]
Let's not worry about this for now. We want the one binary solution
for practical reasons and also for simplicity. In the end, we want to
do what is right and good for everyone.
Those are legit concerns and I think what Intel is trying to do (sorry
for mind reading) is to discuss all those concerns and questions to
make the right decision. I really appreciate their effort on preparing
those slides and bringing it to the community to review.

I will also read your comments more carefully and provide my thoughts
on them.
Sorry for being a little slow on this.

*** Slide 10 -- Key impact to firmware

(7) "CPUs are left running in long mode after exiting firmware" -- what
kind of "pen" are we talking about? Does a HLT loop still work?

(8) "I/O, MMIO, MSR accesses are different" -- those are implemented by
low-level libraries in edk2; how can they be configured dynamically?

... Back from slide 53: I'll comment on slide 53 separately; ignore
this.


*** Slide 11 -- TDVF Image (1)

(9) CFV -- Configuration Firmware Volume (VarStore.fdf.inc),
containing SB keys -- how is this firmware volume populated (at build
time)? Is this a hexdump?

... Back from slide 16: it seems like CFV is a raw hexdump indeed; how
is that managed when keys change (at build time)?

(10) This slide (slide 11) basically describes an intrusive
reorganization of "OvmfPkgX64.fdf". I don't think I can agree to that.
While confidential computing is important, it is not relevant for many
users. Even if we don't cause outright regressions for existent setups,
the maintenance cost of the traditional OVMF platform will skyrocket.

The big bunch of areas that SEV-ES introduced to MEMFD is already a big
complication. I'd feel much better if we could isolate all that to a
dedicated "remote attested boot" firmware platform, and not risk the
functionality and maintenance of the traditional platform. I think this
ties in with my comment (1).

For example, seeing a configuration firmware volume (CFV) with secure
boot keys embedded, in the "usual" FDF, will confuse lots of people, me
included. In the traditional OVMF use case, we use a different method:
namely OvmfPkg/EnrollDefaultKeys, for "priming" a variable store
template, in the build environment.

Edk2 (and PI and UEFI) are definitely flexible enough for accommodating
TDX, but the existent, traditional OVMF platforms are a bad fit. In my
opinion.


*** Slide 12: TDVF Image (2)

(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX support
in small, well-isolated layers. 5-level paging is not enabled (has never
been tested, to my knowledge) with OVMF on QEMU/KVM, regardless of
confidential computing, for starters. If 5-level paging is a strict
requirement for TDX, then it arguably needs to be implemented
independently of TDX, at first. So that the common edk2 architecture be
at least testable on QEMU/KVM with 5-level paging enabled.


*** Slide 13:

(12) My comment is that the GUID-ed structure chain already starts at a
fixed GPA (the "AMD SEV option"). Ordering between GUID-ed structures is
irrelevant, so any TDX-specific structures should be eligible for
appending to, prepending to, or intermixing with, other (possibly
SEV-specific) structures. There need not be a separate entry point, just
different GUIDS.

(13) Regarding "4G-0x20[0x10] is OVMF AP reset vector (used in OVMF
implementation)" -- I think this is a typo: this "AP reset vector" is
*not* used in OVMF. To my knowledge, it is a vestige from the UefiCpuPkg
reset vector. In OVMF, APs are booted via MpInitLib (in multiple
firmware phases), using INIT-SIPI-SIPI, and the reset vector for the
APs, posited through those IPIs, is prepared in low RAM.


*** Slides 14 through 16:

I consider these TDVF firmware image internals, implementation details
-- do whatever you need to do, just don't interfere with existing
platforms / use cases. See my comment (10) above.


*** Slides 17-21:

(14) Again, a very big difference from traditional OVMF: APs never enter
SEC in traditional OVMF. I assume this new functionality is part of
TdxStartupLib (from slide 18) -- will there be a Null instance of that?

Last week I posted a 43-part patch series to edk2-devel, for separating
out the dynamic Xen enlightenments from the IA32, IA32X64, X64
platforms, in favor of the dedicated OvmfXen platform. TDX seems to
bring in incomparably more complications than Xen, and the OvmfPkg
maintainers have found even the Xen complications troublesome in the
long term.

If I had had access to all this information when we first discussed "one
binary" on the mailing list, I'd have never agreed to "one binary". I'm
OK with attempting one firmware binary for "confidential computing", but
that "one platform" cannot be "OvmfPkgX64.dsc".

Even if I make a comparison with just the "technology" (not the
remotely-attested deployment) of SEV and SEV-ES, as it is included in
"OvmfPkgX64.dsc", TDX is hugely more complicated and intrusive than
that. SEV proved possible to integrate into existing modules, into the
existing boot flow, maybe through the addition of some new drivers (such
as a new IOMMU protocol implementation, and some "clever" depexes). But
we never had to restructure the FD layout, eliminate whole firmware
phases, or think about multiprocessing in the reset vector or the SEC
phase.

In order to bring an example from the ARM world, please note that
platforms that use a PEI phase, and platforms that don't, are distinct
platforms. In ArmVirtPkg, two examples are ArmVirtQemu and
ArmVirtQemuKernel. The latter does not include the PEI Core.


*** Slides 22 through 34:

(15) All these extra tasks and complications are perfectly fine, as long
as they exist peacefully *separately* from the traditional ("legacy")
OVMF platforms.

Honestly, in the virtual world, picking your firmware binary is easy.
The approach here reminds me of a physical firmware binary that includes
everything possible from "edk2-platforms/Features/Intel", just so it can
be deployed to any physical board imaginable. That's not how Intel
builds physical firmware, right? We have "edk2-platforms/Platform/Intel"
and "edk2-platforms/Silicon/Intel" with many-many separate DSC files.


*** Slide 35-36: DXE phase

(16) "Some DXE Drivers not allowed to load/start in Td guest -- Network
stack, RNG, ..."

Same comment as (several times) above. The Linuxboot project is a good
example for eliminating cruft from DXEFV (in fact, for eliminating most
of the DXE phase). In a TDX environment, why include drivers in the
firmware binary that are never used? Meanwhile, DXEFV in OVMF grows by a
MB every 1.5 years or so. Again, remove these drivers from the DSC/FDF
then, and it needs to be a separate platform.

(17) "Other DXE Phase drivers -- [...] AcpiPlatformDxe"

I'm not sure what this section is supposed to mean. Other DXE phase
drivers included, or excluded? Without AcpiPlatformDxe, the guest OS
will not see QEMU's ACPI content, and will almost certainly malfunction.

... Back from slide 48: ignore this for now, I'll comment in more detail
later.


*** Slide 37: DXE Core

(18) says "SMM is not supported in Td guest" -- how is the variable
store protected from direct hardware (pflash) access from the guest OS?
Without SMM, the guest OS need not go through gRT->SetVariable() to
update authenticated non-volatile UEFI variables, and that undermines
Secure Boot.

Note that, while SEV-ES has the same limitation wrt. SMM, the
"OvmfPkg/AmdSev/AmdSevX64.dsc" platform doesn't even include the Secure
Boot firmware feature. For another example, the OVMF firmware binary in
RHEL that's going to support SEV-ES is such a build of "OvmfPkgX64.dsc"
that does not include the Secure Boot feature either.

But in this TDX slide deck, Secure Boot certificates are embedded in the
CFV (configuration firmware volume) -- see slide 11 and slide 16 --,
which suggests that this platform does want secure boot.

... Back from slide 48: I'm going to make *additional* comments on this,
when I'm at slide 48, too.

The rest of this slide (slide 37) looks reasonable (generic DXE Core
changes -- possibly PI spec changes too).


*** Slides 38 through 39:

These seem reasonable (TdxDxe assumes some responsibilities of
OvmfPkg/PlatformPei)


*** Slides 40 through 42:

*If* you really can implement TDX support for the IOMMU driver *this*
surgically, then I'm OK with it. The general logic in the IOMMU driver
was truly difficult to write, and I'd be seriously concerned if those
parts would have to be modified. Customizing just the page
encryption/decryption primitives for TDX vs. SEV is OK.


*** Slides 43 through 47:

(19) Slide 46 and slide 47 are almost identical. Please consolidate them
into a single slide.

(20) the TPM2 infrastructure in edk2 is baroque (over-engineered), in my
opinion. It has so many layers that I can never keep them in mind. When
we added TPM support to OVMF, I required commit messages that would help
us recall the layering. In particular, please refer to commit
0c0a50d6b3ff ("OvmfPkg: include Tcg2Dxe module", 2018-03-09). Here's an
excerpt:

TPM 2 consumer driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibTcg2 instance
|
v
TCG2 protocol interface
|
v
TCG2 protocol provider: Tcg2Dxe.inf driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibRouter instance
|
v
NULL class: Tpm2InstanceLibDTpm instance
(via earlier registration)
|
v
TPM2 chip (actual hardware)

The slide deck says that EFI_TD_PROTOCOL is supposed to reuse the
EFI_TCG2_PROTOCOL definition. If that's the case, then why don't we push
the TDX specifics (more or less: the replacement of PCRs with RTMR) down
to the lowest possible level?

Can we have "Tpm2InstanceLibTdxRtmr", plugged into the same Tcg2Dxe.inf
driver?

If not, can we have a new TdTcg2Dxe.inf driver, but make it so that it
install the same protocol as before (EFI_TCG2_PROTOCOL -- same old
protocol GUID)? Then DxeTpmMeasurementLib doesn't have to change.

As long as there is *at most* one EFI_TCG2_PROTOCOL instance published
in the protocol database, DxeTpmMeasurementLib should be fine. In SEV*
guests, the standard Tcg2Dxe driver provides that protocol. In TDX
guests, TdTcg2Dxe.inf should provide the protocol. Arbitration between
the two can be implemented with the pattern seen in the following
commits:

1 05db0948cc60 EmbeddedPkg: introduce EDKII Platform Has ACPI GUID
2 786f476323a6 EmbeddedPkg: introduce PlatformHasAcpiLib
3 65a69b214840 EmbeddedPkg: introduce EDKII Platform Has Device Tree
GUID
4 2558bfe3e907 ArmVirtPkg: add PlatformHasAcpiDtDxe

The basic idea is that Tcg2Dxe can have a depex on a new "running in
SEV" protocol GUID, and TdTcg2Dxe can have a depex on a new "running in
TDX" protocol GUID. A separate platform driver can install the proper
GUID -- possibly *neither* of those GUIDs.

And, we don't have to change the depex section of
"SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.inf" for this; we can implement a
library instance with an empty constructor, but a non-empty depex, and
then hook this lib instance into Tcg2Dxe.inf via module scope NULL lib
class override in the DSC file. Basically we could forcibly restrict
Tcg2Dxe's DEPEX by making it inherit the new DEPEX from the library.


*** Slide 48: DXE Phase -- Other Modules

Regarding IncompatiblePciDeviceSupportDxe: the proposed update sounds
plausible and simple enough.

(21) "AcpiPlatformDxe: Support for MADT/ACPI addition to report Td
Mailbox entry"

Firmware-owned tables must not be installed from this driver.

Please refer to my "Xen removal" patch set again, for
<https://bugzilla.tianocore.org/show_bug.cgi?id=2122>, which I mention
above in point (14). As part of the Xen removal, the AcpiPlatformDxe
driver in OvmfPkg is significantly trimmed: all unused (dead) cruft is
removed, including any ACPI table templates that are built into the
firmware.

OvmfPkg/AcpiPlatformDxe is responsible *solely* for implementing the
client side of the QEMU ACPI linker/loader.

If you need to prepare & install different ACPI tables, please do it
elsewhere, in another DXE driver. A bunch of other firmware modules do
that (NFIT, IBFT, BGRT, ...).

For example, the OvmfPkg/TdxDxe DXE_DRIVER is supposed to be launched
early in the DXE phase, via APRIORI section -- please consider
registering a protocol notify in that driver, for
EFI_ACPI_TABLE_PROTOCOL, and when it becomes available, install whatever
*extra* tables you need.

Note that if you need to *customize* an ACPI table that QEMU already
provides, then you will have to modify the ACPI generator on the QEMU
side. It is a design tenet between QEMU and OVMF that OVMF include no
business logic for either parsing or fixing up ACPI tables that QEMU
provides. AcpiPlatformDxe contains the minimum (which is already a whole
lot, unfortunately) that's necessary for implementing the QEMU ACPI
linker/loader client in the UEFI environment.

The slide deck mentions MADT, which is also known as the "APIC" table --
and indeed, QEMU does provide that. (See acpi_build_madt()
[hw/i386/acpi-common.c].) So if TDX needs MADT customizations, that
should go into QEMU.

(22) EmuVariableFvbRuntimeDxe

Ouch, this is an unpleasant surprise.

First, if you know for a fact that pflash is not part of the *board* in
any TDX setup, then pulling

OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbServicesRuntimeDxe.inf

into the firmware platform is useless, as it is mutually exclusive with

OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.inf

(via dynamic means -- a dynamic PCD).

Note that the FDF file places QemuFlashFvbServicesRuntimeDxe in APRIORI
DXE when SMM_REQUIRE is FALSE. This driver checks for pflash presence,
and lets EmuVariableFvbRuntimeDxe perform its in-RAM flash emulation
only in case pflash is not found.

So this is again in favor of a separate platform -- if we know pflash is
never part of the board, then QemuFlashFvbServicesRuntimeDxe is never
needed, but you cannot remove it from the traditional DSC/FDF files.

Second, EmuVariableFvbRuntimeDxe consumes the PlatformFvbLib class, for
the PlatformFvbDataWritten() API (among other things). This lib class is
implemented by two instances in OvmfPkg, PlatformFvbLibNull and
EmuVariableFvbLib. The latter instance allows Platform BDS to hook an
event (for signaling) via "PcdEmuVariableEvent" into the
EmuVariableFvbRuntimeDxe driver.

In old (very old) QEMU board configurations, namely those without
pflash, this (mis)feature is used by OVMF's PlatformBootManagerLib to
write out all variables to the EFI system partition in a regular file
called \NvVars, with the help of NvVarsFileLib, whenever
EmuVariableFvbRuntimeDxe writes out an emulated "flash" block. For this
purpose, the traditional OVMF DSC files link EmuVariableFvbLib into
EmuVariableFvbRuntimeDxe.

But it counts as an absolute disaster nowadays, and should not be
revived in any platform. If you don't have pflash in TDX guests, just
accept that you won't have non-volatile variables. And link
PlatformFvbLibNull into EmuVariableFvbRuntimeDxe. You're going to need a
separate PlatformBootManagerLib instance anyway.

(We should have removed EmuVariableFvbRuntimeDxe a long time ago from
the traditional OVMF platforms, i.e. made pflash a hard requirement,
even when SMM is not built into the platform -- but whenever I tried
that, Jordan always shot me down.)

My point is: using EmuVariableFvbRuntimeDxe in the TDX platform may be
defensible per se, but we must be very clear that it will never provide
a standards-conformant service for non-volatile UEFI variables, and we
must keep as much of the \NvVars mess out of EmuVariableFvbRuntimeDxe as
possible. This will require a separate PlatformBootManagerLib instance
for TDX anyway (or maybe share PlatformBootManagerLibGrub with
"OvmfPkg/AmdSev/AmdSevX64.dsc").

Apart from the volatility aspect, let's assume we have this in-RAM
emulated "flash device", storing authenticated UEFI variables for Secure
Boot purposes. And we don't have SMM.

What protects this in-RAM variable store from tampering by the guest OS?
It's all guest RAM only, after all. What provides the privilege barrier
between the guest firmware and the guest OS?


*** Slide 50: Library

(23) Should we introduce Null instances for all (or most) new lib
classes here? Code size is a concern (static linking). If we extend a
common OvmfPkg module with new hook points, it's one thing to return
from that hook point early *dynamically*, but it's even better (given
separate platforms) to allow the traditional platform firmware to use a
Null lib instance, to cut out all the dead code statically.


*** Slides 51 through 52

Seems OK.


*** Slide 53:

(24) It might be worth noting that BaseIoLibIntrinsic already has some
SEV enlightenment, as the FIFO IO port operations (which normally use
the REP prefix) are not handled on SEV. I don't have an immediate idea
why this might matter, we should just minimize code duplication if
possible.


*** Slides 54-56:

No comments, this stuff seems reasonable.


*** Slide 57: MpInitLib

I don't know enough to give a summary judgement.


All in all, I see the controversial / messy parts in the platform
bringup, and how all that differs from the traditional ("legacy") OVMF
platforms. I admit I *may* be biased in favor of SEV, possibly because
SEV landed first -- if you find signs of such a bias in my comments,
please don't hesitate to debunk those points. Yet my general impression
is that the early bringup stuff is significantly different from
everything before, and because of this, a separate platform is
justified.

Definitely separate from the traditional OVMF IA32, IA32X64, and X64
platforms, and *possibly* separate from the "remote attestation"
AmdSevX64.dsc platform. I would approach the TDX feature-set in complete
isolation (exactly how Intel commenced the work, if I understand
correctly), modulo obviously shareable / reusable parts, and then slowly
& gradually work on extracting / refactoring commonalities.

(But, given my stance on Xen for example, I could disagree even with the
latter, retroactive kind of unification -- it all boils down to shared
developer and user base. Component sharing should reflect the community
structure, otherwise maintenance will be a nightmare.)

Thanks
Laszlo


Yao, Jiewen
 

Hi Laszlo.

To clarify your "one binary" feedback below, do you mean you suggest A) create a separate DSC (for example OvmfPkg/ConfidentialComputing.dsc) for a full solution including AMD SEC + Intel TDX + NonConfidentialComputing?
Or B) to create a standalone DSC for Intel TDX (for example, create a OvmfPkg/IntelTdx/IntelTdxXS64.dsc) ?

To me, A) does not change too much, we just create another DSC file - that is it.
Then the original OvmfPkgX64.dsc will only be used for POC/Testing purpose. It does not provide any security guarantee.
(The threat model is: we don't trust VMM. Without attestation, you don't know if VMM modified the OVMF.)

I don't know how "simply" it means. To enable TDX to make it work is not a simple work.
Some architecture changes are mandatory, such as reset vector, IO/MMIO access, #VE handler, IOMMU based shared memory access, etc. I think AMD SEV already did those.

===================
My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc" exists.
===================

Thank you
Yao Jiewen

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Laszlo Ersek
Sent: Friday, June 4, 2021 12:12 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; rfc@edk2.groups.io;
devel@edk2.groups.io
Cc: jejb@linux.ibm.com; Brijesh Singh <brijesh.singh@amd.com>; Tom
Lendacky <thomas.lendacky@amd.com>; erdemaktas@google.com;
cho@microsoft.com; bret.barkelew@microsoft.com; Jon Lange
<jlange@microsoft.com>; Karen Noel <knoel@redhat.com>; Paolo Bonzini
<pbonzini@redhat.com>; Nathaniel McCallum <npmccallum@redhat.com>;
Dr. David Alan Gilbert <dgilbert@redhat.com>; Ademar de Souza Reis Jr.
<areis@redhat.com>
Subject: Re: [edk2-rfc] [edk2-devel] RFC: design review for TDVF in OVMF

On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the
feedback inserted at the proper spots that has been given in the
off-list thread since then:


*** Slides 4, 6, 7: the "one binary requirement".

(1) The presentation refers to "OvmfPkgX64.dsc" as "the one" on slide#4,
but then the explanation for the requirement, given on slide 7, speaks
about "common attestation interface".

I think we have a misunderstanding here. The "OvmfPkgX64.dsc" platform
indeed contains SEV, SEV-ES, and (in the future, will contain) SEV-SNP
support. In that sense, adding TDX support to the same platform should
be (hopefully) possible, at the cost of ugly gymnastics in the reset
vector.

But "OvmfPkgX64.dsc" is *already* different from the remotely attested
OVMF platform, namely "OvmfPkg/AmdSev/AmdSevX64.dsc".

The latter has some additional modules (secret PEIM and DXE driver), it
has the Grub binary built in, and -- probably most importantly -- it
trusts host-originated information less than "OvmfPkgX64.dsc".

For example, "OvmfPkg/AmdSev/AmdSevX64.dsc" has a dedicated
PlatformBootManagerLib instance, one that ignores the non-volatile UEFI
variables Boot#### and BootOrder, and ignores (thus far) the fw_cfg
originated kernel/initrd/cmdline as well.

It remains an "area of research" to see what else should be removed from
the traditional host-guest integration (which integration is usually
desirable for management and convenience), in the remotely-attested boot
scenario. See e.g.
<https://bugzilla.tianocore.org/show_bug.cgi?id=3180>.

My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc"
exists.


*** Slides 8-9: general boot flow -- TDVF; TDVF Flow

I'm likely missing a big bunch of background here, so just a few
questions:

(2) Not sure what RTMR is, but it's associated with "Enable TrustedBoot"
-- so is a virtual TPM a hard requirement?

... Back from slide 10: "TCG measurement and event log framework w/o
TPM" -- that's curious.

[response from Dave Gilbert:]
My reading of this is that the RTMR (and another set of similar
registers) are a TDX thing that is like the PCRs from a TPM but
without the rest of the TPM features; so you can do the one-way
measurement into the RTMRs just like you do into a TPM PCR, and the
measurements pop out somewhere in the TDX quote. Just like a TPM you
need the event log to make any sense of how the final hashed value
supposedly got to where it did.
[response from Erdem Aktas:]
+1 to David on this. TDX provides 2 kinds of measurement registers:
MRTDs and RTMRs
(https://software.intel.com/content/dam/develop/external/us/en/docume
nts/tdx-module-1eas-v0.85.039.pdf
section 10.1.2) . MRTDs are build-time measurement registers which are
updated when TDX is being created. Once TDX is finalized (before the
first run), the MRTDs are finalized and cannot be updated anymore. On
the other hand, while the TD is running, TD can extend RTMRs through
TDCALLs which will provide TPM PCR kind of capabilities.
... Back from slide 43: feel free to skip this now; I will comment in
more detail below.

(3) Prepare AcpiTable -- OVMF fetches ACPI tables from QEMU; is this a
new (firmware originated) table that is installed in addition, or does
it replace QEMU's tables?

... Ignore for now, will comment on the MADT stuff later.

(4) Does DMA management mean a different IOMMU protocol? That is
going
to conflict with the SEV IOMMU protocol. Depexes in OVMF expect one or
zero IOMMU protocols to be present.

... Back from slide 40: feel free to skip this now; I'll comment on this
separately, below.

(5) Enumerate VirtIO -- virtio enumeration is PCI based on x86. But I
see no mention of PCI. If you mean VirtioMmioDeviceLib, that's no good,
because it does not support virtio-1.0, only virtio-0.9.5.

... Back from slide 42: I got my answer to this on slide 42, so don't
worry about this point.

(6) The PEI phase is skipped as a whole. I don't see how that can be
reasonably brought together with either "OvmfPkgX64.dsc" or
"OvmfPkg/AmdSev/AmdSevX64.dsc". I guess you can always modify SEC to
jump into DXE directly, but then why keep the PEI core and a bunch of
PEIMs in the firmware binary?

Also touched on by slide 9, TDVF Flow -- PEI is eliminated but SEC
becomes more heavy-weight.

Wouldn't this deserve a dedicated, separate platform DSC? The
8-bit/32-bit branching at the front of the reset vector is a smaller
complication in comparison.

Slide 6 references the mailing list archive:

https://edk2.groups.io/g/devel/topic/81969494#74319

and in that message, I wrote:

I'm doubtful that this is a unique problem ("just fix the reset
vector") the likes of which will supposedly never return during the
integration of SEV and TDX

See also:

https://listman.redhat.com/archives/edk2-devel-archive/2021-
April/msg00784.html

where I said:

It's not lost on me that we're talking about ~3 instructions.

Let's keep a close eye on further "polymorphisms" that would require
hacks.

The first 9 slides in the presentation introduce much-much more
intrusive problems than the "polymorphism" of the reset vector. Would I
be correct to say that my concern in the above messages was right? I
think I was only given a fraction of the necessary information when I
was asked "do you agree 'one binary' is viable".

[response from Erdem Aktas:]
Let's not worry about this for now. We want the one binary solution
for practical reasons and also for simplicity. In the end, we want to
do what is right and good for everyone.
Those are legit concerns and I think what Intel is trying to do (sorry
for mind reading) is to discuss all those concerns and questions to
make the right decision. I really appreciate their effort on preparing
those slides and bringing it to the community to review.

I will also read your comments more carefully and provide my thoughts
on them.
Sorry for being a little slow on this.

*** Slide 10 -- Key impact to firmware

(7) "CPUs are left running in long mode after exiting firmware" -- what
kind of "pen" are we talking about? Does a HLT loop still work?

(8) "I/O, MMIO, MSR accesses are different" -- those are implemented by
low-level libraries in edk2; how can they be configured dynamically?

... Back from slide 53: I'll comment on slide 53 separately; ignore
this.


*** Slide 11 -- TDVF Image (1)

(9) CFV -- Configuration Firmware Volume (VarStore.fdf.inc),
containing SB keys -- how is this firmware volume populated (at build
time)? Is this a hexdump?

... Back from slide 16: it seems like CFV is a raw hexdump indeed; how
is that managed when keys change (at build time)?

(10) This slide (slide 11) basically describes an intrusive
reorganization of "OvmfPkgX64.fdf". I don't think I can agree to that.
While confidential computing is important, it is not relevant for many
users. Even if we don't cause outright regressions for existent setups,
the maintenance cost of the traditional OVMF platform will skyrocket.

The big bunch of areas that SEV-ES introduced to MEMFD is already a big
complication. I'd feel much better if we could isolate all that to a
dedicated "remote attested boot" firmware platform, and not risk the
functionality and maintenance of the traditional platform. I think this
ties in with my comment (1).

For example, seeing a configuration firmware volume (CFV) with secure
boot keys embedded, in the "usual" FDF, will confuse lots of people, me
included. In the traditional OVMF use case, we use a different method:
namely OvmfPkg/EnrollDefaultKeys, for "priming" a variable store
template, in the build environment.

Edk2 (and PI and UEFI) are definitely flexible enough for accommodating
TDX, but the existent, traditional OVMF platforms are a bad fit. In my
opinion.


*** Slide 12: TDVF Image (2)

(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX support
in small, well-isolated layers. 5-level paging is not enabled (has never
been tested, to my knowledge) with OVMF on QEMU/KVM, regardless of
confidential computing, for starters. If 5-level paging is a strict
requirement for TDX, then it arguably needs to be implemented
independently of TDX, at first. So that the common edk2 architecture be
at least testable on QEMU/KVM with 5-level paging enabled.


*** Slide 13:

(12) My comment is that the GUID-ed structure chain already starts at a
fixed GPA (the "AMD SEV option"). Ordering between GUID-ed structures is
irrelevant, so any TDX-specific structures should be eligible for
appending to, prepending to, or intermixing with, other (possibly
SEV-specific) structures. There need not be a separate entry point, just
different GUIDS.

(13) Regarding "4G-0x20[0x10] is OVMF AP reset vector (used in OVMF
implementation)" -- I think this is a typo: this "AP reset vector" is
*not* used in OVMF. To my knowledge, it is a vestige from the UefiCpuPkg
reset vector. In OVMF, APs are booted via MpInitLib (in multiple
firmware phases), using INIT-SIPI-SIPI, and the reset vector for the
APs, posited through those IPIs, is prepared in low RAM.


*** Slides 14 through 16:

I consider these TDVF firmware image internals, implementation details
-- do whatever you need to do, just don't interfere with existing
platforms / use cases. See my comment (10) above.


*** Slides 17-21:

(14) Again, a very big difference from traditional OVMF: APs never enter
SEC in traditional OVMF. I assume this new functionality is part of
TdxStartupLib (from slide 18) -- will there be a Null instance of that?

Last week I posted a 43-part patch series to edk2-devel, for separating
out the dynamic Xen enlightenments from the IA32, IA32X64, X64
platforms, in favor of the dedicated OvmfXen platform. TDX seems to
bring in incomparably more complications than Xen, and the OvmfPkg
maintainers have found even the Xen complications troublesome in the
long term.

If I had had access to all this information when we first discussed "one
binary" on the mailing list, I'd have never agreed to "one binary". I'm
OK with attempting one firmware binary for "confidential computing", but
that "one platform" cannot be "OvmfPkgX64.dsc".

Even if I make a comparison with just the "technology" (not the
remotely-attested deployment) of SEV and SEV-ES, as it is included in
"OvmfPkgX64.dsc", TDX is hugely more complicated and intrusive than
that. SEV proved possible to integrate into existing modules, into the
existing boot flow, maybe through the addition of some new drivers (such
as a new IOMMU protocol implementation, and some "clever" depexes).
But
we never had to restructure the FD layout, eliminate whole firmware
phases, or think about multiprocessing in the reset vector or the SEC
phase.

In order to bring an example from the ARM world, please note that
platforms that use a PEI phase, and platforms that don't, are distinct
platforms. In ArmVirtPkg, two examples are ArmVirtQemu and
ArmVirtQemuKernel. The latter does not include the PEI Core.


*** Slides 22 through 34:

(15) All these extra tasks and complications are perfectly fine, as long
as they exist peacefully *separately* from the traditional ("legacy")
OVMF platforms.

Honestly, in the virtual world, picking your firmware binary is easy.
The approach here reminds me of a physical firmware binary that includes
everything possible from "edk2-platforms/Features/Intel", just so it can
be deployed to any physical board imaginable. That's not how Intel
builds physical firmware, right? We have "edk2-platforms/Platform/Intel"
and "edk2-platforms/Silicon/Intel" with many-many separate DSC files.


*** Slide 35-36: DXE phase

(16) "Some DXE Drivers not allowed to load/start in Td guest -- Network
stack, RNG, ..."

Same comment as (several times) above. The Linuxboot project is a good
example for eliminating cruft from DXEFV (in fact, for eliminating most
of the DXE phase). In a TDX environment, why include drivers in the
firmware binary that are never used? Meanwhile, DXEFV in OVMF grows by
a
MB every 1.5 years or so. Again, remove these drivers from the DSC/FDF
then, and it needs to be a separate platform.

(17) "Other DXE Phase drivers -- [...] AcpiPlatformDxe"

I'm not sure what this section is supposed to mean. Other DXE phase
drivers included, or excluded? Without AcpiPlatformDxe, the guest OS
will not see QEMU's ACPI content, and will almost certainly malfunction.

... Back from slide 48: ignore this for now, I'll comment in more detail
later.


*** Slide 37: DXE Core

(18) says "SMM is not supported in Td guest" -- how is the variable
store protected from direct hardware (pflash) access from the guest OS?
Without SMM, the guest OS need not go through gRT->SetVariable() to
update authenticated non-volatile UEFI variables, and that undermines
Secure Boot.

Note that, while SEV-ES has the same limitation wrt. SMM, the
"OvmfPkg/AmdSev/AmdSevX64.dsc" platform doesn't even include the
Secure
Boot firmware feature. For another example, the OVMF firmware binary in
RHEL that's going to support SEV-ES is such a build of "OvmfPkgX64.dsc"
that does not include the Secure Boot feature either.

But in this TDX slide deck, Secure Boot certificates are embedded in the
CFV (configuration firmware volume) -- see slide 11 and slide 16 --,
which suggests that this platform does want secure boot.

... Back from slide 48: I'm going to make *additional* comments on this,
when I'm at slide 48, too.

The rest of this slide (slide 37) looks reasonable (generic DXE Core
changes -- possibly PI spec changes too).


*** Slides 38 through 39:

These seem reasonable (TdxDxe assumes some responsibilities of
OvmfPkg/PlatformPei)


*** Slides 40 through 42:

*If* you really can implement TDX support for the IOMMU driver *this*
surgically, then I'm OK with it. The general logic in the IOMMU driver
was truly difficult to write, and I'd be seriously concerned if those
parts would have to be modified. Customizing just the page
encryption/decryption primitives for TDX vs. SEV is OK.


*** Slides 43 through 47:

(19) Slide 46 and slide 47 are almost identical. Please consolidate them
into a single slide.

(20) the TPM2 infrastructure in edk2 is baroque (over-engineered), in my
opinion. It has so many layers that I can never keep them in mind. When
we added TPM support to OVMF, I required commit messages that would
help
us recall the layering. In particular, please refer to commit
0c0a50d6b3ff ("OvmfPkg: include Tcg2Dxe module", 2018-03-09). Here's an
excerpt:

TPM 2 consumer driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibTcg2 instance
|
v
TCG2 protocol interface
|
v
TCG2 protocol provider: Tcg2Dxe.inf driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibRouter instance
|
v
NULL class: Tpm2InstanceLibDTpm instance
(via earlier registration)
|
v
TPM2 chip (actual hardware)

The slide deck says that EFI_TD_PROTOCOL is supposed to reuse the
EFI_TCG2_PROTOCOL definition. If that's the case, then why don't we push
the TDX specifics (more or less: the replacement of PCRs with RTMR) down
to the lowest possible level?

Can we have "Tpm2InstanceLibTdxRtmr", plugged into the same Tcg2Dxe.inf
driver?

If not, can we have a new TdTcg2Dxe.inf driver, but make it so that it
install the same protocol as before (EFI_TCG2_PROTOCOL -- same old
protocol GUID)? Then DxeTpmMeasurementLib doesn't have to change.

As long as there is *at most* one EFI_TCG2_PROTOCOL instance published
in the protocol database, DxeTpmMeasurementLib should be fine. In SEV*
guests, the standard Tcg2Dxe driver provides that protocol. In TDX
guests, TdTcg2Dxe.inf should provide the protocol. Arbitration between
the two can be implemented with the pattern seen in the following
commits:

1 05db0948cc60 EmbeddedPkg: introduce EDKII Platform Has ACPI GUID
2 786f476323a6 EmbeddedPkg: introduce PlatformHasAcpiLib
3 65a69b214840 EmbeddedPkg: introduce EDKII Platform Has Device Tree
GUID
4 2558bfe3e907 ArmVirtPkg: add PlatformHasAcpiDtDxe

The basic idea is that Tcg2Dxe can have a depex on a new "running in
SEV" protocol GUID, and TdTcg2Dxe can have a depex on a new "running in
TDX" protocol GUID. A separate platform driver can install the proper
GUID -- possibly *neither* of those GUIDs.

And, we don't have to change the depex section of
"SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.inf" for this; we can implement a
library instance with an empty constructor, but a non-empty depex, and
then hook this lib instance into Tcg2Dxe.inf via module scope NULL lib
class override in the DSC file. Basically we could forcibly restrict
Tcg2Dxe's DEPEX by making it inherit the new DEPEX from the library.


*** Slide 48: DXE Phase -- Other Modules

Regarding IncompatiblePciDeviceSupportDxe: the proposed update sounds
plausible and simple enough.

(21) "AcpiPlatformDxe: Support for MADT/ACPI addition to report Td
Mailbox entry"

Firmware-owned tables must not be installed from this driver.

Please refer to my "Xen removal" patch set again, for
<https://bugzilla.tianocore.org/show_bug.cgi?id=2122>, which I mention
above in point (14). As part of the Xen removal, the AcpiPlatformDxe
driver in OvmfPkg is significantly trimmed: all unused (dead) cruft is
removed, including any ACPI table templates that are built into the
firmware.

OvmfPkg/AcpiPlatformDxe is responsible *solely* for implementing the
client side of the QEMU ACPI linker/loader.

If you need to prepare & install different ACPI tables, please do it
elsewhere, in another DXE driver. A bunch of other firmware modules do
that (NFIT, IBFT, BGRT, ...).

For example, the OvmfPkg/TdxDxe DXE_DRIVER is supposed to be launched
early in the DXE phase, via APRIORI section -- please consider
registering a protocol notify in that driver, for
EFI_ACPI_TABLE_PROTOCOL, and when it becomes available, install
whatever
*extra* tables you need.

Note that if you need to *customize* an ACPI table that QEMU already
provides, then you will have to modify the ACPI generator on the QEMU
side. It is a design tenet between QEMU and OVMF that OVMF include no
business logic for either parsing or fixing up ACPI tables that QEMU
provides. AcpiPlatformDxe contains the minimum (which is already a whole
lot, unfortunately) that's necessary for implementing the QEMU ACPI
linker/loader client in the UEFI environment.

The slide deck mentions MADT, which is also known as the "APIC" table --
and indeed, QEMU does provide that. (See acpi_build_madt()
[hw/i386/acpi-common.c].) So if TDX needs MADT customizations, that
should go into QEMU.

(22) EmuVariableFvbRuntimeDxe

Ouch, this is an unpleasant surprise.

First, if you know for a fact that pflash is not part of the *board* in
any TDX setup, then pulling

OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbServicesRuntimeDxe.inf

into the firmware platform is useless, as it is mutually exclusive with

OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.inf

(via dynamic means -- a dynamic PCD).

Note that the FDF file places QemuFlashFvbServicesRuntimeDxe in APRIORI
DXE when SMM_REQUIRE is FALSE. This driver checks for pflash presence,
and lets EmuVariableFvbRuntimeDxe perform its in-RAM flash emulation
only in case pflash is not found.

So this is again in favor of a separate platform -- if we know pflash is
never part of the board, then QemuFlashFvbServicesRuntimeDxe is never
needed, but you cannot remove it from the traditional DSC/FDF files.

Second, EmuVariableFvbRuntimeDxe consumes the PlatformFvbLib class,
for
the PlatformFvbDataWritten() API (among other things). This lib class is
implemented by two instances in OvmfPkg, PlatformFvbLibNull and
EmuVariableFvbLib. The latter instance allows Platform BDS to hook an
event (for signaling) via "PcdEmuVariableEvent" into the
EmuVariableFvbRuntimeDxe driver.

In old (very old) QEMU board configurations, namely those without
pflash, this (mis)feature is used by OVMF's PlatformBootManagerLib to
write out all variables to the EFI system partition in a regular file
called \NvVars, with the help of NvVarsFileLib, whenever
EmuVariableFvbRuntimeDxe writes out an emulated "flash" block. For this
purpose, the traditional OVMF DSC files link EmuVariableFvbLib into
EmuVariableFvbRuntimeDxe.

But it counts as an absolute disaster nowadays, and should not be
revived in any platform. If you don't have pflash in TDX guests, just
accept that you won't have non-volatile variables. And link
PlatformFvbLibNull into EmuVariableFvbRuntimeDxe. You're going to need
a
separate PlatformBootManagerLib instance anyway.

(We should have removed EmuVariableFvbRuntimeDxe a long time ago
from
the traditional OVMF platforms, i.e. made pflash a hard requirement,
even when SMM is not built into the platform -- but whenever I tried
that, Jordan always shot me down.)

My point is: using EmuVariableFvbRuntimeDxe in the TDX platform may be
defensible per se, but we must be very clear that it will never provide
a standards-conformant service for non-volatile UEFI variables, and we
must keep as much of the \NvVars mess out of EmuVariableFvbRuntimeDxe
as
possible. This will require a separate PlatformBootManagerLib instance
for TDX anyway (or maybe share PlatformBootManagerLibGrub with
"OvmfPkg/AmdSev/AmdSevX64.dsc").

Apart from the volatility aspect, let's assume we have this in-RAM
emulated "flash device", storing authenticated UEFI variables for Secure
Boot purposes. And we don't have SMM.

What protects this in-RAM variable store from tampering by the guest OS?
It's all guest RAM only, after all. What provides the privilege barrier
between the guest firmware and the guest OS?


*** Slide 50: Library

(23) Should we introduce Null instances for all (or most) new lib
classes here? Code size is a concern (static linking). If we extend a
common OvmfPkg module with new hook points, it's one thing to return
from that hook point early *dynamically*, but it's even better (given
separate platforms) to allow the traditional platform firmware to use a
Null lib instance, to cut out all the dead code statically.


*** Slides 51 through 52

Seems OK.


*** Slide 53:

(24) It might be worth noting that BaseIoLibIntrinsic already has some
SEV enlightenment, as the FIFO IO port operations (which normally use
the REP prefix) are not handled on SEV. I don't have an immediate idea
why this might matter, we should just minimize code duplication if
possible.


*** Slides 54-56:

No comments, this stuff seems reasonable.


*** Slide 57: MpInitLib

I don't know enough to give a summary judgement.


All in all, I see the controversial / messy parts in the platform
bringup, and how all that differs from the traditional ("legacy") OVMF
platforms. I admit I *may* be biased in favor of SEV, possibly because
SEV landed first -- if you find signs of such a bias in my comments,
please don't hesitate to debunk those points. Yet my general impression
is that the early bringup stuff is significantly different from
everything before, and because of this, a separate platform is
justified.

Definitely separate from the traditional OVMF IA32, IA32X64, and X64
platforms, and *possibly* separate from the "remote attestation"
AmdSevX64.dsc platform. I would approach the TDX feature-set in complete
isolation (exactly how Intel commenced the work, if I understand
correctly), modulo obviously shareable / reusable parts, and then slowly
& gradually work on extracting / refactoring commonalities.

(But, given my stance on Xen for example, I could disagree even with the
latter, retroactive kind of unification -- it all boils down to shared
developer and user base. Component sharing should reflect the community
structure, otherwise maintenance will be a nightmare.)

Thanks
Laszlo





Min Xu <min.m.xu@...>
 

On 06/04/2021 12:12 AM, Laszlo wrote:
On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the feedback
inserted at the proper spots that has been given in the off-list thread since
then:


*** Slides 4, 6, 7: the "one binary requirement".
I would like to hear other reviewers' comments on the "one binary requirement".

(1) The presentation refers to "OvmfPkgX64.dsc" as "the one" on slide#4, but
then the explanation for the requirement, given on slide 7, speaks about
"common attestation interface".

I think we have a misunderstanding here. The "OvmfPkgX64.dsc" platform
indeed contains SEV, SEV-ES, and (in the future, will contain) SEV-SNP
support. In that sense, adding TDX support to the same platform should be
(hopefully) possible, at the cost of ugly gymnastics in the reset vector.

But "OvmfPkgX64.dsc" is *already* different from the remotely attested
OVMF platform, namely "OvmfPkg/AmdSev/AmdSevX64.dsc".

The latter has some additional modules (secret PEIM and DXE driver), it has
the Grub binary built in, and -- probably most importantly -- it trusts host-
originated information less than "OvmfPkgX64.dsc".

For example, "OvmfPkg/AmdSev/AmdSevX64.dsc" has a dedicated
PlatformBootManagerLib instance, one that ignores the non-volatile UEFI
variables Boot#### and BootOrder, and ignores (thus far) the fw_cfg
originated kernel/initrd/cmdline as well.

It remains an "area of research" to see what else should be removed from
the traditional host-guest integration (which integration is usually desirable
for management and convenience), in the remotely-attested boot scenario.
See e.g.
<https://bugzilla.tianocore.org/show_bug.cgi?id=3180>.

My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology* in
itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for remote
attestation, which has a much broader scope, involving multiple computers,
networking, deployment, guest-owner/host-owner separation, whatever.
For the latter, we needed a separate platform anyway, even with only SEV in
mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc" exists.


*** Slides 8-9: general boot flow -- TDVF; TDVF Flow

I'm likely missing a big bunch of background here, so just a few
questions:

(2) Not sure what RTMR is, but it's associated with "Enable TrustedBoot"
-- so is a virtual TPM a hard requirement?

... Back from slide 10: "TCG measurement and event log framework w/o
TPM" -- that's curious.
Comments of Dave & Erdem are pretty good to explain what RTMR is.
[response from Dave Gilbert:]
My reading of this is that the RTMR (and another set of similar
registers) are a TDX thing that is like the PCRs from a TPM but
without the rest of the TPM features; so you can do the one-way
measurement into the RTMRs just like you do into a TPM PCR, and the
measurements pop out somewhere in the TDX quote. Just like a TPM you
need the event log to make any sense of how the final hashed value
supposedly got to where it did.
[response from Erdem Aktas:]
+1 to David on this. TDX provides 2 kinds of measurement registers:
MRTDs and RTMRs
(https://software.intel.com/content/dam/develop/external/us/en/documen
ts/tdx-module-1eas-v0.85.039.pdf section 10.1.2) . MRTDs are
build-time measurement registers which are updated when TDX is being
created. Once TDX is finalized (before the first run), the MRTDs are
finalized and cannot be updated anymore. On the other hand, while the
TD is running, TD can extend RTMRs through TDCALLs which will provide
TPM PCR kind of capabilities.
... Back from slide 43: feel free to skip this now; I will comment in more detail
below.

(3) Prepare AcpiTable -- OVMF fetches ACPI tables from QEMU; is this a new
(firmware originated) table that is installed in addition, or does it replace
QEMU's tables?

... Ignore for now, will comment on the MADT stuff later.

(4) Does DMA management mean a different IOMMU protocol? That is going
to conflict with the SEV IOMMU protocol. Depexes in OVMF expect one or
zero IOMMU protocols to be present.
There is only one IOMMU protocol. We will merge TDX features into the current SEV IOMMU implementation. Td or Non-Td will be probed in run-time, so that the corresponding APIs will be called to clear/set the memory encryption mask for SEV or the shared bit for TDX.
... Back from slide 40: feel free to skip this now; I'll comment on this
separately, below.

(5) Enumerate VirtIO -- virtio enumeration is PCI based on x86. But I see no
mention of PCI. If you mean VirtioMmioDeviceLib, that's no good, because it
does not support virtio-1.0, only virtio-0.9.5.

... Back from slide 42: I got my answer to this on slide 42, so don't worry
about this point.
I will comment it in my later response.
[...]
(6) The PEI phase is skipped as a whole. I don't see how that can be
reasonably brought together with either "OvmfPkgX64.dsc" or
"OvmfPkg/AmdSev/AmdSevX64.dsc". I guess you can always modify SEC to
jump into DXE directly, but then why keep the PEI core and a bunch of PEIMs
in the firmware binary?
This is because of the *one binary*. In non-Td guest, Legacy OVMF still need PEI core and the PEIMs.

Also touched on by slide 9, TDVF Flow -- PEI is eliminated but SEC becomes
more heavy-weight.
I have to admit SEC is more heavy-weight in Td guest. TdxStartupLib wraps the whole stuff in SEC phase.

Wouldn't this deserve a dedicated, separate platform DSC? The 8-bit/32-bit
branching at the front of the reset vector is a smaller complication in
comparison.

Slide 6 references the mailing list archive:

https://edk2.groups.io/g/devel/topic/81969494#74319

and in that message, I wrote:

I'm doubtful that this is a unique problem ("just fix the reset
vector") the likes of which will supposedly never return during the
integration of SEV and TDX

See also:

https://listman.redhat.com/archives/edk2-devel-archive/2021-
April/msg00784.html

where I said:

It's not lost on me that we're talking about ~3 instructions.

Let's keep a close eye on further "polymorphisms" that would require
hacks.

The first 9 slides in the presentation introduce much-much more intrusive
problems than the "polymorphism" of the reset vector. Would I be correct to
say that my concern in the above messages was right? I think I was only given
a fraction of the necessary information when I was asked "do you agree 'one
binary' is viable".

[response from Erdem Aktas:]
Let's not worry about this for now. We want the one binary solution
for practical reasons and also for simplicity. In the end, we want to
do what is right and good for everyone.
Those are legit concerns and I think what Intel is trying to do (sorry
for mind reading) is to discuss all those concerns and questions to
make the right decision. I really appreciate their effort on preparing
those slides and bringing it to the community to review.

I will also read your comments more carefully and provide my thoughts
on them.
Sorry for being a little slow on this.

*** Slide 10 -- Key impact to firmware

(7) "CPUs are left running in long mode after exiting firmware" -- what kind of
"pen" are we talking about? Does a HLT loop still work?
It is expected that TDX-VMM lauchs all CPUs to the ResetVector(0xfffffff0). After reset, all CPUs run the same initialization code (protected mode -> long mode) until Sec Entrypoint. (see slides 22)
After that APs spin at TdMailBox waiting for the commands (from BSP). Before jumping to DXE phase, TdMailBox will be relocated to a new address which is recorded in MADT table. APs then are waked up by BSP and spin at that new TdMailbox and wait for command.
After exiting firmware, APs are spinning and waiting for OS to wake them up. OS send the wake up command to the TdMailbox by reading the MADT table.

(8) "I/O, MMIO, MSR accesses are different" -- those are implemented by
low-level libraries in edk2; how can they be configured dynamically?
I will add more description of the basic IoLib in next version design slides.
Simply say we probe Td or Non-Td in run-time and then call the corresponding IO operation.
UINT8
EFIAPI
MmioRead8 (
IN UINTN Address
)
{
UINT8 Value;

if (IsTdGuest ()) {
Value = TdMmioRead8 (Address);
return Value;
}

MemoryFence ();
Value = *(volatile UINT8*)Address;
MemoryFence ();

return Value;
}

... Back from slide 53: I'll comment on slide 53 separately; ignore this.


*** Slide 11 -- TDVF Image (1)

(9) CFV -- Configuration Firmware Volume (VarStore.fdf.inc), containing SB
keys -- how is this firmware volume populated (at build time)? Is this a
hexdump?
CFV is populated in post build. We can provide such python scripts to do the SB keys enrollment.

I will continue my comments in my next mail.
[To be continued]
Thanks!


Laszlo Ersek
 

On 06/04/21 01:19, Yao, Jiewen wrote:
Hi Laszlo.

To clarify your "one binary" feedback below, do you mean you suggest A) create a separate DSC (for example OvmfPkg/ConfidentialComputing.dsc) for a full solution including AMD SEC + Intel TDX + NonConfidentialComputing?
Or B) to create a standalone DSC for Intel TDX (for example, create a OvmfPkg/IntelTdx/IntelTdxXS64.dsc) ?

To me, A) does not change too much, we just create another DSC file - that is it.
Then the original OvmfPkgX64.dsc will only be used for POC/Testing purpose. It does not provide any security guarantee.
(The threat model is: we don't trust VMM. Without attestation, you don't know if VMM modified the OVMF.)

I don't know how "simply" it means. To enable TDX to make it work is not a simple work.
Some architecture changes are mandatory, such as reset vector, IO/MMIO access, #VE handler, IOMMU based shared memory access, etc. I think AMD SEV already did those.
I mean option (B). Create a completely separate DSC+FDF for Intel TDX.

In my mind, there are two (very high level) stages for developing the
"Confidential Computing with TDX" feature in edk2.

Stage 1: allow a guest to run in a TDX domain. "Guest owner" and "Cloud
owner" are *not* separate concepts in this stage.

Stage 2: enable such a guest to be deployed remotely to an untrusted
cloud, and ensure its integrity via remote attestation.


Stage 1 is *hugely different* between AMD SEV* technologies and Intel
TDX. That's why we need, in my opinion, a separate DSC+FDF for Intel TDX
right from the start. This does not necessarily mean that we need to
duplicate every single module (library, PEIM, DXE driver, etc) for Intel
TDX. Wherever it makes sense, and the changes are not very intrusive or
wasteful (considering binary code that becomes "dead" on in a TDX
guest), we can modify (extend) existent modules in-place.

Stage 1 is complete for AMD SEV and AMD SEV-ES. AMD SEV-SNP is in
progress. These AMD SEV* technologies have been possible to integrate
(thus far) into the existing "OvmfPkg/OvmfPkgX64.dsc" platform. But
Intel TDX is very different from that, even in Stage 1.


Stage 2 is far out in the future, for Intel TDX. I have no idea about
it, but whatever it's going to look like, it will be based on Stage 1.

The "OvmfPkg/AmdSev/AmdSevX64.dsc" platform is *one approach* for
implementing Stage 2. And this platform utilizes AMD SEV* technologies
only (thus far). *If* and *how much* the approach of
"OvmfPkg/AmdSev/AmdSevX64.dsc" will apply to Intel TDX -- once Stage 1
of Intel TDX is complete --, I cannot predict.


The underlying physical hardware features are completely different
between AMD SEV* and Intel TDX, as much as I can tell. It makes zero
sense to me to start from a unified perspective, and shoehorn everything
possible (and impossible) into a common blob and framework. That
approach has never *ever* worked for me, not even when I started working
on the virtio drivers for OVMF 9 years ago. I extracted VirtioLib only
when I worked on the second virtio driver -- that is, when I was about
to have *working code* for two *distinct* virtio devices, and it was
possible to identify commonalities between the drivers, and to extract
those commonalities. The fact that I knew, in advance, that my "end
goal" was the same with these devices, namely to "boot from them", made
no difference whatsoever. I still I had to start implementing them in
separate sandboxes. Soon enough, VirtioLib emerged, and later
VIRTIO_DEVICE_PROTOCOL emerged even.

I'm 100% incapable of dealing with a top-down approach here. Only
bottom-up works for me.


Most importantly, the OvmfPkgIa32, OvmfPkgIa32X64, OvmfPkgX64 platforms
must be kept regression-free, and preferably their complexity should be
kept manageable for maintenance too. These platforms are *entirely
irrelevant* for Stage 2 (regardless of underlying security technology).
They *happen* to be relevant for Stage 1 of AMD SEV*, purely because SEV
proved possible to integrate into them, in very small, well isolated,
surgical advances, without upsetting everything. But I can tell upfront
that Intel TDX is way more intrusive than anything I've seen thus far in
AMD SEV*. So I want Intel TDX to exist in a brand new platform, even as
Stage 1. And, to reiterate, just because Confidential Computing is the
new hot thing, the use cases for OvmfPkgIa32, OvmfPkgIa32X64, OvmfPkgX64
do not disappear. Regressing them, or making them unmaintainable due to
skyrocketing complexity, is not acceptable.

Thanks
Laszlo


===================
My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc" exists.
===================

Thank you
Yao Jiewen

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Laszlo Ersek
Sent: Friday, June 4, 2021 12:12 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; rfc@edk2.groups.io;
devel@edk2.groups.io
Cc: jejb@linux.ibm.com; Brijesh Singh <brijesh.singh@amd.com>; Tom
Lendacky <thomas.lendacky@amd.com>; erdemaktas@google.com;
cho@microsoft.com; bret.barkelew@microsoft.com; Jon Lange
<jlange@microsoft.com>; Karen Noel <knoel@redhat.com>; Paolo Bonzini
<pbonzini@redhat.com>; Nathaniel McCallum <npmccallum@redhat.com>;
Dr. David Alan Gilbert <dgilbert@redhat.com>; Ademar de Souza Reis Jr.
<areis@redhat.com>
Subject: Re: [edk2-rfc] [edk2-devel] RFC: design review for TDVF in OVMF

On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the
feedback inserted at the proper spots that has been given in the
off-list thread since then:


*** Slides 4, 6, 7: the "one binary requirement".

(1) The presentation refers to "OvmfPkgX64.dsc" as "the one" on slide#4,
but then the explanation for the requirement, given on slide 7, speaks
about "common attestation interface".

I think we have a misunderstanding here. The "OvmfPkgX64.dsc" platform
indeed contains SEV, SEV-ES, and (in the future, will contain) SEV-SNP
support. In that sense, adding TDX support to the same platform should
be (hopefully) possible, at the cost of ugly gymnastics in the reset
vector.

But "OvmfPkgX64.dsc" is *already* different from the remotely attested
OVMF platform, namely "OvmfPkg/AmdSev/AmdSevX64.dsc".

The latter has some additional modules (secret PEIM and DXE driver), it
has the Grub binary built in, and -- probably most importantly -- it
trusts host-originated information less than "OvmfPkgX64.dsc".

For example, "OvmfPkg/AmdSev/AmdSevX64.dsc" has a dedicated
PlatformBootManagerLib instance, one that ignores the non-volatile UEFI
variables Boot#### and BootOrder, and ignores (thus far) the fw_cfg
originated kernel/initrd/cmdline as well.

It remains an "area of research" to see what else should be removed from
the traditional host-guest integration (which integration is usually
desirable for management and convenience), in the remotely-attested boot
scenario. See e.g.
<https://bugzilla.tianocore.org/show_bug.cgi?id=3180>.

My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc"
exists.


*** Slides 8-9: general boot flow -- TDVF; TDVF Flow

I'm likely missing a big bunch of background here, so just a few
questions:

(2) Not sure what RTMR is, but it's associated with "Enable TrustedBoot"
-- so is a virtual TPM a hard requirement?

... Back from slide 10: "TCG measurement and event log framework w/o
TPM" -- that's curious.

[response from Dave Gilbert:]
My reading of this is that the RTMR (and another set of similar
registers) are a TDX thing that is like the PCRs from a TPM but
without the rest of the TPM features; so you can do the one-way
measurement into the RTMRs just like you do into a TPM PCR, and the
measurements pop out somewhere in the TDX quote. Just like a TPM you
need the event log to make any sense of how the final hashed value
supposedly got to where it did.
[response from Erdem Aktas:]
+1 to David on this. TDX provides 2 kinds of measurement registers:
MRTDs and RTMRs
(https://software.intel.com/content/dam/develop/external/us/en/docume
nts/tdx-module-1eas-v0.85.039.pdf
section 10.1.2) . MRTDs are build-time measurement registers which are
updated when TDX is being created. Once TDX is finalized (before the
first run), the MRTDs are finalized and cannot be updated anymore. On
the other hand, while the TD is running, TD can extend RTMRs through
TDCALLs which will provide TPM PCR kind of capabilities.
... Back from slide 43: feel free to skip this now; I will comment in
more detail below.

(3) Prepare AcpiTable -- OVMF fetches ACPI tables from QEMU; is this a
new (firmware originated) table that is installed in addition, or does
it replace QEMU's tables?

... Ignore for now, will comment on the MADT stuff later.

(4) Does DMA management mean a different IOMMU protocol? That is
going
to conflict with the SEV IOMMU protocol. Depexes in OVMF expect one or
zero IOMMU protocols to be present.

... Back from slide 40: feel free to skip this now; I'll comment on this
separately, below.

(5) Enumerate VirtIO -- virtio enumeration is PCI based on x86. But I
see no mention of PCI. If you mean VirtioMmioDeviceLib, that's no good,
because it does not support virtio-1.0, only virtio-0.9.5.

... Back from slide 42: I got my answer to this on slide 42, so don't
worry about this point.

(6) The PEI phase is skipped as a whole. I don't see how that can be
reasonably brought together with either "OvmfPkgX64.dsc" or
"OvmfPkg/AmdSev/AmdSevX64.dsc". I guess you can always modify SEC to
jump into DXE directly, but then why keep the PEI core and a bunch of
PEIMs in the firmware binary?

Also touched on by slide 9, TDVF Flow -- PEI is eliminated but SEC
becomes more heavy-weight.

Wouldn't this deserve a dedicated, separate platform DSC? The
8-bit/32-bit branching at the front of the reset vector is a smaller
complication in comparison.

Slide 6 references the mailing list archive:

https://edk2.groups.io/g/devel/topic/81969494#74319

and in that message, I wrote:

I'm doubtful that this is a unique problem ("just fix the reset
vector") the likes of which will supposedly never return during the
integration of SEV and TDX

See also:

https://listman.redhat.com/archives/edk2-devel-archive/2021-
April/msg00784.html

where I said:

It's not lost on me that we're talking about ~3 instructions.

Let's keep a close eye on further "polymorphisms" that would require
hacks.

The first 9 slides in the presentation introduce much-much more
intrusive problems than the "polymorphism" of the reset vector. Would I
be correct to say that my concern in the above messages was right? I
think I was only given a fraction of the necessary information when I
was asked "do you agree 'one binary' is viable".

[response from Erdem Aktas:]
Let's not worry about this for now. We want the one binary solution
for practical reasons and also for simplicity. In the end, we want to
do what is right and good for everyone.
Those are legit concerns and I think what Intel is trying to do (sorry
for mind reading) is to discuss all those concerns and questions to
make the right decision. I really appreciate their effort on preparing
those slides and bringing it to the community to review.

I will also read your comments more carefully and provide my thoughts
on them.
Sorry for being a little slow on this.

*** Slide 10 -- Key impact to firmware

(7) "CPUs are left running in long mode after exiting firmware" -- what
kind of "pen" are we talking about? Does a HLT loop still work?

(8) "I/O, MMIO, MSR accesses are different" -- those are implemented by
low-level libraries in edk2; how can they be configured dynamically?

... Back from slide 53: I'll comment on slide 53 separately; ignore
this.


*** Slide 11 -- TDVF Image (1)

(9) CFV -- Configuration Firmware Volume (VarStore.fdf.inc),
containing SB keys -- how is this firmware volume populated (at build
time)? Is this a hexdump?

... Back from slide 16: it seems like CFV is a raw hexdump indeed; how
is that managed when keys change (at build time)?

(10) This slide (slide 11) basically describes an intrusive
reorganization of "OvmfPkgX64.fdf". I don't think I can agree to that.
While confidential computing is important, it is not relevant for many
users. Even if we don't cause outright regressions for existent setups,
the maintenance cost of the traditional OVMF platform will skyrocket.

The big bunch of areas that SEV-ES introduced to MEMFD is already a big
complication. I'd feel much better if we could isolate all that to a
dedicated "remote attested boot" firmware platform, and not risk the
functionality and maintenance of the traditional platform. I think this
ties in with my comment (1).

For example, seeing a configuration firmware volume (CFV) with secure
boot keys embedded, in the "usual" FDF, will confuse lots of people, me
included. In the traditional OVMF use case, we use a different method:
namely OvmfPkg/EnrollDefaultKeys, for "priming" a variable store
template, in the build environment.

Edk2 (and PI and UEFI) are definitely flexible enough for accommodating
TDX, but the existent, traditional OVMF platforms are a bad fit. In my
opinion.


*** Slide 12: TDVF Image (2)

(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX support
in small, well-isolated layers. 5-level paging is not enabled (has never
been tested, to my knowledge) with OVMF on QEMU/KVM, regardless of
confidential computing, for starters. If 5-level paging is a strict
requirement for TDX, then it arguably needs to be implemented
independently of TDX, at first. So that the common edk2 architecture be
at least testable on QEMU/KVM with 5-level paging enabled.


*** Slide 13:

(12) My comment is that the GUID-ed structure chain already starts at a
fixed GPA (the "AMD SEV option"). Ordering between GUID-ed structures is
irrelevant, so any TDX-specific structures should be eligible for
appending to, prepending to, or intermixing with, other (possibly
SEV-specific) structures. There need not be a separate entry point, just
different GUIDS.

(13) Regarding "4G-0x20[0x10] is OVMF AP reset vector (used in OVMF
implementation)" -- I think this is a typo: this "AP reset vector" is
*not* used in OVMF. To my knowledge, it is a vestige from the UefiCpuPkg
reset vector. In OVMF, APs are booted via MpInitLib (in multiple
firmware phases), using INIT-SIPI-SIPI, and the reset vector for the
APs, posited through those IPIs, is prepared in low RAM.


*** Slides 14 through 16:

I consider these TDVF firmware image internals, implementation details
-- do whatever you need to do, just don't interfere with existing
platforms / use cases. See my comment (10) above.


*** Slides 17-21:

(14) Again, a very big difference from traditional OVMF: APs never enter
SEC in traditional OVMF. I assume this new functionality is part of
TdxStartupLib (from slide 18) -- will there be a Null instance of that?

Last week I posted a 43-part patch series to edk2-devel, for separating
out the dynamic Xen enlightenments from the IA32, IA32X64, X64
platforms, in favor of the dedicated OvmfXen platform. TDX seems to
bring in incomparably more complications than Xen, and the OvmfPkg
maintainers have found even the Xen complications troublesome in the
long term.

If I had had access to all this information when we first discussed "one
binary" on the mailing list, I'd have never agreed to "one binary". I'm
OK with attempting one firmware binary for "confidential computing", but
that "one platform" cannot be "OvmfPkgX64.dsc".

Even if I make a comparison with just the "technology" (not the
remotely-attested deployment) of SEV and SEV-ES, as it is included in
"OvmfPkgX64.dsc", TDX is hugely more complicated and intrusive than
that. SEV proved possible to integrate into existing modules, into the
existing boot flow, maybe through the addition of some new drivers (such
as a new IOMMU protocol implementation, and some "clever" depexes).
But
we never had to restructure the FD layout, eliminate whole firmware
phases, or think about multiprocessing in the reset vector or the SEC
phase.

In order to bring an example from the ARM world, please note that
platforms that use a PEI phase, and platforms that don't, are distinct
platforms. In ArmVirtPkg, two examples are ArmVirtQemu and
ArmVirtQemuKernel. The latter does not include the PEI Core.


*** Slides 22 through 34:

(15) All these extra tasks and complications are perfectly fine, as long
as they exist peacefully *separately* from the traditional ("legacy")
OVMF platforms.

Honestly, in the virtual world, picking your firmware binary is easy.
The approach here reminds me of a physical firmware binary that includes
everything possible from "edk2-platforms/Features/Intel", just so it can
be deployed to any physical board imaginable. That's not how Intel
builds physical firmware, right? We have "edk2-platforms/Platform/Intel"
and "edk2-platforms/Silicon/Intel" with many-many separate DSC files.


*** Slide 35-36: DXE phase

(16) "Some DXE Drivers not allowed to load/start in Td guest -- Network
stack, RNG, ..."

Same comment as (several times) above. The Linuxboot project is a good
example for eliminating cruft from DXEFV (in fact, for eliminating most
of the DXE phase). In a TDX environment, why include drivers in the
firmware binary that are never used? Meanwhile, DXEFV in OVMF grows by
a
MB every 1.5 years or so. Again, remove these drivers from the DSC/FDF
then, and it needs to be a separate platform.

(17) "Other DXE Phase drivers -- [...] AcpiPlatformDxe"

I'm not sure what this section is supposed to mean. Other DXE phase
drivers included, or excluded? Without AcpiPlatformDxe, the guest OS
will not see QEMU's ACPI content, and will almost certainly malfunction.

... Back from slide 48: ignore this for now, I'll comment in more detail
later.


*** Slide 37: DXE Core

(18) says "SMM is not supported in Td guest" -- how is the variable
store protected from direct hardware (pflash) access from the guest OS?
Without SMM, the guest OS need not go through gRT->SetVariable() to
update authenticated non-volatile UEFI variables, and that undermines
Secure Boot.

Note that, while SEV-ES has the same limitation wrt. SMM, the
"OvmfPkg/AmdSev/AmdSevX64.dsc" platform doesn't even include the
Secure
Boot firmware feature. For another example, the OVMF firmware binary in
RHEL that's going to support SEV-ES is such a build of "OvmfPkgX64.dsc"
that does not include the Secure Boot feature either.

But in this TDX slide deck, Secure Boot certificates are embedded in the
CFV (configuration firmware volume) -- see slide 11 and slide 16 --,
which suggests that this platform does want secure boot.

... Back from slide 48: I'm going to make *additional* comments on this,
when I'm at slide 48, too.

The rest of this slide (slide 37) looks reasonable (generic DXE Core
changes -- possibly PI spec changes too).


*** Slides 38 through 39:

These seem reasonable (TdxDxe assumes some responsibilities of
OvmfPkg/PlatformPei)


*** Slides 40 through 42:

*If* you really can implement TDX support for the IOMMU driver *this*
surgically, then I'm OK with it. The general logic in the IOMMU driver
was truly difficult to write, and I'd be seriously concerned if those
parts would have to be modified. Customizing just the page
encryption/decryption primitives for TDX vs. SEV is OK.


*** Slides 43 through 47:

(19) Slide 46 and slide 47 are almost identical. Please consolidate them
into a single slide.

(20) the TPM2 infrastructure in edk2 is baroque (over-engineered), in my
opinion. It has so many layers that I can never keep them in mind. When
we added TPM support to OVMF, I required commit messages that would
help
us recall the layering. In particular, please refer to commit
0c0a50d6b3ff ("OvmfPkg: include Tcg2Dxe module", 2018-03-09). Here's an
excerpt:

TPM 2 consumer driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibTcg2 instance
|
v
TCG2 protocol interface
|
v
TCG2 protocol provider: Tcg2Dxe.inf driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibRouter instance
|
v
NULL class: Tpm2InstanceLibDTpm instance
(via earlier registration)
|
v
TPM2 chip (actual hardware)

The slide deck says that EFI_TD_PROTOCOL is supposed to reuse the
EFI_TCG2_PROTOCOL definition. If that's the case, then why don't we push
the TDX specifics (more or less: the replacement of PCRs with RTMR) down
to the lowest possible level?

Can we have "Tpm2InstanceLibTdxRtmr", plugged into the same Tcg2Dxe.inf
driver?

If not, can we have a new TdTcg2Dxe.inf driver, but make it so that it
install the same protocol as before (EFI_TCG2_PROTOCOL -- same old
protocol GUID)? Then DxeTpmMeasurementLib doesn't have to change.

As long as there is *at most* one EFI_TCG2_PROTOCOL instance published
in the protocol database, DxeTpmMeasurementLib should be fine. In SEV*
guests, the standard Tcg2Dxe driver provides that protocol. In TDX
guests, TdTcg2Dxe.inf should provide the protocol. Arbitration between
the two can be implemented with the pattern seen in the following
commits:

1 05db0948cc60 EmbeddedPkg: introduce EDKII Platform Has ACPI GUID
2 786f476323a6 EmbeddedPkg: introduce PlatformHasAcpiLib
3 65a69b214840 EmbeddedPkg: introduce EDKII Platform Has Device Tree
GUID
4 2558bfe3e907 ArmVirtPkg: add PlatformHasAcpiDtDxe

The basic idea is that Tcg2Dxe can have a depex on a new "running in
SEV" protocol GUID, and TdTcg2Dxe can have a depex on a new "running in
TDX" protocol GUID. A separate platform driver can install the proper
GUID -- possibly *neither* of those GUIDs.

And, we don't have to change the depex section of
"SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.inf" for this; we can implement a
library instance with an empty constructor, but a non-empty depex, and
then hook this lib instance into Tcg2Dxe.inf via module scope NULL lib
class override in the DSC file. Basically we could forcibly restrict
Tcg2Dxe's DEPEX by making it inherit the new DEPEX from the library.


*** Slide 48: DXE Phase -- Other Modules

Regarding IncompatiblePciDeviceSupportDxe: the proposed update sounds
plausible and simple enough.

(21) "AcpiPlatformDxe: Support for MADT/ACPI addition to report Td
Mailbox entry"

Firmware-owned tables must not be installed from this driver.

Please refer to my "Xen removal" patch set again, for
<https://bugzilla.tianocore.org/show_bug.cgi?id=2122>, which I mention
above in point (14). As part of the Xen removal, the AcpiPlatformDxe
driver in OvmfPkg is significantly trimmed: all unused (dead) cruft is
removed, including any ACPI table templates that are built into the
firmware.

OvmfPkg/AcpiPlatformDxe is responsible *solely* for implementing the
client side of the QEMU ACPI linker/loader.

If you need to prepare & install different ACPI tables, please do it
elsewhere, in another DXE driver. A bunch of other firmware modules do
that (NFIT, IBFT, BGRT, ...).

For example, the OvmfPkg/TdxDxe DXE_DRIVER is supposed to be launched
early in the DXE phase, via APRIORI section -- please consider
registering a protocol notify in that driver, for
EFI_ACPI_TABLE_PROTOCOL, and when it becomes available, install
whatever
*extra* tables you need.

Note that if you need to *customize* an ACPI table that QEMU already
provides, then you will have to modify the ACPI generator on the QEMU
side. It is a design tenet between QEMU and OVMF that OVMF include no
business logic for either parsing or fixing up ACPI tables that QEMU
provides. AcpiPlatformDxe contains the minimum (which is already a whole
lot, unfortunately) that's necessary for implementing the QEMU ACPI
linker/loader client in the UEFI environment.

The slide deck mentions MADT, which is also known as the "APIC" table --
and indeed, QEMU does provide that. (See acpi_build_madt()
[hw/i386/acpi-common.c].) So if TDX needs MADT customizations, that
should go into QEMU.

(22) EmuVariableFvbRuntimeDxe

Ouch, this is an unpleasant surprise.

First, if you know for a fact that pflash is not part of the *board* in
any TDX setup, then pulling

OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbServicesRuntimeDxe.inf

into the firmware platform is useless, as it is mutually exclusive with

OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.inf

(via dynamic means -- a dynamic PCD).

Note that the FDF file places QemuFlashFvbServicesRuntimeDxe in APRIORI
DXE when SMM_REQUIRE is FALSE. This driver checks for pflash presence,
and lets EmuVariableFvbRuntimeDxe perform its in-RAM flash emulation
only in case pflash is not found.

So this is again in favor of a separate platform -- if we know pflash is
never part of the board, then QemuFlashFvbServicesRuntimeDxe is never
needed, but you cannot remove it from the traditional DSC/FDF files.

Second, EmuVariableFvbRuntimeDxe consumes the PlatformFvbLib class,
for
the PlatformFvbDataWritten() API (among other things). This lib class is
implemented by two instances in OvmfPkg, PlatformFvbLibNull and
EmuVariableFvbLib. The latter instance allows Platform BDS to hook an
event (for signaling) via "PcdEmuVariableEvent" into the
EmuVariableFvbRuntimeDxe driver.

In old (very old) QEMU board configurations, namely those without
pflash, this (mis)feature is used by OVMF's PlatformBootManagerLib to
write out all variables to the EFI system partition in a regular file
called \NvVars, with the help of NvVarsFileLib, whenever
EmuVariableFvbRuntimeDxe writes out an emulated "flash" block. For this
purpose, the traditional OVMF DSC files link EmuVariableFvbLib into
EmuVariableFvbRuntimeDxe.

But it counts as an absolute disaster nowadays, and should not be
revived in any platform. If you don't have pflash in TDX guests, just
accept that you won't have non-volatile variables. And link
PlatformFvbLibNull into EmuVariableFvbRuntimeDxe. You're going to need
a
separate PlatformBootManagerLib instance anyway.

(We should have removed EmuVariableFvbRuntimeDxe a long time ago
from
the traditional OVMF platforms, i.e. made pflash a hard requirement,
even when SMM is not built into the platform -- but whenever I tried
that, Jordan always shot me down.)

My point is: using EmuVariableFvbRuntimeDxe in the TDX platform may be
defensible per se, but we must be very clear that it will never provide
a standards-conformant service for non-volatile UEFI variables, and we
must keep as much of the \NvVars mess out of EmuVariableFvbRuntimeDxe
as
possible. This will require a separate PlatformBootManagerLib instance
for TDX anyway (or maybe share PlatformBootManagerLibGrub with
"OvmfPkg/AmdSev/AmdSevX64.dsc").

Apart from the volatility aspect, let's assume we have this in-RAM
emulated "flash device", storing authenticated UEFI variables for Secure
Boot purposes. And we don't have SMM.

What protects this in-RAM variable store from tampering by the guest OS?
It's all guest RAM only, after all. What provides the privilege barrier
between the guest firmware and the guest OS?


*** Slide 50: Library

(23) Should we introduce Null instances for all (or most) new lib
classes here? Code size is a concern (static linking). If we extend a
common OvmfPkg module with new hook points, it's one thing to return
from that hook point early *dynamically*, but it's even better (given
separate platforms) to allow the traditional platform firmware to use a
Null lib instance, to cut out all the dead code statically.


*** Slides 51 through 52

Seems OK.


*** Slide 53:

(24) It might be worth noting that BaseIoLibIntrinsic already has some
SEV enlightenment, as the FIFO IO port operations (which normally use
the REP prefix) are not handled on SEV. I don't have an immediate idea
why this might matter, we should just minimize code duplication if
possible.


*** Slides 54-56:

No comments, this stuff seems reasonable.


*** Slide 57: MpInitLib

I don't know enough to give a summary judgement.


All in all, I see the controversial / messy parts in the platform
bringup, and how all that differs from the traditional ("legacy") OVMF
platforms. I admit I *may* be biased in favor of SEV, possibly because
SEV landed first -- if you find signs of such a bias in my comments,
please don't hesitate to debunk those points. Yet my general impression
is that the early bringup stuff is significantly different from
everything before, and because of this, a separate platform is
justified.

Definitely separate from the traditional OVMF IA32, IA32X64, and X64
platforms, and *possibly* separate from the "remote attestation"
AmdSevX64.dsc platform. I would approach the TDX feature-set in complete
isolation (exactly how Intel commenced the work, if I understand
correctly), modulo obviously shareable / reusable parts, and then slowly
& gradually work on extracting / refactoring commonalities.

(But, given my stance on Xen for example, I could disagree even with the
latter, retroactive kind of unification -- it all boils down to shared
developer and user base. Component sharing should reflect the community
structure, otherwise maintenance will be a nightmare.)

Thanks
Laszlo





Yao, Jiewen
 

thank you Laszlo. Your feedback is received.
I am waiting for comment from other people.

thank you!
Yao, Jiewen

在 2021年6月4日,下午6:11,Laszlo Ersek <lersek@redhat.com> 写道:

On 06/04/21 01:19, Yao, Jiewen wrote:
Hi Laszlo.

To clarify your "one binary" feedback below, do you mean you suggest A) create a separate DSC (for example OvmfPkg/ConfidentialComputing.dsc) for a full solution including AMD SEC + Intel TDX + NonConfidentialComputing?
Or B) to create a standalone DSC for Intel TDX (for example, create a OvmfPkg/IntelTdx/IntelTdxXS64.dsc) ?

To me, A) does not change too much, we just create another DSC file - that is it.
Then the original OvmfPkgX64.dsc will only be used for POC/Testing purpose. It does not provide any security guarantee.
(The threat model is: we don't trust VMM. Without attestation, you don't know if VMM modified the OVMF.)

I don't know how "simply" it means. To enable TDX to make it work is not a simple work.
Some architecture changes are mandatory, such as reset vector, IO/MMIO access, #VE handler, IOMMU based shared memory access, etc. I think AMD SEV already did those.
I mean option (B). Create a completely separate DSC+FDF for Intel TDX.

In my mind, there are two (very high level) stages for developing the
"Confidential Computing with TDX" feature in edk2.

Stage 1: allow a guest to run in a TDX domain. "Guest owner" and "Cloud
owner" are *not* separate concepts in this stage.

Stage 2: enable such a guest to be deployed remotely to an untrusted
cloud, and ensure its integrity via remote attestation.


Stage 1 is *hugely different* between AMD SEV* technologies and Intel
TDX. That's why we need, in my opinion, a separate DSC+FDF for Intel TDX
right from the start. This does not necessarily mean that we need to
duplicate every single module (library, PEIM, DXE driver, etc) for Intel
TDX. Wherever it makes sense, and the changes are not very intrusive or
wasteful (considering binary code that becomes "dead" on in a TDX
guest), we can modify (extend) existent modules in-place.

Stage 1 is complete for AMD SEV and AMD SEV-ES. AMD SEV-SNP is in
progress. These AMD SEV* technologies have been possible to integrate
(thus far) into the existing "OvmfPkg/OvmfPkgX64.dsc" platform. But
Intel TDX is very different from that, even in Stage 1.


Stage 2 is far out in the future, for Intel TDX. I have no idea about
it, but whatever it's going to look like, it will be based on Stage 1.

The "OvmfPkg/AmdSev/AmdSevX64.dsc" platform is *one approach* for
implementing Stage 2. And this platform utilizes AMD SEV* technologies
only (thus far). *If* and *how much* the approach of
"OvmfPkg/AmdSev/AmdSevX64.dsc" will apply to Intel TDX -- once Stage 1
of Intel TDX is complete --, I cannot predict.


The underlying physical hardware features are completely different
between AMD SEV* and Intel TDX, as much as I can tell. It makes zero
sense to me to start from a unified perspective, and shoehorn everything
possible (and impossible) into a common blob and framework. That
approach has never *ever* worked for me, not even when I started working
on the virtio drivers for OVMF 9 years ago. I extracted VirtioLib only
when I worked on the second virtio driver -- that is, when I was about
to have *working code* for two *distinct* virtio devices, and it was
possible to identify commonalities between the drivers, and to extract
those commonalities. The fact that I knew, in advance, that my "end
goal" was the same with these devices, namely to "boot from them", made
no difference whatsoever. I still I had to start implementing them in
separate sandboxes. Soon enough, VirtioLib emerged, and later
VIRTIO_DEVICE_PROTOCOL emerged even.

I'm 100% incapable of dealing with a top-down approach here. Only
bottom-up works for me.


Most importantly, the OvmfPkgIa32, OvmfPkgIa32X64, OvmfPkgX64 platforms
must be kept regression-free, and preferably their complexity should be
kept manageable for maintenance too. These platforms are *entirely
irrelevant* for Stage 2 (regardless of underlying security technology).
They *happen* to be relevant for Stage 1 of AMD SEV*, purely because SEV
proved possible to integrate into them, in very small, well isolated,
surgical advances, without upsetting everything. But I can tell upfront
that Intel TDX is way more intrusive than anything I've seen thus far in
AMD SEV*. So I want Intel TDX to exist in a brand new platform, even as
Stage 1. And, to reiterate, just because Confidential Computing is the
new hot thing, the use cases for OvmfPkgIa32, OvmfPkgIa32X64, OvmfPkgX64
do not disappear. Regressing them, or making them unmaintainable due to
skyrocketing complexity, is not acceptable.

Thanks
Laszlo


===================
My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc" exists.
===================

Thank you
Yao Jiewen

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Laszlo Ersek
Sent: Friday, June 4, 2021 12:12 AM
To: Yao, Jiewen <jiewen.yao@intel.com>; rfc@edk2.groups.io;
devel@edk2.groups.io
Cc: jejb@linux.ibm.com; Brijesh Singh <brijesh.singh@amd.com>; Tom
Lendacky <thomas.lendacky@amd.com>; erdemaktas@google.com;
cho@microsoft.com; bret.barkelew@microsoft.com; Jon Lange
<jlange@microsoft.com>; Karen Noel <knoel@redhat.com>; Paolo Bonzini
<pbonzini@redhat.com>; Nathaniel McCallum <npmccallum@redhat.com>;
Dr. David Alan Gilbert <dgilbert@redhat.com>; Ademar de Souza Reis Jr.
<areis@redhat.com>
Subject: Re: [edk2-rfc] [edk2-devel] RFC: design review for TDVF in OVMF

On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the
feedback inserted at the proper spots that has been given in the
off-list thread since then:


*** Slides 4, 6, 7: the "one binary requirement".

(1) The presentation refers to "OvmfPkgX64.dsc" as "the one" on slide#4,
but then the explanation for the requirement, given on slide 7, speaks
about "common attestation interface".

I think we have a misunderstanding here. The "OvmfPkgX64.dsc" platform
indeed contains SEV, SEV-ES, and (in the future, will contain) SEV-SNP
support. In that sense, adding TDX support to the same platform should
be (hopefully) possible, at the cost of ugly gymnastics in the reset
vector.

But "OvmfPkgX64.dsc" is *already* different from the remotely attested
OVMF platform, namely "OvmfPkg/AmdSev/AmdSevX64.dsc".

The latter has some additional modules (secret PEIM and DXE driver), it
has the Grub binary built in, and -- probably most importantly -- it
trusts host-originated information less than "OvmfPkgX64.dsc".

For example, "OvmfPkg/AmdSev/AmdSevX64.dsc" has a dedicated
PlatformBootManagerLib instance, one that ignores the non-volatile UEFI
variables Boot#### and BootOrder, and ignores (thus far) the fw_cfg
originated kernel/initrd/cmdline as well.

It remains an "area of research" to see what else should be removed from
the traditional host-guest integration (which integration is usually
desirable for management and convenience), in the remotely-attested boot
scenario. See e.g.
<https://bugzilla.tianocore.org/show_bug.cgi?id=3180>.

My point is that the "one binary" that the slide deck refers to (i.e.,
OvmfPkgX64.dsc) might prove OK for utilizing the Intel TDX *technology*
in itself. Simply enabling OVMF + a guest OS to boot in a TDX domain.

But "OvmfPkgX64.dsc" is *not* the "one binary" that is suitable for
remote attestation, which has a much broader scope, involving multiple
computers, networking, deployment, guest-owner/host-owner separation,
whatever. For the latter, we needed a separate platform anyway, even
with only SEV in mind; that's why "OvmfPkg/AmdSev/AmdSevX64.dsc"
exists.


*** Slides 8-9: general boot flow -- TDVF; TDVF Flow

I'm likely missing a big bunch of background here, so just a few
questions:

(2) Not sure what RTMR is, but it's associated with "Enable TrustedBoot"
-- so is a virtual TPM a hard requirement?

... Back from slide 10: "TCG measurement and event log framework w/o
TPM" -- that's curious.

[response from Dave Gilbert:]
My reading of this is that the RTMR (and another set of similar
registers) are a TDX thing that is like the PCRs from a TPM but
without the rest of the TPM features; so you can do the one-way
measurement into the RTMRs just like you do into a TPM PCR, and the
measurements pop out somewhere in the TDX quote. Just like a TPM you
need the event log to make any sense of how the final hashed value
supposedly got to where it did.
[response from Erdem Aktas:]
+1 to David on this. TDX provides 2 kinds of measurement registers:
MRTDs and RTMRs
(https://software.intel.com/content/dam/develop/external/us/en/docume
nts/tdx-module-1eas-v0.85.039.pdf
section 10.1.2) . MRTDs are build-time measurement registers which are
updated when TDX is being created. Once TDX is finalized (before the
first run), the MRTDs are finalized and cannot be updated anymore. On
the other hand, while the TD is running, TD can extend RTMRs through
TDCALLs which will provide TPM PCR kind of capabilities.
... Back from slide 43: feel free to skip this now; I will comment in
more detail below.

(3) Prepare AcpiTable -- OVMF fetches ACPI tables from QEMU; is this a
new (firmware originated) table that is installed in addition, or does
it replace QEMU's tables?

... Ignore for now, will comment on the MADT stuff later.

(4) Does DMA management mean a different IOMMU protocol? That is
going
to conflict with the SEV IOMMU protocol. Depexes in OVMF expect one or
zero IOMMU protocols to be present.

... Back from slide 40: feel free to skip this now; I'll comment on this
separately, below.

(5) Enumerate VirtIO -- virtio enumeration is PCI based on x86. But I
see no mention of PCI. If you mean VirtioMmioDeviceLib, that's no good,
because it does not support virtio-1.0, only virtio-0.9.5.

... Back from slide 42: I got my answer to this on slide 42, so don't
worry about this point.

(6) The PEI phase is skipped as a whole. I don't see how that can be
reasonably brought together with either "OvmfPkgX64.dsc" or
"OvmfPkg/AmdSev/AmdSevX64.dsc". I guess you can always modify SEC to
jump into DXE directly, but then why keep the PEI core and a bunch of
PEIMs in the firmware binary?

Also touched on by slide 9, TDVF Flow -- PEI is eliminated but SEC
becomes more heavy-weight.

Wouldn't this deserve a dedicated, separate platform DSC? The
8-bit/32-bit branching at the front of the reset vector is a smaller
complication in comparison.

Slide 6 references the mailing list archive:

https://edk2.groups.io/g/devel/topic/81969494#74319

and in that message, I wrote:

I'm doubtful that this is a unique problem ("just fix the reset
vector") the likes of which will supposedly never return during the
integration of SEV and TDX

See also:

https://listman.redhat.com/archives/edk2-devel-archive/2021-
April/msg00784.html

where I said:

It's not lost on me that we're talking about ~3 instructions.

Let's keep a close eye on further "polymorphisms" that would require
hacks.

The first 9 slides in the presentation introduce much-much more
intrusive problems than the "polymorphism" of the reset vector. Would I
be correct to say that my concern in the above messages was right? I
think I was only given a fraction of the necessary information when I
was asked "do you agree 'one binary' is viable".

[response from Erdem Aktas:]
Let's not worry about this for now. We want the one binary solution
for practical reasons and also for simplicity. In the end, we want to
do what is right and good for everyone.
Those are legit concerns and I think what Intel is trying to do (sorry
for mind reading) is to discuss all those concerns and questions to
make the right decision. I really appreciate their effort on preparing
those slides and bringing it to the community to review.

I will also read your comments more carefully and provide my thoughts
on them.
Sorry for being a little slow on this.

*** Slide 10 -- Key impact to firmware

(7) "CPUs are left running in long mode after exiting firmware" -- what
kind of "pen" are we talking about? Does a HLT loop still work?

(8) "I/O, MMIO, MSR accesses are different" -- those are implemented by
low-level libraries in edk2; how can they be configured dynamically?

... Back from slide 53: I'll comment on slide 53 separately; ignore
this.


*** Slide 11 -- TDVF Image (1)

(9) CFV -- Configuration Firmware Volume (VarStore.fdf.inc),
containing SB keys -- how is this firmware volume populated (at build
time)? Is this a hexdump?

... Back from slide 16: it seems like CFV is a raw hexdump indeed; how
is that managed when keys change (at build time)?

(10) This slide (slide 11) basically describes an intrusive
reorganization of "OvmfPkgX64.fdf". I don't think I can agree to that.
While confidential computing is important, it is not relevant for many
users. Even if we don't cause outright regressions for existent setups,
the maintenance cost of the traditional OVMF platform will skyrocket.

The big bunch of areas that SEV-ES introduced to MEMFD is already a big
complication. I'd feel much better if we could isolate all that to a
dedicated "remote attested boot" firmware platform, and not risk the
functionality and maintenance of the traditional platform. I think this
ties in with my comment (1).

For example, seeing a configuration firmware volume (CFV) with secure
boot keys embedded, in the "usual" FDF, will confuse lots of people, me
included. In the traditional OVMF use case, we use a different method:
namely OvmfPkg/EnrollDefaultKeys, for "priming" a variable store
template, in the build environment.

Edk2 (and PI and UEFI) are definitely flexible enough for accommodating
TDX, but the existent, traditional OVMF platforms are a bad fit. In my
opinion.


*** Slide 12: TDVF Image (2)

(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX support
in small, well-isolated layers. 5-level paging is not enabled (has never
been tested, to my knowledge) with OVMF on QEMU/KVM, regardless of
confidential computing, for starters. If 5-level paging is a strict
requirement for TDX, then it arguably needs to be implemented
independently of TDX, at first. So that the common edk2 architecture be
at least testable on QEMU/KVM with 5-level paging enabled.


*** Slide 13:

(12) My comment is that the GUID-ed structure chain already starts at a
fixed GPA (the "AMD SEV option"). Ordering between GUID-ed structures is
irrelevant, so any TDX-specific structures should be eligible for
appending to, prepending to, or intermixing with, other (possibly
SEV-specific) structures. There need not be a separate entry point, just
different GUIDS.

(13) Regarding "4G-0x20[0x10] is OVMF AP reset vector (used in OVMF
implementation)" -- I think this is a typo: this "AP reset vector" is
*not* used in OVMF. To my knowledge, it is a vestige from the UefiCpuPkg
reset vector. In OVMF, APs are booted via MpInitLib (in multiple
firmware phases), using INIT-SIPI-SIPI, and the reset vector for the
APs, posited through those IPIs, is prepared in low RAM.


*** Slides 14 through 16:

I consider these TDVF firmware image internals, implementation details
-- do whatever you need to do, just don't interfere with existing
platforms / use cases. See my comment (10) above.


*** Slides 17-21:

(14) Again, a very big difference from traditional OVMF: APs never enter
SEC in traditional OVMF. I assume this new functionality is part of
TdxStartupLib (from slide 18) -- will there be a Null instance of that?

Last week I posted a 43-part patch series to edk2-devel, for separating
out the dynamic Xen enlightenments from the IA32, IA32X64, X64
platforms, in favor of the dedicated OvmfXen platform. TDX seems to
bring in incomparably more complications than Xen, and the OvmfPkg
maintainers have found even the Xen complications troublesome in the
long term.

If I had had access to all this information when we first discussed "one
binary" on the mailing list, I'd have never agreed to "one binary". I'm
OK with attempting one firmware binary for "confidential computing", but
that "one platform" cannot be "OvmfPkgX64.dsc".

Even if I make a comparison with just the "technology" (not the
remotely-attested deployment) of SEV and SEV-ES, as it is included in
"OvmfPkgX64.dsc", TDX is hugely more complicated and intrusive than
that. SEV proved possible to integrate into existing modules, into the
existing boot flow, maybe through the addition of some new drivers (such
as a new IOMMU protocol implementation, and some "clever" depexes).
But
we never had to restructure the FD layout, eliminate whole firmware
phases, or think about multiprocessing in the reset vector or the SEC
phase.

In order to bring an example from the ARM world, please note that
platforms that use a PEI phase, and platforms that don't, are distinct
platforms. In ArmVirtPkg, two examples are ArmVirtQemu and
ArmVirtQemuKernel. The latter does not include the PEI Core.


*** Slides 22 through 34:

(15) All these extra tasks and complications are perfectly fine, as long
as they exist peacefully *separately* from the traditional ("legacy")
OVMF platforms.

Honestly, in the virtual world, picking your firmware binary is easy.
The approach here reminds me of a physical firmware binary that includes
everything possible from "edk2-platforms/Features/Intel", just so it can
be deployed to any physical board imaginable. That's not how Intel
builds physical firmware, right? We have "edk2-platforms/Platform/Intel"
and "edk2-platforms/Silicon/Intel" with many-many separate DSC files.


*** Slide 35-36: DXE phase

(16) "Some DXE Drivers not allowed to load/start in Td guest -- Network
stack, RNG, ..."

Same comment as (several times) above. The Linuxboot project is a good
example for eliminating cruft from DXEFV (in fact, for eliminating most
of the DXE phase). In a TDX environment, why include drivers in the
firmware binary that are never used? Meanwhile, DXEFV in OVMF grows by
a
MB every 1.5 years or so. Again, remove these drivers from the DSC/FDF
then, and it needs to be a separate platform.

(17) "Other DXE Phase drivers -- [...] AcpiPlatformDxe"

I'm not sure what this section is supposed to mean. Other DXE phase
drivers included, or excluded? Without AcpiPlatformDxe, the guest OS
will not see QEMU's ACPI content, and will almost certainly malfunction.

... Back from slide 48: ignore this for now, I'll comment in more detail
later.


*** Slide 37: DXE Core

(18) says "SMM is not supported in Td guest" -- how is the variable
store protected from direct hardware (pflash) access from the guest OS?
Without SMM, the guest OS need not go through gRT->SetVariable() to
update authenticated non-volatile UEFI variables, and that undermines
Secure Boot.

Note that, while SEV-ES has the same limitation wrt. SMM, the
"OvmfPkg/AmdSev/AmdSevX64.dsc" platform doesn't even include the
Secure
Boot firmware feature. For another example, the OVMF firmware binary in
RHEL that's going to support SEV-ES is such a build of "OvmfPkgX64.dsc"
that does not include the Secure Boot feature either.

But in this TDX slide deck, Secure Boot certificates are embedded in the
CFV (configuration firmware volume) -- see slide 11 and slide 16 --,
which suggests that this platform does want secure boot.

... Back from slide 48: I'm going to make *additional* comments on this,
when I'm at slide 48, too.

The rest of this slide (slide 37) looks reasonable (generic DXE Core
changes -- possibly PI spec changes too).


*** Slides 38 through 39:

These seem reasonable (TdxDxe assumes some responsibilities of
OvmfPkg/PlatformPei)


*** Slides 40 through 42:

*If* you really can implement TDX support for the IOMMU driver *this*
surgically, then I'm OK with it. The general logic in the IOMMU driver
was truly difficult to write, and I'd be seriously concerned if those
parts would have to be modified. Customizing just the page
encryption/decryption primitives for TDX vs. SEV is OK.


*** Slides 43 through 47:

(19) Slide 46 and slide 47 are almost identical. Please consolidate them
into a single slide.

(20) the TPM2 infrastructure in edk2 is baroque (over-engineered), in my
opinion. It has so many layers that I can never keep them in mind. When
we added TPM support to OVMF, I required commit messages that would
help
us recall the layering. In particular, please refer to commit
0c0a50d6b3ff ("OvmfPkg: include Tcg2Dxe module", 2018-03-09). Here's an
excerpt:

TPM 2 consumer driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibTcg2 instance
|
v
TCG2 protocol interface
|
v
TCG2 protocol provider: Tcg2Dxe.inf driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibRouter instance
|
v
NULL class: Tpm2InstanceLibDTpm instance
(via earlier registration)
|
v
TPM2 chip (actual hardware)

The slide deck says that EFI_TD_PROTOCOL is supposed to reuse the
EFI_TCG2_PROTOCOL definition. If that's the case, then why don't we push
the TDX specifics (more or less: the replacement of PCRs with RTMR) down
to the lowest possible level?

Can we have "Tpm2InstanceLibTdxRtmr", plugged into the same Tcg2Dxe.inf
driver?

If not, can we have a new TdTcg2Dxe.inf driver, but make it so that it
install the same protocol as before (EFI_TCG2_PROTOCOL -- same old
protocol GUID)? Then DxeTpmMeasurementLib doesn't have to change.

As long as there is *at most* one EFI_TCG2_PROTOCOL instance published
in the protocol database, DxeTpmMeasurementLib should be fine. In SEV*
guests, the standard Tcg2Dxe driver provides that protocol. In TDX
guests, TdTcg2Dxe.inf should provide the protocol. Arbitration between
the two can be implemented with the pattern seen in the following
commits:

1 05db0948cc60 EmbeddedPkg: introduce EDKII Platform Has ACPI GUID
2 786f476323a6 EmbeddedPkg: introduce PlatformHasAcpiLib
3 65a69b214840 EmbeddedPkg: introduce EDKII Platform Has Device Tree
GUID
4 2558bfe3e907 ArmVirtPkg: add PlatformHasAcpiDtDxe

The basic idea is that Tcg2Dxe can have a depex on a new "running in
SEV" protocol GUID, and TdTcg2Dxe can have a depex on a new "running in
TDX" protocol GUID. A separate platform driver can install the proper
GUID -- possibly *neither* of those GUIDs.

And, we don't have to change the depex section of
"SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.inf" for this; we can implement a
library instance with an empty constructor, but a non-empty depex, and
then hook this lib instance into Tcg2Dxe.inf via module scope NULL lib
class override in the DSC file. Basically we could forcibly restrict
Tcg2Dxe's DEPEX by making it inherit the new DEPEX from the library.


*** Slide 48: DXE Phase -- Other Modules

Regarding IncompatiblePciDeviceSupportDxe: the proposed update sounds
plausible and simple enough.

(21) "AcpiPlatformDxe: Support for MADT/ACPI addition to report Td
Mailbox entry"

Firmware-owned tables must not be installed from this driver.

Please refer to my "Xen removal" patch set again, for
<https://bugzilla.tianocore.org/show_bug.cgi?id=2122>, which I mention
above in point (14). As part of the Xen removal, the AcpiPlatformDxe
driver in OvmfPkg is significantly trimmed: all unused (dead) cruft is
removed, including any ACPI table templates that are built into the
firmware.

OvmfPkg/AcpiPlatformDxe is responsible *solely* for implementing the
client side of the QEMU ACPI linker/loader.

If you need to prepare & install different ACPI tables, please do it
elsewhere, in another DXE driver. A bunch of other firmware modules do
that (NFIT, IBFT, BGRT, ...).

For example, the OvmfPkg/TdxDxe DXE_DRIVER is supposed to be launched
early in the DXE phase, via APRIORI section -- please consider
registering a protocol notify in that driver, for
EFI_ACPI_TABLE_PROTOCOL, and when it becomes available, install
whatever
*extra* tables you need.

Note that if you need to *customize* an ACPI table that QEMU already
provides, then you will have to modify the ACPI generator on the QEMU
side. It is a design tenet between QEMU and OVMF that OVMF include no
business logic for either parsing or fixing up ACPI tables that QEMU
provides. AcpiPlatformDxe contains the minimum (which is already a whole
lot, unfortunately) that's necessary for implementing the QEMU ACPI
linker/loader client in the UEFI environment.

The slide deck mentions MADT, which is also known as the "APIC" table --
and indeed, QEMU does provide that. (See acpi_build_madt()
[hw/i386/acpi-common.c].) So if TDX needs MADT customizations, that
should go into QEMU.

(22) EmuVariableFvbRuntimeDxe

Ouch, this is an unpleasant surprise.

First, if you know for a fact that pflash is not part of the *board* in
any TDX setup, then pulling

OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbServicesRuntimeDxe.inf

into the firmware platform is useless, as it is mutually exclusive with

OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.inf

(via dynamic means -- a dynamic PCD).

Note that the FDF file places QemuFlashFvbServicesRuntimeDxe in APRIORI
DXE when SMM_REQUIRE is FALSE. This driver checks for pflash presence,
and lets EmuVariableFvbRuntimeDxe perform its in-RAM flash emulation
only in case pflash is not found.

So this is again in favor of a separate platform -- if we know pflash is
never part of the board, then QemuFlashFvbServicesRuntimeDxe is never
needed, but you cannot remove it from the traditional DSC/FDF files.

Second, EmuVariableFvbRuntimeDxe consumes the PlatformFvbLib class,
for
the PlatformFvbDataWritten() API (among other things). This lib class is
implemented by two instances in OvmfPkg, PlatformFvbLibNull and
EmuVariableFvbLib. The latter instance allows Platform BDS to hook an
event (for signaling) via "PcdEmuVariableEvent" into the
EmuVariableFvbRuntimeDxe driver.

In old (very old) QEMU board configurations, namely those without
pflash, this (mis)feature is used by OVMF's PlatformBootManagerLib to
write out all variables to the EFI system partition in a regular file
called \NvVars, with the help of NvVarsFileLib, whenever
EmuVariableFvbRuntimeDxe writes out an emulated "flash" block. For this
purpose, the traditional OVMF DSC files link EmuVariableFvbLib into
EmuVariableFvbRuntimeDxe.

But it counts as an absolute disaster nowadays, and should not be
revived in any platform. If you don't have pflash in TDX guests, just
accept that you won't have non-volatile variables. And link
PlatformFvbLibNull into EmuVariableFvbRuntimeDxe. You're going to need
a
separate PlatformBootManagerLib instance anyway.

(We should have removed EmuVariableFvbRuntimeDxe a long time ago
from
the traditional OVMF platforms, i.e. made pflash a hard requirement,
even when SMM is not built into the platform -- but whenever I tried
that, Jordan always shot me down.)

My point is: using EmuVariableFvbRuntimeDxe in the TDX platform may be
defensible per se, but we must be very clear that it will never provide
a standards-conformant service for non-volatile UEFI variables, and we
must keep as much of the \NvVars mess out of EmuVariableFvbRuntimeDxe
as
possible. This will require a separate PlatformBootManagerLib instance
for TDX anyway (or maybe share PlatformBootManagerLibGrub with
"OvmfPkg/AmdSev/AmdSevX64.dsc").

Apart from the volatility aspect, let's assume we have this in-RAM
emulated "flash device", storing authenticated UEFI variables for Secure
Boot purposes. And we don't have SMM.

What protects this in-RAM variable store from tampering by the guest OS?
It's all guest RAM only, after all. What provides the privilege barrier
between the guest firmware and the guest OS?


*** Slide 50: Library

(23) Should we introduce Null instances for all (or most) new lib
classes here? Code size is a concern (static linking). If we extend a
common OvmfPkg module with new hook points, it's one thing to return
from that hook point early *dynamically*, but it's even better (given
separate platforms) to allow the traditional platform firmware to use a
Null lib instance, to cut out all the dead code statically.


*** Slides 51 through 52

Seems OK.


*** Slide 53:

(24) It might be worth noting that BaseIoLibIntrinsic already has some
SEV enlightenment, as the FIFO IO port operations (which normally use
the REP prefix) are not handled on SEV. I don't have an immediate idea
why this might matter, we should just minimize code duplication if
possible.


*** Slides 54-56:

No comments, this stuff seems reasonable.


*** Slide 57: MpInitLib

I don't know enough to give a summary judgement.


All in all, I see the controversial / messy parts in the platform
bringup, and how all that differs from the traditional ("legacy") OVMF
platforms. I admit I *may* be biased in favor of SEV, possibly because
SEV landed first -- if you find signs of such a bias in my comments,
please don't hesitate to debunk those points. Yet my general impression
is that the early bringup stuff is significantly different from
everything before, and because of this, a separate platform is
justified.

Definitely separate from the traditional OVMF IA32, IA32X64, and X64
platforms, and *possibly* separate from the "remote attestation"
AmdSevX64.dsc platform. I would approach the TDX feature-set in complete
isolation (exactly how Intel commenced the work, if I understand
correctly), modulo obviously shareable / reusable parts, and then slowly
& gradually work on extracting / refactoring commonalities.

(But, given my stance on Xen for example, I could disagree even with the
latter, retroactive kind of unification -- it all boils down to shared
developer and user base. Component sharing should reflect the community
structure, otherwise maintenance will be a nightmare.)

Thanks
Laszlo









Michael Brown <mcb30@...>
 

On 04/06/2021 11:11, Laszlo Ersek wrote:
And, to reiterate, just because Confidential Computing is the
new hot thing, the use cases for OvmfPkgIa32, OvmfPkgIa32X64, OvmfPkgX64
do not disappear. Regressing them, or making them unmaintainable due to
skyrocketing complexity, is not acceptable.
Totally agree with this. Confidential Computing is a very niche use case, and there is no justification for exploding the complexity of the standard OVMF build.

If, several years from now, it ever reaches the point that the majority of real-world workloads are using TDX, then there would be an argument that the complexity cost has to be paid and that the standard OVMF build should include TDX features. But that's several years away and may never happen.

Michael


Michael Brown <mcb30@...>
 

On 04/06/2021 11:43, Michael Brown wrote:
On 04/06/2021 11:11, Laszlo Ersek wrote:
And, to reiterate, just because Confidential Computing is the
new hot thing, the use cases for OvmfPkgIa32, OvmfPkgIa32X64, OvmfPkgX64
do not disappear. Regressing them, or making them unmaintainable due to
skyrocketing complexity, is not acceptable.
Totally agree with this.  Confidential Computing is a very niche use case, and there is no justification for exploding the complexity of the standard OVMF build.
If, several years from now, it ever reaches the point that the majority of real-world workloads are using TDX, then there would be an argument that the complexity cost has to be paid and that the standard OVMF build should include TDX features.  But that's several years away and may never happen.
Out of interest: does Intel TDX provide any security benefits beyond the (much simpler) Intel SGX?

As far as I can tell from the various papers, the fundamental difference between TDX and SGX seems to be that TDX deliberately increases the attack surface from "just the application code" to "entire guest VM, including OS kernel, runtime libraries, etc". Increasing the attack surface while adding complexity is a huge cost so I'm assuming that there must be some commensurate benefit, but nothing in the documentation I've seen seems to describe what this benefit actually is.

Thanks,

Michael


James Bottomley <jejb@...>
 

On Fri, 2021-06-04 at 15:52 +0100, Michael Brown wrote:
On 04/06/2021 11:43, Michael Brown wrote:
On 04/06/2021 11:11, Laszlo Ersek wrote:
And, to reiterate, just because Confidential Computing is the
new hot thing, the use cases for OvmfPkgIa32, OvmfPkgIa32X64,
OvmfPkgX64 do not disappear. Regressing them, or making them
unmaintainable due to skyrocketing complexity, is not acceptable.
Totally agree with this. Confidential Computing is a very niche
use case, and there is no justification for exploding the
complexity of the standard OVMF build.

If, several years from now, it ever reaches the point that the
majority of real-world workloads are using TDX, then there would be
an argument that the complexity cost has to be paid and that the
standard OVMF build should include TDX features. But that's
several years away and may never happen.
Out of interest: does Intel TDX provide any security benefits beyond
the (much simpler) Intel SGX?
The main benefit is ease of deployment for unmodified applications.
While you might argue this isn't a "security" benefit, remember that
any security technology that is too complex for most people to deploy
doesn't have much impact, so ease of use is a significant consideration
in security technologies.

As far as I can tell from the various papers, the fundamental
difference between TDX and SGX seems to be that TDX deliberately
increases the attack surface from "just the application code" to
"entire guest VM, including OS kernel, runtime libraries,
etc". Increasing the attack surface while adding complexity is a
huge cost so I'm assuming that there must be some commensurate
benefit, but nothing in the documentation I've seen seems to describe
what this benefit actually is.
The big problems of enclave technology like SGX is rewriting
applications into secure and insecure parts and controlling information
leak across the boundaries of the enclave ... even if you opt to run
the application entirely within the enclave, you still get leaks into
the kernel via syscalls and the machine owner still has a huge amount
of leeway to exfiltrate any secrets in the enclave.

The push towards VM based isolation is because all the handling of the
technology can be done inside an enlightened guest kernel (so any
application will run with no modification) and the guest to host
boundary is a far easier to analyse being a hardware emulation
vmexit/hypercall one rather than the huge and complex syscall
interface.

James


Min Xu <min.m.xu@...>
 

On June 4, 2021 12:12 AM, Laszlo wrote:
On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the feedback
inserted at the proper spots that has been given in the off-list thread since
then:
Continue my comments from here.

*** Slide 11 -- TDVF Image (1)

(9) CFV -- Configuration Firmware Volume (VarStore.fdf.inc), containing SB
keys -- how is this firmware volume populated (at build time)? Is this a
hexdump?
CFV is populated in post build. We can provide such python scripts to do the
SB keys enrollment.

... Back from slide 16: it seems like CFV is a raw hexdump indeed; how is that
managed when keys change (at build time)?
As I mentioned above, SB keys are enrolled in post build phase. We can provide
a python scripts to add/delete/append the keys.

(10) This slide (slide 11) basically describes an intrusive reorganization of
"OvmfPkgX64.fdf". I don't think I can agree to that.
While confidential computing is important, it is not relevant for many users.
Even if we don't cause outright regressions for existent setups, the
maintenance cost of the traditional OVMF platform will skyrocket.

The big bunch of areas that SEV-ES introduced to MEMFD is already a big
complication. I'd feel much better if we could isolate all that to a dedicated
"remote attested boot" firmware platform, and not risk the functionality and
maintenance of the traditional platform. I think this ties in with my comment
(1).
Actually in our first version of TDVF, it is a separated dsc/fdf. But when I try to
implement the *one binary*, I have to figure out some way to put our mailbox/tdhob.
I checked the OvmfPkgX64.fdf and mimics what SEV-ES does in MEMFD.
I would wait for a conclusion of the *one binary* and then figure out how to
handle the mailbox/tdhob.

For example, seeing a configuration firmware volume (CFV) with secure boot
keys embedded, in the "usual" FDF, will confuse lots of people, me included.
In the traditional OVMF use case, we use a different method:
namely OvmfPkg/EnrollDefaultKeys, for "priming" a variable store template,
in the build environment.
As I mentioned above, the SB keys are enrolled in post-build. The standard build
script:
build -p OvmfPkg/OvmfPkgX64.dsc -a X64 -t GCC5
Its output is a standard OVMF image (with a clean CFV/VarStore)
If the customers want the SB feature configured, it's up to them to enroll the SB
keys.
CFV is just a concept in TDVF. From the perspective of Standard OVMF, it is still
the VarStore.


Edk2 (and PI and UEFI) are definitely flexible enough for accommodating TDX,
but the existent, traditional OVMF platforms are a bad fit. In my opinion.


*** Slide 12: TDVF Image (2)

(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX support in
small, well-isolated layers. 5-level paging is not enabled (has never been
tested, to my knowledge) with OVMF on QEMU/KVM, regardless of
confidential computing, for starters. If 5-level paging is a strict requirement
for TDX, then it arguably needs to be implemented independently of TDX, at
first. So that the common edk2 architecture be at least testable on
QEMU/KVM with 5-level paging enabled.
Yes, 5-level paging is a strict requirement for TDX.
I would wait for the conclusion of the *one binary*.

*** Slide 13:

(12) My comment is that the GUID-ed structure chain already starts at a fixed
GPA (the "AMD SEV option"). Ordering between GUID-ed structures is
irrelevant, so any TDX-specific structures should be eligible for appending to,
prepending to, or intermixing with, other (possibly
SEV-specific) structures. There need not be a separate entry point, just
different GUIDS.
Yes, we prefer a TDX GUID in ResetVector. In that GUID there is a offset which
points to the actual TDX Metadata blob.

(13) Regarding "4G-0x20[0x10] is OVMF AP reset vector (used in OVMF
implementation)" -- I think this is a typo: this "AP reset vector" is
*not* used in OVMF. To my knowledge, it is a vestige from the UefiCpuPkg
reset vector. In OVMF, APs are booted via MpInitLib (in multiple firmware
phases), using INIT-SIPI-SIPI, and the reset vector for the APs, posited
through those IPIs, is prepared in low RAM.
Thanks Laszlo for explanation.

*** Slides 14 through 16:

I consider these TDVF firmware image internals, implementation details
-- do whatever you need to do, just don't interfere with existing platforms /
use cases. See my comment (10) above.
Sure. All the TDVF changes will not interfere with existing platfomrs/use cases.

*** Slides 17-21:

(14) Again, a very big difference from traditional OVMF: APs never enter SEC
in traditional OVMF. I assume this new functionality is part of TdxStartupLib
(from slide 18) -- will there be a Null instance of that?
Yes, there is a NULL instance of TdxStartupLib.


Last week I posted a 43-part patch series to edk2-devel, for separating out
the dynamic Xen enlightenments from the IA32, IA32X64, X64 platforms, in
favor of the dedicated OvmfXen platform. TDX seems to bring in
incomparably more complications than Xen, and the OvmfPkg maintainers
have found even the Xen complications troublesome in the long term.

If I had had access to all this information when we first discussed "one
binary" on the mailing list, I'd have never agreed to "one binary". I'm OK with
attempting one firmware binary for "confidential computing", but that "one
platform" cannot be "OvmfPkgX64.dsc".

Even if I make a comparison with just the "technology" (not the remotely-
attested deployment) of SEV and SEV-ES, as it is included in
"OvmfPkgX64.dsc", TDX is hugely more complicated and intrusive than that.
SEV proved possible to integrate into existing modules, into the existing boot
flow, maybe through the addition of some new drivers (such as a new
IOMMU protocol implementation, and some "clever" depexes). But we never
had to restructure the FD layout, eliminate whole firmware phases, or think
about multiprocessing in the reset vector or the SEC phase.

In order to bring an example from the ARM world, please note that platforms
that use a PEI phase, and platforms that don't, are distinct platforms. In
ArmVirtPkg, two examples are ArmVirtQemu and ArmVirtQemuKernel. The
latter does not include the PEI Core.
Thanks Laszlo. I will check the example from the ARM world.


*** Slides 22 through 34:

(15) All these extra tasks and complications are perfectly fine, as long as they
exist peacefully *separately* from the traditional ("legacy") OVMF platforms.

Honestly, in the virtual world, picking your firmware binary is easy.
The approach here reminds me of a physical firmware binary that includes
everything possible from "edk2-platforms/Features/Intel", just so it can be
deployed to any physical board imaginable. That's not how Intel builds
physical firmware, right? We have "edk2-platforms/Platform/Intel"
and "edk2-platforms/Silicon/Intel" with many-many separate DSC files.
I will continue my comments in my next mail.

Thanks!
Min


Min Xu <min.m.xu@...>
 

On June 4, 2021 12:12 AM, Laszlo wrote:
On 06/03/21 15:51, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11) is
now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429



You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
Resending my earlier comments in this mailing list thread, with the feedback
inserted at the proper spots that has been given in the off-list thread since
then:
Continue my comments from here.

*** Slide 35-36: DXE phase

(16) "Some DXE Drivers not allowed to load/start in Td guest -- Network
stack, RNG, ..."

Same comment as (several times) above. The Linuxboot project is a good
example for eliminating cruft from DXEFV (in fact, for eliminating most of the
DXE phase). In a TDX environment, why include drivers in the firmware binary
that are never used? Meanwhile, DXEFV in OVMF grows by a MB every 1.5
years or so. Again, remove these drivers from the DSC/FDF then, and it needs
to be a separate platform.
It is because of the *one binary* so that we have to include all the drivers in the
firmware binary. Some of the DXE drivers are not called in TDX, but they're dispatched
in Non-Tdx environment.
I will explain more about this topic in next version of design slides.
Also I would wait for the conclusion to the *one binary*.

(17) "Other DXE Phase drivers -- [...] AcpiPlatformDxe"

I'm not sure what this section is supposed to mean. Other DXE phase drivers
included, or excluded? Without AcpiPlatformDxe, the guest OS will not see
QEMU's ACPI content, and will almost certainly malfunction.
Sorry for the confusion about the title. This slides means Other DXE phase drivers
including AcpiPlatformDxe is included. I will update the slides to be more clear
and accurate.

... Back from slide 48: ignore this for now, I'll comment in more detail later.


*** Slide 37: DXE Core

(18) says "SMM is not supported in Td guest" -- how is the variable store
protected from direct hardware (pflash) access from the guest OS?
Without SMM, the guest OS need not go through gRT->SetVariable() to
update authenticated non-volatile UEFI variables, and that undermines
Secure Boot.
Let me explain the SMM and Secure boot in TDX like below:
1) TDX doesn't support virtual SMM in guest. Virtual SMI cannot be injected
into TD guest.
2) SMI/SMM is used to manage variable update to avoid expose Flash direct.
So SMM is not must-to-have for secure boot, but help to mitigate the security risk.
3) We don't trust VMM. That is why we need TDX.
4) If you trust VMM to emulate SMM, then you don't need TDX.

Note that, while SEV-ES has the same limitation wrt. SMM, the
"OvmfPkg/AmdSev/AmdSevX64.dsc" platform doesn't even include the
Secure Boot firmware feature. For another example, the OVMF firmware
binary in RHEL that's going to support SEV-ES is such a build of
"OvmfPkgX64.dsc"
that does not include the Secure Boot feature either.

But in this TDX slide deck, Secure Boot certificates are embedded in the CFV
(configuration firmware volume) -- see slide 11 and slide 16 --, which
suggests that this platform does want secure boot.
Yes, TDVF does support Secure Boot.

... Back from slide 48: I'm going to make *additional* comments on this,
when I'm at slide 48, too.

The rest of this slide (slide 37) looks reasonable (generic DXE Core changes --
possibly PI spec changes too).
Yes, there is new attribute (EFI_RESOURCE_ATTRIBUTE_ENCRYPTED) in PiHob.h



*** Slides 38 through 39:

These seem reasonable (TdxDxe assumes some responsibilities of
OvmfPkg/PlatformPei)


*** Slides 40 through 42:

*If* you really can implement TDX support for the IOMMU driver *this*
surgically, then I'm OK with it. The general logic in the IOMMU driver was
truly difficult to write, and I'd be seriously concerned if those parts would
have to be modified. Customizing just the page encryption/decryption
primitives for TDX vs. SEV is OK.
Actually all the changes we do in IOMMU is to customize the page encryption
/ decryption primitive for TDX and SEV. We will probe the Td or Non-Td in
run-time, then the corresponding APIs will be called.



*** Slides 43 through 47:

(19) Slide 46 and slide 47 are almost identical. Please consolidate them into a
single slide.
Slide 46 is of Td measurement, like TpmMeasurement.
SecurityPkg/Library/DxeTpmMeasurementLib
Slide 47 is of Measure boot.
SecurityPkg/Library/DxeTpm2MeasureBootLib
I will refine the two slides to make it more clear. Thanks for reminder.

(20) the TPM2 infrastructure in edk2 is baroque (over-engineered), in my
opinion. It has so many layers that I can never keep them in mind. When we
added TPM support to OVMF, I required commit messages that would help us
recall the layering. In particular, please refer to commit 0c0a50d6b3ff
("OvmfPkg: include Tcg2Dxe module", 2018-03-09). Here's an
excerpt:

TPM 2 consumer driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibTcg2 instance
|
v
TCG2 protocol interface
|
v
TCG2 protocol provider: Tcg2Dxe.inf driver
|
v
Tpm2DeviceLib class: Tpm2DeviceLibRouter instance
|
v
NULL class: Tpm2InstanceLibDTpm instance
(via earlier registration)
|
v
TPM2 chip (actual hardware)

The slide deck says that EFI_TD_PROTOCOL is supposed to reuse the
EFI_TCG2_PROTOCOL definition. If that's the case, then why don't we push
the TDX specifics (more or less: the replacement of PCRs with RTMR) down
to the lowest possible level?

Can we have "Tpm2InstanceLibTdxRtmr", plugged into the same Tcg2Dxe.inf
driver?

If not, can we have a new TdTcg2Dxe.inf driver, but make it so that it install
the same protocol as before (EFI_TCG2_PROTOCOL -- same old protocol
GUID)? Then DxeTpmMeasurementLib doesn't have to change.

As long as there is *at most* one EFI_TCG2_PROTOCOL instance published in
the protocol database, DxeTpmMeasurementLib should be fine. In SEV*
guests, the standard Tcg2Dxe driver provides that protocol. In TDX guests,
TdTcg2Dxe.inf should provide the protocol. Arbitration between the two can
be implemented with the pattern seen in the following
commits:

1 05db0948cc60 EmbeddedPkg: introduce EDKII Platform Has ACPI GUID
2 786f476323a6 EmbeddedPkg: introduce PlatformHasAcpiLib
3 65a69b214840 EmbeddedPkg: introduce EDKII Platform Has Device Tree
GUID
4 2558bfe3e907 ArmVirtPkg: add PlatformHasAcpiDtDxe

The basic idea is that Tcg2Dxe can have a depex on a new "running in SEV"
protocol GUID, and TdTcg2Dxe can have a depex on a new "running in TDX"
protocol GUID. A separate platform driver can install the proper GUID --
possibly *neither* of those GUIDs.

And, we don't have to change the depex section of
"SecurityPkg/Tcg/Tcg2Dxe/Tcg2Dxe.inf" for this; we can implement a library
instance with an empty constructor, but a non-empty depex, and then hook
this lib instance into Tcg2Dxe.inf via module scope NULL lib class override in
the DSC file. Basically we could forcibly restrict Tcg2Dxe's DEPEX by making it
inherit the new DEPEX from the library.
This is a big topic. I need to discuss it internally first then give my comments.
Thanks for your patience.


*** Slide 48: DXE Phase -- Other Modules

Regarding IncompatiblePciDeviceSupportDxe: the proposed update sounds
plausible and simple enough.

(21) "AcpiPlatformDxe: Support for MADT/ACPI addition to report Td Mailbox
entry"

Firmware-owned tables must not be installed from this driver.

Please refer to my "Xen removal" patch set again, for
<https://bugzilla.tianocore.org/show_bug.cgi?id=2122>, which I mention
above in point (14). As part of the Xen removal, the AcpiPlatformDxe driver in
OvmfPkg is significantly trimmed: all unused (dead) cruft is removed,
including any ACPI table templates that are built into the firmware.

OvmfPkg/AcpiPlatformDxe is responsible *solely* for implementing the
client side of the QEMU ACPI linker/loader.

If you need to prepare & install different ACPI tables, please do it elsewhere,
in another DXE driver. A bunch of other firmware modules do that (NFIT, IBFT,
BGRT, ...).

For example, the OvmfPkg/TdxDxe DXE_DRIVER is supposed to be launched
early in the DXE phase, via APRIORI section -- please consider registering a
protocol notify in that driver, for EFI_ACPI_TABLE_PROTOCOL, and when it
becomes available, install whatever
*extra* tables you need.

Note that if you need to *customize* an ACPI table that QEMU already
provides, then you will have to modify the ACPI generator on the QEMU side.
It is a design tenet between QEMU and OVMF that OVMF include no business
logic for either parsing or fixing up ACPI tables that QEMU provides.
AcpiPlatformDxe contains the minimum (which is already a whole lot,
unfortunately) that's necessary for implementing the QEMU ACPI
linker/loader client in the UEFI environment.

The slide deck mentions MADT, which is also known as the "APIC" table --
and indeed, QEMU does provide that. (See acpi_build_madt()
[hw/i386/acpi-common.c].) So if TDX needs MADT customizations, that
should go into QEMU.
Thanks for reminder. We will exam the implementation of MADT/ACPI carefully.

(22) EmuVariableFvbRuntimeDxe

Ouch, this is an unpleasant surprise.

First, if you know for a fact that pflash is not part of the *board* in any TDX
setup, then pulling

OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbServicesRuntimeDxe.inf

into the firmware platform is useless, as it is mutually exclusive with

OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.inf

(via dynamic means -- a dynamic PCD).

Note that the FDF file places QemuFlashFvbServicesRuntimeDxe in APRIORI
DXE when SMM_REQUIRE is FALSE. This driver checks for pflash presence,
and lets EmuVariableFvbRuntimeDxe perform its in-RAM flash emulation only
in case pflash is not found.

So this is again in favor of a separate platform -- if we know pflash is never
part of the board, then QemuFlashFvbServicesRuntimeDxe is never needed,
but you cannot remove it from the traditional DSC/FDF files.

Second, EmuVariableFvbRuntimeDxe consumes the PlatformFvbLib class, for
the PlatformFvbDataWritten() API (among other things). This lib class is
implemented by two instances in OvmfPkg, PlatformFvbLibNull and
EmuVariableFvbLib. The latter instance allows Platform BDS to hook an event
(for signaling) via "PcdEmuVariableEvent" into the
EmuVariableFvbRuntimeDxe driver.

In old (very old) QEMU board configurations, namely those without pflash,
this (mis)feature is used by OVMF's PlatformBootManagerLib to write out all
variables to the EFI system partition in a regular file called \NvVars, with the
help of NvVarsFileLib, whenever EmuVariableFvbRuntimeDxe writes out an
emulated "flash" block. For this purpose, the traditional OVMF DSC files link
EmuVariableFvbLib into EmuVariableFvbRuntimeDxe.

But it counts as an absolute disaster nowadays, and should not be revived in
any platform. If you don't have pflash in TDX guests, just accept that you
won't have non-volatile variables. And link PlatformFvbLibNull into
EmuVariableFvbRuntimeDxe. You're going to need a separate
PlatformBootManagerLib instance anyway.

(We should have removed EmuVariableFvbRuntimeDxe a long time ago from
the traditional OVMF platforms, i.e. made pflash a hard requirement, even
when SMM is not built into the platform -- but whenever I tried that, Jordan
always shot me down.)

My point is: using EmuVariableFvbRuntimeDxe in the TDX platform may be
defensible per se, but we must be very clear that it will never provide a
standards-conformant service for non-volatile UEFI variables, and we must
keep as much of the \NvVars mess out of EmuVariableFvbRuntimeDxe as
possible. This will require a separate PlatformBootManagerLib instance for
TDX anyway (or maybe share PlatformBootManagerLibGrub with
"OvmfPkg/AmdSev/AmdSevX64.dsc").

Apart from the volatility aspect, let's assume we have this in-RAM emulated
"flash device", storing authenticated UEFI variables for Secure Boot purposes.
And we don't have SMM.

What protects this in-RAM variable store from tampering by the guest OS?
It's all guest RAM only, after all. What provides the privilege barrier between
the guest firmware and the guest OS?
Thanks Laszlo. I will carefully read your comments and discuss it internally first.


*** Slide 50: Library

(23) Should we introduce Null instances for all (or most) new lib classes here?
Code size is a concern (static linking). If we extend a common OvmfPkg
module with new hook points, it's one thing to return from that hook point
early *dynamically*, but it's even better (given separate platforms) to allow
the traditional platform firmware to use a Null lib instance, to cut out all the
dead code statically.
Yes, agree. We will introduce the NULL instance for the new libs.


*** Slides 51 through 52

Seems OK.


*** Slide 53:

(24) It might be worth noting that BaseIoLibIntrinsic already has some SEV
enlightenment, as the FIFO IO port operations (which normally use the REP
prefix) are not handled on SEV. I don't have an immediate idea why this
might matter, we should just minimize code duplication if possible.
Agree that we should minimize the code duplication if possible.

Before we start to enable IoLib for Tdx, we search out the EDK2 and find:
For BaseIoLibIntrinsicSev.inf, it is imported in OvmfPkg, such as
- OvmfPkg/OvmfXen.dsc
- OvmfPkg/Bhyve/BhyveX64.dsc
- OvmfPkg/OvmfPkgIa32.dsc
- OvmfPkg/OvmfPkgX64.dsc
- OvmfPkg/OvmfPkgIa32X64.dsc
- OvmfPkg/AmdSev/AmdSevX64.dsc

But for BaseIoLibIntrinsic.inf it seems it is not imported in any dsc in OvmfPkg.

BaseIoLibIntrinsic is the right IoLib base that we can enable TDX. But
it doesn't support SEV for the FIFO IO port operations.
BaseIoLibIntrinsicSev handles the FIFO IO on SEV. But we have an open about
the name.
Anyway we agree code duplication should be minimized.


*** Slides 54-56:

No comments, this stuff seems reasonable.


*** Slide 57: MpInitLib

I don't know enough to give a summary judgement.


All in all, I see the controversial / messy parts in the platform bringup, and
how all that differs from the traditional ("legacy") OVMF platforms. I admit I
*may* be biased in favor of SEV, possibly because SEV landed first -- if you
find signs of such a bias in my comments, please don't hesitate to debunk
those points. Yet my general impression is that the early bringup stuff is
significantly different from everything before, and because of this, a
separate platform is justified.

Definitely separate from the traditional OVMF IA32, IA32X64, and X64
platforms, and *possibly* separate from the "remote attestation"
AmdSevX64.dsc platform. I would approach the TDX feature-set in complete
isolation (exactly how Intel commenced the work, if I understand correctly),
modulo obviously shareable / reusable parts, and then slowly & gradually
work on extracting / refactoring commonalities.

(But, given my stance on Xen for example, I could disagree even with the
latter, retroactive kind of unification -- it all boils down to shared developer
and user base. Component sharing should reflect the community structure,
otherwise maintenance will be a nightmare.)

Thanks
Laszlo
Some of the comments need be discussed internally in intel first. So I cannot answer
all your comments right now.
Thanks again Laszlo for your valuable review comments!

Min


Michael Brown <mcb30@...>
 

On 06/06/2021 03:03, Min Xu wrote:
(11) "Page table should support both 4-level and 5-level page table"
As a general development strategy, I would suggest building TDX support in small, well-isolated layers. 5-level paging is not enabled (has never been tested, to my knowledge) with OVMF on QEMU/KVM, regardless of confidential computing, for starters. If 5-level paging is a strict requirement for TDX, then it arguably needs to be implemented independently of TDX, at first. So that the
common edk2 architecture be at least testable on QEMU/KVM with 5-level paging enabled.
Yes, 5-level paging is a strict requirement for TDX. I would wait for
the conclusion of the *one binary*.
The "one binary" decision isn't relevant here, is it? It would make
more sense to implement 5-level paging within the base EDK2
architecture. This would allow that feature to be tested in isolation
from TDX (and consequently tested more widely), and would reduce the
distance between standard builds and TDX builds.

Michael


Michael Brown <mcb30@...>
 

On 06/06/2021 09:52, Min Xu wrote:
On June 4, 2021 12:12 AM, Laszlo wrote:
(18) says "SMM is not supported in Td guest" -- how is the variable store
protected from direct hardware (pflash) access from the guest OS?
Without SMM, the guest OS need not go through gRT->SetVariable() to
update authenticated non-volatile UEFI variables, and that undermines
Secure Boot.
Let me explain the SMM and Secure boot in TDX like below:
1) TDX doesn't support virtual SMM in guest. Virtual SMI cannot be injected
into TD guest.
2) SMI/SMM is used to manage variable update to avoid expose Flash direct.
So SMM is not must-to-have for secure boot, but help to mitigate the security risk.
3) We don't trust VMM. That is why we need TDX.
4) If you trust VMM to emulate SMM, then you don't need TDX.
Secure Boot defines a security boundary between the firmware and the operating system: the operating system is not permitted to make arbitrary changes to firmware variables.

It sounds as though you have decided that the TDX security properties remove the need for the Secure Boot security properties. That would be a viable conclusion: if the user is able to verify that the intended workload is running in the VM (and the VM is disposable anyway) then there is not much value added by also having Secure Boot.

However, it's not valid to pretend to also include Secure Boot, knowing that there is no way to actually provide the security properties of Secure Boot.

If TDX can't support SMM (or some equivalent way for the guest *firmware* to guarantee that the ring 0 guest OS cannot make arbitrary changes to UEFI variables), then TDX cannot support Secure Boot.

Thanks,

Michael


Min Xu <min.m.xu@...>
 

On June 6, 2021 7:30 PM, Michael Brown Wrote:
On 06/06/2021 03:03, Min Xu wrote:
(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX
support in small, well-isolated layers. 5-level paging is not enabled
(has never been tested, to my knowledge) with OVMF on QEMU/KVM,
regardless of confidential computing, for starters. If 5-level paging
is a strict requirement for TDX, then it arguably needs to be
implemented independently of TDX, at first. So that the common edk2
architecture be at least testable on QEMU/KVM with 5-level paging
enabled.
Yes, 5-level paging is a strict requirement for TDX. I would wait for
the conclusion of the *one binary*.
The "one binary" decision isn't relevant here, is it? It would make more
sense to implement 5-level paging within the base EDK2 architecture. This
would allow that feature to be tested in isolation from TDX (and
consequently tested more widely), and would reduce the distance between
standard builds and TDX builds.
In our first version of TDVF, a static 5-level page table is used. It is simple and
straight forward. But for *one binary* solution, we have to consider the compatibility
with the current 4-level page table. That's why I said "I would wait for the conclusion
of the *one binary*"

Thanks for the suggestion. We will discuss the it internally first.

Michael


Laszlo Ersek
 

On 06/06/21 14:49, Xu, Min M wrote:
On June 6, 2021 7:30 PM, Michael Brown Wrote:
On 06/06/2021 03:03, Min Xu wrote:
(11) "Page table should support both 4-level and 5-level page table"

As a general development strategy, I would suggest building TDX
support in small, well-isolated layers. 5-level paging is not enabled
(has never been tested, to my knowledge) with OVMF on QEMU/KVM,
regardless of confidential computing, for starters. If 5-level paging
is a strict requirement for TDX, then it arguably needs to be
implemented independently of TDX, at first. So that the common edk2
architecture be at least testable on QEMU/KVM with 5-level paging
enabled.
Yes, 5-level paging is a strict requirement for TDX. I would wait for
the conclusion of the *one binary*.
The "one binary" decision isn't relevant here, is it? It would make more
sense to implement 5-level paging within the base EDK2 architecture. This
would allow that feature to be tested in isolation from TDX (and
consequently tested more widely), and would reduce the distance between
standard builds and TDX builds.
In our first version of TDVF, a static 5-level page table is used. It is simple and
straight forward. But for *one binary* solution, we have to consider the compatibility
with the current 4-level page table. That's why I said "I would wait for the conclusion
of the *one binary*"

Thanks for the suggestion. We will discuss the it internally first.
My primary concern with the 5-level paging is not that the core
infrastructure is absent from edk2 -- it is present alright. (Over time,
numerous issues have been found and fixed in it, but that's kind of
expected, with such a big feature.) I understand it has been in use
successfully on a number of physical platforms.

My problem is that, AFAICT, the 5-level paging infrastructure of edk2
has never been *tested* on QEMU/KVM, as a part of OVMF. I have
absolutely no idea what to expect.

The "one binary" decision is a little bit relevant:

- If 5-level paging blows up on QEMU/KVM, as a part of OVMF, then
restricting the breakage (possibly a regression even?) to the new TDX
platform is good.

- On the other hand, both 5-level paging and TDX are complex in their
own rights; developing feature sets in small isolated waves is always
best. There are going to be "bug hunts" in the TDX platform of course;
finding an *orthogonal* 5-level paging bug (anywhere in the virt stack,
for that matter) is not the greatest outcome for a supposed TDX bug hunt.

- I figure users might want 5-level paging for OVMF at some point
anyway, even without TDX.

The last two points (especially the middle point of the three) kind of
outweigh(s) the first point for me.

Thanks
Laszlo


Min Xu <min.m.xu@...>
 

On 06/04/2021 12:12 AM, Laszlo wrote:

(22) EmuVariableFvbRuntimeDxe

Ouch, this is an unpleasant surprise.

First, if you know for a fact that pflash is not part of the *board* in any TDX
setup, then pulling

OvmfPkg/QemuFlashFvbServicesRuntimeDxe/FvbServicesRuntimeDxe.inf

into the firmware platform is useless, as it is mutually exclusive with

OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.inf

(via dynamic means -- a dynamic PCD).

Note that the FDF file places QemuFlashFvbServicesRuntimeDxe in APRIORI
DXE when SMM_REQUIRE is FALSE. This driver checks for pflash presence,
and lets EmuVariableFvbRuntimeDxe perform its in-RAM flash emulation
only in case pflash is not found.
Right, pflash is not part of the *board* in any TDX setup.
This is a limitation from the Qemu (the tdx-qemu) side. TDVF is copied into
RAM. Generic loader is the one for it.
In this case (pflash is not found), FvbServiceRuntimeDxe.inf will return
EFI_WRITE_PROTECTED in its FvbInitialize().
EmuVariableFvbRuntimeDxe will then perform its in-RAM flash emulation.

So this is again in favor of a separate platform -- if we know pflash is never
part of the board, then QemuFlashFvbServicesRuntimeDxe is never needed,
but you cannot remove it from the traditional DSC/FDF files.
Yes, if TDVF is a separate DSD/FDF, QemuFlashFvbServicesRuntimeDxe will be
removed from the image.

Second, EmuVariableFvbRuntimeDxe consumes the PlatformFvbLib class, for
the PlatformFvbDataWritten() API (among other things). This lib class is
implemented by two instances in OvmfPkg, PlatformFvbLibNull and
EmuVariableFvbLib. The latter instance allows Platform BDS to hook an event
(for signaling) via "PcdEmuVariableEvent" into the
EmuVariableFvbRuntimeDxe driver.

In old (very old) QEMU board configurations, namely those without pflash,
this (mis)feature is used by OVMF's PlatformBootManagerLib to write out all
variables to the EFI system partition in a regular file called \NvVars, with the
help of NvVarsFileLib, whenever EmuVariableFvbRuntimeDxe writes out an
emulated "flash" block. For this purpose, the traditional OVMF DSC files link
EmuVariableFvbLib into EmuVariableFvbRuntimeDxe.
Thanks for the explanation. It's very helpful for me to understand the code.

But it counts as an absolute disaster nowadays, and should not be revived in
any platform. If you don't have pflash in TDX guests, just accept that you
won't have non-volatile variables. And link PlatformFvbLibNull into
EmuVariableFvbRuntimeDxe. You're going to need a separate
PlatformBootManagerLib instance anyway.
I have a question here, that if PlatformFvbLibNull is linked into EmuVaiableFvRuntimeDxe,
Does it mean it cannot write to the in-RAM variable store?


(We should have removed EmuVariableFvbRuntimeDxe a long time ago from
the traditional OVMF platforms, i.e. made pflash a hard requirement, even
when SMM is not built into the platform -- but whenever I tried that, Jordan
always shot me down.)
I am afraid in TDVF we have to use EmuVariableFvRuntimeDxe to emulate the
in-RAM, as I explained pflash is not part of the *board* in TDX setup.
In the traditional EmuVariableFvRuntimeDxe, the FV contents will be initialized.
But TDVF has its own requirement, that the SB keys in CFV need to be copied
into the FV contents. That's why EmuVariableFvRuntimeDxe is updated in
TDVF project.


My point is: using EmuVariableFvbRuntimeDxe in the TDX platform may be
defensible per se, but we must be very clear that it will never provide a
standards-conformant service for non-volatile UEFI variables, and we must
keep as much of the \NvVars mess out of EmuVariableFvbRuntimeDxe as
possible. This will require a separate PlatformBootManagerLib instance for
TDX anyway (or maybe share PlatformBootManagerLibGrub with
"OvmfPkg/AmdSev/AmdSevX64.dsc").

Apart from the volatility aspect, let's assume we have this in-RAM emulated
"flash device", storing authenticated UEFI variables for Secure Boot purposes.
And we don't have SMM.

What protects this in-RAM variable store from tampering by the guest OS?
It's all guest RAM only, after all. What provides the privilege barrier between
the guest firmware and the guest OS?
A good question and we will answer it a bit later. Thanks for your patience.

Thanks
Laszlo


Laszlo Ersek
 

On 06/08/21 14:27, Xu, Min M wrote:
On 06/04/2021 12:12 AM, Laszlo wrote:
But it counts as an absolute disaster nowadays, and should not be revived in
any platform. If you don't have pflash in TDX guests, just accept that you
won't have non-volatile variables. And link PlatformFvbLibNull into
EmuVariableFvbRuntimeDxe. You're going to need a separate
PlatformBootManagerLib instance anyway.
I have a question here, that if PlatformFvbLibNull is linked into EmuVaiableFvRuntimeDxe,
Does it mean it cannot write to the in-RAM variable store?
No, that's not the case; PlatformFvbLibNull only turns the hooks
(special callbacks) into no-ops; the normal operation of
EmuVariableFvbRuntimeDxe is not disrupted.

The APIs in the PlatformFvbLib class are PlatformFvbDataRead,
PlatformFvbDataWritten, PlatformFvbBlocksErased; they are called at the
ends of the functions FvbProtocolEraseBlocks(), FvbProtocolWrite(),
FvbProtocolRead(), in "OvmfPkg/EmuVariableFvbRuntimeDxe/Fvb.c". If the
PlatformFvb* APIs do nothing, that's not a problem for
EmuVariableFvbRuntimeDxe.

(In fact, even in the non-Null instance -- that is, in the
EmuVariableFvbLib instance --, the read and erase callbacks are empty;
and the write callback only signals an event, at best.)

(We should have removed EmuVariableFvbRuntimeDxe a long time ago from
the traditional OVMF platforms, i.e. made pflash a hard requirement, even
when SMM is not built into the platform -- but whenever I tried that, Jordan
always shot me down.)
I am afraid in TDVF we have to use EmuVariableFvRuntimeDxe to emulate the
in-RAM, as I explained pflash is not part of the *board* in TDX setup.
Using EmuVariableFvbRuntimeDxe in the TDVF platform is fine (with the
Null hooks, see above), as long as we carefully document the expected /
resultant behavior of the UEFI variable services.

(This is not a comment on my part on the SB situation, which remains an
open question for TDVF, for the time being.)

Thanks
Laszlo


James Bottomley <jejb@...>
 

On Thu, 2021-06-03 at 13:51 +0000, Yao, Jiewen wrote:
Hi, All
We plan to do a design review for TDVF in OVMF package.


The TDVF Design slides for TinaoCore Design Review Meeting (Jun 11)
is now available in blow link:
https://edk2.groups.io/g/devel/files/Designs/2021/0611.

The Bugzilla is https://bugzilla.tianocore.org/show_bug.cgi?id=3429
It looks like I'll be travelling that day, but should be able to attend
at least the first 45 minutes of the design review from the airport
lounge.

You can have an offline review first. You comments will be warmly
welcomed and we will continuously update the slides based on the
feedbacks.
On TdMailbox and TdHob, we already have two SEV pages in the MEMFD and
since TDX and SEV is an either/or, could we simply not rename both
pages and use them for either boot depending on what CPU type is
detected, so we only have two MEMFD pages, not four?

On your slide 13 Question: "Open: How will the QEMU find the metadata
location?" can't you just use the mechanism for SEV that's already
upstream in both QEMU and OVMF?

On slide 19, the mucking with the reset vector really worries me
because we don't have that much space to play with. Given that you're
starting in 32 bit mode and can thus enter anywhere in the lower 4GB,
why not simply use a different and TDX specific entry point?

I'm not quite sure why you don't have a PEI phase, since TdxStartupLib
seems effectively to be PEI.

On all the Tcg2 changes: what about installing a vTPM driver that
simply translates to your MSRs? That way we can use all the standard
TCG code as is? Plus then we could do SEV-SNP measurement through an
actual vTPM running at higher VMPL or something.

Slide 41: IOMMU operation. The implication is that you only transition
to unencrypted memory for DMA during the actual operation, so do I have
it correct that the guest writes DMA to encrypted memory, then the
iommu marks the region as unencrypted and transforms the memory to be
in the clear and then transforms it back after the DMA operation
completes? Given that SEV operates quite happily with always in the
clear DMA buffers, this seems to have the potential to be a performance
problem, but what security does it gain?

James


Laszlo Ersek
 

(Min Xu got dropped from the CC list for some reason, at *some* point in
this sub-thread! Not sure when. Re-adding him.)

Commenting on excerpts:

On 06/08/21 18:01, James Bottomley wrote:

On TdMailbox and TdHob, we already have two SEV pages in the MEMFD and
since TDX and SEV is an either/or, could we simply not rename both
pages and use them for either boot depending on what CPU type is
detected, so we only have two MEMFD pages, not four?
Great idea, in my opinion.


On your slide 13 Question: "Open: How will the QEMU find the metadata
location?" can't you just use the mechanism for SEV that's already
upstream in both QEMU and OVMF?
I think I made the same comment, in different words. (Point (12) at
<https://listman.redhat.com/archives/edk2-devel-archive/2021-June/msg00143.html>.)


On slide 19, the mucking with the reset vector really worries me
because we don't have that much space to play with. Given that you're
starting in 32 bit mode and can thus enter anywhere in the lower 4GB,
why not simply use a different and TDX specific entry point?
What's more, we should use a dedicated ResetVector (through a DSC+FDF
dedicated solely to TDX).


On all the Tcg2 changes: what about installing a vTPM driver that
simply translates to your MSRs? That way we can use all the standard
TCG code as is?
I believe I made the same comment in point (20) (see URL above).


Slide 41: IOMMU operation.
That's more like slides 40 and 42, no?


The implication is that you only transition to unencrypted memory for
DMA during the actual operation,
Yes, this is the idea behind EDKII_IOMMU_PROTOCOL, which
OvmfPkg/IoMmuDxe implements (for SEV only, currently).


so do I have it correct that the guest writes DMA to encrypted memory,
then the iommu marks the region as unencrypted and transforms the
memory to be in the clear and then transforms it back after the DMA
operation completes?
Effectively, yes. (Your summary corresponds to a BusMasterRead
operation.)


Given that SEV operates quite happily with always in the clear DMA
buffers,
I don't understand this comment -- is it a statement about SEV as a
technology, or about OvmfPkg/IoMmuDxe?

Specifically in the context of OvmfPkg/IoMmuDxe, there is no
"always-in-the-clear DMA".

EDKII_IOMMU_PROTOCOL was designed to fit cleanly into the Map(),
Unmap(), AllocateBuffer(), FreeBuffer() terminology of the UEFI standard
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL and EFI_PCI_IO_PROTOCOL. As far as I can
tell, the original use case for EDKII_IOMMU_PROTOCOL was VT-d on bare
metal, but the protocol proved a good match for SEV too.

VIRTIO_DEVICE_PROTOCOL has similar member functions
(AllocateSharedPages, FreeSharedPages, MapSharedBuffer,
UnmapSharedBuffer).

As long as a PCI device driver (or virtio driver) uses these member
functions judiciously, only "BusMasterCommonBuffer" operations will be
backed by long-term plaintext (decrypted) pages. One-shot read and write
transactions will be backed by plaintext (decrypted) pages only as long
as necessary.

The transitions you outline already happen in any plain SEV guest that
uses PCI DMA or virtio.


this seems to have the potential to be a performance problem, but what
security does it gain?
We have not experienced performance problems due to this kind of IOMMU
protocol usage, when booting SEV guests.

The basic goal was to keep everything as tightly encrypted as possible
(as permitted by the individual PCI or Virtio driver, through its
conservative usage of BusMasterCommonBuffer operations).

I won't claim that it has zero performance impact, but we should
remember the purpose that firmware serves (namely, "booting an operating
system"). Really -- I don't recall any performance issues. This applies
to such virtio devices & drivers too that aren't "bootable", such as
virtio-gpu-pci (VirtioGpuDxe) and virtio-rng-pci (VirtioRngDxe).

(

If you enable verbose logs, OvmfPkg/IoMmuDxe does produce an immense
amount of messages (with the express purpose of a human reading
through them, and matching up decryption and re-encryption actions --
I've done it, likely with some ad-hoc scripts). *This* does slow down
the boot considerably (if you actually enable the QEMU debug console),
but for a different reason: producing debug logs through the QEMU
debug console (IO Port) is very-very costly in a SEV guest. Not just
because an IO port trap may be more expensive in a SEV guest, but
because SEV does not support REP OUTSB, so every debug character
written traps separately, as opposed to every line written. See
the following commits:

- b6d11d7c4678 ("MdePkg: BaseIoLibIntrinsic (IoLib class) library",
2017-04-13),

- 97353a9c914d ("OvmfPkg: Update dsc to use IoLib from
BaseIoLibIntrinsicSev.inf", 2017-07-10),

- 98a4d04e8fda ("MdePkg/BaseIoLibIntrinsic: fix SEV (=unrolled)
variants of IoWriteFifoXX()", 2017-09-11),

- c09d9571300a ("OvmfPkg: save on I/O port accesses when the debug
port is not in use", 2017-11-17).

)

From my perspective, I find the changes proposed for OvmfPkg/IoMmuDxe to
be among the least intrusive of the whole slide deck (after Min Xu
confirmed that the intent was really only to customize the page
decryption / encryption primitives in the driver, and to leave the
general logic untouched).

That's not to say that I'm unhappy about this topic being raised. To the
contrary!

Thanks
Laszlo


Min Xu <min.m.xu@...>
 

On 06/09/2021 3:33 AM, Laszlo wrote:
(Min Xu got dropped from the CC list for some reason, at *some* point in this
sub-thread! Not sure when. Re-adding him.)

Commenting on excerpts:

On 06/08/21 18:01, James Bottomley wrote:

On TdMailbox and TdHob, we already have two SEV pages in the MEMFD and
since TDX and SEV is an either/or, could we simply not rename both
pages and use them for either boot depending on what CPU type is
detected, so we only have two MEMFD pages, not four?
Great idea, in my opinion.
Agree, it's a good idea.

On your slide 13 Question: "Open: How will the QEMU find the metadata
location?" can't you just use the mechanism for SEV that's already
upstream in both QEMU and OVMF?
I think I made the same comment, in different words. (Point (12) at
<https://listman.redhat.com/archives/edk2-devel-archive/2021-
June/msg00143.html>.)
So my understanding to this solution is that:
1) GUID-ed structure chain is started from a fixed GPA in ResetVector.
2) Append a TDX-specific GUID-ed structure in the chain
3) Qemu search the GUID-ed chain from the fixed GPA and find the TDX-specific
GUID-ed structure based on TDX-specific GUID.
Is the expected process for QEMU?


On slide 19, the mucking with the reset vector really worries me
because we don't have that much space to play with. Given that you're
starting in 32 bit mode and can thus enter anywhere in the lower 4GB,
why not simply use a different and TDX specific entry point?
What's more, we should use a dedicated ResetVector (through a DSC+FDF
dedicated solely to TDX).
If TDVF has a separate DSC/FDF, then this is not a problem. Let's wait for a
conclusion to the *one binary* solution.
Thanks for your suggestion.

On all the Tcg2 changes: what about installing a vTPM driver that
simply translates to your MSRs? That way we can use all the standard
TCG code as is?
I believe I made the same comment in point (20) (see URL above).
I will answer this comments in my later response. Thanks.

Slide 41: IOMMU operation.
That's more like slides 40 and 42, no?


The implication is that you only transition to unencrypted memory for
DMA during the actual operation,
Yes, this is the idea behind EDKII_IOMMU_PROTOCOL, which
OvmfPkg/IoMmuDxe implements (for SEV only, currently).


so do I have it correct that the guest writes DMA to encrypted memory,
then the iommu marks the region as unencrypted and transforms the
memory to be in the clear and then transforms it back after the DMA
operation completes?
Effectively, yes. (Your summary corresponds to a BusMasterRead
operation.)
In TDX there are the concepts of Private-Memory and Shared-Memory.
By default the memory are private. Before the DMA operation, the private
memory should be converted to Shared-Memory by setting the S-Bit. Then
VMM and Guest can do the DMA operation.
(The difference is that in TDX even the Shared-Memory is encrypted with
a shared-key between VMM and Guest. So we call it Shared-Memory)
So all the changes in IOMMU by TDX is just to customize the page
decryption / encryption primitives in the driver , and the general logic will not
be touched.

Given that SEV operates quite happily with always in the clear DMA
buffers,
I don't understand this comment -- is it a statement about SEV as a technology,
or about OvmfPkg/IoMmuDxe?

Specifically in the context of OvmfPkg/IoMmuDxe, there is no "always-in-the-
clear DMA".

EDKII_IOMMU_PROTOCOL was designed to fit cleanly into the Map(), Unmap(),
AllocateBuffer(), FreeBuffer() terminology of the UEFI standard
EFI_PCI_ROOT_BRIDGE_IO_PROTOCOL and EFI_PCI_IO_PROTOCOL. As far as I
can tell, the original use case for EDKII_IOMMU_PROTOCOL was VT-d on bare
metal, but the protocol proved a good match for SEV too.

VIRTIO_DEVICE_PROTOCOL has similar member functions
(AllocateSharedPages, FreeSharedPages, MapSharedBuffer,
UnmapSharedBuffer).

As long as a PCI device driver (or virtio driver) uses these member functions
judiciously, only "BusMasterCommonBuffer" operations will be backed by long-
term plaintext (decrypted) pages. One-shot read and write transactions will be
backed by plaintext (decrypted) pages only as long as necessary.

The transitions you outline already happen in any plain SEV guest that uses PCI
DMA or virtio.


this seems to have the potential to be a performance problem, but what
security does it gain?
As I mentioned above, in TDX DMA operation even the Shared-Memory is encrypted
by a shared-key between VMM and Guest. It is not *plain-text* outside VMM&Guest.

We have not experienced performance problems due to this kind of IOMMU
protocol usage, when booting SEV guests.
In our test, we don't see the performance problem when booting TDX guests.
We do have the performance issue in "Accept Pages". But we have several solutions
to resolve this performance. See below link.
https://software.intel.com/content/dam/develop/external/us/en/documents/tdx-virtual-firmware-design-guide-rev-1.pdf
Section 7.8 Optimization Consideration.

The basic goal was to keep everything as tightly encrypted as possible (as
permitted by the individual PCI or Virtio driver, through its conservative usage
of BusMasterCommonBuffer operations).

I won't claim that it has zero performance impact, but we should remember
the purpose that firmware serves (namely, "booting an operating system").
Really -- I don't recall any performance issues. This applies to such virtio
devices & drivers too that aren't "bootable", such as virtio-gpu-pci
(VirtioGpuDxe) and virtio-rng-pci (VirtioRngDxe).

(

If you enable verbose logs, OvmfPkg/IoMmuDxe does produce an immense
amount of messages (with the express purpose of a human reading
through them, and matching up decryption and re-encryption actions --
I've done it, likely with some ad-hoc scripts). *This* does slow down
the boot considerably (if you actually enable the QEMU debug console),
but for a different reason: producing debug logs through the QEMU
debug console (IO Port) is very-very costly in a SEV guest. Not just
because an IO port trap may be more expensive in a SEV guest, but
because SEV does not support REP OUTSB, so every debug character
written traps separately, as opposed to every line written. See
the following commits:

- b6d11d7c4678 ("MdePkg: BaseIoLibIntrinsic (IoLib class) library",
2017-04-13),

- 97353a9c914d ("OvmfPkg: Update dsc to use IoLib from
BaseIoLibIntrinsicSev.inf", 2017-07-10),

- 98a4d04e8fda ("MdePkg/BaseIoLibIntrinsic: fix SEV (=unrolled)
variants of IoWriteFifoXX()", 2017-09-11),

- c09d9571300a ("OvmfPkg: save on I/O port accesses when the debug
port is not in use", 2017-11-17).

)

From my perspective, I find the changes proposed for OvmfPkg/IoMmuDxe to
be among the least intrusive of the whole slide deck (after Min Xu confirmed
that the intent was really only to customize the page decryption / encryption
primitives in the driver, and to leave the general logic untouched).

That's not to say that I'm unhappy about this topic being raised. To the
contrary!

Thanks
Laszlo
Thanks
Min