Date   

Re: [PATCH v3 0/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Laszlo Ersek
 

On 1/19/23 12:01, Laszlo Ersek wrote:
Repo: https://pagure.io/lersek/edk2.git
Branch: cpuhp-reg-catch-4250-v3
Test build: https://github.com/tianocore/edk2/pull/3930
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250

v2 was posted at:
- http://mid.mail-archive.com/20230112082845.128463-1-lersek@redhat.com
- https://edk2.groups.io/g/devel/message/98336
- https://listman.redhat.com/archives/edk2-devel-archive/2023-January/057634.html

Please see the Notes sections of the patches, regarding the updates.

The "PlatformCI_OvmfPkg_Windows_VS2019_PR" checks in test build linked
above time out again (as expected); now with the following debug log:

PlatformCpuCountBugCheck: Present=0 Possible=1
PlatformCpuCountBugCheck: Broken CPU hotplug register block found. Update QEMU to version 8+, or
PlatformCpuCountBugCheck: to a stable release with commit dab30fbef389 backported. Refer to
PlatformCpuCountBugCheck: <https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.
PlatformCpuCountBugCheck: Consequences of the QEMU bug may include, but are not limited to:
PlatformCpuCountBugCheck: - all firmware logic, dependent on the CPU hotplug register block,
PlatformCpuCountBugCheck: being confused, for example, multiprocessing-related logic;
PlatformCpuCountBugCheck: - guest OS data loss, including filesystem corruption, due to crash or
PlatformCpuCountBugCheck: hang during ACPI S3 resume;
PlatformCpuCountBugCheck: - SMM privilege escalation, by a malicious guest OS or 3rd partty UEFI
PlatformCpuCountBugCheck: agent, against the platform firmware.
PlatformCpuCountBugCheck: These symptoms need not necessarily be limited to the QEMU user
PlatformCpuCountBugCheck: attempting to hot(un)plug a CPU.
PlatformCpuCountBugCheck: The firmware will now stop (hang) deliberately, in order to prevent the
PlatformCpuCountBugCheck: above symptoms.
PlatformCpuCountBugCheck: You can forcibly override the hang, *at your own risk*, with the
PlatformCpuCountBugCheck: following *experimental* QEMU command line option:
PlatformCpuCountBugCheck: -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes
PlatformCpuCountBugCheck: Please only report such bugs that you can reproduce *without* the
PlatformCpuCountBugCheck: override.
Select Item: 0x19
ASSERT d:\a\1\s\OvmfPkg\Library\PlatformInitLib\Platform.c(520): ((BOOLEAN)(0==1))
The "PlatformCI_OvmfPkg_Ubuntu_GCC5_PR" checks succed, probably due to
edk2 commits ef0916009843 ("CI: Use Fedora 35 container (Linux only)",
2023-01-17) and 7fab007f33e9 ("OvmfPkg: CI: Use Fedora 35 container
(Linux only)", 2023-01-17), and due to
<https://github.com/tianocore/containers/commit/47addc9a4f20> applying
the upstream QEMU patch.

Laszlo

Cc: Ard Biesheuvel <ardb+tianocore@...>
Cc: Brijesh Singh <brijesh.singh@...>
Cc: Erdem Aktas <erdemaktas@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: James Bottomley <jejb@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Jordan Justen <jordan.l.justen@...>
Cc: Michael Brown <mcb30@...>
Cc: Min Xu <min.m.xu@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Sebastien Boeuf <sebastien.boeuf@...>
Cc: Tom Lendacky <thomas.lendacky@...>

Laszlo Ersek (2):
OvmfPkg/PlatformInitLib: factor out PlatformCpuCountBugCheck()
OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

OvmfPkg/Library/PlatformInitLib/Platform.c | 168 +++++++++++++++++---
1 file changed, 145 insertions(+), 23 deletions(-)
Merged as the last two commits in range 51411435d559..bf5678b58026, via
<https://github.com/tianocore/edk2/pull/3935>.

Laszlo


Re: [RFC PATCH] OvmfPkg/PlatformCI VS2019: Enable temporary workaround for cpuhp bugfix

Laszlo Ersek
 

On 1/19/23 14:43, Ard Biesheuvel wrote:
QEMU for x86 has a nasty CPU hotplug bug of which the ramifications are
difficult to oversee, even though KVM acceleration seems to be
unaffected. This has been addressed in QEMU mainline, and will percolate
through the ecosystem at its usual pace. In the mean time, due to the
potential impact on production workloads, we will be updating OVMF to
abort the boot when it detects a QEMU build that is affected.

Tiancore's platform CI uses QEMU in TCG mode, and is therefore impacted
by this mitigation, unless its QEMU builds are updated. This has been
done for Ubuntu-GCC5, but Windows-VS2019 still uses a QEMU build that is
affected.

Aborting the boot upon detecting the QEMU issue will render all boot
tests carried out on Windows-VS2019 broken unless we implement the
'escape hatch' that enables proceed-at-your-own-risk mode, and permits
the boot to proceed even if the QEMU issue is detected.

So let's enable this for Windows-VS2019, and remove it again once it is
no longer needed.

Cc: Laszlo Ersek <lersek@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Michael Brown <mcb30@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Michael Kubacki <michael.kubacki@...>

Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Ard Biesheuvel <ardb@...>
---
OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml | 2 +-
OvmfPkg/PlatformCI/PlatformBuildLib.py | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
Merged as the first commit in range 51411435d559..bf5678b58026, via
<https://github.com/tianocore/edk2/pull/3935>.

Thank you!
Laszlo


Re: [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Laszlo Ersek
 

On 1/20/23 10:10, Ard Biesheuvel wrote:
On Fri, 20 Jan 2023 at 09:50, Laszlo Ersek <lersek@...> wrote:

a couple of requests to Oliver below:

On 1/19/23 12:27, Ard Biesheuvel wrote:
On Thu, 19 Jan 2023 at 12:01, Laszlo Ersek <lersek@...> wrote:

In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
protocol is (effectively) broken such that it suggests that switching from
the legacy interface to the modern interface works, but in reality the
switch never happens. The symptom has been witnessed when using TCG
acceleration; KVM seems to mask the issue. The issue persists with the
following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
there is no stable release that addresses the problem.

The QEMU bug confuses the Present and Possible counting in function
PlatformMaxCpuCountInitialization(), in
"OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
the privilege level of SMM) is not that great.

Detect the issue in PlatformCpuCountBugCheck(), and print an error message
and *hang* if the issue is present.

Users willing to take risks can override the hang with the experimental
QEMU command line option

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(The "-fw_cfg" QEMU option itself is not experimental; its above argument,
as far it concerns the firmware, is experimental.)

The problem was originally reported by Ard [0]. We analyzed it at [1] and
[2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
block", 2023-01-08), to be included in QEMU v8.0.0.

[0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3

[2] IO port write width clamping differs between TCG and KVM
http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html

[3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html

NOTE: PlatformInitLib is used in the following platform DSCs:

OvmfPkg/AmdSev/AmdSevX64.dsc
OvmfPkg/CloudHv/CloudHvX64.dsc
OvmfPkg/IntelTdx/IntelTdxX64.dsc
OvmfPkg/Microvm/MicrovmX64.dsc
OvmfPkg/OvmfPkgIa32.dsc
OvmfPkg/OvmfPkgIa32X64.dsc
OvmfPkg/OvmfPkgX64.dsc

but I can only test this change with the last three platforms, running on
QEMU.

Test results:

TCG QEMU OVMF override result
patched patched
--- ------- ------- -------- --------------------------------------
0 0 0 0 CPU counts OK (KVM masks the QEMU bug)
0 0 1 0 CPU counts OK (KVM masks the QEMU bug)
0 1 0 0 CPU counts OK (QEMU fix, but KVM masks
the QEMU bug anyway)
0 1 1 0 CPU counts OK (QEMU fix, but KVM masks
the QEMU bug anyway)
1 0 0 0 boot with broken CPU counts (original
QEMU bug)
1 0 1 0 broken CPU count caught (boot hangs)
1 0 1 1 broken CPU count caught, bug check
overridden, boot continues
1 1 0 0 CPU counts OK (QEMU fix)
1 1 1 0 CPU counts OK (QEMU fix)

Cc: Ard Biesheuvel <ardb+tianocore@...>
Cc: Brijesh Singh <brijesh.singh@...>
Cc: Erdem Aktas <erdemaktas@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: James Bottomley <jejb@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Jordan Justen <jordan.l.justen@...>
Cc: Michael Brown <mcb30@...>
Cc: Min Xu <min.m.xu@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Sebastien Boeuf <sebastien.boeuf@...>
Cc: Tom Lendacky <thomas.lendacky@...>
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Laszlo Ersek <lersek@...>
Thanks a lot for taking the time and investing the effort. I'm quite
happy that we have this 'escape hatch' now, which we could arguably
use temporarily in the VS2019 platform CI until its QEMU binary gets
updated, right?
Yes, I have to agree there.

Right now, because those QEMU binaries are affected by the regression,
and because they use TCG, OVMF already sees Present=0 Possible=1. Due to
the interference of Present=0 with the QEMU v2.7 reset bug workaround,
we also get BootCpuCount=0. Furthermore, MaxCpuCount gets set to 1, from
Possible. Thus, we exit PlatformMaxCpuCountInitialization() with
PcdCpuBootLogicalProcessorNumber=0 (from BootCpuCount) and
PcdCpuMaxLogicalProcessorNumber=1 (from MaxCpuCount).

Then, in the "predictable subset" of consequences of the QEMU
regression, we can say that MpInitLib interprets the above PCD values as
"uniprocessor system with the boot CPU count not exposed by the
platform". This (i.e., *just this*) does not fall outside of MpInitLib's
domain (again, note my qualification "predictable subset").

Now, if we apply the patch and also add the -fw_cfg switch to the
Windows CI, *and* we also don't add any -smp flags (as far as I can
tell, no -smp flag is used now), then the new PCD state will be

PcdCpuBootLogicalProcessorNumber=1 (changed from zero)
PcdCpuMaxLogicalProcessorNumber=1 (stays the same)

As far as I can tell, *right now* this change should have no effect *in
MpInitLib*, IOW nothing gets worse or better there. Namely,
PcdCpuBootLogicalProcessorNumber is only consumed in WakeUpAP(), and
only when InitFlag == ApInitConfig. InitFlag is set like that only in
CollectProcessorCount(). However, CollectProcessorCount() is only called
if PcdCpuMaxLogicalProcessorNumber is >1 (see MaxLogicalProcessorNumber
in MpInitLibInitialize()). Meaning in effect that
PcdCpuMaxLogicalProcessorNumber=1 makes PcdCpuBootLogicalProcessorNumber
irrelevant, so its change from 0 to 1 is invisible *to MpInitLib*.

Oliver:

(1) can you please post a patch for the Windows CI so that the following
option be passed to QEMU:

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(This option is harmless when the firmware does not determine the QEMU
bug, so it can be passed in advance; it will have no consequence at all.)

In the patch, please reference

https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Can I take the above as an ack on

https://edk2.groups.io/g/devel/message/98899

?

(2) Please file a separate TianoCore BZ for *backing out* the change (=
for removing the -fw_cfg switch), and assign it to yourself :)

Once the Windows CI advances to a fixed QEMU binary, the "escape hatch"
should be shut welded down.

(3) Please give me a hint when the CI patch (1) has been merged; then I
can go ahead and merge this v3 series as well.
I'll merge the whole lot once you're happy with the CI patch.
(/me checks the timestamps of messages :) my tendency to work in batches
has its downsides as well, alas. Sorry about the confusion; I'll proceed
with the merge in the other thread.)


Re: [PATCH V2 04/10] OvmfPkg/IntelTdx: Implement other helper functions in SecTdxHelperLib

Min Xu
 

On January 20, 2023 6:18 PM, Gerd Hoffmann wrote:
On Fri, Jan 20, 2023 at 08:10:45AM +0000, Yao, Jiewen wrote:
Can we define FV_HANDOFF_TABLE_POINTERS2 and
FV_HANDOFF_TABLE_POINTERS2 in
MdePkg/Include/IndustryStandard/UefiTcgPlatform.h?

[Jiewen] No. We cannot move to MdePkg.
TCG defines the field to be variable length. Something like below:

typedef struct {
UINT8 TableDescriptionSize;
UINT8 TableDescription[TableDescriptionSize];
UINT64 NumberOfTables;
EFI_CONFIGURATION_TABLE TableEntry[NumberOfTables];
} HANDOFF_TABLE_POINTERS2;

typedef struct {
UINT8 BlobDescriptionSize;
UINT8 BlobDescription[BlobDescriptionSize];
EFI_PHYSICAL_ADDRESS BlobBase;
UINT64 BlobLength;
} HANDOFF_TABLE_POINTERS2;

The implementation can choose its own length as they wish.
Why doesn't follow TDX standard TCG practices here?
As Jiewen mentioned TCG defines the field to be variable length. The implementation can choose its own length. Below are some examples.
Tcg2Pei defines its FV_HANDOFF_TABLE_POINTERS2. (https://github.com/tianocore/edk2/blob/master/SecurityPkg/Tcg/Tcg2Pei/Tcg2Pei.c#L126-L136)
SmbiosMeasurementDxe defines its SMBIOS_HANDOFF_TABLE_POINTERS2 (https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/SmbiosMeasurementDxe/SmbiosMeasurementDxe.c#L113-L123)
TcgEventLogRecordLib defines the PLATFORM_FIRMWARE_BLOB2_STRUCT and HANDOFF_TABLE_POINTERS2_STRUCT. https://github.com/tianocore/edk2/blob/master/SecurityPkg/Include/Library/TcgEventLogRecordLib.h#L14-L32

I think TDX follow the same practice above to define its own TDX_HANDOFF_TABLE_POINTERS2 and FV_HANDOFF_TABLE_POINTERS2. (FV_HANDOFF_TABLE_POINTERS2 happens to be same as the one in Tcg2Pei.) To make the definition more clear, TDX can define the name as CFV_HANDOFF_TABLE_POINTERS2.

@Gerd, Hoffmann what's your thought?

Thanks
Min


Re: [PATCH V2 04/10] OvmfPkg/IntelTdx: Implement other helper functions in SecTdxHelperLib

Gerd Hoffmann
 

On Fri, Jan 20, 2023 at 08:10:45AM +0000, Yao, Jiewen wrote:
Can we define FV_HANDOFF_TABLE_POINTERS2 and FV_HANDOFF_TABLE_POINTERS2 in MdePkg/Include/IndustryStandard/UefiTcgPlatform.h?
[Jiewen] No. We cannot move to MdePkg.
TCG defines the field to be variable length. Something like below:

typedef struct {
UINT8 TableDescriptionSize;
UINT8 TableDescription[TableDescriptionSize];
UINT64 NumberOfTables;
EFI_CONFIGURATION_TABLE TableEntry[NumberOfTables];
} HANDOFF_TABLE_POINTERS2;

typedef struct {
UINT8 BlobDescriptionSize;
UINT8 BlobDescription[BlobDescriptionSize];
EFI_PHYSICAL_ADDRESS BlobBase;
UINT64 BlobLength;
} HANDOFF_TABLE_POINTERS2;

The implementation can choose its own length as they wish.
Why doesn't follow TDX standard TCG practices here?

take care,
Gerd


Re: [RFC PATCH] OvmfPkg/PlatformCI VS2019: Enable temporary workaround for cpuhp bugfix

Ard Biesheuvel
 

On Fri, 20 Jan 2023 at 10:25, Laszlo Ersek <lersek@...> wrote:

On 1/19/23 14:43, Ard Biesheuvel wrote:
QEMU for x86 has a nasty CPU hotplug bug of which the ramifications are
difficult to oversee, even though KVM acceleration seems to be
unaffected. This has been addressed in QEMU mainline, and will percolate
through the ecosystem at its usual pace. In the mean time, due to the
potential impact on production workloads, we will be updating OVMF to
abort the boot when it detects a QEMU build that is affected.

Tiancore's platform CI uses QEMU in TCG mode, and is therefore impacted
by this mitigation, unless its QEMU builds are updated. This has been
done for Ubuntu-GCC5, but Windows-VS2019 still uses a QEMU build that is
affected.

Aborting the boot upon detecting the QEMU issue will render all boot
tests carried out on Windows-VS2019 broken unless we implement the
'escape hatch' that enables proceed-at-your-own-risk mode, and permits
the boot to proceed even if the QEMU issue is detected.

So let's enable this for Windows-VS2019, and remove it again once it is
no longer needed.

Cc: Laszlo Ersek <lersek@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Michael Brown <mcb30@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Michael Kubacki <michael.kubacki@...>

Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Ard Biesheuvel <ardb@...>
---
OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml | 2 +-
OvmfPkg/PlatformCI/PlatformBuildLib.py | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
index 7e63f419b26b..b3b91aa84ea0 100644
--- a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
+++ b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
@@ -24,7 +24,7 @@ jobs:
package: 'OvmfPkg'
vm_image: 'windows-2019'
should_run: true
- run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE"
+ run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE QEMU_CPUHP_QUIRK=TRUE"

#Use matrix to speed up the build process
strategy:
diff --git a/OvmfPkg/PlatformCI/PlatformBuildLib.py b/OvmfPkg/PlatformCI/PlatformBuildLib.py
index bfef9849c749..58dc1189a2cc 100644
--- a/OvmfPkg/PlatformCI/PlatformBuildLib.py
+++ b/OvmfPkg/PlatformCI/PlatformBuildLib.py
@@ -170,6 +170,7 @@ class PlatformBuilder( UefiBuilder, BuildSettingsManager):
self.env.SetValue("PRODUCT_NAME", "OVMF", "Platform Hardcoded")
self.env.SetValue("MAKE_STARTUP_NSH", "FALSE", "Default to false")
self.env.SetValue("QEMU_HEADLESS", "FALSE", "Default to false")
+ self.env.SetValue("QEMU_CPUHP_QUIRK", "FALSE", "Default to false")
return 0

def PlatformPreBuild(self):
@@ -211,6 +212,17 @@ class PlatformBuilder( UefiBuilder, BuildSettingsManager):
args += " -pflash " + os.path.join(OutputPath_FV, "OVMF.fd") # path to firmware


+ ###
+ ### NOTE This is a temporary workaround to allow platform CI to cope with
+ ### a QEMU bug in the CPU hotplug code. Once the CI environment has
+ ### been updated to carry a fixed version of QEMU, this can be
+ ### removed again
+ ###
+ ### Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
+ ###
+ if (self.env.GetValue("QEMU_CPUHP_QUIRK").upper() == "TRUE"):
+ args += " -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes"
+
if (self.env.GetValue("MAKE_STARTUP_NSH").upper() == "TRUE"):
f = open(os.path.join(VirtualDrive, "startup.nsh"), "w")
f.write("BOOT SUCCESS !!! \n")
Reviewed-by: Laszlo Ersek <lersek@...>

Technically speaking, can I merge this *prepended* to my v3 patch set
([PATCH v3 0/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg
block regression), in the *same* PR?

Because then I'll do that, saving us a bit of duplicated work.
Yes, please go ahead and merge this if it is all ready to go. I won't
get around to it to later this afternoon.

I'll then also file the BZ for reverting this (once I know the commit
hash), for when a new QEMU is out for Windows -- whom should I assign
that BZ? Oliver or Ard?
You can assign it to me. And please put Michael Kubacki on cc as well.


Re: [RFC PATCH] OvmfPkg/PlatformCI VS2019: Enable temporary workaround for cpuhp bugfix

Laszlo Ersek
 

On 1/19/23 14:43, Ard Biesheuvel wrote:
QEMU for x86 has a nasty CPU hotplug bug of which the ramifications are
difficult to oversee, even though KVM acceleration seems to be
unaffected. This has been addressed in QEMU mainline, and will percolate
through the ecosystem at its usual pace. In the mean time, due to the
potential impact on production workloads, we will be updating OVMF to
abort the boot when it detects a QEMU build that is affected.

Tiancore's platform CI uses QEMU in TCG mode, and is therefore impacted
by this mitigation, unless its QEMU builds are updated. This has been
done for Ubuntu-GCC5, but Windows-VS2019 still uses a QEMU build that is
affected.

Aborting the boot upon detecting the QEMU issue will render all boot
tests carried out on Windows-VS2019 broken unless we implement the
'escape hatch' that enables proceed-at-your-own-risk mode, and permits
the boot to proceed even if the QEMU issue is detected.

So let's enable this for Windows-VS2019, and remove it again once it is
no longer needed.

Cc: Laszlo Ersek <lersek@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Michael Brown <mcb30@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Michael Kubacki <michael.kubacki@...>

Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Ard Biesheuvel <ardb@...>
---
OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml | 2 +-
OvmfPkg/PlatformCI/PlatformBuildLib.py | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
index 7e63f419b26b..b3b91aa84ea0 100644
--- a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
+++ b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
@@ -24,7 +24,7 @@ jobs:
package: 'OvmfPkg'
vm_image: 'windows-2019'
should_run: true
- run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE"
+ run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE QEMU_CPUHP_QUIRK=TRUE"

#Use matrix to speed up the build process
strategy:
diff --git a/OvmfPkg/PlatformCI/PlatformBuildLib.py b/OvmfPkg/PlatformCI/PlatformBuildLib.py
index bfef9849c749..58dc1189a2cc 100644
--- a/OvmfPkg/PlatformCI/PlatformBuildLib.py
+++ b/OvmfPkg/PlatformCI/PlatformBuildLib.py
@@ -170,6 +170,7 @@ class PlatformBuilder( UefiBuilder, BuildSettingsManager):
self.env.SetValue("PRODUCT_NAME", "OVMF", "Platform Hardcoded")
self.env.SetValue("MAKE_STARTUP_NSH", "FALSE", "Default to false")
self.env.SetValue("QEMU_HEADLESS", "FALSE", "Default to false")
+ self.env.SetValue("QEMU_CPUHP_QUIRK", "FALSE", "Default to false")
return 0

def PlatformPreBuild(self):
@@ -211,6 +212,17 @@ class PlatformBuilder( UefiBuilder, BuildSettingsManager):
args += " -pflash " + os.path.join(OutputPath_FV, "OVMF.fd") # path to firmware


+ ###
+ ### NOTE This is a temporary workaround to allow platform CI to cope with
+ ### a QEMU bug in the CPU hotplug code. Once the CI environment has
+ ### been updated to carry a fixed version of QEMU, this can be
+ ### removed again
+ ###
+ ### Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
+ ###
+ if (self.env.GetValue("QEMU_CPUHP_QUIRK").upper() == "TRUE"):
+ args += " -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes"
+
if (self.env.GetValue("MAKE_STARTUP_NSH").upper() == "TRUE"):
f = open(os.path.join(VirtualDrive, "startup.nsh"), "w")
f.write("BOOT SUCCESS !!! \n")
Reviewed-by: Laszlo Ersek <lersek@...>

Technically speaking, can I merge this *prepended* to my v3 patch set
([PATCH v3 0/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg
block regression), in the *same* PR?

Because then I'll do that, saving us a bit of duplicated work.

I'll then also file the BZ for reverting this (once I know the commit
hash), for when a new QEMU is out for Windows -- whom should I assign
that BZ? Oliver or Ard?

Thanks!
Laszlo


Re: [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Laszlo Ersek
 

On 1/20/23 10:17, Laszlo Ersek wrote:
On 1/20/23 09:50, Laszlo Ersek wrote:

Oliver:

(1) can you please post a patch for the Windows CI so that the
following option be passed to QEMU:

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(This option is harmless when the firmware does not determine the QEMU
bug, so it can be passed in advance; it will have no consequence at
all.)

In the patch, please reference

https://bugzilla.tianocore.org/show_bug.cgi?id=4250
I *think* the call chain is something like

Platform_CI [edk2/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml]
FlashRomImage [edk2-pytool-extensions/edk2toolext/environment/uefi_build.py]
FlashRomImage [edk2/OvmfPkg/PlatformCI/PlatformBuildLib.py]
qemu-system-x86_64 [edk2/OvmfPkg/PlatformCI/PlatformBuildLib.py]

So I believe a new variable, similar to QEMU_HEADLESS, should be
introduced, in "OvmfPkg/PlatformCI/PlatformBuildLib.py" and
"OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml", for controlling
the above command line switch. Maybe it should be documented even, in
"OvmfPkg/PlatformCI/ReadMe.md".

Call the new variable QEMU_X_CPUHP_BUGCHECK_OVERRIDE?
Haha, I'm sooo late; Ard has already posted such patches ;)

Laszlo


Re: [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Laszlo Ersek
 

On 1/20/23 09:50, Laszlo Ersek wrote:

Oliver:

(1) can you please post a patch for the Windows CI so that the
following option be passed to QEMU:

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(This option is harmless when the firmware does not determine the QEMU
bug, so it can be passed in advance; it will have no consequence at
all.)

In the patch, please reference

https://bugzilla.tianocore.org/show_bug.cgi?id=4250
I *think* the call chain is something like

Platform_CI [edk2/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml]
FlashRomImage [edk2-pytool-extensions/edk2toolext/environment/uefi_build.py]
FlashRomImage [edk2/OvmfPkg/PlatformCI/PlatformBuildLib.py]
qemu-system-x86_64 [edk2/OvmfPkg/PlatformCI/PlatformBuildLib.py]

So I believe a new variable, similar to QEMU_HEADLESS, should be
introduced, in "OvmfPkg/PlatformCI/PlatformBuildLib.py" and
"OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml", for controlling
the above command line switch. Maybe it should be documented even, in
"OvmfPkg/PlatformCI/ReadMe.md".

Call the new variable QEMU_X_CPUHP_BUGCHECK_OVERRIDE?

Thank you!
Laszlo


Re: [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Ard Biesheuvel
 

On Fri, 20 Jan 2023 at 09:50, Laszlo Ersek <lersek@...> wrote:

a couple of requests to Oliver below:

On 1/19/23 12:27, Ard Biesheuvel wrote:
On Thu, 19 Jan 2023 at 12:01, Laszlo Ersek <lersek@...> wrote:

In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
protocol is (effectively) broken such that it suggests that switching from
the legacy interface to the modern interface works, but in reality the
switch never happens. The symptom has been witnessed when using TCG
acceleration; KVM seems to mask the issue. The issue persists with the
following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
there is no stable release that addresses the problem.

The QEMU bug confuses the Present and Possible counting in function
PlatformMaxCpuCountInitialization(), in
"OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
the privilege level of SMM) is not that great.

Detect the issue in PlatformCpuCountBugCheck(), and print an error message
and *hang* if the issue is present.

Users willing to take risks can override the hang with the experimental
QEMU command line option

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(The "-fw_cfg" QEMU option itself is not experimental; its above argument,
as far it concerns the firmware, is experimental.)

The problem was originally reported by Ard [0]. We analyzed it at [1] and
[2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
block", 2023-01-08), to be included in QEMU v8.0.0.

[0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3

[2] IO port write width clamping differs between TCG and KVM
http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html

[3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html

NOTE: PlatformInitLib is used in the following platform DSCs:

OvmfPkg/AmdSev/AmdSevX64.dsc
OvmfPkg/CloudHv/CloudHvX64.dsc
OvmfPkg/IntelTdx/IntelTdxX64.dsc
OvmfPkg/Microvm/MicrovmX64.dsc
OvmfPkg/OvmfPkgIa32.dsc
OvmfPkg/OvmfPkgIa32X64.dsc
OvmfPkg/OvmfPkgX64.dsc

but I can only test this change with the last three platforms, running on
QEMU.

Test results:

TCG QEMU OVMF override result
patched patched
--- ------- ------- -------- --------------------------------------
0 0 0 0 CPU counts OK (KVM masks the QEMU bug)
0 0 1 0 CPU counts OK (KVM masks the QEMU bug)
0 1 0 0 CPU counts OK (QEMU fix, but KVM masks
the QEMU bug anyway)
0 1 1 0 CPU counts OK (QEMU fix, but KVM masks
the QEMU bug anyway)
1 0 0 0 boot with broken CPU counts (original
QEMU bug)
1 0 1 0 broken CPU count caught (boot hangs)
1 0 1 1 broken CPU count caught, bug check
overridden, boot continues
1 1 0 0 CPU counts OK (QEMU fix)
1 1 1 0 CPU counts OK (QEMU fix)

Cc: Ard Biesheuvel <ardb+tianocore@...>
Cc: Brijesh Singh <brijesh.singh@...>
Cc: Erdem Aktas <erdemaktas@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: James Bottomley <jejb@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Jordan Justen <jordan.l.justen@...>
Cc: Michael Brown <mcb30@...>
Cc: Min Xu <min.m.xu@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Sebastien Boeuf <sebastien.boeuf@...>
Cc: Tom Lendacky <thomas.lendacky@...>
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Laszlo Ersek <lersek@...>
Thanks a lot for taking the time and investing the effort. I'm quite
happy that we have this 'escape hatch' now, which we could arguably
use temporarily in the VS2019 platform CI until its QEMU binary gets
updated, right?
Yes, I have to agree there.

Right now, because those QEMU binaries are affected by the regression,
and because they use TCG, OVMF already sees Present=0 Possible=1. Due to
the interference of Present=0 with the QEMU v2.7 reset bug workaround,
we also get BootCpuCount=0. Furthermore, MaxCpuCount gets set to 1, from
Possible. Thus, we exit PlatformMaxCpuCountInitialization() with
PcdCpuBootLogicalProcessorNumber=0 (from BootCpuCount) and
PcdCpuMaxLogicalProcessorNumber=1 (from MaxCpuCount).

Then, in the "predictable subset" of consequences of the QEMU
regression, we can say that MpInitLib interprets the above PCD values as
"uniprocessor system with the boot CPU count not exposed by the
platform". This (i.e., *just this*) does not fall outside of MpInitLib's
domain (again, note my qualification "predictable subset").

Now, if we apply the patch and also add the -fw_cfg switch to the
Windows CI, *and* we also don't add any -smp flags (as far as I can
tell, no -smp flag is used now), then the new PCD state will be

PcdCpuBootLogicalProcessorNumber=1 (changed from zero)
PcdCpuMaxLogicalProcessorNumber=1 (stays the same)

As far as I can tell, *right now* this change should have no effect *in
MpInitLib*, IOW nothing gets worse or better there. Namely,
PcdCpuBootLogicalProcessorNumber is only consumed in WakeUpAP(), and
only when InitFlag == ApInitConfig. InitFlag is set like that only in
CollectProcessorCount(). However, CollectProcessorCount() is only called
if PcdCpuMaxLogicalProcessorNumber is >1 (see MaxLogicalProcessorNumber
in MpInitLibInitialize()). Meaning in effect that
PcdCpuMaxLogicalProcessorNumber=1 makes PcdCpuBootLogicalProcessorNumber
irrelevant, so its change from 0 to 1 is invisible *to MpInitLib*.

Oliver:

(1) can you please post a patch for the Windows CI so that the following
option be passed to QEMU:

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(This option is harmless when the firmware does not determine the QEMU
bug, so it can be passed in advance; it will have no consequence at all.)

In the patch, please reference

https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Can I take the above as an ack on

https://edk2.groups.io/g/devel/message/98899

?

(2) Please file a separate TianoCore BZ for *backing out* the change (=
for removing the -fw_cfg switch), and assign it to yourself :)

Once the Windows CI advances to a fixed QEMU binary, the "escape hatch"
should be shut welded down.

(3) Please give me a hint when the CI patch (1) has been merged; then I
can go ahead and merge this v3 series as well.
I'll merge the whole lot once you're happy with the CI patch.


Re: [PATCH v3 2/2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression

Laszlo Ersek
 

a couple of requests to Oliver below:

On 1/19/23 12:27, Ard Biesheuvel wrote:
On Thu, 19 Jan 2023 at 12:01, Laszlo Ersek <lersek@...> wrote:

In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation
protocol is (effectively) broken such that it suggests that switching from
the legacy interface to the modern interface works, but in reality the
switch never happens. The symptom has been witnessed when using TCG
acceleration; KVM seems to mask the issue. The issue persists with the
following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently
there is no stable release that addresses the problem.

The QEMU bug confuses the Present and Possible counting in function
PlatformMaxCpuCountInitialization(), in
"OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=0
Possible=1. This in turn further confuses MpInitLib in UefiCpuPkg (hence
firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with
SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering
the privilege level of SMM) is not that great.

Detect the issue in PlatformCpuCountBugCheck(), and print an error message
and *hang* if the issue is present.

Users willing to take risks can override the hang with the experimental
QEMU command line option

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(The "-fw_cfg" QEMU option itself is not experimental; its above argument,
as far it concerns the firmware, is experimental.)

The problem was originally reported by Ard [0]. We analyzed it at [1] and
[2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389
("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg
block", 2023-01-08), to be included in QEMU v8.0.0.

[0] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c2

[1] https://bugzilla.tianocore.org/show_bug.cgi?id=4234#c3

[2] IO port write width clamping differs between TCG and KVM
http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html

[3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block
http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com
https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html

NOTE: PlatformInitLib is used in the following platform DSCs:

OvmfPkg/AmdSev/AmdSevX64.dsc
OvmfPkg/CloudHv/CloudHvX64.dsc
OvmfPkg/IntelTdx/IntelTdxX64.dsc
OvmfPkg/Microvm/MicrovmX64.dsc
OvmfPkg/OvmfPkgIa32.dsc
OvmfPkg/OvmfPkgIa32X64.dsc
OvmfPkg/OvmfPkgX64.dsc

but I can only test this change with the last three platforms, running on
QEMU.

Test results:

TCG QEMU OVMF override result
patched patched
--- ------- ------- -------- --------------------------------------
0 0 0 0 CPU counts OK (KVM masks the QEMU bug)
0 0 1 0 CPU counts OK (KVM masks the QEMU bug)
0 1 0 0 CPU counts OK (QEMU fix, but KVM masks
the QEMU bug anyway)
0 1 1 0 CPU counts OK (QEMU fix, but KVM masks
the QEMU bug anyway)
1 0 0 0 boot with broken CPU counts (original
QEMU bug)
1 0 1 0 broken CPU count caught (boot hangs)
1 0 1 1 broken CPU count caught, bug check
overridden, boot continues
1 1 0 0 CPU counts OK (QEMU fix)
1 1 1 0 CPU counts OK (QEMU fix)

Cc: Ard Biesheuvel <ardb+tianocore@...>
Cc: Brijesh Singh <brijesh.singh@...>
Cc: Erdem Aktas <erdemaktas@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: James Bottomley <jejb@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Jordan Justen <jordan.l.justen@...>
Cc: Michael Brown <mcb30@...>
Cc: Min Xu <min.m.xu@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Sebastien Boeuf <sebastien.boeuf@...>
Cc: Tom Lendacky <thomas.lendacky@...>
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Laszlo Ersek <lersek@...>
Thanks a lot for taking the time and investing the effort. I'm quite
happy that we have this 'escape hatch' now, which we could arguably
use temporarily in the VS2019 platform CI until its QEMU binary gets
updated, right?
Yes, I have to agree there.

Right now, because those QEMU binaries are affected by the regression,
and because they use TCG, OVMF already sees Present=0 Possible=1. Due to
the interference of Present=0 with the QEMU v2.7 reset bug workaround,
we also get BootCpuCount=0. Furthermore, MaxCpuCount gets set to 1, from
Possible. Thus, we exit PlatformMaxCpuCountInitialization() with
PcdCpuBootLogicalProcessorNumber=0 (from BootCpuCount) and
PcdCpuMaxLogicalProcessorNumber=1 (from MaxCpuCount).

Then, in the "predictable subset" of consequences of the QEMU
regression, we can say that MpInitLib interprets the above PCD values as
"uniprocessor system with the boot CPU count not exposed by the
platform". This (i.e., *just this*) does not fall outside of MpInitLib's
domain (again, note my qualification "predictable subset").

Now, if we apply the patch and also add the -fw_cfg switch to the
Windows CI, *and* we also don't add any -smp flags (as far as I can
tell, no -smp flag is used now), then the new PCD state will be

PcdCpuBootLogicalProcessorNumber=1 (changed from zero)
PcdCpuMaxLogicalProcessorNumber=1 (stays the same)

As far as I can tell, *right now* this change should have no effect *in
MpInitLib*, IOW nothing gets worse or better there. Namely,
PcdCpuBootLogicalProcessorNumber is only consumed in WakeUpAP(), and
only when InitFlag == ApInitConfig. InitFlag is set like that only in
CollectProcessorCount(). However, CollectProcessorCount() is only called
if PcdCpuMaxLogicalProcessorNumber is >1 (see MaxLogicalProcessorNumber
in MpInitLibInitialize()). Meaning in effect that
PcdCpuMaxLogicalProcessorNumber=1 makes PcdCpuBootLogicalProcessorNumber
irrelevant, so its change from 0 to 1 is invisible *to MpInitLib*.

Oliver:

(1) can you please post a patch for the Windows CI so that the following
option be passed to QEMU:

-fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes

(This option is harmless when the firmware does not determine the QEMU
bug, so it can be passed in advance; it will have no consequence at all.)

In the patch, please reference

https://bugzilla.tianocore.org/show_bug.cgi?id=4250

(2) Please file a separate TianoCore BZ for *backing out* the change (=
for removing the -fw_cfg switch), and assign it to yourself :)

Once the Windows CI advances to a fixed QEMU binary, the "escape hatch"
should be shut welded down.

(3) Please give me a hint when the CI patch (1) has been merged; then I
can go ahead and merge this v3 series as well.

Thanks!
Laszlo


For the series,

Reviewed-by: Ard Biesheuvel <ardb@...>



---

Notes:
v3:

- reimplement the bug check in the factored out function
PlatformCpuCountBugCheck()

- add override [Ard, Gerd, Michael]

- assert "0 < Present == BootCpuCount <= Possible" whenever
PlatformCpuCountBugCheck() returns

- update commit message

- update test matrix in commit message

- (re)test all "OVMF patched = 1" cases (sigh)

v2:

- V1 was at
<http://mid.mail-archive.com/20230104151234.286030-1-lersek@redhat.com>.

- Repo: <https://pagure.io/lersek/edk2.git>, branch:
cpuhp-reg-catch-4250-v2

- Remove KVM as a proposed workaround from the error message, because in
the QEMU discussion, we had found that the KVM accelerator's behavior
in QEMU (masking the problem) was not right, and that a fix for that
had been in progress for quite some time.

- Add the QEMU commit hash to the commit message, the code comment, and
the error message.

- Pick up Gerd's R-b; add Oliver to the Cc list.

OvmfPkg/Library/PlatformInitLib/Platform.c | 87 ++++++++++++++++++++
1 file changed, 87 insertions(+)

diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/PlatformInitLib/Platform.c
index d1be5c2d7970..9fee6e481038 100644
--- a/OvmfPkg/Library/PlatformInitLib/Platform.c
+++ b/OvmfPkg/Library/PlatformInitLib/Platform.c
@@ -36,6 +36,9 @@

#include <Library/PlatformInitLib.h>

+#define CPUHP_BUGCHECK_OVERRIDE_FWCFG_FILE \
+ "opt/org.tianocore/X-Cpuhp-Bugcheck-Override"
+
VOID
EFIAPI
PlatformAddIoMemoryBaseSizeHob (
@@ -437,6 +440,87 @@ PlatformCpuCountBugCheck (
{
ASSERT (*BootCpuCount > 0);

+ //
+ // Sanity check: we need at least 1 present CPU (CPU#0 is always present).
+ //
+ // The legacy-to-modern switching of the CPU hotplug register block got broken
+ // (for TCG) in QEMU v5.1.0. Refer to "IO port write width clamping differs
+ // between TCG and KVM" at
+ // <http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat.com>
+ // or at
+ // <https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html>.
+ //
+ // QEMU received the fix in commit dab30fbef389 ("acpi: cpuhp: fix
+ // guest-visible maximum access size to the legacy reg block", 2023-01-08), to
+ // be included in QEMU v8.0.0.
+ //
+ // If we're affected by this QEMU bug, then we must not continue: it confuses
+ // the multiprocessing in UefiCpuPkg/Library/MpInitLib, and breaks CPU
+ // hot(un)plug with SMI in OvmfPkg/CpuHotplugSmm.
+ //
+ if (*Present == 0) {
+ UINTN Idx;
+ STATIC CONST CHAR8 *CONST Message[] = {
+ "Broken CPU hotplug register block found. Update QEMU to version 8+, or",
+ "to a stable release with commit dab30fbef389 backported. Refer to",
+ "<https://bugzilla.tianocore.org/show_bug.cgi?id=4250>.",
+ "Consequences of the QEMU bug may include, but are not limited to:",
+ "- all firmware logic, dependent on the CPU hotplug register block,",
+ " being confused, for example, multiprocessing-related logic;",
+ "- guest OS data loss, including filesystem corruption, due to crash or",
+ " hang during ACPI S3 resume;",
+ "- SMM privilege escalation, by a malicious guest OS or 3rd partty UEFI",
+ " agent, against the platform firmware.",
+ "These symptoms need not necessarily be limited to the QEMU user",
+ "attempting to hot(un)plug a CPU.",
+ "The firmware will now stop (hang) deliberately, in order to prevent the",
+ "above symptoms.",
+ "You can forcibly override the hang, *at your own risk*, with the",
+ "following *experimental* QEMU command line option:",
+ " -fw_cfg name=" CPUHP_BUGCHECK_OVERRIDE_FWCFG_FILE ",string=yes",
+ "Please only report such bugs that you can reproduce *without* the",
+ "override.",
+ };
+ RETURN_STATUS ParseStatus;
+ BOOLEAN Override;
+
+ DEBUG ((
+ DEBUG_ERROR,
+ "%a: Present=%u Possible=%u\n",
+ __FUNCTION__,
+ *Present,
+ *Possible
+ ));
+ for (Idx = 0; Idx < ARRAY_SIZE (Message); ++Idx) {
+ DEBUG ((DEBUG_ERROR, "%a: %a\n", __FUNCTION__, Message[Idx]));
+ }
+
+ ParseStatus = QemuFwCfgParseBool (
+ CPUHP_BUGCHECK_OVERRIDE_FWCFG_FILE,
+ &Override
+ );
+ if (!RETURN_ERROR (ParseStatus) && Override) {
+ DEBUG ((
+ DEBUG_WARN,
+ "%a: \"%a\" active. You've been warned.\n",
+ __FUNCTION__,
+ CPUHP_BUGCHECK_OVERRIDE_FWCFG_FILE
+ ));
+ //
+ // The bug is in QEMU v5.1.0+, where we're not affected by the QEMU v2.7
+ // reset bug, so BootCpuCount from fw_cfg is reliable. Assume a fully
+ // populated topology, like when the modern CPU hotplug interface is
+ // unavailable.
+ //
+ *Present = *BootCpuCount;
+ *Possible = *BootCpuCount;
+ return;
+ }
+
+ ASSERT (FALSE);
+ CpuDeadLoop ();
+ }
+
//
// Sanity check: fw_cfg and the modern CPU hotplug interface should expose the
// same boot CPU count.
@@ -596,6 +680,9 @@ PlatformMaxCpuCountInitialization (
} while (Selected > 0);

PlatformCpuCountBugCheck (&BootCpuCount, &Present, &Possible);
+ ASSERT (Present > 0);
+ ASSERT (Present <= Possible);
+ ASSERT (BootCpuCount == Present);

MaxCpuCount = Possible;
}


Re: [PATCH v3 1/5] UefiCpuPkg/SmmBaseHob.h: Add SMM Base HOB Data

Laszlo Ersek
 

On 1/18/23 16:06, Ni, Ray wrote:

#pragma pack(1)
typedef struct {
UINT32 CpuIndex;
UINT32 NumberOfCpus; // align to EFI_SEC_PLATFORM_INFORMATION_RECORD2.NumberOfCpus
UINT64 SmBase[];
} SMM_BASE_HOB_DATA;
#pragma pack()

For system with less than 8K CPUs, one HOB is produced. CpuIndex is set to 0 indicating
the HOB describes the CPU from 0 to NumberOfCpus-1.

The HOB list may contains multiple such HOB instances each describing the information for
CPU from CpuIndex to CpuIndex + NumberOfCpus - 1.
The instance order in the HOB list is random so consumer cannot assume the CpuIndex
of first instance is 0.
When using discontiguous blocks that are limited to ~64KB each:

- The consumer won't be able to access elements of the "conceptual" big
array in a truly random (= random-access) fashion. There won't be a
single contiguous domain of valid subscripts. It's "bank switching", and
switching banks should be avoided IMO. It effectively replaces the
vector data structure with a linked list. The consequence is that the
consumer will have to either (a) build a new (temporary, or permanent)
index table of sorts, for implementing the "conceptual" big array as a
factual contiguous array, or (b) traverse the HOB list multiple times.

- If the element size of the array increases (which is otherwise
possible to do compatibly, e.g. by placing a GUID and/or revision# in
the HOB), the number of elements that fit in a single HOB decreases. I
think that's an artifact that needlessly complicates debugging, and
maybe performance too (it might increase the number of full-list
traversals).

- With relatively many elements fitting into a single HOB, on most
platforms, just one HOB is going to be used. While that may be good for
performance, it is not good for code coverage (testing). The quirky
indexing method will not be exercised by most platforms.

What's wrong with:

- restricting the creation of all such HOBs after
"gEfiPeiMemoryDiscoveredPpiGuid"

- using a page allocation, for representing the array contiguously

- in the HOB, only storing the address of the page allocation.

Page allocations performed after gEfiPeiMemoryDiscoveredPpiGuid will not
be moved, so the address in the HOB would be stable.

Laszlo


Re: [PATCH V2 04/10] OvmfPkg/IntelTdx: Implement other helper functions in SecTdxHelperLib

Yao, Jiewen
 

Can we define FV_HANDOFF_TABLE_POINTERS2 and FV_HANDOFF_TABLE_POINTERS2 in MdePkg/Include/IndustryStandard/UefiTcgPlatform.h?
[Jiewen] No. We cannot move to MdePkg.
TCG defines the field to be variable length. Something like below:

typedef struct {
UINT8 TableDescriptionSize;
UINT8 TableDescription[TableDescriptionSize];
UINT64 NumberOfTables;
EFI_CONFIGURATION_TABLE TableEntry[NumberOfTables];
} HANDOFF_TABLE_POINTERS2;

typedef struct {
UINT8 BlobDescriptionSize;
UINT8 BlobDescription[BlobDescriptionSize];
EFI_PHYSICAL_ADDRESS BlobBase;
UINT64 BlobLength;
} HANDOFF_TABLE_POINTERS2;

The implementation can choose its own length as they wish.

Thank you
Yao, Jiewen

-----Original Message-----
From: Xu, Min M <min.m.xu@...>
Sent: Friday, January 20, 2023 3:40 PM
To: Gerd Hoffmann <kraxel@...>; Yao, Jiewen
<jiewen.yao@...>
Cc: devel@edk2.groups.io; Aktas, Erdem <erdemaktas@...>; James
Bottomley <jejb@...>; Tom Lendacky
<thomas.lendacky@...>; Michael Roth <michael.roth@...>
Subject: RE: [PATCH V2 04/10] OvmfPkg/IntelTdx: Implement other helper
functions in SecTdxHelperLib

On January 19, 2023 5:54 PM, Gerd Hoffmann wrote:

+#pragma pack(1)
+
+#define HANDOFF_TABLE_DESC "TdxTable"
+typedef struct {
+ UINT8 TableDescriptionSize;
+ UINT8 TableDescription[sizeof (HANDOFF_TABLE_DESC)];
+ UINT64 NumberOfTables;
+ EFI_CONFIGURATION_TABLE TableEntry[1];
+} TDX_HANDOFF_TABLE_POINTERS2;
+
+#define FV_HANDOFF_TABLE_DESC "Fv(XXXXXXXX-XXXX-XXXX-XXXX-
XXXXXXXXXXXX)"
+typedef struct {
+ UINT8 BlobDescriptionSize;
+ UINT8 BlobDescription[sizeof (FV_HANDOFF_TABLE_DESC)];
+ EFI_PHYSICAL_ADDRESS BlobBase;
+ UINT64 BlobLength;
+} FV_HANDOFF_TABLE_POINTERS2;
+
+#pragma pack()
Why do you need this? For standard event types we should have those
structs already defined somewhere in edk2 I think ...
FV_HANDOFF_TABLE_POINTERS2 is related to standard event type
(EV_EFI_PLATFORM_FIRMWARE_BLOB2).
According to comment
(https://github.com/tianocore/edk2/blob/master/MdePkg/Include/Industry
Standard/UefiTcgPlatform.h#L145-L156) we can see this event type uses the
structure of UEFI_PLATFORM_FIRMWARE_BLOB2. It is not a data struct with
fixed size. Instead its size depends on BlobDescriptionSize.
Tcg2Pei measures the FV image with the event type
(EV_EFI_PLATFORM_FIRMWARE_BLOB2) and data struct
(FV_HANDOFF_TABLE_POINTERS2).
Tdx measurement does the same measurement to the Configuration FV
image.
@Yao, Jiewen Can we define FV_HANDOFF_TABLE_POINTERS2 and
FV_HANDOFF_TABLE_POINTERS2 in
MdePkg/Include/IndustryStandard/UefiTcgPlatform.h?

Thanks
Min


Re: [PATCH V2 04/10] OvmfPkg/IntelTdx: Implement other helper functions in SecTdxHelperLib

Min Xu
 

On January 19, 2023 5:54 PM, Gerd Hoffmann wrote:

+#pragma pack(1)
+
+#define HANDOFF_TABLE_DESC "TdxTable"
+typedef struct {
+ UINT8 TableDescriptionSize;
+ UINT8 TableDescription[sizeof (HANDOFF_TABLE_DESC)];
+ UINT64 NumberOfTables;
+ EFI_CONFIGURATION_TABLE TableEntry[1];
+} TDX_HANDOFF_TABLE_POINTERS2;
+
+#define FV_HANDOFF_TABLE_DESC "Fv(XXXXXXXX-XXXX-XXXX-XXXX-
XXXXXXXXXXXX)"
+typedef struct {
+ UINT8 BlobDescriptionSize;
+ UINT8 BlobDescription[sizeof (FV_HANDOFF_TABLE_DESC)];
+ EFI_PHYSICAL_ADDRESS BlobBase;
+ UINT64 BlobLength;
+} FV_HANDOFF_TABLE_POINTERS2;
+
+#pragma pack()
Why do you need this? For standard event types we should have those
structs already defined somewhere in edk2 I think ...
FV_HANDOFF_TABLE_POINTERS2 is related to standard event type (EV_EFI_PLATFORM_FIRMWARE_BLOB2).
According to comment (https://github.com/tianocore/edk2/blob/master/MdePkg/Include/IndustryStandard/UefiTcgPlatform.h#L145-L156) we can see this event type uses the structure of UEFI_PLATFORM_FIRMWARE_BLOB2. It is not a data struct with fixed size. Instead its size depends on BlobDescriptionSize.
Tcg2Pei measures the FV image with the event type (EV_EFI_PLATFORM_FIRMWARE_BLOB2) and data struct (FV_HANDOFF_TABLE_POINTERS2).
Tdx measurement does the same measurement to the Configuration FV image.
@Yao, Jiewen Can we define FV_HANDOFF_TABLE_POINTERS2 and FV_HANDOFF_TABLE_POINTERS2 in MdePkg/Include/IndustryStandard/UefiTcgPlatform.h?

Thanks
Min


Re: [PATCH v3 1/1] ShellPkg: Export default shell delay as PCD

Rebecca Cran
 

Could someone review this please?


--

Rebecca Cran

On 1/10/23 05:09, Tomas Pilar (tpilar) wrote:
Hi,

Any chance you could review this change? It's fairly simple.

Cheers,
Tom

On 03/01/2023 17:02, Tomas Pilar (tpilar) wrote:
From: Tomas Pilar <quic_tpilar@...>

Create PcdShellDefaultDelay to configure the default
delay the shell provides for the user at the start time
if the user wishes to cancel the execution of a potential
startup script.

The shell application already allows the user to override
the delay default value by specifying the -delay cmdline
argument. This however cannot be used when loading the
shell application using direct boot or when integrating
the shell into the platform firmware build.

Thus, a PCD can be easily configurerd by the developer
either at build time, or even at runtime.

Cc: Ray Ni <ray.ni@...>
Cc: Zhichao Gao <zhichao.gao@...>
Signed-off-by: Tomas Pilar <tomas@...>
---
  ShellPkg/ShellPkg.dec                | 4 ++++
  ShellPkg/Application/Shell/Shell.inf | 1 +
  ShellPkg/Application/Shell/Shell.c   | 2 +-
  3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/ShellPkg/ShellPkg.dec b/ShellPkg/ShellPkg.dec
index 7b2d1230bd2c..2ebea0a2615f 100644
--- a/ShellPkg/ShellPkg.dec
+++ b/ShellPkg/ShellPkg.dec
@@ -136,3 +136,7 @@ [PcdsFixedAtBuild, PcdsPatchableInModule, PcdsDynamic, PcdsDynamicEx]
    # Up to this many bytes of vendor specific data will be used. Default is 0
    # (disabled).
gEfiShellPkgTokenSpaceGuid.PcdShellVendorExtendedDecode|0|UINT32|0x00000013
+
+  ## Controls the default delay the shell will offer to the user at the
+  # start to check if the user wishes to cancel the script autostart
+ gEfiShellPkgTokenSpaceGuid.PcdShellDefaultDelay|5|UINT32|0x00000015
diff --git a/ShellPkg/Application/Shell/Shell.inf b/ShellPkg/Application/Shell/Shell.inf
index 4c32960a9687..f1e41de133d1 100644
--- a/ShellPkg/Application/Shell/Shell.inf
+++ b/ShellPkg/Application/Shell/Shell.inf
@@ -103,3 +103,4 @@ [Pcd]
    gEfiShellPkgTokenSpaceGuid.PcdShellForceConsole           ## CONSUMES
    gEfiShellPkgTokenSpaceGuid.PcdShellSupplier               ## CONSUMES
    gEfiShellPkgTokenSpaceGuid.PcdShellMaxHistoryCommandCount ## CONSUMES
+  gEfiShellPkgTokenSpaceGuid.PcdShellDefaultDelay           ## CONSUMES
diff --git a/ShellPkg/Application/Shell/Shell.c b/ShellPkg/Application/Shell/Shell.c
index df00adfdfa5b..0ae6e14a34bf 100644
--- a/ShellPkg/Application/Shell/Shell.c
+++ b/ShellPkg/Application/Shell/Shell.c
@@ -990,7 +990,7 @@ ProcessCommandLine (
    ShellInfoObject.ShellInitSettings.BitUnion.Bits.Delay = FALSE;
    ShellInfoObject.ShellInitSettings.BitUnion.Bits.Exit = FALSE;
    ShellInfoObject.ShellInitSettings.BitUnion.Bits.NoNest = FALSE;
-  ShellInfoObject.ShellInitSettings.Delay = 5;
+  ShellInfoObject.ShellInitSettings.Delay = PcdGet32 (PcdShellDefaultDelay);
      //
    // Start LoopVar at 0 to parse only optional arguments at Argv[0]




Re: [PATCH V2 04/10] OvmfPkg/IntelTdx: Implement other helper functions in SecTdxHelperLib

Min Xu
 

On January 19, 2023 5:54 PM, Gerd Hoffmann wrote:
@@ -807,7 +880,47 @@ TdxHelperMeasureTdHob (
VOID
)
{
- return EFI_UNSUPPORTED;
+ EFI_PEI_HOB_POINTERS Hob;
+ EFI_STATUS Status;
+ UINT8 Digest[SHA384_DIGEST_SIZE];
+ OVMF_WORK_AREA *WorkArea;
+ VOID *TdHob;
+
+ TdHob = (VOID *)(UINTN)FixedPcdGet32 (PcdOvmfSecGhcbBase);
+ Hob.Raw = (UINT8 *)TdHob;
+
+ //
+ // Walk thru the TdHob list until end of list.
+ //
+ while (!END_OF_HOB_LIST (Hob)) {
+ Hob.Raw = GET_NEXT_HOB (Hob);
+ }
Hmm? Isn't there just a single TdHob? Why do you need to walk the list here?
No, TdHob is a HobList and it contains several Hobs, including the ResourceDescriptorHobs which describe the memory regions. So we have to walk thru the hob list to find out its length.


+#pragma pack(1)
+
+#define HANDOFF_TABLE_DESC "TdxTable"
+typedef struct {
+ UINT8 TableDescriptionSize;
+ UINT8 TableDescription[sizeof (HANDOFF_TABLE_DESC)];
+ UINT64 NumberOfTables;
+ EFI_CONFIGURATION_TABLE TableEntry[1];
+} TDX_HANDOFF_TABLE_POINTERS2;
+
+#define FV_HANDOFF_TABLE_DESC "Fv(XXXXXXXX-XXXX-XXXX-XXXX-
XXXXXXXXXXXX)"
+typedef struct {
+ UINT8 BlobDescriptionSize;
+ UINT8 BlobDescription[sizeof (FV_HANDOFF_TABLE_DESC)];
+ EFI_PHYSICAL_ADDRESS BlobBase;
+ UINT64 BlobLength;
+} FV_HANDOFF_TABLE_POINTERS2;
+
+#pragma pack()
Why do you need this? For standard event types we should have those
structs already defined somewhere in edk2 I think ...
These structs are defined in SecurityPkg/Tcg/Tcg2Pei/Tcg2Pei.c. Let me think can they be moved to a common header file. Thanks for reminder.

Thanks
Min


[edk2-staging/Intel/HttpProxy] Introduction of new branch

Saloni Kasbekar
 

Hello EDK2 Community,

 

The "Intel/HttpProxy" branch was created in edk2-staging repository.

Role of this branch is to intercept HTTP Proxy feature for Intel product intercept. The branch is based on edk2-staging/HttpProxy branch that follows the code first process for HTTP Proxy feature development and UEFI specification update.

 

Link to branch: https://github.com/tianocore/edk2-staging/tree/Intel/HttpProxy

Base edk2 commit: https://github.com/tianocore/edk2/commit/2c17d676e402d75a3a674499342f7ddaccf387bd

 

Relevant BZs:

https://bugzilla.tianocore.org/show_bug.cgi?id=3951

 

Thanks,

Saloni Kasbekar


[PATCH] MdeModulePkg: allow PlatformBootManagerLib to use BootNext

Jeshua Smith
 

Currently BdsEntry caches BootNext before calling PlatformBootManagerLib
APIs, with the result that:
- If BootNext is already set, a BootNext value written by the APIs will
be ignored and deleted, and the current boot will use the cached
BootNext value.
- If BootNext is not present, a BootNext value written by the APIs will
have no effect on the current boot, but will be used by the next boot.

This patch adds PcdAllowBootNextFromPlatformBootManagerLib so that a
platform can enable PlatformBootManagerLib API calls to set BootNext
to control the current boot.
- If the PCD is FALSE (default), there is no change.
- If the PCD is TRUE, then a BootNext value written by the
PlatformBootManagerLib APIs will affect the current boot.

Signed-off-by: Jeshua Smith <jeshuas@...>
---
MdeModulePkg/MdeModulePkg.dec | 7 +++++
MdeModulePkg/Universal/BdsDxe/BdsDxe.inf | 27 ++++++++++---------
MdeModulePkg/Universal/BdsDxe/BdsEntry.c | 34 ++++++++++++++++++------
3 files changed, 47 insertions(+), 21 deletions(-)

diff --git a/MdeModulePkg/MdeModulePkg.dec b/MdeModulePkg/MdeModulePkg.dec
index 9605c617b7..0e74131712 100644
--- a/MdeModulePkg/MdeModulePkg.dec
+++ b/MdeModulePkg/MdeModulePkg.dec
@@ -1093,6 +1093,13 @@
# @Prompt Enable UEFI Stack Guard.
gEfiMdeModulePkgTokenSpaceGuid.PcdCpuStackGuard|FALSE|BOOLEAN|0x30001055

+ ## Indicates whether PlatformBootManagerLib code can set BootNext for the current boot.
+ # If enabled, setting BootNext in PlatformBootManagerLib controls the current boot.<BR><BR>
+ # TRUE - BootNext value from PlatformBootManagerLib will affect the current boot.<BR>
+ # FALSE - BootNext value from PlatformBootManagerLib will affect the subsequent boot (or be ignored if already set).<BR>
+ # @Prompt Allow PlatformBootManagerLib to set BootNext for the current boot.
+ gEfiMdeModulePkgTokenSpaceGuid.PcdAllowBootNextFromPlatformBootManagerLib|FALSE|BOOLEAN|0x30001056
+
[PcdsFixedAtBuild, PcdsPatchableInModule]
## Dynamic type PCD can be registered callback function for Pcd setting action.
# PcdMaxPeiPcdCallBackNumberPerPcdEntry indicates the maximum number of callback function
diff --git a/MdeModulePkg/Universal/BdsDxe/BdsDxe.inf b/MdeModulePkg/Universal/BdsDxe/BdsDxe.inf
index 5bac635def..b7a3560f5f 100644
--- a/MdeModulePkg/Universal/BdsDxe/BdsDxe.inf
+++ b/MdeModulePkg/Universal/BdsDxe/BdsDxe.inf
@@ -85,19 +85,20 @@
gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultLangDeprecate ## CONSUMES

[Pcd]
- gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultLangCodes ## CONSUMES
- gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultLang ## SOMETIMES_CONSUMES
- gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultPlatformLangCodes ## CONSUMES
- gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultPlatformLang ## CONSUMES
- gEfiMdePkgTokenSpaceGuid.PcdHardwareErrorRecordLevel ## CONSUMES
- gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut ## CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdFirmwareVendor ## CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdFirmwareRevision ## CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdConInConnectOnDemand ## CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdErrorCodeSetVariable ## SOMETIMES_CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdTestKeyUsed ## CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdCapsuleOnDiskSupport ## CONSUMES
- gEfiMdeModulePkgTokenSpaceGuid.PcdPlatformRecoverySupport ## CONSUMES
+ gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultLangCodes ## CONSUMES
+ gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultLang ## SOMETIMES_CONSUMES
+ gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultPlatformLangCodes ## CONSUMES
+ gEfiMdePkgTokenSpaceGuid.PcdUefiVariableDefaultPlatformLang ## CONSUMES
+ gEfiMdePkgTokenSpaceGuid.PcdHardwareErrorRecordLevel ## CONSUMES
+ gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdFirmwareVendor ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdFirmwareRevision ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdConInConnectOnDemand ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdErrorCodeSetVariable ## SOMETIMES_CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdTestKeyUsed ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdCapsuleOnDiskSupport ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdPlatformRecoverySupport ## CONSUMES
+ gEfiMdeModulePkgTokenSpaceGuid.PcdAllowBootNextFromPlatformBootManagerLib ## CONSUMES

[Depex]
TRUE
diff --git a/MdeModulePkg/Universal/BdsDxe/BdsEntry.c b/MdeModulePkg/Universal/BdsDxe/BdsEntry.c
index 766dde3aae..6450406cce 100644
--- a/MdeModulePkg/Universal/BdsDxe/BdsEntry.c
+++ b/MdeModulePkg/Universal/BdsDxe/BdsEntry.c
@@ -787,15 +787,19 @@ BdsEntry (

//
// Cache the "BootNext" NV variable before calling any PlatformBootManagerLib APIs
- // This could avoid the "BootNext" set by PlatformBootManagerLib be consumed in this boot.
- //
- GetEfiGlobalVariable2 (EFI_BOOT_NEXT_VARIABLE_NAME, (VOID **)&BootNext, &DataSize);
- if (DataSize != sizeof (UINT16)) {
- if (BootNext != NULL) {
- FreePool (BootNext);
- }
+ // if the Platform isn't allowed to override BootNext.
+ // If "BootNext" was already set, a "BootNext" value set in PlatformBootManagerLib APIs
+ // will be ignored; otherwise it will not take effect until the next boot.
+ //
+ if (!PcdGetBool (PcdAllowBootNextFromPlatformBootManagerLib)) {
+ GetEfiGlobalVariable2 (EFI_BOOT_NEXT_VARIABLE_NAME, (VOID **)&BootNext, &DataSize);
+ if (DataSize != sizeof (UINT16)) {
+ if (BootNext != NULL) {
+ FreePool (BootNext);
+ }

- BootNext = NULL;
+ BootNext = NULL;
+ }
}

//
@@ -1048,6 +1052,20 @@ BdsEntry (

EfiBootManagerHotkeyBoot ();

+ //
+ // If PlatformBootManagerLib APIs are allowed to override BootNext, read it just before use
+ //
+ if (PcdGetBool (PcdAllowBootNextFromPlatformBootManagerLib)) {
+ GetEfiGlobalVariable2 (EFI_BOOT_NEXT_VARIABLE_NAME, (VOID **)&BootNext, &DataSize);
+ if (DataSize != sizeof (UINT16)) {
+ if (BootNext != NULL) {
+ FreePool (BootNext);
+ }
+
+ BootNext = NULL;
+ }
+ }
+
if (BootNext != NULL) {
//
// Delete "BootNext" NV variable before transferring control to it to prevent loops.
--
2.17.1


Re: [RFC PATCH] OvmfPkg/PlatformCI VS2019: Enable temporary workaround for cpuhp bugfix

Yao, Jiewen
 

Acked-by: Jiewen Yao <Jiewen.yao@...>

-----Original Message-----
From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Michael
Kubacki
Sent: Friday, January 20, 2023 12:51 AM
To: devel@edk2.groups.io; ardb@...
Cc: Laszlo Ersek <lersek@...>; Gerd Hoffmann
<kraxel@...>; Yao, Jiewen <jiewen.yao@...>; Michael Brown
<mcb30@...>; Oliver Steffen <osteffen@...>; Kubacki, Michael
<michael.kubacki@...>
Subject: Re: [edk2-devel] [RFC PATCH] OvmfPkg/PlatformCI VS2019: Enable
temporary workaround for cpuhp bugfix

Reviewed-by: Michael Kubacki <michael.kubacki@...>

On 1/19/2023 8:43 AM, Ard Biesheuvel wrote:
QEMU for x86 has a nasty CPU hotplug bug of which the ramifications are

difficult to oversee, even though KVM acceleration seems to be

unaffected. This has been addressed in QEMU mainline, and will percolate

through the ecosystem at its usual pace. In the mean time, due to the

potential impact on production workloads, we will be updating OVMF to

abort the boot when it detects a QEMU build that is affected.



Tiancore's platform CI uses QEMU in TCG mode, and is therefore impacted

by this mitigation, unless its QEMU builds are updated. This has been

done for Ubuntu-GCC5, but Windows-VS2019 still uses a QEMU build that
is

affected.



Aborting the boot upon detecting the QEMU issue will render all boot

tests carried out on Windows-VS2019 broken unless we implement the

'escape hatch' that enables proceed-at-your-own-risk mode, and permits

the boot to proceed even if the QEMU issue is detected.



So let's enable this for Windows-VS2019, and remove it again once it is

no longer needed.



Cc: Laszlo Ersek <lersek@...>

Cc: Gerd Hoffmann <kraxel@...>

Cc: Jiewen Yao <jiewen.yao@...>

Cc: Michael Brown <mcb30@...>

Cc: Oliver Steffen <osteffen@...>

Cc: Michael Kubacki <michael.kubacki@...>



Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250

Signed-off-by: Ard Biesheuvel <ardb@...>

---

OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml | 2 +-

OvmfPkg/PlatformCI/PlatformBuildLib.py | 12 ++++++++++++

2 files changed, 13 insertions(+), 1 deletion(-)



diff --git a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml

index 7e63f419b26b..b3b91aa84ea0 100644

--- a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml

+++ b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml

@@ -24,7 +24,7 @@ jobs:

package: 'OvmfPkg'

vm_image: 'windows-2019'

should_run: true

- run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE"

+ run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE
QEMU_CPUHP_QUIRK=TRUE"



#Use matrix to speed up the build process

strategy:

diff --git a/OvmfPkg/PlatformCI/PlatformBuildLib.py
b/OvmfPkg/PlatformCI/PlatformBuildLib.py

index bfef9849c749..58dc1189a2cc 100644

--- a/OvmfPkg/PlatformCI/PlatformBuildLib.py

+++ b/OvmfPkg/PlatformCI/PlatformBuildLib.py

@@ -170,6 +170,7 @@ class PlatformBuilder( UefiBuilder,
BuildSettingsManager):

self.env.SetValue("PRODUCT_NAME", "OVMF", "Platform Hardcoded")

self.env.SetValue("MAKE_STARTUP_NSH", "FALSE", "Default to false")

self.env.SetValue("QEMU_HEADLESS", "FALSE", "Default to false")

+ self.env.SetValue("QEMU_CPUHP_QUIRK", "FALSE", "Default to false")

return 0



def PlatformPreBuild(self):

@@ -211,6 +212,17 @@ class PlatformBuilder( UefiBuilder,
BuildSettingsManager):

args += " -pflash " + os.path.join(OutputPath_FV, "OVMF.fd") #
path to firmware





+ ###

+ ### NOTE This is a temporary workaround to allow platform CI to
cope with

+ ### a QEMU bug in the CPU hotplug code. Once the CI
environment has

+ ### been updated to carry a fixed version of QEMU, this can be

+ ### removed again

+ ###

+ ### Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250

+ ###

+ if (self.env.GetValue("QEMU_CPUHP_QUIRK").upper() == "TRUE"):

+ args += " -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-
Override,string=yes"

+

if (self.env.GetValue("MAKE_STARTUP_NSH").upper() == "TRUE"):

f = open(os.path.join(VirtualDrive, "startup.nsh"), "w")

f.write("BOOT SUCCESS !!! \n")



Re: [RFC PATCH] OvmfPkg/PlatformCI VS2019: Enable temporary workaround for cpuhp bugfix

Michael Kubacki
 

Reviewed-by: Michael Kubacki <michael.kubacki@...>

On 1/19/2023 8:43 AM, Ard Biesheuvel wrote:
QEMU for x86 has a nasty CPU hotplug bug of which the ramifications are
difficult to oversee, even though KVM acceleration seems to be
unaffected. This has been addressed in QEMU mainline, and will percolate
through the ecosystem at its usual pace. In the mean time, due to the
potential impact on production workloads, we will be updating OVMF to
abort the boot when it detects a QEMU build that is affected.
Tiancore's platform CI uses QEMU in TCG mode, and is therefore impacted
by this mitigation, unless its QEMU builds are updated. This has been
done for Ubuntu-GCC5, but Windows-VS2019 still uses a QEMU build that is
affected.
Aborting the boot upon detecting the QEMU issue will render all boot
tests carried out on Windows-VS2019 broken unless we implement the
'escape hatch' that enables proceed-at-your-own-risk mode, and permits
the boot to proceed even if the QEMU issue is detected.
So let's enable this for Windows-VS2019, and remove it again once it is
no longer needed.
Cc: Laszlo Ersek <lersek@...>
Cc: Gerd Hoffmann <kraxel@...>
Cc: Jiewen Yao <jiewen.yao@...>
Cc: Michael Brown <mcb30@...>
Cc: Oliver Steffen <osteffen@...>
Cc: Michael Kubacki <michael.kubacki@...>
Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
Signed-off-by: Ard Biesheuvel <ardb@...>
---
OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml | 2 +-
OvmfPkg/PlatformCI/PlatformBuildLib.py | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
index 7e63f419b26b..b3b91aa84ea0 100644
--- a/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
+++ b/OvmfPkg/PlatformCI/.azurepipelines/Windows-VS2019.yml
@@ -24,7 +24,7 @@ jobs:
package: 'OvmfPkg'
vm_image: 'windows-2019'
should_run: true
- run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE"
+ run_flags: "MAKE_STARTUP_NSH=TRUE QEMU_HEADLESS=TRUE QEMU_CPUHP_QUIRK=TRUE"
#Use matrix to speed up the build process
strategy:
diff --git a/OvmfPkg/PlatformCI/PlatformBuildLib.py b/OvmfPkg/PlatformCI/PlatformBuildLib.py
index bfef9849c749..58dc1189a2cc 100644
--- a/OvmfPkg/PlatformCI/PlatformBuildLib.py
+++ b/OvmfPkg/PlatformCI/PlatformBuildLib.py
@@ -170,6 +170,7 @@ class PlatformBuilder( UefiBuilder, BuildSettingsManager):
self.env.SetValue("PRODUCT_NAME", "OVMF", "Platform Hardcoded")
self.env.SetValue("MAKE_STARTUP_NSH", "FALSE", "Default to false")
self.env.SetValue("QEMU_HEADLESS", "FALSE", "Default to false")
+ self.env.SetValue("QEMU_CPUHP_QUIRK", "FALSE", "Default to false")
return 0
def PlatformPreBuild(self):
@@ -211,6 +212,17 @@ class PlatformBuilder( UefiBuilder, BuildSettingsManager):
args += " -pflash " + os.path.join(OutputPath_FV, "OVMF.fd") # path to firmware
+ ###
+ ### NOTE This is a temporary workaround to allow platform CI to cope with
+ ### a QEMU bug in the CPU hotplug code. Once the CI environment has
+ ### been updated to carry a fixed version of QEMU, this can be
+ ### removed again
+ ###
+ ### Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=4250
+ ###
+ if (self.env.GetValue("QEMU_CPUHP_QUIRK").upper() == "TRUE"):
+ args += " -fw_cfg name=opt/org.tianocore/X-Cpuhp-Bugcheck-Override,string=yes"
+
if (self.env.GetValue("MAKE_STARTUP_NSH").upper() == "TRUE"):
f = open(os.path.join(VirtualDrive, "startup.nsh"), "w")
f.write("BOOT SUCCESS !!! \n")