Re: [PATCH v8 08/10] OvmfPkg/CpuHotplugSmm: add EjectCpu()


Laszlo Ersek
 

superficial comments only; the patch is nearly ready:

On 02/22/21 08:19, Ankur Arora wrote:
Add EjectCpu(), which handles the CPU ejection, and provides a holding
area for said CPUs. It is called via SmmCpuFeaturesRendezvousExit(),
at the tail end of the SMI handling.

Also UnplugCpus() now stashes QEMU Selectors of CPUs which need to be
ejected in CPU_HOT_EJECT_DATA.QemuSelectorMap. This is used by
EjectCpu() to identify CPUs marked for ejection.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ard Biesheuvel <ard.biesheuvel@arm.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Aaron Young <aaron.young@oracle.com>
Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=3132
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---

Notes:
Address these review comments from v6:
(1) s/CpuEject/EjectCpu/g
(2) Ensure that the added include is in sorted order.
(3) Switch to a cheaper CpuSleep() based loop instead of
CpuDeadLoop(). Also add the CpuLib LibraryClass.
(4) Remove the nested else clause
(5) Use Laszlo's much clearer comment when we try to map multiple
QemuSelector to the same ProcessorNum.
(6a) Fix indentation of the debug print in the block in (5).
(6b,6c,6d) Fix printf types for ProcessorNum, use FMT_APIC_ID for
APIC_ID and 0x%Lx for QemuSelector[].
() As discussed elsewhere add an DEBUG_INFO print logging the
correspondence between ProcessorNum, APIC_ID, QemuSelector.
(7a,7b) Use EFI_ALREADY_STARTED instead of EFI_INVALID_PARAMETER and
document it in the UnplugCpus() comment block.
() As discussed elsewhere, add the import statement for
PcdCpuHotEjectDataAddress.
(9) Use Laszlo's comment in the PcdGet64(PcdCpuHotEjectDataAddress)
description block.
(10) Change mCpuHotEjectData init state checks from ASSERT to ones
consistent with similar checks for mCpuHotPlugData.
(11-14) Get rid of mCpuHotEjectData init loop: moved to a prior
patch so it can be done at allocation time.
(15) s/SmmCpuFeaturesSmiRendezvousExit/SmmCpuFeaturesRendezvousExit/
(16,17) Document the ordering requirements of
mCpuHotEjectData->Handler, and mCpuHotEjectData->QemuSelectorMap.

Not addressed:
(8) Not removing the EjectCount variable as I'd like to minimize
stores/loads to CPU_HOT_EJECT_DATA->Handler and so would like to do this
a single time at the end of the iteration. (It is safe to write multiple
times to the handler in UnplugCpus() but given the ordering concerns
around it, it seems cleaner to not access it unnecessarily.)

OvmfPkg/CpuHotplugSmm/CpuHotplugSmm.inf | 2 +
OvmfPkg/CpuHotplugSmm/CpuHotplug.c | 157 ++++++++++++++++++++++++++++++--
2 files changed, 151 insertions(+), 8 deletions(-)

diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplugSmm.inf b/OvmfPkg/CpuHotplugSmm/CpuHotplugSmm.inf
index 04322b0d7855..ebcc7e2ac63a 100644
--- a/OvmfPkg/CpuHotplugSmm/CpuHotplugSmm.inf
+++ b/OvmfPkg/CpuHotplugSmm/CpuHotplugSmm.inf
@@ -40,6 +40,7 @@ [Packages]
[LibraryClasses]
BaseLib
BaseMemoryLib
+ CpuLib
DebugLib
LocalApicLib
MmServicesTableLib
@@ -54,6 +55,7 @@ [Protocols]

[Pcd]
gUefiCpuPkgTokenSpaceGuid.PcdCpuHotPlugDataAddress ## CONSUMES
+ gUefiOvmfPkgTokenSpaceGuid.PcdCpuHotEjectDataAddress ## CONSUMES
gUefiOvmfPkgTokenSpaceGuid.PcdQ35SmramAtDefaultSmbase ## CONSUMES

[FeaturePcd]
diff --git a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
index f07b5072749a..851e2b28aad9 100644
--- a/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
+++ b/OvmfPkg/CpuHotplugSmm/CpuHotplug.c
@@ -10,10 +10,12 @@
#include <IndustryStandard/Q35MchIch9.h> // ICH9_APM_CNT
#include <IndustryStandard/QemuCpuHotplug.h> // QEMU_CPUHP_CMD_GET_PENDING
#include <Library/BaseLib.h> // CpuDeadLoop()
+#include <Library/CpuLib.h> // CpuSleep()
#include <Library/DebugLib.h> // ASSERT()
#include <Library/MmServicesTableLib.h> // gMmst
#include <Library/PcdLib.h> // PcdGetBool()
#include <Library/SafeIntLib.h> // SafeUintnSub()
+#include <Pcd/CpuHotEjectData.h> // CPU_HOT_EJECT_DATA
#include <Protocol/MmCpuIo.h> // EFI_MM_CPU_IO_PROTOCOL
#include <Protocol/SmmCpuService.h> // EFI_SMM_CPU_SERVICE_PROTOCOL
#include <Uefi/UefiBaseType.h> // EFI_STATUS
@@ -32,11 +34,12 @@ STATIC EFI_MM_CPU_IO_PROTOCOL *mMmCpuIo;
//
STATIC EFI_SMM_CPU_SERVICE_PROTOCOL *mMmCpuService;
//
-// This structure is a communication side-channel between the
+// These structures serve as communication side-channels between the
// EFI_SMM_CPU_SERVICE_PROTOCOL consumer (i.e., this driver) and provider
// (i.e., PiSmmCpuDxeSmm).
//
STATIC CPU_HOT_PLUG_DATA *mCpuHotPlugData;
+STATIC CPU_HOT_EJECT_DATA *mCpuHotEjectData;
//
// SMRAM arrays for fetching the APIC IDs of processors with pending events (of
// known event types), for the time of just one MMI.
@@ -188,18 +191,72 @@ RevokeNewSlot:
}

/**
+ CPU Hot-eject handler, called from SmmCpuFeaturesRendezvousExit()
+ on each CPU at exit from SMM.
+
+ If, the executing CPU is not being ejected, nothing to be done.
+ If, the executing CPU is being ejected, wait in a halted loop
+ until ejected.
+
+ @param[in] ProcessorNum ProcessorNum denotes the CPU exiting SMM,
+ and will be used as an index into
+ CPU_HOT_EJECT_DATA->QemuSelectorMap. It is
+ identical to the processor handle number in
+ EFI_SMM_CPU_SERVICE_PROTOCOL.
+**/
+VOID
+EFIAPI
+EjectCpu (
+ IN UINTN ProcessorNum
+ )
+{
+ UINT64 QemuSelector;
+
+ QemuSelector = mCpuHotEjectData->QemuSelectorMap[ProcessorNum];
+ if (QemuSelector == CPU_EJECT_QEMU_SELECTOR_INVALID) {
+ return;
+ }
+
+ //
+ // CPU(s) being unplugged get here from SmmCpuFeaturesRendezvousExit()
+ // after having been cleared to exit the SMI by the BSP and thus have
+ // no SMM processing remaining.
+ //
+ // Given that we cannot allow them to escape to the guest, we pen them
+ // here until the BSP tells QEMU to unplug them.
+ //
+ for (;;) {
+ DisableInterrupts ();
+ CpuSleep ();
+ }
+}
+
+/**
Process to be hot-unplugged CPUs, per QemuCpuhpCollectApicIds().

For each such CPU, report the CPU to PiSmmCpuDxeSmm via
- EFI_SMM_CPU_SERVICE_PROTOCOL. If the to be hot-unplugged CPU is
- unknown, skip it silently.
+ EFI_SMM_CPU_SERVICE_PROTOCOL and stash the APIC ID for later ejection.
+ If the to be hot-unplugged CPU is unknown, skip it silently.
+
+ Additonally, if we do stash any APIC IDs, also install a CPU eject handler
+ which would handle the ejection.
(1) Please update the two APIC ID references above to QEMU CPU selectors
(the commit message has been updated already, correctly).



@param[in] ToUnplugApicIds The APIC IDs of the CPUs that are about to be
hot-unplugged.

+ @param[in] ToUnplugSelector The QEMU Selectors of the CPUs that are about to
+ be hot-unplugged.
+
(2) Please rename the parameter to "ToUnplugSelectors" (plural), both
here in the comment and in the actual parameter list.


@param[in] ToUnplugCount The number of filled-in APIC IDs in
ToUnplugApicIds.

+ @retval EFI_ALREADY_STARTED For the ProcessorNum that
+ EFI_SMM_CPU_SERVICE_PROTOCOL had assigned to
+ one of the APIC ID in ToUnplugApicIds,
+ mCpuHotEjectData->QemuSelectorMap already has
+ the QemuSelector value stashed. (This should
+ never happen.)
+
(3) Unfortunately I made a typing error in my v6 review where I
suggested this language: it should say

one of the APIC ID*s* in ToUnplugApicIds

(emphasis only here, in this comment)


@retval EFI_SUCCESS Known APIC IDs have been removed from SMM data
structures.

@@ -210,23 +267,36 @@ STATIC
EFI_STATUS
UnplugCpus (
IN APIC_ID *ToUnplugApicIds,
+ IN UINT32 *ToUnplugSelector,
IN UINT32 ToUnplugCount
)
{
EFI_STATUS Status;
UINT32 ToUnplugIdx;
+ UINT32 EjectCount;
UINTN ProcessorNum;

ToUnplugIdx = 0;
+ EjectCount = 0;
while (ToUnplugIdx < ToUnplugCount) {
APIC_ID RemoveApicId;
+ UINT32 QemuSelector;

RemoveApicId = ToUnplugApicIds[ToUnplugIdx];
+ QemuSelector = ToUnplugSelector[ToUnplugIdx];

//
- // mCpuHotPlugData->ApicId maps ProcessorNum -> ApicId. Use it to find
- // the ProcessorNum for the APIC ID to be removed.
+ // mCpuHotPlugData->ApicId maps ProcessorNum -> ApicId. Use RemoveApicId
+ // to find the corresponding ProcessorNum for the CPU to be removed.
//
+ // With this we can establish a 3 way mapping:
+ // APIC_ID -- ProcessorNum -- QemuSelector
+ //
+ // We stash the ProcessorNum -> QemuSelector mapping so it can later be
+ // used for CPU hot-eject in SmmCpuFeaturesRendezvousExit() context (where
+ // we only have ProcessorNum available.)
+ //
+
for (ProcessorNum = 0;
ProcessorNum < mCpuHotPlugData->ArrayLength;
ProcessorNum++) {
@@ -255,11 +325,64 @@ UnplugCpus (
return Status;
}

+ if (mCpuHotEjectData->QemuSelectorMap[ProcessorNum] !=
+ CPU_EJECT_QEMU_SELECTOR_INVALID) {
(4) Invalid indentation; in this (controlling expression) context,
CPU_EJECT_QEMU_SELECTOR_INVALID should line up with "mCpuHotEjectData".


+ //
+ // mCpuHotEjectData->QemuSelectorMap[ProcessorNum] is set to
+ // CPU_EJECT_QEMU_SELECTOR_INVALID when mCpuHotEjectData->QemuSelectorMap
+ // is allocated, and once the subject processsor is ejected.
+ //
+ // Additionally, mMmCpuService->RemoveProcessor(ProcessorNum) invalidates
+ // mCpuHotPlugData->ApicId[ProcessorNum], so a given ProcessorNum can
+ // never match more than one APIC ID and by transitivity, more than one
+ // QemuSelector in a single invocation of UnplugCpus().
+ //
(5) good comment, but please make it a tiny bit easier to read: please
insert two "em dashes", as follows:

// never match more than one APIC ID -- and by transitivity, designate
// more than one QemuSelector -- in a single invocation of UnplugCpus().

(I've also inserted the verb "designate", because "match" doesn't seem
to apply in that context.)


+ DEBUG ((DEBUG_ERROR, "%a: ProcessorNum %Lu maps to QemuSelector 0x%Lx, "
+ "cannot also map to 0x%Lx\n", __FUNCTION__, (UINT64)ProcessorNum,
+ (UINT64)mCpuHotEjectData->QemuSelectorMap[ProcessorNum], QemuSelector));
(6a) "mCpuHotEjectData->QemuSelectorMap[ProcessorNum]" has type UINT64,
so the cast is superfluous

(6b) we log selectors in all other places in decimal, so please use %Lu
here, not 0x%Lx, for logging
"mCpuHotEjectData->QemuSelectorMap[ProcessorNum]"

(6c) "QemuSelector" has type UINT32 (correctly), so we need to log it
with either "0x%x" or "%u". Given my above point about logging selectors
in decimal everywhere else, please use "%u".

DEBUG ((DEBUG_ERROR, "%a: ProcessorNum %Lu maps to QemuSelector %Lu, "
"cannot also map to %u\n", __FUNCTION__, (UINT64)ProcessorNum,
mCpuHotEjectData->QemuSelectorMap[ProcessorNum], QemuSelector));


+
+ Status = EFI_ALREADY_STARTED;
+ return Status;
+ }
(7) style nit -- this looks better as "return EFI_ALREADY_STARTED".


+
+ //
+ // Stash the QemuSelector so we can do the actual ejection later.
+ //
+ mCpuHotEjectData->QemuSelectorMap[ProcessorNum] = (UINT64)QemuSelector;
+
+ DEBUG ((DEBUG_INFO, "%a: Started hot-unplug on ProcessorNum %Lu, APIC ID "
+ FMT_APIC_ID ", QemuSelector 0x%Lx\n", __FUNCTION__, (UINT64)ProcessorNum,
+ RemoveApicId, mCpuHotEjectData->QemuSelectorMap[ProcessorNum]));
(8a) In the format string, please replace "QemuSelector 0x%Lx" with
"QemuSelector %u" -- the main point is the decimal notation

(8b) As a suggestion, I think we should simplify this DEBUG call by
replacing the "mCpuHotEjectData->QemuSelectorMap[ProcessorNum]" argument
with just "QemuSelector".

DEBUG ((DEBUG_INFO, "%a: Started hot-unplug on ProcessorNum %Lu, APIC ID "
FMT_APIC_ID ", QemuSelector %u\n", __FUNCTION__, (UINT64)ProcessorNum,
RemoveApicId, QemuSelector));


+
+ EjectCount++;
(I'm fine with keeping the "EjectCount" variable.)


ToUnplugIdx++;
}

+ if (EjectCount != 0) {
+ //
+ // We have processors to be ejected; install the handler.
+ //
+ mCpuHotEjectData->Handler = EjectCpu;
+
+ //
+ // The BSP, CPUs to be ejected dereference mCpuHotEjectData->Handler, and
+ // mCpuHotEjectData->QemuSelectorMap[] in SmmCpuFeaturesRendezvousExit().
+ //
+ // Assignments to both of these are ordered-before the BSP's SMI exit signal
+ // which happens via a write to SMM_DISPATCHER_MP_SYNC_DATA->AllCpusInSync.
+ // Dereferences of both are ordered-after the synchronization via
+ // "AllCpusInSync".
+ //
+ // So we are guaranteed that the Handler would see the assignments above.
+ // However, add a MemoryFence() here in-lieu of a compiler barrier to
+ // ensure that the compiler doesn't monkey around with the stores.
+ //
+ MemoryFence ();
+ }
+
(9) Per previous discussion (under patch v8 #7), please replace the last
paragraph of this comment, with a "ReleaseMemoryFence" reference.

The rest looks good.

Thanks!
Laszlo

//
- // We've removed this set of APIC IDs from SMM data structures.
+ // We've removed this set of APIC IDs from SMM data structures and
+ // have installed an ejection handler if needed.
//
return EFI_SUCCESS;
}
@@ -387,7 +510,7 @@ CpuHotplugMmi (
}

if (ToUnplugCount > 0) {
- Status = UnplugCpus (mToUnplugApicIds, ToUnplugCount);
+ Status = UnplugCpus (mToUnplugApicIds, mToUnplugSelector, ToUnplugCount);
if (EFI_ERROR (Status)) {
goto Fatal;
}
@@ -458,9 +581,14 @@ CpuHotplugEntry (

//
// Our DEPEX on EFI_SMM_CPU_SERVICE_PROTOCOL guarantees that PiSmmCpuDxeSmm
- // has pointed PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM.
+ // has pointed:
+ // - PcdCpuHotPlugDataAddress to CPU_HOT_PLUG_DATA in SMRAM,
+ // - PcdCpuHotEjectDataAddress to CPU_HOT_EJECT_DATA in SMRAM, if the
+ // possible CPU count is greater than 1.
//
mCpuHotPlugData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotPlugDataAddress);
+ mCpuHotEjectData = (VOID *)(UINTN)PcdGet64 (PcdCpuHotEjectDataAddress);
+
if (mCpuHotPlugData == NULL) {
Status = EFI_NOT_FOUND;
DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_PLUG_DATA: %r\n", __FUNCTION__, Status));
@@ -472,6 +600,19 @@ CpuHotplugEntry (
if (mCpuHotPlugData->ArrayLength == 1) {
return EFI_UNSUPPORTED;
}
+
+ if (mCpuHotEjectData == NULL) {
+ Status = EFI_NOT_FOUND;
+ } else if (mCpuHotPlugData->ArrayLength != mCpuHotEjectData->ArrayLength) {
+ Status = EFI_INVALID_PARAMETER;
+ } else {
+ Status = EFI_SUCCESS;
+ }
+ if (EFI_ERROR (Status)) {
+ DEBUG ((DEBUG_ERROR, "%a: CPU_HOT_EJECT_DATA: %r\n", __FUNCTION__, Status));
+ goto Fatal;
+ }
+
//
// Allocate the data structures that depend on the possible CPU count.
//

Join devel@edk2.groups.io to automatically receive all group messages.