Re: [PATCH v8 07/10] OvmfPkg/SmmCpuFeaturesLib: call CPU hot-eject handler


Ankur Arora
 

On 2021-02-23 9:18 a.m., Paolo Bonzini wrote:
On 23/02/21 18:06, Laszlo Ersek wrote:
On 02/23/21 08:45, Paolo Bonzini wrote:
On 22/02/21 15:53, Laszlo Ersek wrote:
+
+  if (mCpuHotEjectData != NULL) {
+    CPU_HOT_EJECT_HANDLER Handler;
+
+    Handler = mCpuHotEjectData->Handler;
This patch looks otherwise OK to me, but:

In patch v8 08/10, we have a ReleaseMemoryFence(). (For now, it is only
expressed as a MemoryFence() call; we'll make that more precise later.)

(1) I think that should be paired with an AcquireMemoryFence() call,
just before loading "mCpuHotEjectData->Handler" above -- for now, also
expressed as a MemoryFence() call only.
In Linux terms, there is a control dependency here.  However, it should
at least be a separate statement to load mCpuHotEjectData (which from my
EDK2 reminiscences should be a global) into a local variable.  So

   EjectData = mCPUHotEjectData;
   // Optional AcquireMemoryFence here
   if (EjectData != NULL) {
     CPU_HOT_EJECT_HANDLER Handler;

     Handler = EjectData->Handler;
     if (Handler != NULL) {
       Handler (CpuIndex);
     }
   }
Yes, "mCPUHotEjectData" is a global.

"mCpuHotEjectData" itself is set up on the BSP (from the entry point
function of the PiSmmCpuSmmDxe driver), before any other APs have a
chance to execute any SMM-related code at all. Furthermore, once set up,
mCpuHotEjectData never changes -- it remains set to a particular
non-NULL value forever, or it remains NULL forever. (The latter case
applies when the possible CPU count is 1; IOW, then there is no AP at all.)
Ok, that's what I was missing.  However, your code below has *two* loads of mCpuHotEjectData and the fence would have to go after the second (between the load of mCpuHotEjectData and the load of the Handler field).  Therefore I would still use a local variable even if you decide to put the fence inside the "if", which I agree is the clearest.
Sorry, I'm missing something here. As Laszlo said given that mCpuHotEjectData
does not change after being set, so why would it be a problem in referencing it
twice?

The generated code looks like this (load for mCpuHotEjectData at 0xf54b and
then the dependent mCpuHotEjectData->Handler load on 0xf645):

# 17d60 <mCpuHotEjectData>
f54b: 48 8b 05 0e 88 00 00 mov 0x880e(%rip),%rax
f54e: R_X86_64_PC32 .data+0x1d5c
f552: 48 85 c0 test %rax,%rax
f555: 0f 85 ea 00 00 00 jne f645 <SmiRendezvous+0x17e>

# Handler = mCpuHotEjectData->Handler
f645: 48 8b 40 08 mov 0x8(%rax),%rax
f649: 48 85 c0 test %rax,%rax
f64c: 74 05 je f653 <SmiRendezvous+0x18c>
f64e: 4c 89 e1 mov %r12,%rcx
f651: ff d0 callq *%rax

In the worst case, however, maybe it looks like this (two loads for
mCpuHotEjectData and then the dependent load):

# 17d60 <mCpuHotEjectData>
f54b: 48 8b 05 0e 88 00 00 mov 0x880e(%rip),%rax
f54e: R_X86_64_PC32 .data+0x1d5c
f552: 48 85 c0 test %rax,%rax
f555: 0f 85 ea 00 00 00 jne f645 <SmiRendezvous+0x17e>

# 17d60 <mCpuHotEjectData>
f645: 48 8b 05 0e 88 00 00 mov 0x880e(%rip),%rax
+3: R_X86_64_PC32 .data+0x1d5c

# Handler = mCpuHotEjectData->Handler
+7: 48 8b 40 08 mov 0x8(%rax),%rax
+11: 48 85 c0 test %rax,%rax
+14: 74 05 je f653 <SmiRendezvous+0x18c>
+16: 4c 89 e1 mov %r12,%rcx
+19: ff d0 callq *%rax

As you and Laszlo say -- we do need an acquire fence before this line
(which corresponds to the release fence in UnplugCpus(), patch 8
and the release fence in EjectCpu() in patch 9).

# Handler = mCpuHotEjectData->Handler
48 8b 40 08 mov 0x8(%rax),%rax

A local variable for mCpuHotEjectData, would be nice to have but I'm
not sure it is needed for correctness.

Ankur

Paolo

Because of that, I thought that the first comparison (mCpuHotEjectData
!= NULL) would not need any fence -- I thought it was similar to a
userspace program that (a) set a global variable in the "main" thread,
before calling pthread_create(), (b) treated the global variable as a
constant, ever after (meaning all threads).

However, mCpuHotEjectData->Handler is changed regularly (modified by the
BSP, and read "later" by all processors). That's why I thought the
acquire fence was needed in the following location:

   if (mCpuHotEjectData != NULL) {
     CPU_HOT_EJECT_HANDLER Handler;

     //
     // HERE -- AcquireMemoryFence()
     //
     Handler = mCpuHotEjectData->Handler;
     if (Handler != NULL) {
       Handler (CpuIndex);
     }
   }

Thanks!
Laszlo

Join devel@edk2.groups.io to automatically receive all group messages.