Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Igor Mammedov <imammedo@...>

On Sat, 24 Aug 2019 01:48:09 +0000
"Yao, Jiewen" <jiewen.yao@...> wrote:

I give my thought.
Paolo may add more.
Here are some ideas I have on the topic.

-----Original Message-----
From: Kinney, Michael D
Sent: Friday, August 23, 2019 11:25 PM
To: Yao, Jiewen <jiewen.yao@...>; Paolo Bonzini
<pbonzini@...>; Laszlo Ersek <lersek@...>;; Kinney, Michael D <michael.d.kinney@...>
Cc: Alex Williamson <alex.williamson@...>;;
qemu devel list <qemu-devel@...>; Igor Mammedov
<imammedo@...>; Chen, Yingwen <yingwen.chen@...>;
Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky
<boris.ostrovsky@...>; Joao Marcal Lemos Martins
<joao.m.martins@...>; Phillip Goerl <phillip.goerl@...>
Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with

Hi Jiewen,

If a hot add CPU needs to run any code before the
first SMI, I would recommend is only executes code
from a write protected FLASH range without a stack
and then wait for the first SMI.
[Jiewen] Right.

Another option from Paolo, the new CPU will not run until 0x7b.
To mitigate DMA threat, someone need guarantee the low memory SIPI vector is DMA protected.

NOTE: The LOW memory *could* be mapped to write protected FLASH AREA via PAM register. The Host CPU may setup that in SMM.
If that is the case, we don’t need worry DMA.

I copied the detail step here, because I found it is hard to dig them out again.
*) In light of using dedicated SMRAM at 30000 with pre-configured
relocation vector for initial relocation which is not reachable from
non-SMM mode:

(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
we might not need parked CPU (if we ignore attacker's attempt to send
SMI to several new CPUs, see below for issue it causes)

(01b) QEMU: trigger SCI

(02-03) no equivalent

(04) Host CPU: (OS) execute GPE handler from DSDT

(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
I think only CPU that does the write will enter SMM
and we might not need to pull in all already initialized CPUs into SMM.

At this step we could also send a directed SMI to a new CPU from host
CPU that entered SMM on write.

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
could skip this step as well (*)

(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU

(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
we need to wake up new CPU somehow so it would process (09) pending (05) SMI
before jumping to SIPI vector

(08a) New CPU: (Low RAM) Enter protected mode.

(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
these both steps could be changed to to just cli;hlt loop or do INIT reset.
if SMI relocation handler and/or host CPU will pull in the new CPU into OVMF,
we actually don't care about SIPI vector as all firmware initialization
for the new CPU is done in SMM mode (07b triggers 10).
Thus eliminating one attack vector to protect from.

(09) Host CPU: (SMM) Send SMI to the new CPU only.
could be done at (05)

(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
it could also pull in itself into other OVMF structures
(assuming it can TSEG as stack as that's rather complex) or
just do relocation and let host CPU to fill in OVMF structures for the new CPU (12).

(11) Host CPU: (SMM) Restore 38000.
could skip this step as well (*)

(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)

(13) New CPU: (Flash) do whatever other initialization is needed
do we actually need it?

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..

For this OVMF use case, is any CPU init required
before the first SMI?
[Jiewen] I am sure what is the detail action in 08b.
And I am not sure what your "init" means here?
Personally, I don’t think we need too much init work, such as Microcode or MTRR.
But we need detail info.
Wouldn't it be preferable to do in SMM mode?

From Paolo's list of steps are steps (8a) and (8b)
really required? Can the SMI monarch use the Local
APIC to send a directed SMI to the hot added CPU?
The SMI monarch needs to know the APIC ID of the
hot added CPU.
[Jiewen] I think it depend upon virtual hardware design.
Leave question to Paolo.
it's not really needed as described in (8x), it could be just
cli;hlt loop so that our SIPI could land at sensible code and stop the new CPU,
it even could be an attacker's code if we do all initialization in SMM mode.

Do we also need to handle the case
where multiple CPUs are added at once? I think we
would need to serialize the use of 3000:8000 for the
SMM rebase operation on each hot added CPU.
It would be simpler if we can guarantee that only
one CPU can be added or removed at a time and the
complete flow of adding a CPU to SMM and the OS
needs to be completed before another add/remove
event needs to be processed.
[Jiewen] Right.
I treat the multiple CPU hot-add at same time as a potential threat.
the problem I see here is the race of saving/restoring to/from SMBASE at 30000,
so a CPU exiting SMM can't be sure if it restores its own saved area
or it's another CPU saved state. (I couldn't find in SDM what would
happen in this case)

If we consider non-attack flow, then we can serialize sending SMIs
to new CPUs (one at a time) from GPE handler and ensure that
only one CPU can do relocation at a time (i.e. non enforced serialization).

In attack case, attacker would only be able to trigger above race.

We don’t want to trust end user.
The solution could be:
1) Let trusted hardware guarantee hot-add one by one.
so far in QEMU it's not possible. We might be able to implement
"parking/unparking" chipset feature, but that would mean inventing
and maintaining ABI for it, which I'd like to avoid if possible.

That's why I'm curious about what happens if CPU exits SMM mode with
another CPU saved registers state in case of the race and if we could
ignore consequences of it. (it's fine for guest OS to crash or new CPU
do not work, attacker would only affect itself)

2) Let trusted software (SMM and init code) guarantee SMREBASE one by one (include any code runs before SMREBASE)
that would mean pulling all present CPUs into SMM mode so no attack
code could be executing before doing hotplug. With a lot of present CPUs
it could be quite expensive and unlike physical hardware, guest's CPUs
could be preempted arbitrarily long causing long delays.

3) Let trusted software (SMM and init code) support SMREBASE simultaneously (include any code runs before SMREBASE).
Is it really possible to do in software?
Potentially it could be done in hardware (QEMU/KVM) if each CPU will have its
own SMRAM at 30000, so parallely relocated CPUs won't trample over each other.

But KVM has only 2 address spaces (normal RAM and SMM) so it won't just
work of the box (and I recall that Paolo had some reservation versus adding more).
Also it would mean adding ABI for initializing that SMRAM blocks from
another CPU which could be complicated.

Solution #1 or #2 are simple solution.
lets first see if if we can ignore race and if it's not then
we probably end up with implementing some form of #1


-----Original Message-----
From: Yao, Jiewen
Sent: Thursday, August 22, 2019 10:00 PM
To: Kinney, Michael D <michael.d.kinney@...>;
Paolo Bonzini <pbonzini@...>; Laszlo Ersek
Cc: Alex Williamson <alex.williamson@...>;; qemu devel list <qemu-
devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky
<boris.ostrovsky@...>; Joao Marcal Lemos Martins
<joao.m.martins@...>; Phillip Goerl
Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using

Thank you Mike!

That is good reference on the real hardware behavior.
(Glad it is public.)

For threat model, the unique part in virtual environment
is temp RAM.
The temp RAM in real platform is per CPU cache, while
the temp RAM in virtual platform is global memory.
That brings one more potential attack surface in virtual
environment, if hot-added CPU need run code with stack
or heap before SMI rebase.

Other threats, such as SMRAM or DMA, are same.

Thank you
Yao Jiewen

-----Original Message-----
From: Kinney, Michael D
Sent: Friday, August 23, 2019 9:03 AM
To: Paolo Bonzini <pbonzini@...>; Laszlo Ersek
<lersek@...>;; Yao, Jiewen
<jiewen.yao@...>; Kinney, Michael D
Cc: Alex Williamson <alex.williamson@...>;; qemu devel list <qemu-
devel@...>; Igor
Mammedov <imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, Jun
Boris Ostrovsky <boris.ostrovsky@...>; Joao
Marcal Lemos
Martins <joao.m.martins@...>; Phillip Goerl
Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with


I find the following links related to the discussions
here along with
one example feature called GENPROTRANGE.
a ges-media/day1_trusted-computing_200-250.pdf

Best regards,


-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Thursday, August 22, 2019 4:12 PM
To: Kinney, Michael D <michael.d.kinney@...>;
Laszlo Ersek
<lersek@...>;; Yao, Jiewen
Cc: Alex Williamson <alex.williamson@...>;; qemu devel list <qemu-
devel@...>; Igor
Mammedov <imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, Jun
Boris Ostrovsky <boris.ostrovsky@...>; Joao
Marcal Lemos
Martins <joao.m.martins@...>; Phillip Goerl
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
using SMM with

On 23/08/19 00:32, Kinney, Michael D wrote:

It is my understanding that real HW hot plug uses
SDM defined
methods. Meaning the initial SMI is to 3000:8000
they rebase to
TSEG in the first SMI. They must have chipset
methods to
protect 3000:8000 from DMA.
It would be great if you could check.

Can we add a chipset feature to prevent DMA to
range from
0x30000-0x3FFFF and the UEFI Memory Map and ACPI
content can be
updated so the Guest OS knows to not use that
range for

If real hardware does it at the chipset level, we
will probably use
Igor's suggestion of aliasing A-seg to 3000:0000.
Before starting
the new CPU, the SMI handler can prepare the SMBASE
trampoline at
A000:8000 and the hot-plugged CPU will find it at
3000:8000 when it receives the initial SMI. Because
this is backed
by RAM at 0xA0000-0xAFFFF, DMA cannot access it and
would still go
through to RAM at 0x30000.


Join to automatically receive all group messages.