Re: CPU hotplug using SMM with QEMU+OVMF

Igor Mammedov <imammedo@...>

On Wed, 14 Aug 2019 16:04:50 +0200
Paolo Bonzini <pbonzini@...> wrote:

On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC code?
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.
Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.

- How do we tell the hot-plugged AP where to start execution? (I.e. that
it should execute code at a particular pflash location.)
[Jiewen] Same real mode reset vector at FFFF:FFF0.
You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.

We only need to modify QEMU so that hot-plugged APIs do not reply to

I don’t think there is problem for real hardware, who always has CAR.
Can QEMU provide some CPU specific space, such as MMIO region?
Why is a CPU-specific region needed if every other processor is in SMM
and thus trusted.

Does CPU hotplug apply only at the socket level? If the CPU is
multi-core, what is responsible for hot-plugging all cores present in
the socket?
I can answer this: the SMM handler would interact with the hotplug
controller in the same way that ACPI DSDT does normally. This supports
multiple hotplugs already.

Writes to the hotplug controller from outside SMM would be ignored.

(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
-- I am waiting for hot-add message.
Maybe we can simplify this in QEMU by broadcasting an SMI to existent
processors immediately upon plugging the new CPU.
The QEMU DSDT could be modified (when secure boot is in effect) to OUT
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.

(NOTE: Host CPU can only
instruction in SMM mode. -- The register is SMM only)
Sorry, I don't follow -- what register are we talking about here, and
why is the BSP needed to send anything at all? What "instruction" do you
have in mind?
[Jiewen] The new CPU does not enable SMI at reset.
At some point of time later, the CPU need enable SMI, right?
The "instruction" here means, the host CPUs need tell to CPU to enable SMI.
Right, this would be a write to the CPU hotplug controller

(04) Host CPU: (OS) get message from board that a new CPU is added.

(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
will not enter CPU because SMI is disabled)
I don't understand the OS involvement here. But, again, perhaps QEMU can
force all existent CPUs into SMM immediately upon adding the new CPU.
[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.
See above.

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.

(07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
Aha, so this is the SMM-only register you mention in step (03). Is the
register specified in the Intel SDM?
[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
It is platform specific register. Not defined in SDM.
You may invent one in device model.
See above.

(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
What code does the new CPU execute after it completes step (10)? Does it
[Jiewen] The new CPU exits SMM and return to original place - where it is
interrupted to enter SMM - running code on the flash.
So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
Looking at Q35 code and Seabios SMM relocation as example, if I see it
right QEMU has:
- SMRAM is aliased from DRAM at 0xa0000
- and TSEG steals from the top of low RAM when configured

Now problem is that default SMBASE at 0x30000 isn't backed by anything
in SMRAM address space and default SMI entry falls-through to the same
location in System address space.

The later is not trusted and entry into SMM mode will corrupt area + might
jump to 'random' SMI handler (hence save/restore code in Seabios).

Here is an idea, can we map a memory region at 0x30000 in SMRAM address
space with relocation space/code reserved. It could be a part of TSEG
(so we don't have to invent ABI to configure that)?

In that case we do not have to care about System address space content
anymore and un-trusted code shouldn't be able to supply rogue SMI handler.
(that would cross out one of the reasons for inventing disabled-INIT/SMI state)

(11) Host CPU: (SMM) Restore 38000.
These steps (i.e., (06) through (11)) don't appear RAS-specific. The
only platform-specific feature seems to be SMI masking register, which
could be extracted into a new SmmCpuFeaturesLib API.

Thus, would you please consider open sourcing firmware code for steps
(06) through (11)?

Alternatively -- and in particular because the stack for step (01)
concerns me --, we could approach this from a high-level, functional
perspective. The states that really matter are the relocated SMBASE for
the new CPU, and the state of the full system, right at the end of step

When the SMM setup quiesces during normal firmware boot, OVMF could
existent (finalized) SMBASE infomation to *pre-program* some virtual
QEMU hardware, with such state that would be expected, as "final" state,
of any new hotplugged CPU. Afterwards, if / when the hotplug actually
happens, QEMU could blanket-apply this state to the new CPU, and
broadcast a hardware SMI to all CPUs except the new one.
I'd rather avoid this and stay as close as possible to real hardware.


Join { to automatically receive all group messages.