On 02/24/20 10:10, Laszlo Ersek wrote:
Overnight I managed to think up an attack, from the OS, against the
"SmmVacated" byte (the last byte of the reserved page, i.e. the last
byte of the Post-SMM Pen).
There are three CPUs being hotplugged in one SMI, CPU#1..CPU#3. The OS
boots them all before raising the SMI (i.e. before it allows the ACPI
GPE handler to take effect). After the first CPU (let's say CPU#1)
returns to the OS via the RSM, the OS uses it (=CPU#1) to attack
"SmmVacated", writing 1 to it in a loop.
Meanwhile CPU#2 and CPU#3 are still in SMM; let's say CPU#2 is
relocating SMBASE, while CPU#3 is spinning on the APIC ID gate. And the
SMM Monarch (CPU#0) is waiting for CPU#2 to report back in through
"SmmVacated", from the Post-SMM Pen.
Now, the OS writes 1 to "SmmVacated" early, pretending to be CPU#2. This
causes the SMM Monarch (CPU#0) to open the APIC ID gate for CPU#3 before
CPU#2 actually reaches the RSM. This may cause CPU#2 and CPU#3 to both
reach RSM with the same SMBASE value.
So why did I think to put SmmVacated in normal RAM (in the Post-SMM Pen
reserved page?) Because in the following message:
(alt. link: <https://edk2.groups.io/g/devel/message/48475>)
Igor showed that putting "SmmVacated" in SMRAM is racy, even without a
malicious OS. The problem is that there is no way to flip a byte in
SMRAM *atomically* with RSM. So the CPU that has just completed SMBASE
relocation can only flip the byte before RSM (in SMRAM) or after RSM (in
normal RAM). In the latter case, the OS can attack SmmVacated -- but in
the former case, we get the same race *without* any OS involvement
(because the CPU just about to leave SMM via RSM actively allows the SMM
Monarch to ungate the relocation code for a different CPU).
So I don't think there's a 100% safe & secure way to do this. One thing
we could try -- I could update the code -- is to *combine* both
approaches; use two "SmmVacated" flags: one in SMRAM, set to 1 just
before the RSM instruction, and the other one in normal RAM (reserved
page) that this patch set already introduces. The normal RAM flag would
avoid the race completely for benign OSes (like the present patchset
already does), and the SMRAM flag would keep the racy window to a single
instruction when the OS is malicious and the normal RAM flag cannot be
I've implemented and tested this "combined" approach.
I'd appreciate feedback on this; I don't know how in physical firmware,
the initial SMI handler for hot-added CPUs deals with the same problem.
I guess on physical platforms, the platform manual could simply say,
"only hot-plug CPUs one by one". That eliminates the whole problem. In
such a case, we could safely stick with the current patchset, too.
BTW, I did try hot-plugging multiple CPUs at once, with libvirt:
virsh setvcpu ovmf.fedora.q35 1,3 --enable --liveI think this is because it would require multiple "device-add" commands
error: Operation not supported: only one hotpluggable entity can be
to be sent at the same time over the QMP monitor -- and that's something
that QEMU doesn't support. (Put alternatively, "device-add" does not
take a list of objects to create.) In that regard, the problem described
above is academic, because QEMU already seems like a platform that can
only hotplug one CPU at a time. In that sense, using APIC ID *arrays*
and *loops* per hotplug SMI is not really needed; I did that because we
had discussed this feature in terms of multiple CPUs from the beginning,
and because QEMU's ACPI GPE handler (the CSCN AML method) also loops
over multiple processors.
Again, comments would be most welcome... I wouldn't like to complicate
the SMI handler if it's not absolutely necessary.