Date
1 - 10 of 10
CPU hotplug using SMM with QEMU+OVMF
Laszlo Ersek
Hi,
this message is a problem statement, and an initial recommendation for solving it, from Jiewen, Paolo, Yingwen, and others. I'm cross-posting the thread starter to the <devel@edk2.groups.io>, <rfc@edk2.groups.io> and <qemu-devel@...> lists. Please use "Reply All" when commenting. In response to the initial posting, I plan to ask a number of questions. The related TianoCore bugzillas are: https://bugzilla.tianocore.org/show_bug.cgi?id=1512 https://bugzilla.tianocore.org/show_bug.cgi?id=1515 SMM is used as a security barrier between the OS kernel and the firmware. When a CPU is plugged into a running system where this barrier exists fine otherwise, the new CPU can be considered a means to attack SMM. When the next SMI is raised (globally, or targeted at the new CPU), the SMBASE for that CPU is still at 0x30000, which is normal RAM, not SMRAM. Therefore the OS could place attack code in that area prior to the SMI. Once in SMM, the new CPU would execute OS-owned code (from normal RAM) with access to SMRAM and to other SMM-protected stuff, such as flash. [I stole a few words from Paolo here.] Jiewen summarized the problem as follows: - Asset: SMM - Adversary: - System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data. - Simple hardware attacker, who can hot add or hot remove a CPU. - Non-adversary: The attacker cannot modify the flash BIOS code or read only BIOS data. The flash part itself is treated as TCB and protected. - Threat: The attacker may hot add or hot remove a CPU, then modify system memory to tamper the SMRAM content, or trigger SMI to get the privilege escalation by executing code in SMM mode. We'd like to solve this problem for QEMU/KVM and OVMF. (At the moment, CPU hotplug doesn't work with OVMF *iff* OVMF was built with -D SMM_REQUIRE. SMBASE relocation never happens for the new CPU, the SMM infrastructure in edk2 doesn't know about the new CPU, and so when the first SMI is broadcast afterwards, we crash. We'd like this functionality to *work*, in the first place -- but securely at that, so that an actively malicious guest kernel can't break into SMM.) Yingwen and Jiewen suggested the following process. Legend: - "New CPU": CPU being hot-added - "Host CPU": existing CPU - (Flash): code running from flash - (SMM): code running from SMRAM Steps: (01) New CPU: (Flash) enter reset vector, Global SMI disabled by default. (02) New CPU: (Flash) configure memory control to let it access global host memory. (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add message. (NOTE: Host CPU can only send instruction in SMM mode. -- The register is SMM only) (04) Host CPU: (OS) get message from board that a new CPU is added. (GPIO -> SCI) (05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU will not enter CPU because SMI is disabled) (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase code. (07) Host CPU: (SMM) Send message to New CPU to Enable SMI. (08) New CPU: (Flash) Get message - Enable SMI. (09) Host CPU: (SMM) Send SMI to the new CPU only. (10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to TSEG. (11) Host CPU: (SMM) Restore 38000. (12) Host CPU: (SMM) Update located data structure to add the new CPU information. (This step will involve CPU_SERVICE protocol) ===================== (now, the next SMI will bring all CPU into TSEG) (13) New CPU: (Flash) run MRC code, to init its own memory. (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI. (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in. Thanks Laszlo |
|
Laszlo Ersek
On 08/13/19 16:16, Laszlo Ersek wrote:
Yingwen and Jiewen suggested the following process.- What does "Global SMI disabled by default" mean? In particular, what is "global" here? Do you mean that the CPU being hot-plugged should mask (by default) broadcast SMIs? What about directed SMIs? (An attacker could try that too.) And what about other processors? (I'd assume step (01)) is not relevant for other processors, but "global" is quite confusing here.) - Does this part require a new branch somewhere in the OVMF SEC code? How do we determine whether the CPU executing SEC is BSP or hot-plugged AP? - How do we tell the hot-plugged AP where to start execution? (I.e. that it should execute code at a particular pflash location.) For example, in MpInitLib, we start a specific AP with INIT-SIPI-SIPI, where "SIPI" stores the startup address in the "Interrupt Command Register" (which is memory-mapped in xAPIC mode, and an MSR in x2APIC mode, apparently). That doesn't apply here -- should QEMU auto-start the new CPU? - What memory is used as stack by the new CPU, when it runs code from flash? QEMU does not emulate CAR (Cache As RAM). The new CPU doesn't have access to SMRAM. And we cannot use AcpiNVS or Reserved memory, because a malicious OS could use other CPUs -- or PCI device DMA -- to attack the stack (unless QEMU forcibly paused other CPUs upon hotplug; I'm not sure). - If an attempt is made to hotplug multiple CPUs in quick succession, does something serialize those attempts? Again, stack usage could be a concern, even with Cache-As-RAM -- HyperThreads (logical processors) on a single core don't have dedicated cache. Does CPU hotplug apply only at the socket level? If the CPU is multi-core, what is responsible for hot-plugging all cores present in the socket? (02) New CPU: (Flash) configure memory control to let it access globalIn QEMU/KVM guests, we don't have to enable memory explicitly, it just exists and works. In OVMF X64 SEC, we can't access RAM above 4GB, but that shouldn't be an issue per se. (03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)Maybe we can simplify this in QEMU by broadcasting an SMI to existent processors immediately upon plugging the new CPU. (NOTE: Host CPU can only sendSorry, I don't follow -- what register are we talking about here, and why is the BSP needed to send anything at all? What "instruction" do you have in mind? (04) Host CPU: (OS) get message from board that a new CPU is added.I don't understand the OS involvement here. But, again, perhaps QEMU can force all existent CPUs into SMM immediately upon adding the new CPU. (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMMAha, so this is the SMM-only register you mention in step (03). Is the register specified in the Intel SDM? (08) New CPU: (Flash) Get message - Enable SMI.What code does the new CPU execute after it completes step (10)? Does it halt? (11) Host CPU: (SMM) Restore 38000.These steps (i.e., (06) through (11)) don't appear RAS-specific. The only platform-specific feature seems to be SMI masking register, which could be extracted into a new SmmCpuFeaturesLib API. Thus, would you please consider open sourcing firmware code for steps (06) through (11)? Alternatively -- and in particular because the stack for step (01) concerns me --, we could approach this from a high-level, functional perspective. The states that really matter are the relocated SMBASE for the new CPU, and the state of the full system, right at the end of step (11). When the SMM setup quiesces during normal firmware boot, OVMF could use existent (finalized) SMBASE infomation to *pre-program* some virtual QEMU hardware, with such state that would be expected, as "final" state, of any new hotplugged CPU. Afterwards, if / when the hotplug actually happens, QEMU could blanket-apply this state to the new CPU, and broadcast a hardware SMI to all CPUs except the new one. The hardware SMI should tell the firmware that the rest of the process -- step (12) below, and onward -- is being requested. If I understand right, this approach would produce an firmware & system state that's identical to what's expected right after step (11): - all SMBASEs relocated - all preexistent CPUs in SMM - new CPU halted / blocked from launch - DRAM at 0x30000 / 0x38000 contains OS-owned data Is my understanding correct that this is the expected state after step (11)? Three more comments on the "SMBASE pre-config" approach: - the virtual hardware providing this feature should become locked after the configuration, until next platform reset - the pre-config should occur via simple hardware accesses, so that it can be replayed at S3 resume, i.e. as part of the S3 boot script - from the pre-configured state, and the APIC ID, QEMU itself could perhaps calculate the SMI stack location for the new processor. (12) Host CPU: (SMM) Update located data structure to add the new CPUI commented on EFI_SMM_CPU_SERVICE_PROTOCOL in upon bullet (4) of <https://bugzilla.tianocore.org/show_bug.cgi?id=1512#c4>. Calling EFI_SMM_ADD_PROCESSOR looks justified. What are some of the other member functions used for? The scary one is EFI_SMM_REGISTER_EXCEPTION_HANDLER. ===================== (now, the next SMI will bring all CPU into TSEG)OK... but what component injects that SMI, and when? (13) New CPU: (Flash) run MRC code, to init its own memory.Why is this needed esp. after step (10)? The new CPU has accessed DRAM already. And why are we executing code from pflash, rather than from SMRAM, given that we're past SMBASE relocation? (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.I'm confused by these steps. I thought that step (12) would complete the hotplug, by updating the administrative data structures internally. And the next SMI -- raised for the usual purposes, such as a software SMI for variable access -- would be handled like it always is, except it would also pull the new CPU into SMM too. Thanks! Laszlo |
|
Laszlo Ersek
On 08/13/19 18:09, Laszlo Ersek wrote:
On 08/13/19 16:16, Laszlo Ersek wrote: Revisiting some of my notes from earlier, such as(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMMAha, so this is the SMM-only register you mention in step (03). Is the <https://bugzilla.redhat.com/show_bug.cgi?id=1454803#c46> -- apologies, private BZ... --, we discussed some of this stuff with Mike on the phone in April. And, it looked like generating a hardware SMI in QEMU, in association with the hotplug action that was being requested through the QEMU monitor, would be the right approach. By now I have forgotten about that discussion -- hence "revisiting my notes"--, but luckily, it seems consistent with what I've proposed above, under "alternatively". Thanks, Laszlo |
|
Yao, Jiewen
My comments below.
toggle quoted message
Show quoted text
-----Original Message-----[Jiewen] OK. Let's don’t use the term "global". Do you mean that the CPU being hot-plugged should mask (by default)[Jiewen] I mean all SMIs are blocked for this specific hot-added CPU. And what about other processors? (I'd assume step (01)) is not[Jiewen] No impact to other processors. - Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction. There are some hardware specific registers can be used to determine if the CPU is new added. I don’t think this must be same as the real hardware. You are free to invent some registers in device model to be used in OVMF hot plug driver. - How do we tell the hot-plugged AP where to start execution? (I.e. that[Jiewen] Same real mode reset vector at FFFF:FFF0. For example, in MpInitLib, we start a specific AP with INIT-SIPI-SIPI,[Jiewen] You can send INIT-SIPI-SIPI to new CPU only after it can access memory. SIPI need give AP an below 1M memory address as waking vector. - What memory is used as stack by the new CPU, when it runs code from[Jiewen] Same as other CPU in normal boot. You can use special reserved memory. QEMU does not emulate CAR (Cache As RAM). The new CPU doesn't have[Jiewen] Excellent point! I don’t think there is problem for real hardware, who always has CAR. Can QEMU provide some CPU specific space, such as MMIO region? - If an attempt is made to hotplug multiple CPUs in quick succession,[Jiewen] The BIOS need consider this as availability requirement. I don’t have strong opinion. You can design a system that required hotplug must be one-by-one, or fail the hot-add. Or you can design a system that did not have such restriction. Again, all we need to do is to maintain the integrity of SMM. The availability should be considered as separate requirement. Again, stack usage could be a concern, even with Cache-As-RAM --[Jiewen] Agree with you on the virtual environment. For real hardware, we do socket level hot-add only. So HT is not the concern. But if you want to do that in virtual environment, a processor specific memory should be considered. Does CPU hotplug apply only at the socket level? If the CPU is[Jiewen] Ditto. [Jiewen] Agree. I do not see the issue.(02) New CPU: (Flash) configure memory control to let it access globalIn QEMU/KVM guests, we don't have to enable memory explicitly, it just [Jiewen] The new CPU does not enable SMI at reset.(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)Maybe we can simplify this in QEMU by broadcasting an SMI to existent At some point of time later, the CPU need enable SMI, right? The "instruction" here means, the host CPUs need tell to CPU to enable SMI. [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.(04) Host CPU: (OS) get message from board that a new CPU is added.I don't understand the OS involvement here. But, again, perhaps QEMU can [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMMAha, so this is the SMM-only register you mention in step (03). Is the It is platform specific register. Not defined in SDM. You may invent one in device model. [Jiewen] The new CPU exits SMM and return to original place - where it is(08) New CPU: (Flash) Get message - Enable SMI.What code does the new CPU execute after it completes step (10)? Does it interrupted to enter SMM - running code on the flash. [Jiewen] I think you are correct.(11) Host CPU: (SMM) Restore 38000.These steps (i.e., (06) through (11)) don't appear RAS-specific. The Three more comments on the "SMBASE pre-config" approach:[Jiewen] I think you are correct. Also REMOVE_PROCESSOR will be used for hot-remove action. What are some of the other member functions used for? The scary one is[Jiewen] This is to register a new exception handler in SMM. I don’t think this API is involved in hot-add. [Jiewen] Any SMI event. It could be synchronized SMI or asynchronized SMI.===================== (now, the next SMI will bring all CPU into TSEG)OK... but what component injects that SMI, and when? It could from software such as IO write, or hardware such as thermal event. [Jiewen] On real hardware, it is needed because different CPU may have different capability to access different DIMM.(13) New CPU: (Flash) run MRC code, to init its own memory.Why is this needed esp. after step (10)? The new CPU has accessed DRAM I do not think your virtual platform need it. [Jiewen] The OS need use the new CPU at some point of time, right?(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.I'm confused by these steps. I thought that step (12) would complete the As such, the OS need pull the new CPU into its own environment by INIT-SIPI-SIPI. Thanks! |
|
Paolo Bonzini <pbonzini@...>
On 14/08/19 15:20, Yao, Jiewen wrote:
Yes, this would be a new operation mode for QEMU, that only applies to- Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction. hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in fact it doesn't reply to anything at all. You do not need a reset vector or INIT/SIPI/SIPI sequence at all in- How do we tell the hot-plugged AP where to start execution? (I.e. that[Jiewen] Same real mode reset vector at FFFF:FFF0. QEMU. The AP does not start execution at all when it is unplugged, so no cache-as-RAM etc. We only need to modify QEMU so that hot-plugged APIs do not reply to INIT/SIPI/SMI. I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM and thus trusted. I can answer this: the SMM handler would interact with the hotplugDoes CPU hotplug apply only at the socket level? If the CPU is controller in the same way that ACPI DSDT does normally. This supports multiple hotplugs already. Writes to the hotplug controller from outside SMM would be ignored. The QEMU DSDT could be modified (when secure boot is in effect) to OUT(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)Maybe we can simplify this in QEMU by broadcasting an SMI to existent to 0xB2 when hotplug happens. It could write a well-known value to 0xB2, to be read by an SMI handler in edk2. Right, this would be a write to the CPU hotplug controller[Jiewen] The new CPU does not enable SMI at reset.(NOTE: Host CPU can onlysendinstruction in SMM mode. -- The register is SMM only)Sorry, I don't follow -- what register are we talking about here, and See above.[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.(04) Host CPU: (OS) get message from board that a new CPU is added.I don't understand the OS involvement here. But, again, perhaps QEMU can See above.[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMMAha, so this is the SMM-only register you mention in step (03). Is the So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).[Jiewen] The new CPU exits SMM and return to original place - where it is(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE toWhat code does the new CPU execute after it completes step (10)? Does it I'd rather avoid this and stay as close as possible to real hardware.(11) Host CPU: (SMM) Restore 38000.These steps (i.e., (06) through (11)) don't appear RAS-specific. The Paolo |
|
Yao, Jiewen
Hi Paolo
toggle quoted message
Show quoted text
I am not sure what do you mean - "You do not need a reset vector ...". If so, where is the first instruction of the new CPU in the virtualization environment? Please help me understand that at first. Then we can continue the discussion. Thank you Yao Jiewen -----Original Message----- We only need to modify QEMU so that hot-plugged APIs do not reply to |
|
Paolo Bonzini <pbonzini@...>
On 15/08/19 11:55, Yao, Jiewen wrote:
Hi PaoloThe BSP starts running from 0xFFFFFFF0. APs do not start running at all and just sit waiting for an INIT-SIPI-SIPI sequence. Please see my proposal in the reply to Laszlo. Paolo |
|
Igor Mammedov <imammedo@...>
On Thu, 15 Aug 2019 18:24:53 +0200
Paolo Bonzini <pbonzini@...> wrote: On 15/08/19 18:07, Igor Mammedov wrote:My impression was that QEMU/KVM's SMM address space is accessible only fromLooking at Q35 code and Seabios SMM relocation as example, if I see itNo, there could be real mode code using it. CPU in SMM mode, so SMM CPU should access in-depended SMRAM at 0x30000 in SMM address space while not SMM CPUs (including real mode) should access 0x30000 from normal system RAM. What we _could_ do isAgreed, it's better to follow spec, that's one of the reasons why I was toying with idea of using separate SMRAM at 0x30000 mapped only in SMM address space. Practically we would be following spec: SDM: 34.4 SMRAM " System logic can use the SMI acknowledge transaction or the assertion of the SMIACT# pin to decode accesses to the SMRAM and redirect them (if desired) to specific SMRAM memory. If a separate RAM memory is used for SMRAM, system logic should provide a programmable method of mapping the SMRAM into system memory space when the processor is not in SMM. This mechanism will enable start-up procedures to initialize the SMRAM space (that is, load the SMI handler) before executing the SMI handler during SMM. " Another benefit that gives us, is that we won't have to pull in all existing CPUs into SMM (essentially another stop_machine) to guarantee exclusive access to 0x30000 in normal RAM.
|
|
Igor Mammedov <imammedo@...>
On Wed, 14 Aug 2019 16:04:50 +0200
Paolo Bonzini <pbonzini@...> wrote: On 14/08/19 15:20, Yao, Jiewen wrote:Looking at Q35 code and Seabios SMM relocation as example, if I see itYes, this would be a new operation mode for QEMU, that only applies to- Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction. right QEMU has: - SMRAM is aliased from DRAM at 0xa0000 - and TSEG steals from the top of low RAM when configured Now problem is that default SMBASE at 0x30000 isn't backed by anything in SMRAM address space and default SMI entry falls-through to the same location in System address space. The later is not trusted and entry into SMM mode will corrupt area + might jump to 'random' SMI handler (hence save/restore code in Seabios). Here is an idea, can we map a memory region at 0x30000 in SMRAM address space with relocation space/code reserved. It could be a part of TSEG (so we don't have to invent ABI to configure that)? In that case we do not have to care about System address space content anymore and un-trusted code shouldn't be able to supply rogue SMI handler. (that would cross out one of the reasons for inventing disabled-INIT/SMI state) I'd rather avoid this and stay as close as possible to real hardware.(11) Host CPU: (SMM) Restore 38000.These steps (i.e., (06) through (11)) don't appear RAS-specific. The |
|
Paolo Bonzini <pbonzini@...>
On 15/08/19 18:07, Igor Mammedov wrote:
Looking at Q35 code and Seabios SMM relocation as example, if I see itNo, there could be real mode code using it. What we _could_ do is initialize SMBASE to 0xa0000, but I think it's better to not deviate too much from processor behavior (even if it's admittedly a 20-years legacy that doesn't make any sense). Paolo |
|