Paolo Bonzini <pbonzini@...>
On 21/08/19 19:25, Kinney, Michael D wrote: Could we have an initial SMBASE that is within TSEG.
If we bring in hot plug CPUs one at a time, then initial SMBASE in TSEG can reprogram the SMBASE to the correct value for that CPU.
Can we add a register to the hot plug controller that allows the BSP to set the initial SMBASE value for a hot added CPU? The default can be 3000:8000 for compatibility.
Another idea is when the SMI handler runs for a hot add CPU event, the SMM monarch programs the hot plug controller register with the SMBASE to use for the CPU that is being added. As each CPU is added, a different SMBASE value can be programmed by the SMM Monarch. Yes, all of these would work. Again, I'm interested in having something that has a hope of being implemented in real hardware. Another, far easier to implement possibility could be a lockable MSR (could be the existing MSR_SMM_FEATURE_CONTROL) that allows programming the SMBASE outside SMM. It would be nice if such a bit could be defined by Intel. Paolo
|
|
Paolo Bonzini <pbonzini@...>
On 21/08/19 17:48, Kinney, Michael D wrote: Perhaps there is a way to avoid the 3000:8000 startup vector.
If a CPU is added after a cold reset, it is already in a different state because one of the active CPUs needs to release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same SMRR value, so a check can be made to verify that all the active CPUs have the same SMRR value. If they do, then any CPU released through the hot plug controller can have its SMRR pre-programmed and the initial SMI will start within TSEG.
We just need to decide what to do in the unexpected case where all the active CPUs do not have the same SMRR value.
This should also reduce the total number of steps. The problem is not the SMRR but the SMBASE. If the SMBASE area is outside TSEG, it is vulnerable to DMA attacks independent of the SMRR. SMBASE is also different for all CPUs, so it cannot be preprogrammed. (As an aside, virt platforms are also immune to cache poisoning so they don't have SMRR yet - we could use them for SMM_CODE_CHK_EN and block execution outside SMRR but we never got round to it). An even simpler alternative would be to make A0000h the initial SMBASE. However, I would like to understand what hardware platforms plan to do, if anything. Paolo Mike
-----Original Message----- From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On Behalf Of Yao, Jiewen Sent: Sunday, August 18, 2019 4:01 PM To: Paolo Bonzini <pbonzini@...> Cc: Alex Williamson <alex.williamson@...>; Laszlo Ersek <lersek@...>; devel@edk2.groups.io; edk2- rfc-groups-io <rfc@edk2.groups.io>; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack. I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison.
thank you! Yao, Jiewen
在 2019年8月19日,上午3:50,Paolo Bonzini <pbonzini@...> 写道:
On 17/08/19 02:20, Yao, Jiewen wrote: [Jiewen] That is OK. Then we MUST add the third adversary.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world: #1: the SMM MUST be non-DMA capable region. #2: the MMIO MUST be non-DMA capable region. #3: the stolen memory MIGHT be DMA capable region or
non-DMA capable
region. It depends upon the silicon design. #4: the normal OS accessible memory - including ACPI reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU
protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please
correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only problematic one;
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
|
|
Paolo Bonzini <pbonzini@...>
On 22/08/19 20:29, Laszlo Ersek wrote: On 08/22/19 08:18, Paolo Bonzini wrote:
On 21/08/19 22:17, Kinney, Michael D wrote:
DMA protection of memory ranges is a chipset feature. For the current QEMU implementation, what ranges of memory are guaranteed to be protected from DMA? Is it only A/B seg and TSEG? Yes. This thread (esp. Jiewen's and Mike's messages) are the first time that I've heard about the *existence* of such RAM ranges / the chipset feature. :)
Out of interest (independently of virtualization), how is a general purpose OS informed by the firmware, "never try to set up DMA to this RAM area"? Is this communicated through ACPI _CRS perhaps?
... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory Access)". It writes,
For example, if a platform implements a PCI bus that cannot access all of physical memory, it has a _DMA object under that PCI bus that describes the ranges of physical memory that can be accessed by devices on that bus.
Sorry about the digression, and also about being late to this thread, continually -- I'm primarily following and learning. It's much simpler: these ranges are not in e820, for example kernel: BIOS-e820: [mem 0x0000000000059000-0x000000000008bfff] usable kernel: BIOS-e820: [mem 0x000000000008c000-0x00000000000fffff] reserved The ranges are not special-cased in any way by QEMU. Simply, AB-segs and TSEG RAM are not part of the address space except when in SMM. Therefore, DMA to those ranges ends up respectively to low VGA RAM[1] and to the bit bucket. When AB-segs are open, for example, DMA to that area becomes possible. Paolo [1] old timers may remember DEF SEG=&HB800: BLOAD "foo.img",0. It still works with some disk device models.
|
|
Paolo Bonzini <pbonzini@...>
On 22/08/19 19:59, Laszlo Ersek wrote: The firmware and QEMU could agree on a formula, which would compute the CPU-specific SMBASE from a value pre-programmed by the firmware, and the initial APIC ID of the hot-added CPU.
Yes, it would duplicate code -- the calculation -- between QEMU and edk2. While that's not optimal, it wouldn't be a first. No, that would be unmaintainable. The best solution to me seems to be to make SMBASE programmable from non-SMM code if some special conditions hold. Michael, would it be possible to get in contact with the Intel architects? Paolo
|
|
Laszlo,
I believe all the code for the AP startup vector is already in edk2.
It is a combination of the reset vector code in UefiCpuPkg/ResetVecor/Vtf0 and an IA32/X64 specific feature in the GenFv tool. It sets up a 4KB aligned location near 4GB which can be used to start an AP using INIT-SIPI-SIPI.
DI is set to 'AP' if the processor is not the BSP. This can be used to choose to put the APs into a wait loop executing from the protected FLASH region.
The SMM Monarch on a hot add event can use the Local APIC to send an INIT-SIPI-SIPI to wake the AP at the 4KB startup vector in FLASH. Later the SMM Monarch can sent use the Local APIC to send an SMI to pull the hot added CPU into SMM. It is not clear if we have to do both SIPI followed by the SMI or if we can just do the SMI.
Best regards,
Mike
toggle quoted message
Show quoted text
-----Original Message----- From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek Sent: Thursday, August 22, 2019 11:29 AM To: Paolo Bonzini <pbonzini@...>; Kinney, Michael D <michael.d.kinney@...>; rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@...> Cc: Alex Williamson <alex.williamson@...>; devel@edk2.groups.io; qemu devel list <qemu- devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 08/22/19 08:18, Paolo Bonzini wrote:
On 21/08/19 22:17, Kinney, Michael D wrote:
Paolo,
It makes sense to match real HW. Note that it'd also be fine to match some kind of official Intel
specification even if no processor (currently?) supports it.
I agree, because...
That puts us back to the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective. that would suggest that matching reset vector code already exists, and it would "only" need to be upstreamed to edk2. :)
It look like the only issue left is DMA.
DMA protection of memory ranges is a chipset feature. For the current
QEMU implementation, what ranges of memory are guaranteed to be
protected from DMA? Is it only A/B seg and TSEG? Yes. (
This thread (esp. Jiewen's and Mike's messages) are the first time that I've heard about the *existence* of such RAM ranges / the chipset feature. :)
Out of interest (independently of virtualization), how is a general purpose OS informed by the firmware, "never try to set up DMA to this RAM area"? Is this communicated through ACPI _CRS perhaps?
... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory Access)". It writes,
For example, if a platform implements a PCI bus that cannot access all of physical memory, it has a _DMA object under that PCI bus that describes the ranges of physical memory that can be accessed by devices on that bus.
Sorry about the digression, and also about being late to this thread, continually -- I'm primarily following and learning.
)
Thanks! Laszlo
|
|
Paolo, The SMBASE register is internal and cannot be directly accessed by any CPU. There is an SMBASE field that is member of the SMM Save State area and can only be modified from SMM and requires the execution of an RSM instruction from SMM for the SMBASE register to be updated from the current SMBASE field value. The new SMBASE register value is only used on the next SMI. https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdfVol 3C - Section 34.11 The default base address for the SMRAM is 30000H. This value is contained in an internal processor register called the SMBASE register. The operating system or executive can relocate the SMRAM by setting the SMBASE field in the saved state map (at offset 7EF8H) to a new value (see Figure 34-4). The RSM instruction reloads the internal SMBASE register with the value in the SMBASE field each time it exits SMM. All subsequent SMI requests will use the new SMBASE value to find the starting address for the SMI handler (at SMBASE + 8000H) and the SMRAM state save area (from SMBASE + FE00H to SMBASE + FFFFH). (The processor resets the value in its internal SMBASE register to 30000H on a RESET, but does not change it on an INIT.) One idea to work around these issues is to startup OVMF with the maximum number of CPUs. All the CPUs will be assigned an SMBASE address and at a safe time to assign the SMBASE values using the initial 3000:8000 SMI vector because there is a guarantee of no DMA at that point in the FW init. Once all the CPUs have been initialized for SMM, the CPUs that are not needed can be hot removed. As noted above, the SMBASE value does not change on an INIT. So as long as the hot add operation does not do a RESET, the SMBASE value must be preserved. Of course, this is not a good idea from a boot performance perspective, especially if the max CPUs is a large value. Another idea is to emulate this behavior. If the hot plug controller provide registers (only accessible from SMM) to assign the SMBASE address for every CPU. When a CPU is hot added, QEMU can set the internal SMBASE register value from the hot plug controller register value. If the SMM Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU, then the SMBASE register should not be modified and the CPU starts execution within TSEG the first time it receives an SMI. Jiewen and I can collect specific questions on this topic and continue the discussion here. For example, I do not think there is any method other than what I referenced above to program the SMBASE register, but I can ask if there are any other methods. Thanks, Mike
toggle quoted message
Show quoted text
-----Original Message----- From: Paolo Bonzini [mailto:pbonzini@...] Sent: Thursday, August 22, 2019 11:43 AM To: Laszlo Ersek <lersek@...>; Kinney, Michael D <michael.d.kinney@...>; rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@...> Cc: Alex Williamson <alex.williamson@...>; devel@edk2.groups.io; qemu devel list <qemu- devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 22/08/19 19:59, Laszlo Ersek wrote:
The firmware and QEMU could agree on a formula, which would compute
the CPU-specific SMBASE from a value pre-programmed by the firmware,
and the initial APIC ID of the hot-added CPU.
Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first. No, that would be unmaintainable. The best solution to me seems to be to make SMBASE programmable from non-SMM code if some special conditions hold. Michael, would it be possible to get in contact with the Intel architects?
Paolo
|
|
On 08/22/19 08:18, Paolo Bonzini wrote: On 21/08/19 22:17, Kinney, Michael D wrote:
Paolo,
It makes sense to match real HW. Note that it'd also be fine to match some kind of official Intel specification even if no processor (currently?) supports it. I agree, because... That puts us back to the reset vector and handling the initial SMI at 3000:8000. That is all workable from a FW implementation perspective.
that would suggest that matching reset vector code already exists, and it would "only" need to be upstreamed to edk2. :) It look like the only issue left is DMA.
DMA protection of memory ranges is a chipset feature. For the current QEMU implementation, what ranges of memory are guaranteed to be protected from DMA? Is it only A/B seg and TSEG? Yes.
( This thread (esp. Jiewen's and Mike's messages) are the first time that I've heard about the *existence* of such RAM ranges / the chipset feature. :) Out of interest (independently of virtualization), how is a general purpose OS informed by the firmware, "never try to set up DMA to this RAM area"? Is this communicated through ACPI _CRS perhaps? ... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory Access)". It writes, For example, if a platform implements a PCI bus that cannot access all of physical memory, it has a _DMA object under that PCI bus that describes the ranges of physical memory that can be accessed by devices on that bus. Sorry about the digression, and also about being late to this thread, continually -- I'm primarily following and learning. ) Thanks! Laszlo
|
|
On 08/21/19 19:05, Paolo Bonzini wrote: On 21/08/19 17:48, Kinney, Michael D wrote:
Perhaps there is a way to avoid the 3000:8000 startup vector.
If a CPU is added after a cold reset, it is already in a different state because one of the active CPUs needs to release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same SMRR value, so a check can be made to verify that all the active CPUs have the same SMRR value. If they do, then any CPU released through the hot plug controller can have its SMRR pre-programmed and the initial SMI will start within TSEG.
We just need to decide what to do in the unexpected case where all the active CPUs do not have the same SMRR value.
This should also reduce the total number of steps. The problem is not the SMRR but the SMBASE. If the SMBASE area is outside TSEG, it is vulnerable to DMA attacks independent of the SMRR. SMBASE is also different for all CPUs, so it cannot be preprogrammed. The firmware and QEMU could agree on a formula, which would compute the CPU-specific SMBASE from a value pre-programmed by the firmware, and the initial APIC ID of the hot-added CPU. Yes, it would duplicate code -- the calculation -- between QEMU and edk2. While that's not optimal, it wouldn't be a first. Thanks Laszlo
|
|
On 08/21/19 17:48, Kinney, Michael D wrote: Perhaps there is a way to avoid the 3000:8000 startup vector.
If a CPU is added after a cold reset, it is already in a different state because one of the active CPUs needs to release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same SMRR value, so a check can be made to verify that all the active CPUs have the same SMRR value. If they do, then any CPU released through the hot plug controller can have its SMRR pre-programmed and the initial SMI will start within TSEG. Yes, that is what I proposed here: * http://mid.mail-archive.com/effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com* https://edk2.groups.io/g/devel/message/45570Namely: When the SMM setup quiesces during normal firmware boot, OVMF could use existent (finalized) SMBASE infomation to *pre-program* some virtual QEMU hardware, with such state that would be expected, as "final" state, of any new hotplugged CPU. Afterwards, if / when the hotplug actually happens, QEMU could blanket-apply this state to the new CPU, and broadcast a hardware SMI to all CPUs except the new one. (I know that Paolo didn't like it; I'm just confirming that I had the same, or at least a very similar, idea.) Thanks! Laszlo
|
|
Paolo,
It makes sense to match real HW. That puts us back to the reset vector and handling the initial SMI at 3000:8000. That is all workable from a FW implementation perspective. It look like the only issue left is DMA.
DMA protection of memory ranges is a chipset feature. For the current QEMU implementation, what ranges of memory are guaranteed to be protected from DMA? Is it only A/B seg and TSEG?
Thanks,
Mike
toggle quoted message
Show quoted text
-----Original Message----- From: Paolo Bonzini [mailto:pbonzini@...] Sent: Wednesday, August 21, 2019 10:40 AM To: Kinney, Michael D <michael.d.kinney@...>; rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@...> Cc: Alex Williamson <alex.williamson@...>; Laszlo Ersek <lersek@...>; devel@edk2.groups.io; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 21/08/19 19:25, Kinney, Michael D wrote:
Could we have an initial SMBASE that is within TSEG.
If we bring in hot plug CPUs one at a time, then initial SMBASE in
TSEG can reprogram the SMBASE to the correct value for that CPU.
Can we add a register to the hot plug controller that allows the BSP
to set the initial SMBASE value for a hot added CPU? The default can
be 3000:8000 for compatibility.
Another idea is when the SMI handler runs for a hot add CPU event, the
SMM monarch programs the hot plug controller register with the SMBASE
to use for the CPU that is being added. As each CPU is added, a
different SMBASE value can be programmed by the SMM Monarch.
Yes, all of these would work. Again, I'm interested in having something that has a hope of being implemented in real hardware.
Another, far easier to implement possibility could be a lockable MSR (could be the existing MSR_SMM_FEATURE_CONTROL) that allows programming the SMBASE outside SMM. It would be nice if such a bit could be defined by Intel.
Paolo
|
|
Could we have an initial SMBASE that is within TSEG.
If we bring in hot plug CPUs one at a time, then initial SMBASE in TSEG can reprogram the SMBASE to the correct value for that CPU.
Can we add a register to the hot plug controller that allows the BSP to set the initial SMBASE value for a hot added CPU? The default can be 3000:8000 for compatibility.
Another idea is when the SMI handler runs for a hot add CPU event, the SMM monarch programs the hot plug controller register with the SMBASE to use for the CPU that is being added. As each CPU is added, a different SMBASE value can be programmed by the SMM Monarch.
Mike
toggle quoted message
Show quoted text
-----Original Message----- From: Paolo Bonzini [mailto:pbonzini@...] Sent: Wednesday, August 21, 2019 10:06 AM To: Kinney, Michael D <michael.d.kinney@...>; rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@...> Cc: Alex Williamson <alex.williamson@...>; Laszlo Ersek <lersek@...>; devel@edk2.groups.io; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 21/08/19 17:48, Kinney, Michael D wrote:
Perhaps there is a way to avoid the 3000:8000 startup vector.
If a CPU is added after a cold reset, it is already in a different
state because one of the active CPUs needs to release it by
interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to match the
SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same SMRR value, so
a check can be made to verify that all the active CPUs have the same
SMRR value. If they do, then any CPU released through the hot plug
controller can have its SMRR pre-programmed and the initial SMI will
start within TSEG.
We just need to decide what to do in the unexpected case where all the
active CPUs do not have the same SMRR value.
This should also reduce the total number of steps. The problem is not the SMRR but the SMBASE. If the SMBASE area is outside TSEG, it is vulnerable to DMA attacks independent of the SMRR. SMBASE is also different for all CPUs, so it cannot be preprogrammed.
(As an aside, virt platforms are also immune to cache poisoning so they don't have SMRR yet - we could use them for SMM_CODE_CHK_EN and block execution outside SMRR but we never got round to it).
An even simpler alternative would be to make A0000h the initial SMBASE. However, I would like to understand what hardware platforms plan to do, if anything.
Paolo
Mike
-----Original Message----- From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On Behalf Of
Yao, Jiewen Sent: Sunday, August 18, 2019 4:01 PM To: Paolo Bonzini <pbonzini@...> Cc: Alex Williamson <alex.williamson@...>; Laszlo Ersek
<lersek@...>; devel@edk2.groups.io; edk2- rfc- groups-io
<rfc@edk2.groups.io>; qemu devel list <qemu- devel@...>; Igor
Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>;
Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos
Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
QEMU+OVMF
in real world, we deprecate AB-seg usage because they are vulnerable
to smm cache poison attack. I assume cache poison is out of scope in the virtual world, or there
is a way to prevent ABseg cache poison.
thank you! Yao, Jiewen
在 2019年8月19日,上午3:50,Paolo Bonzini <pbonzini@...> 写道:
On 17/08/19 02:20, Yao, Jiewen wrote: [Jiewen] That is OK. Then we MUST add the third adversary.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. --
Please do clarify if this is TRUE.
In the real world: #1: the SMM MUST be non-DMA capable region. #2: the MMIO MUST be non-DMA capable region. #3: the stolen memory MIGHT be DMA capable region
or
non-DMA capable
region. It depends upon the silicon design. #4: the normal OS accessible memory - including
ACPI
reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST
be
DMA capable region.
As such, IOMMU protection is NOT required for #1
and
#2. IOMMU
protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please
correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only problematic one;
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
|
|
Paolo Bonzini <pbonzini@...>
On 16/08/19 04:46, Yao, Jiewen wrote: Comment below:
-----Original Message----- From: Paolo Bonzini [mailto:pbonzini@...] Sent: Friday, August 16, 2019 12:21 AM To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao, Jiewen <jiewen.yao@...> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:
On 08/14/19 16:04, Paolo Bonzini wrote:
On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC
code?
How do we determine whether the CPU executing SEC is BSP or hot-plugged AP? [Jiewen] I think this is blocked from hardware perspective, since the first
instruction.
There are some hardware specific registers can be used to determine if
the CPU is new added.
I don’t think this must be same as the real hardware. You are free to invent some registers in device model to be used in
OVMF hot plug driver.
Yes, this would be a new operation mode for QEMU, that only applies to hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in fact it doesn't reply to anything at all.
- How do we tell the hot-plugged AP where to start execution? (I.e.
that
it should execute code at a particular pflash location.) [Jiewen] Same real mode reset vector at FFFF:FFF0. You do not need a reset vector or INIT/SIPI/SIPI sequence at all in QEMU. The AP does not start execution at all when it is unplugged, so no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to INIT/SIPI/SMI.
I don’t think there is problem for real hardware, who always has CAR. Can QEMU provide some CPU specific space, such as MMIO region? Why is a CPU-specific region needed if every other processor is in SMM and thus trusted. I was going through the steps Jiewen and Yingwen recommended.
In step (02), the new CPU is expected to set up RAM access. In step (03), the new CPU, executing code from flash, is expected to "send board message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add message." For that action, the new CPU may need a stack (minimally if we want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged) CPUs (more precisely, Jiewen even confirmed "No impact to other processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully as I can. I'm still very confused. If you have a better understanding, could you please write up the 15-step process from the thread starter again, with all QEMU customizations applied? Such as, unnecessary steps removed, and platform specifics filled in. Sure.
(01a) QEMU: create new CPU. The CPU already exists, but it does not start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU. [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no restriction that INIT/SIPI/SIPI can only be sent in SMM. All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded before 07a, so this is okay. However I do see a problem, because a PCI device's DMA could overwrite 0x38000 between (06) and (10) and hijack the code that is executed in SMM. How is this avoided on real hardware? By the time the new CPU enters SMM, it doesn't run off cache-as-RAM anymore. Paolo (08a) New CPU: (Low RAM) Enter protected mode. [Jiewen] NOTE: The new CPU still cannot use any physical memory, because the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the INIT-SIPI-SIPI sequence of 07b-08a-08b. [Jiewen] I am OK with this proposal. I think the rule is same - the new CPU CANNOT touch any system memory, no matter it is from reset-vector or from INIT/SIPI/SIPI. Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be CPU specific or on the flash.
The QEMU DSDT could be modified (when secure boot is in effect) to OUT to 0xB2 when hotplug happens. It could write a well-known value to 0xB2, to be read by an SMI handler in edk2. I dislike involving QEMU's generated DSDT in anything SMM (even injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it could willfully diverge from the process that we design. If QEMU broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM (and thereby force them into trusted state), then the OS would be explicitly counter-interested in carrying out the AML operations from QEMU's DSDT. But since the hotplug controller would only be accessible from SMM, there would be no other way to invoke it than to follow the DSDT's instruction and write to 0xB2. FWIW, real hardware also has plenty of 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store access).
Paolo
|
|
Paolo Bonzini <pbonzini@...>
On 19/08/19 01:00, Yao, Jiewen wrote: in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack. I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison. Indeed the SMRR would not cover the A-seg on real hardware. However, if the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be used for SMBASE relocation of hotplugged CPU. The firmware would still keep low SMRAM disabled, *except around SMBASE relocation of hotplugged CPUs*. To avoid cache poisoning attacks, you only have to issue a WBINVD before enabling low SMRAM and before disabling it. Hotplug SMI is not a performance-sensitive path, so it's not a big deal. So I guess you agree that PCI DMA attacks are a potential vector also on real hardware. As Alex pointed out, VT-d is not a solution because there could be legitimate DMA happening during CPU hotplug. For OVMF we'll probably go with Igor's idea, it would be nice if Intel chipsets supported it too. :) Paolo
|
|
Paolo Bonzini <pbonzini@...>
On 15/08/19 17:00, Laszlo Ersek wrote: On 08/14/19 16:04, Paolo Bonzini wrote:
On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC code? How do we determine whether the CPU executing SEC is BSP or hot-plugged AP? [Jiewen] I think this is blocked from hardware perspective, since the first instruction. There are some hardware specific registers can be used to determine if the CPU is new added. I don’t think this must be same as the real hardware. You are free to invent some registers in device model to be used in OVMF hot plug driver. Yes, this would be a new operation mode for QEMU, that only applies to hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in fact it doesn't reply to anything at all.
- How do we tell the hot-plugged AP where to start execution? (I.e. that it should execute code at a particular pflash location.) [Jiewen] Same real mode reset vector at FFFF:FFF0. You do not need a reset vector or INIT/SIPI/SIPI sequence at all in QEMU. The AP does not start execution at all when it is unplugged, so no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to INIT/SIPI/SMI.
I don’t think there is problem for real hardware, who always has CAR. Can QEMU provide some CPU specific space, such as MMIO region? Why is a CPU-specific region needed if every other processor is in SMM and thus trusted. I was going through the steps Jiewen and Yingwen recommended.
In step (02), the new CPU is expected to set up RAM access. In step (03), the new CPU, executing code from flash, is expected to "send board message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add message." For that action, the new CPU may need a stack (minimally if we want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged) CPUs (more precisely, Jiewen even confirmed "No impact to other processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully as I can. I'm still very confused. If you have a better understanding, could you please write up the 15-step process from the thread starter again, with all QEMU customizations applied? Such as, unnecessary steps removed, and platform specifics filled in. Sure. (01a) QEMU: create new CPU. The CPU already exists, but it does not start running code until unparked by the CPU hotplug controller. (01b) QEMU: trigger SCI (02-03) no equivalent (04) Host CPU: (OS) execute GPE handler from DSDT (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU will not enter CPU because SMI is disabled) (06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase code. (07a) Host CPU: (SMM) Write to CPU hotplug controller to enable new CPU (07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU. (08a) New CPU: (Low RAM) Enter protected mode. (08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop. (09) Host CPU: (SMM) Send SMI to the new CPU only. (10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to TSEG. (11) Host CPU: (SMM) Restore 38000. (12) Host CPU: (SMM) Update located data structure to add the new CPU information. (This step will involve CPU_SERVICE protocol) (13) New CPU: (Flash) do whatever other initialization is needed (14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI. (15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in.. In other words, the cache-as-RAM phase of 02-03 is replaced by the INIT-SIPI-SIPI sequence of 07b-08a-08b. The QEMU DSDT could be modified (when secure boot is in effect) to OUT to 0xB2 when hotplug happens. It could write a well-known value to 0xB2, to be read by an SMI handler in edk2. I dislike involving QEMU's generated DSDT in anything SMM (even injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it could willfully diverge from the process that we design. If QEMU broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM (and thereby force them into trusted state), then the OS would be explicitly counter-interested in carrying out the AML operations from QEMU's DSDT.
But since the hotplug controller would only be accessible from SMM, there would be no other way to invoke it than to follow the DSDT's instruction and write to 0xB2. FWIW, real hardware also has plenty of 0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store access). Paolo
|
|
Igor Mammedov <imammedo@...>
On Thu, 15 Aug 2019 17:00:16 +0200 Laszlo Ersek <lersek@...> wrote: On 08/14/19 16:04, Paolo Bonzini wrote:
On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC code? How do we determine whether the CPU executing SEC is BSP or hot-plugged AP? [Jiewen] I think this is blocked from hardware perspective, since the first instruction. There are some hardware specific registers can be used to determine if the CPU is new added. I don’t think this must be same as the real hardware. You are free to invent some registers in device model to be used in OVMF hot plug driver. Yes, this would be a new operation mode for QEMU, that only applies to hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in fact it doesn't reply to anything at all.
- How do we tell the hot-plugged AP where to start execution? (I.e. that it should execute code at a particular pflash location.) [Jiewen] Same real mode reset vector at FFFF:FFF0. You do not need a reset vector or INIT/SIPI/SIPI sequence at all in QEMU. The AP does not start execution at all when it is unplugged, so no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to INIT/SIPI/SMI.
I don’t think there is problem for real hardware, who always has CAR. Can QEMU provide some CPU specific space, such as MMIO region? Why is a CPU-specific region needed if every other processor is in SMM and thus trusted. I was going through the steps Jiewen and Yingwen recommended.
In step (02), the new CPU is expected to set up RAM access. In step (03), the new CPU, executing code from flash, is expected to "send board message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add message." For that action, the new CPU may need a stack (minimally if we want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged) CPUs (more precisely, Jiewen even confirmed "No impact to other processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully as I can. I'm still very confused. If you have a better understanding, could you please write up the 15-step process from the thread starter again, with all QEMU customizations applied? Such as, unnecessary steps removed, and platform specifics filled in.
One more comment below:
Does CPU hotplug apply only at the socket level? If the CPU is multi-core, what is responsible for hot-plugging all cores present in the socket? I can answer this: the SMM handler would interact with the hotplug controller in the same way that ACPI DSDT does normally. This supports multiple hotplugs already.
Writes to the hotplug controller from outside SMM would be ignored.
(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add message. Maybe we can simplify this in QEMU by broadcasting an SMI to existent processors immediately upon plugging the new CPU.
The QEMU DSDT could be modified (when secure boot is in effect) to OUT to 0xB2 when hotplug happens. It could write a well-known value to 0xB2, to be read by an SMI handler in edk2. (My comment below is general, and may not apply to this particular situation. I'm too confused to figure that out myself, sorry!)
I dislike involving QEMU's generated DSDT in anything SMM (even injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it could willfully diverge from the process that we design. If QEMU broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM (and thereby force them into trusted state), then the OS would be explicitly counter-interested in carrying out the AML operations from QEMU's DSDT. it shouldn't matter where from management SMI comes if OS won't be able to actually trigger SMI with un-trusted content at SMBASE on hotplugged (parked) CPU. The worst that could happen is that new cpu will stay parked. I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging from that DSDT, the OS kernel could only mess with its own state, and not with the firmware's.
Thanks Laszlo
(NOTE: Host CPU can only send
instruction in SMM mode. -- The register is SMM only) Sorry, I don't follow -- what register are we talking about here, and why is the BSP needed to send anything at all? What "instruction" do you have in mind? [Jiewen] The new CPU does not enable SMI at reset. At some point of time later, the CPU need enable SMI, right? The "instruction" here means, the host CPUs need tell to CPU to enable SMI. Right, this would be a write to the CPU hotplug controller
(04) Host CPU: (OS) get message from board that a new CPU is added. (GPIO -> SCI)
(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU will not enter CPU because SMI is disabled) I don't understand the OS involvement here. But, again, perhaps QEMU can force all existent CPUs into SMM immediately upon adding the new CPU. [Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment. See above.
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase code.
(07) Host CPU: (SMM) Send message to New CPU to Enable SMI. Aha, so this is the SMM-only register you mention in step (03). Is the register specified in the Intel SDM? [Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI. It is platform specific register. Not defined in SDM. You may invent one in device model. See above.
(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to TSEG. What code does the new CPU execute after it completes step (10)? Does it halt? [Jiewen] The new CPU exits SMM and return to original place - where it is interrupted to enter SMM - running code on the flash. So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
(11) Host CPU: (SMM) Restore 38000. These steps (i.e., (06) through (11)) don't appear RAS-specific. The only platform-specific feature seems to be SMI masking register, which could be extracted into a new SmmCpuFeaturesLib API.
Thus, would you please consider open sourcing firmware code for steps (06) through (11)?
Alternatively -- and in particular because the stack for step (01) concerns me --, we could approach this from a high-level, functional perspective. The states that really matter are the relocated SMBASE for the new CPU, and the state of the full system, right at the end of step (11).
When the SMM setup quiesces during normal firmware boot, OVMF could use existent (finalized) SMBASE infomation to *pre-program* some virtual QEMU hardware, with such state that would be expected, as "final" state, of any new hotplugged CPU. Afterwards, if / when the hotplug actually happens, QEMU could blanket-apply this state to the new CPU, and broadcast a hardware SMI to all CPUs except the new one.
I'd rather avoid this and stay as close as possible to real hardware.
Paolo
|
|
Paolo Bonzini <pbonzini@...>
On 17/08/19 02:20, Yao, Jiewen wrote: [Jiewen] That is OK. Then we MUST add the third adversary. -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world. NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world: #1: the SMM MUST be non-DMA capable region. #2: the MMIO MUST be non-DMA capable region. #3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design. #4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region. As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4. I assume the virtual environment is designed in the same way. Please correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only problematic one; Igor's idea (or a variant, for example optionally remapping 0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive. Paolo
|
|
Alex Williamson <alex.williamson@...>
On Fri, 16 Aug 2019 22:15:15 +0200 Laszlo Ersek <lersek@...> wrote: +Alex (direct question at the bottom)
On 08/16/19 09:49, Yao, Jiewen wrote:
below
-----Original Message----- From: Paolo Bonzini [mailto:pbonzini@...] Sent: Friday, August 16, 2019 3:20 PM To: Yao, Jiewen <jiewen.yao@...>; Laszlo Ersek <lersek@...>; devel@edk2.groups.io Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 16/08/19 04:46, Yao, Jiewen wrote:
Comment below:
-----Original Message----- From: Paolo Bonzini [mailto:pbonzini@...] Sent: Friday, August 16, 2019 12:21 AM To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao, Jiewen
<jiewen.yao@...> Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>;
Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:
On 08/14/19 16:04, Paolo Bonzini wrote:
On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC
code?
How do we determine whether the CPU executing SEC is BSP or hot-plugged AP? [Jiewen] I think this is blocked from hardware perspective, since the
first
instruction.
There are some hardware specific registers can be used to determine
if
the CPU is new added.
I don’t think this must be same as the real hardware. You are free to invent some registers in device model to be used in
OVMF hot plug driver.
Yes, this would be a new operation mode for QEMU, that only applies
to
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI,
in
fact it doesn't reply to anything at all.
- How do we tell the hot-plugged AP where to start execution? (I.e.
that
it should execute code at a particular pflash location.) [Jiewen] Same real mode reset vector at FFFF:FFF0. You do not need a reset vector or INIT/SIPI/SIPI sequence at all in QEMU. The AP does not start execution at all when it is unplugged,
so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply
to
INIT/SIPI/SMI.
I don’t think there is problem for real hardware, who always has CAR. Can QEMU provide some CPU specific space, such as MMIO region? Why is a CPU-specific region needed if every other processor is in SMM and thus trusted. I was going through the steps Jiewen and Yingwen recommended.
In step (02), the new CPU is expected to set up RAM access. In step (03), the new CPU, executing code from flash, is expected to "send
board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add message." For that action, the new CPU may need a stack (minimally if
we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged) CPUs (more precisely, Jiewen even confirmed "No impact to other processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully as I can. I'm still very confused. If you have a better understanding, could you please write up the 15-step process from the thread starter again, with all QEMU customizations applied? Such as, unnecessary
steps
removed, and platform specifics filled in. Sure.
(01a) QEMU: create new CPU. The CPU already exists, but it does not start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU. [Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no restriction that INIT/SIPI/SIPI can only be sent in SMM. All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded before 07a, so this is okay. [Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a? I don’t see any extra step between 06 and 07a. What is the magic here? The magic is 07a itself, IIUC. The CPU hotplug controller would be accessible only in SMM. And until 07a happens, the new CPU ignores INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU would implement the new CPU's behavior like that.
However I do see a problem, because a PCI device's DMA could overwrite 0x38000 between (06) and (10) and hijack the code that is executed in SMM. How is this avoided on real hardware? By the time the new CPU enters SMM, it doesn't run off cache-as-RAM anymore. [Jiewen] Interesting question. I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below: -- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data. -- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU. We do have physical PCI(e) device assignment; sorry for not highlighting that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and it makes sure that the assigned device can only access physical frames that belong to the virtual machine that the device is assigned to.
However, as far as I know, VFIO doesn't try to restrict PCI DMA to subsets of guest RAM... I could be wrong about that, I vaguely recall RMRR support, which seems somewhat related.
I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked. I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment? I think that would be a VFIO feature.
Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM (expressed with guest-physical RAM addresses), perhaps permanently, perhaps just for a while -- not sure about coordination though --, could VFIO accommodate that (I guess by "punching holes" in the IOMMU page tables)? It depends. For starters, the vfio mapping API does not allow unmapping arbitrary sub-ranges of previous mappings. So the hole you want to punch would need to be independently mapped. From there you get into the issue of whether this range is a potential DMA target. If it is, then this is the path to data corruption. We cannot interfere with the operation of the device and we have little to no visibility of active DMA targets. If we're talking about RAM that is never a DMA target, perhaps e820 reserved memory, then we can make sure certainly MemoryRegions are skipped when mapped by QEMU and would expect the guest to never map them through a vIOMMU as well. Maybe then it's a question of where we're trying to provide security (it might be more difficult if QEMU needs to sanitize vIOMMU mappings to actively prevent mapping reserved areas). Is there anything unique about the VM case here? Bare metal SMM needs to be concerned about protecting itself from I/O devices that operate outside of the realm of SMM mode as well, right? Is something "simple" like an AddressSpace switch necessary here, such that an I/O device always has a mapping to a safe guest RAM page while the vCPU AddressSpace can switch to some protected page? The IOMMU and vCPU mappings don't need to be the same. The vCPU is more under our control than the assigned device. FWIW, RMRRs are a VT-d specific mechanism to define an address range as persistently, identity mapped for one or more devices. IOW, the device would always map that range. I don't think that's what you're after here. RMRRs are also an abomination that I hope we never find a requirement for in a VM. Thanks, Alex
|
|
Perhaps there is a way to avoid the 3000:8000 startup vector.
If a CPU is added after a cold reset, it is already in a different state because one of the active CPUs needs to release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same SMRR value, so a check can be made to verify that all the active CPUs have the same SMRR value. If they do, then any CPU released through the hot plug controller can have its SMRR pre-programmed and the initial SMI will start within TSEG.
We just need to decide what to do in the unexpected case where all the active CPUs do not have the same SMRR value.
This should also reduce the total number of steps.
Mike
toggle quoted message
Show quoted text
-----Original Message----- From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On Behalf Of Yao, Jiewen Sent: Sunday, August 18, 2019 4:01 PM To: Paolo Bonzini <pbonzini@...> Cc: Alex Williamson <alex.williamson@...>; Laszlo Ersek <lersek@...>; devel@edk2.groups.io; edk2- rfc-groups-io <rfc@edk2.groups.io>; qemu devel list <qemu-devel@...>; Igor Mammedov <imammedo@...>; Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl <phillip.goerl@...> Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack. I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison.
thank you! Yao, Jiewen
在 2019年8月19日,上午3:50,Paolo Bonzini <pbonzini@...> 写道:
On 17/08/19 02:20, Yao, Jiewen wrote: [Jiewen] That is OK. Then we MUST add the third adversary.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world: #1: the SMM MUST be non-DMA capable region. #2: the MMIO MUST be non-DMA capable region. #3: the stolen memory MIGHT be DMA capable region or
non-DMA capable
region. It depends upon the silicon design. #4: the normal OS accessible memory - including ACPI reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU
protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please
correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only problematic one;
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
|
|
On 08/19/19 16:10, Paolo Bonzini wrote: On 19/08/19 01:00, Yao, Jiewen wrote:
in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack. I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison. Indeed the SMRR would not cover the A-seg on real hardware. However, if the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be used for SMBASE relocation of hotplugged CPU. The firmware would still keep low SMRAM disabled, *except around SMBASE relocation of hotplugged CPUs*. To avoid cache poisoning attacks, you only have to issue a WBINVD before enabling low SMRAM and before disabling it. Hotplug SMI is not a performance-sensitive path, so it's not a big deal.
So I guess you agree that PCI DMA attacks are a potential vector also on real hardware. As Alex pointed out, VT-d is not a solution because there could be legitimate DMA happening during CPU hotplug. Alex, thank you for the help! Please let us know if we should remove you from the CC list, in order not to clutter your inbox. (I've kept your address for now, for saying thanks. Feel free to stop reading here. Thanks!) For OVMF we'll probably go with Igor's idea, it would be nice if Intel chipsets supported it too. :) So what is Igor's idea? Please do spoon-feed it to me. I've seen the POC patch but the memory region manipulation isn't obvious to me. Regarding TSEG, QEMU doesn't implement it differently from normal RAM. Instead, if memory serves, there is an extra "black hole" region that is overlaid, which hides the RAM contents when TSEG is supposed to be closed (and the guest is not running in SMM). But this time we're doing something else, right? Is the idea to overlay the RAM range at 0x30000 with a window (alias) into the "compatible" SMRAM at 0xA0000-0xBFFFF? I don't know how the "compatible" SMRAM is implemented in QEMU. Does the compatible SMRAM behave in sync with TSEG? OVMF doesn't configure or touch compatible SMRAM at all, at the moment. Thanks Laszlo
|
|
in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack. I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison.
thank you! Yao, Jiewen
toggle quoted message
Show quoted text
在 2019年8月19日,上午3:50,Paolo Bonzini <pbonzini@...> 写道:
On 17/08/19 02:20, Yao, Jiewen wrote: [Jiewen] That is OK. Then we MUST add the third adversary. -- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world. NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world: #1: the SMM MUST be non-DMA capable region. #2: the MMIO MUST be non-DMA capable region. #3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design. #4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region. As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4. I assume the virtual environment is designed in the same way. Please correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only problematic one; Igor's idea (or a variant, for example optionally remapping 0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
|
|