Date
1 - 20 of 46
[edk2-devel] CPU hotplug using SMM with QEMU+OVMF
Laszlo Ersek
On 08/14/19 16:04, Paolo Bonzini wrote:
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
One more comment below:
situation. I'm too confused to figure that out myself, sorry!)
I dislike involving QEMU's generated DSDT in anything SMM (even
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
from that DSDT, the OS kernel could only mess with its own state, and
not with the firmware's.
Thanks
Laszlo
On 14/08/19 15:20, Yao, Jiewen wrote:I was going through the steps Jiewen and Yingwen recommended.Yes, this would be a new operation mode for QEMU, that only applies to- Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.You do not need a reset vector or INIT/SIPI/SIPI sequence at all in- How do we tell the hot-plugged AP where to start execution? (I.e. that[Jiewen] Same real mode reset vector at FFFF:FFF0.
it should execute code at a particular pflash location.)
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
One more comment below:
(My comment below is general, and may not apply to this particularI can answer this: the SMM handler would interact with the hotplugDoes CPU hotplug apply only at the socket level? If the CPU is
multi-core, what is responsible for hot-plugging all cores present in
the socket?
controller in the same way that ACPI DSDT does normally. This supports
multiple hotplugs already.
Writes to the hotplug controller from outside SMM would be ignored.The QEMU DSDT could be modified (when secure boot is in effect) to OUT(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)Maybe we can simplify this in QEMU by broadcasting an SMI to existent
-- I am waiting for hot-add message.
processors immediately upon plugging the new CPU.
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
situation. I'm too confused to figure that out myself, sorry!)
I dislike involving QEMU's generated DSDT in anything SMM (even
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
from that DSDT, the OS kernel could only mess with its own state, and
not with the firmware's.
Thanks
Laszlo
Right, this would be a write to the CPU hotplug controller[Jiewen] The new CPU does not enable SMI at reset.(NOTE: Host CPU can onlysendinstruction in SMM mode. -- The register is SMM only)Sorry, I don't follow -- what register are we talking about here, and
why is the BSP needed to send anything at all? What "instruction" do you
have in mind?
At some point of time later, the CPU need enable SMI, right?
The "instruction" here means, the host CPUs need tell to CPU to enable SMI.See above.[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.(04) Host CPU: (OS) get message from board that a new CPU is added.I don't understand the OS involvement here. But, again, perhaps QEMU can
(GPIO -> SCI)
(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
will not enter CPU because SMI is disabled)
force all existent CPUs into SMM immediately upon adding the new CPU.See above.[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMMAha, so this is the SMM-only register you mention in step (03). Is the
rebase code.
(07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
register specified in the Intel SDM?
It is platform specific register. Not defined in SDM.
You may invent one in device model.So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).[Jiewen] The new CPU exits SMM and return to original place - where it is(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE toWhat code does the new CPU execute after it completes step (10)? Does it
TSEG.
halt?
interrupted to enter SMM - running code on the flash.I'd rather avoid this and stay as close as possible to real hardware.(11) Host CPU: (SMM) Restore 38000.These steps (i.e., (06) through (11)) don't appear RAS-specific. The
only platform-specific feature seems to be SMI masking register, which
could be extracted into a new SmmCpuFeaturesLib API.
Thus, would you please consider open sourcing firmware code for steps
(06) through (11)?
Alternatively -- and in particular because the stack for step (01)
concerns me --, we could approach this from a high-level, functional
perspective. The states that really matter are the relocated SMBASE for
the new CPU, and the state of the full system, right at the end of step
(11).
When the SMM setup quiesces during normal firmware boot, OVMF could
use
existent (finalized) SMBASE infomation to *pre-program* some virtual
QEMU hardware, with such state that would be expected, as "final" state,
of any new hotplugged CPU. Afterwards, if / when the hotplug actually
happens, QEMU could blanket-apply this state to the new CPU, and
broadcast a hardware SMI to all CPUs except the new one.
Paolo
Yao, Jiewen
Comment below:
toggle quoted message
Show quoted text
-----Original Message-----[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao, Jiewen
<jiewen.yao@...>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>;
Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:On 08/14/19 16:04, Paolo Bonzini wrote:code?On 14/08/19 15:20, Yao, Jiewen wrote:- Does this part require a new branch somewhere in the OVMF SECinstruction.How do we determine whether the CPU executing SEC is BSP or[Jiewen] I think this is blocked from hardware perspective, since the first
hot-plugged AP?the CPU is new added.There are some hardware specific registers can be used to determine ifOVMF hot plug driver.I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used inthat
Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.- How do we tell the hot-plugged AP where to start execution? (I.e.Sure.I was going through the steps Jiewen and Yingwen recommended.You do not need a reset vector or INIT/SIPI/SIPI sequence at all init should execute code at a particular pflash location.)[Jiewen] Same real mode reset vector at FFFF:FFF0.
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
restriction that INIT/SIPI/SIPI can only be sent in SMM.
(08a) New CPU: (Low RAM) Enter protected mode.[Jiewen] NOTE: The new CPU still cannot use any physical memory, because
the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.
(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.[Jiewen] I am OK with this proposal.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
CPU specific or on the flash.
But since the hotplug controller would only be accessible from SMM,The QEMU DSDT could be modified (when secure boot is in effect) to OUTI dislike involving QEMU's generated DSDT in anything SMM (even
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Paolo
Yao, Jiewen
below
toggle quoted message
Show quoted text
-----Original Message-----[Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 3:20 PM
To: Yao, Jiewen <jiewen.yao@...>; Laszlo Ersek
<lersek@...>; devel@edk2.groups.io
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>;
Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 16/08/19 04:46, Yao, Jiewen wrote:Comment below:Jiewen-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao,<imammedo@...>;<jiewen.yao@...>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov<boris.ostrovsky@...>;Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris OstrovskyfirstJoao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:On 08/14/19 16:04, Paolo Bonzini wrote:code?On 14/08/19 15:20, Yao, Jiewen wrote:- Does this part require a new branch somewhere in the OVMF SECHow do we determine whether the CPU executing SEC is BSP or[Jiewen] I think this is blocked from hardware perspective, since the
hot-plugged AP?ifinstruction.There are some hardware specific registers can be used to determinetothe CPU is new added.OVMF hot plug driver.I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in
Yes, this would be a new operation mode for QEMU, that only appliesinhot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI,sothatfact it doesn't reply to anything at all.- How do we tell the hot-plugged AP where to start execution? (I.e.You do not need a reset vector or INIT/SIPI/SIPI sequence at all init should execute code at a particular pflash location.)[Jiewen] Same real mode reset vector at FFFF:FFF0.
QEMU. The AP does not start execution at all when it is unplugged,tono cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not replyboardINIT/SIPI/SMI.I was going through the steps Jiewen and Yingwen recommended.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "sendwemessage to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally ifstepswant to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessaryAll of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is noremoved, and platform specifics filled in.Sure.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
restriction that INIT/SIPI/SIPI can only be sent in SMM.
before 07a, so this is okay.
I don’t see any extra step between 06 and 07a.
What is the magic here?
However I do see a problem, because a PCI device's DMA could overwrite[Jiewen] Interesting question.
0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.
I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
-- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
-- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.
I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.
I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?
Paolobecause(08a) New CPU: (Low RAM) Enter protected mode.[Jiewen] NOTE: The new CPU still cannot use any physical memory,the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.SMI, the memory should be(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.[Jiewen] I am OK with this proposal.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before firstCPU specific or on the flash.OUTThe QEMU DSDT could be modified (when secure boot is in effect) toBut since the hotplug controller would only be accessible from SMM,to 0xB2 when hotplug happens. It could write a well-known value toI dislike involving QEMU's generated DSDT in anything SMM (even
0xB2, to be read by an SMI handler in edk2.
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Paolo
Laszlo Ersek
On 08/15/19 18:21, Paolo Bonzini wrote:
too, down-thread)
SMI. IOW, if we could excise steps 07b, 08a, 08b.
Our CPU hotplug controller, and the initial parked state in 01a for the
new CPU, are going to be home-brewed anyway.
On the other hand...
13/14, after the RSM.
Do we absolutely need low RAM for 08a (for entering protected mode)? we
could execute from pflash, no? OTOH we'd still need RAM for the stack,
and that could be attacked with PCI DMA similarly. I believe.
Laszlo
On 15/08/19 17:00, Laszlo Ersek wrote:(Could Intel open source code for this?)On 08/14/19 16:04, Paolo Bonzini wrote:Sure.On 14/08/19 15:20, Yao, Jiewen wrote:I was going through the steps Jiewen and Yingwen recommended.Yes, this would be a new operation mode for QEMU, that only applies to- Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.You do not need a reset vector or INIT/SIPI/SIPI sequence at all in- How do we tell the hot-plugged AP where to start execution? (I.e. that[Jiewen] Same real mode reset vector at FFFF:FFF0.
it should execute code at a particular pflash location.)
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enablePCI DMA attack might be relevant (but yes, I see you've mentioned that
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
(08a) New CPU: (Low RAM) Enter protected mode.
too, down-thread)
I wish we could simply wake the new CPU -- after step 07a -- with an
(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
SMI. IOW, if we could excise steps 07b, 08a, 08b.
Our CPU hotplug controller, and the initial parked state in 01a for the
new CPU, are going to be home-brewed anyway.
On the other hand...
(11) Host CPU: (SMM) Restore 38000.basically step 08b is the environment to which the new CPU returns in
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
13/14, after the RSM.
Do we absolutely need low RAM for 08a (for entering protected mode)? we
could execute from pflash, no? OTOH we'd still need RAM for the stack,
and that could be attacked with PCI DMA similarly. I believe.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..Right.
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.But since the hotplug controller would only be accessible from SMM,The QEMU DSDT could be modified (when secure boot is in effect) to OUTI dislike involving QEMU's generated DSDT in anything SMM (even
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2.
FWIW, real hardware also has plenty ofThanks
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Laszlo
Laszlo Ersek
+Alex (direct question at the bottom)
On 08/16/19 09:49, Yao, Jiewen wrote:
accessible only in SMM. And until 07a happens, the new CPU ignores
INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
would implement the new CPU's behavior like that.
that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
it makes sure that the assigned device can only access physical frames
that belong to the virtual machine that the device is assigned to.
However, as far as I know, VFIO doesn't try to restrict PCI DMA to
subsets of guest RAM... I could be wrong about that, I vaguely recall
RMRR support, which seems somewhat related.
Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
(expressed with guest-physical RAM addresses), perhaps permanently,
perhaps just for a while -- not sure about coordination though --, could
VFIO accommodate that (I guess by "punching holes" in the IOMMU page
tables)?
Thanks
Laszlo
On 08/16/19 09:49, Yao, Jiewen wrote:
belowThe magic is 07a itself, IIUC. The CPU hotplug controller would be-----Original Message-----[Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 3:20 PM
To: Yao, Jiewen <jiewen.yao@...>; Laszlo Ersek
<lersek@...>; devel@edk2.groups.io
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>;
Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 16/08/19 04:46, Yao, Jiewen wrote:Comment below:Jiewen-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao,<imammedo@...>;<jiewen.yao@...>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov<boris.ostrovsky@...>;Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris OstrovskyfirstJoao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:On 08/14/19 16:04, Paolo Bonzini wrote:code?On 14/08/19 15:20, Yao, Jiewen wrote:- Does this part require a new branch somewhere in the OVMF SECHow do we determine whether the CPU executing SEC is BSP or[Jiewen] I think this is blocked from hardware perspective, since the
hot-plugged AP?ifinstruction.There are some hardware specific registers can be used to determinetothe CPU is new added.OVMF hot plug driver.I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in
Yes, this would be a new operation mode for QEMU, that only appliesinhot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI,sothatfact it doesn't reply to anything at all.- How do we tell the hot-plugged AP where to start execution? (I.e.You do not need a reset vector or INIT/SIPI/SIPI sequence at all init should execute code at a particular pflash location.)[Jiewen] Same real mode reset vector at FFFF:FFF0.
QEMU. The AP does not start execution at all when it is unplugged,tono cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not replyboardINIT/SIPI/SMI.I was going through the steps Jiewen and Yingwen recommended.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "sendwemessage to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally ifstepswant to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessaryAll of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is noremoved, and platform specifics filled in.Sure.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
restriction that INIT/SIPI/SIPI can only be sent in SMM.
before 07a, so this is okay.
I don’t see any extra step between 06 and 07a.
What is the magic here?
accessible only in SMM. And until 07a happens, the new CPU ignores
INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
would implement the new CPU's behavior like that.
We do have physical PCI(e) device assignment; sorry for not highlightingHowever I do see a problem, because a PCI device's DMA could overwrite[Jiewen] Interesting question.
0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.
I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
-- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
-- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.
that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
it makes sure that the assigned device can only access physical frames
that belong to the virtual machine that the device is assigned to.
However, as far as I know, VFIO doesn't try to restrict PCI DMA to
subsets of guest RAM... I could be wrong about that, I vaguely recall
RMRR support, which seems somewhat related.
I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.I think that would be a VFIO feature.
I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?
Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
(expressed with guest-physical RAM addresses), perhaps permanently,
perhaps just for a while -- not sure about coordination though --, could
VFIO accommodate that (I guess by "punching holes" in the IOMMU page
tables)?
Thanks
Laszlo
Paolobecause(08a) New CPU: (Low RAM) Enter protected mode.[Jiewen] NOTE: The new CPU still cannot use any physical memory,the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.SMI, the memory should be(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.[Jiewen] I am OK with this proposal.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before firstCPU specific or on the flash.OUTThe QEMU DSDT could be modified (when secure boot is in effect) toBut since the hotplug controller would only be accessible from SMM,to 0xB2 when hotplug happens. It could write a well-known value toI dislike involving QEMU's generated DSDT in anything SMM (even
0xB2, to be read by an SMI handler in edk2.
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Paolo
Yao, Jiewen
toggle quoted message
Show quoted text
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please correct me if I am wrong.
I found https://www.kernel.org/doc/Documentation/vfio.txt
Is that what you scribed above?
Anyway, I believe the problem is clear and the solution in real world is clear.
I will leave the virtual world discussion to Alex, Paolo, Laszlo.
If you need any of my input, please let me know.
-----Original Message-----[Jiewen] Got it. Looks fine to me.
From: Alex Williamson [mailto:alex.williamson@...]
Sent: Saturday, August 17, 2019 6:20 AM
To: Laszlo Ersek <lersek@...>
Cc: Yao, Jiewen <jiewen.yao@...>; Paolo Bonzini
<pbonzini@...>; devel@edk2.groups.io; edk2-rfc-groups-io
<rfc@edk2.groups.io>; qemu devel list <qemu-devel@...>; Igor
Mammedov <imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, Jun <jun.nakajima@...>; Boris
Ostrovsky <boris.ostrovsky@...>; Joao Marcal Lemos Martins
<joao.m.martins@...>; Phillip Goerl <phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On Fri, 16 Aug 2019 22:15:15 +0200
Laszlo Ersek <lersek@...> wrote:+Alex (direct question at the bottom)<imammedo@...>;
On 08/16/19 09:49, Yao, Jiewen wrote:below-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 3:20 PM
To: Yao, Jiewen <jiewen.yao@...>; Laszlo Ersek
<lersek@...>; devel@edk2.groups.io
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov<boris.ostrovsky@...>;Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris OstrovskyGoerlJoao Marcal Lemos Martins <joao.m.martins@...>; PhillipGoerl<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 16/08/19 04:46, Yao, Jiewen wrote:Comment below:Jiewen-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao,<imammedo@...>;<jiewen.yao@...>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov<boris.ostrovsky@...>;Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris OstrovskyJoao Marcal Lemos Martins <joao.m.martins@...>; PhillipSEC<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:On 08/14/19 16:04, Paolo Bonzini wrote:On 14/08/19 15:20, Yao, Jiewen wrote:- Does this part require a new branch somewhere in the OVMForcode?How do we determine whether the CPU executing SEC is BSPthehot-plugged AP?[Jiewen] I think this is blocked from hardware perspective, sincedeterminefirstinstruction.There are some hardware specific registers can be used toinifthe CPU is new added.I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be usedappliesOVMF hot plug driver.
Yes, this would be a new operation mode for QEMU, that onlySMI,tohot-plugged CPUs. In this mode the AP doesn't reply to INIT or(I.e.infact it doesn't reply to anything at all.- How do we tell the hot-plugged AP where to start execution?unplugged,thatYou do not need a reset vector or INIT/SIPI/SIPI sequence at all init should execute code at a particular pflash location.)[Jiewen] Same real mode reset vector at FFFF:FFF0.
QEMU. The AP does not start execution at all when it isreplysono cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do notCAR.toINIT/SIPI/SMI.I don’t think there is problem for real hardware, who always hasregion?Can QEMU provide some CPU specific space, such as MMIOSMM
Why is a CPU-specific region needed if every other processor is in(minimally ifboardand thus trusted.I was going through the steps Jiewen and Yingwen recommended.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "sendmessage to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stackpre-plugged)wewant to use C function calls).
Until step (03), there had been no word about any other (=carefullyCPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, asstarteras I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the threadnotstepsagain, with all QEMU customizations applied? Such as, unnecessaryremoved, and platform specifics filled in.Sure.
(01a) QEMU: create new CPU. The CPU already exists, but it doescontroller.start running code until unparked by the CPU hotplugCPU
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: Newnowill not enter CPU because SMI is disabled)[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.delivered at 07a?[Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but isrestriction that INIT/SIPI/SIPI can only be sent in SMM.All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
before 07a, so this is okay.I don’t see any extra step between 06 and 07a.The magic is 07a itself, IIUC. The CPU hotplug controller would be
What is the magic here?
accessible only in SMM. And until 07a happens, the new CPU ignores
INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
would implement the new CPU's behavior like that.
[Jiewen] That is OK. Then we MUST add the third adversary.overwriteHowever I do see a problem, because a PCI device's DMA couldCPU0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the newenvironment. We only list adversary below:enters SMM, it doesn't run off cache-as-RAM anymore.[Jiewen] Interesting question.
I don’t think the DMA attack is considered in threat model for the virtualor silicon register from OS level, or read write BIOS data.-- Adversary: System Software Attacker, who can control any OS memorya CPU.-- Adversary: Simple hardware attacker, who can hot add or hot remove
We do have physical PCI(e) device assignment; sorry for not highlighting
that earlier.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or non-DMA capable region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI reclaim, ACPI NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please correct me if I am wrong.
[Jiewen] Thank you! Good to know.That feature (VFIO) does rely on the (physical) IOMMU, and
it makes sure that the assigned device can only access physical frames
that belong to the virtual machine that the device is assigned to.
I found https://www.kernel.org/doc/Documentation/vfio.txt
Is that what you scribed above?
Anyway, I believe the problem is clear and the solution in real world is clear.
I will leave the virtual world discussion to Alex, Paolo, Laszlo.
If you need any of my input, please let me know.
However, as far as I know, VFIO doesn't try to restrict PCI DMA toVTd to make sure the 38000 is blocked.
subsets of guest RAM... I could be wrong about that, I vaguely recall
RMRR support, which seems somewhat related.I agree it is a threat from real hardware perspective. SMM may checkDMA in virtual environment?I doubt if it is a threat in virtual environment. Do we have a way to blockIt depends. For starters, the vfio mapping API does not allow
I think that would be a VFIO feature.
Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
(expressed with guest-physical RAM addresses), perhaps permanently,
perhaps just for a while -- not sure about coordination though --, could
VFIO accommodate that (I guess by "punching holes" in the IOMMU page
tables)?
unmapping arbitrary sub-ranges of previous mappings. So the hole you
want to punch would need to be independently mapped. From there you
get into the issue of whether this range is a potential DMA target. If
it is, then this is the path to data corruption. We cannot interfere
with the operation of the device and we have little to no visibility of
active DMA targets.
If we're talking about RAM that is never a DMA target, perhaps e820
reserved memory, then we can make sure certainly MemoryRegions are
skipped when mapped by QEMU and would expect the guest to never map
them through a vIOMMU as well. Maybe then it's a question of where
we're trying to provide security (it might be more difficult if QEMU
needs to sanitize vIOMMU mappings to actively prevent mapping
reserved areas).
Is there anything unique about the VM case here? Bare metal SMM needs
to be concerned about protecting itself from I/O devices that operate
outside of the realm of SMM mode as well, right? Is something "simple"
like an AddressSpace switch necessary here, such that an I/O device
always has a mapping to a safe guest RAM page while the vCPU
AddressSpace can switch to some protected page? The IOMMU and vCPU
mappings don't need to be the same. The vCPU is more under our control
than the assigned device.
FWIW, RMRRs are a VT-d specific mechanism to define an address range as
persistently, identity mapped for one or more devices. IOW, the device
would always map that range. I don't think that's what you're after
here. RMRRs are also an abomination that I hope we never find a
requirement for in a VM. Thanks,
Alex
Yao, Jiewen
in real world, we deprecate AB-seg usage because they are vulnerable to smm cache poison attack.
I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison.
thank you!
Yao, Jiewen
toggle quoted message
Show quoted text
I assume cache poison is out of scope in the virtual world, or there is a way to prevent ABseg cache poison.
thank you!
Yao, Jiewen
在 2019年8月19日,上午3:50,Paolo Bonzini <pbonzini@...> 写道:On 17/08/19 02:20, Yao, Jiewen wrote:Correct. The 0x30000...0x3ffff area is the only problematic one;
[Jiewen] That is OK. Then we MUST add the third adversary.
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or non-DMA capable
region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU
protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please
correct me if I am wrong.
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
Laszlo Ersek
On 08/19/19 16:10, Paolo Bonzini wrote:
from the CC list, in order not to clutter your inbox. (I've kept your
address for now, for saying thanks. Feel free to stop reading here. Thanks!)
patch but the memory region manipulation isn't obvious to me.
Regarding TSEG, QEMU doesn't implement it differently from normal RAM.
Instead, if memory serves, there is an extra "black hole" region that is
overlaid, which hides the RAM contents when TSEG is supposed to be
closed (and the guest is not running in SMM).
But this time we're doing something else, right? Is the idea to overlay
the RAM range at 0x30000 with a window (alias) into the "compatible"
SMRAM at 0xA0000-0xBFFFF?
I don't know how the "compatible" SMRAM is implemented in QEMU. Does the
compatible SMRAM behave in sync with TSEG? OVMF doesn't configure or
touch compatible SMRAM at all, at the moment.
Thanks
Laszlo
On 19/08/19 01:00, Yao, Jiewen wrote:Alex, thank you for the help! Please let us know if we should remove youin real world, we deprecate AB-seg usage because they are vulnerableIndeed the SMRR would not cover the A-seg on real hardware. However, if
to smm cache poison attack. I assume cache poison is out of scope in
the virtual world, or there is a way to prevent ABseg cache poison.
the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be
used for SMBASE relocation of hotplugged CPU. The firmware would still
keep low SMRAM disabled, *except around SMBASE relocation of hotplugged
CPUs*. To avoid cache poisoning attacks, you only have to issue a
WBINVD before enabling low SMRAM and before disabling it. Hotplug SMI
is not a performance-sensitive path, so it's not a big deal.
So I guess you agree that PCI DMA attacks are a potential vector also on
real hardware. As Alex pointed out, VT-d is not a solution because
there could be legitimate DMA happening during CPU hotplug.
from the CC list, in order not to clutter your inbox. (I've kept your
address for now, for saying thanks. Feel free to stop reading here. Thanks!)
For OVMFSo what is Igor's idea? Please do spoon-feed it to me. I've seen the POC
we'll probably go with Igor's idea, it would be nice if Intel chipsets
supported it too. :)
patch but the memory region manipulation isn't obvious to me.
Regarding TSEG, QEMU doesn't implement it differently from normal RAM.
Instead, if memory serves, there is an extra "black hole" region that is
overlaid, which hides the RAM contents when TSEG is supposed to be
closed (and the guest is not running in SMM).
But this time we're doing something else, right? Is the idea to overlay
the RAM range at 0x30000 with a window (alias) into the "compatible"
SMRAM at 0xA0000-0xBFFFF?
I don't know how the "compatible" SMRAM is implemented in QEMU. Does the
compatible SMRAM behave in sync with TSEG? OVMF doesn't configure or
touch compatible SMRAM at all, at the moment.
Thanks
Laszlo
Michael D Kinney
Perhaps there is a way to avoid the 3000:8000 startup
vector.
If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.
We just need to decide what to do in the unexpected
case where all the active CPUs do not have the same
SMRR value.
This should also reduce the total number of steps.
Mike
toggle quoted message
Show quoted text
vector.
If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.
We just need to decide what to do in the unexpected
case where all the active CPUs do not have the same
SMRR value.
This should also reduce the total number of steps.
Mike
-----Original Message-----
From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On
Behalf Of Yao, Jiewen
Sent: Sunday, August 18, 2019 4:01 PM
To: Paolo Bonzini <pbonzini@...>
Cc: Alex Williamson <alex.williamson@...>; Laszlo
Ersek <lersek@...>; devel@edk2.groups.io; edk2-
rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov
<imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky
<boris.ostrovsky@...>; Joao Marcal Lemos Martins
<joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF
in real world, we deprecate AB-seg usage because they
are vulnerable to smm cache poison attack.
I assume cache poison is out of scope in the virtual
world, or there is a way to prevent ABseg cache poison.
thank you!
Yao, Jiewen在 2019年8月19日,上午3:50,Paolo Bonzini<pbonzini@...> 写道:adversary.On 17/08/19 02:20, Yao, Jiewen wrote:
[Jiewen] That is OK. Then we MUST add the thirddevice to perform DMA attack in the virtual world.-- Adversary: Simple hardware attacker, who can usescope. That is be handled by IOMMU in the real world,NOTE: The DMA attack in the real world is out of
such as VTd. -- Please do clarify if this is TRUE.non-DMA capable
In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region orreclaim, ACPIregion. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPIDMA capable region.NVS, and reserved memory not included by #3 - MUST be#2. IOMMUAs such, IOMMU protection is NOT required for #1 andrequired for #4.protection MIGHT be required for #3 and MUST besame way. PleaseI assume the virtual environment is designed in theproblematic one;correct me if I am wrong.Correct. The 0x30000...0x3ffff area is the onlyIgor's idea (or a variant, for example optionallyremapping0xa0000..0xaffff SMRAM to 0x30000) is becoming moreand more attractive.
Paolo
Alex Williamson <alex.williamson@...>
On Fri, 16 Aug 2019 22:15:15 +0200
Laszlo Ersek <lersek@...> wrote:
unmapping arbitrary sub-ranges of previous mappings. So the hole you
want to punch would need to be independently mapped. From there you
get into the issue of whether this range is a potential DMA target. If
it is, then this is the path to data corruption. We cannot interfere
with the operation of the device and we have little to no visibility of
active DMA targets.
If we're talking about RAM that is never a DMA target, perhaps e820
reserved memory, then we can make sure certainly MemoryRegions are
skipped when mapped by QEMU and would expect the guest to never map
them through a vIOMMU as well. Maybe then it's a question of where
we're trying to provide security (it might be more difficult if QEMU
needs to sanitize vIOMMU mappings to actively prevent mapping
reserved areas).
Is there anything unique about the VM case here? Bare metal SMM needs
to be concerned about protecting itself from I/O devices that operate
outside of the realm of SMM mode as well, right? Is something "simple"
like an AddressSpace switch necessary here, such that an I/O device
always has a mapping to a safe guest RAM page while the vCPU
AddressSpace can switch to some protected page? The IOMMU and vCPU
mappings don't need to be the same. The vCPU is more under our control
than the assigned device.
FWIW, RMRRs are a VT-d specific mechanism to define an address range as
persistently, identity mapped for one or more devices. IOW, the device
would always map that range. I don't think that's what you're after
here. RMRRs are also an abomination that I hope we never find a
requirement for in a VM. Thanks,
Alex
Laszlo Ersek <lersek@...> wrote:
+Alex (direct question at the bottom)It depends. For starters, the vfio mapping API does not allow
On 08/16/19 09:49, Yao, Jiewen wrote:belowThe magic is 07a itself, IIUC. The CPU hotplug controller would be
-----Original Message-----[Jiewen] May I know why INIT/SIPI/SIPI is discarded before 07a but is delivered at 07a?
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 3:20 PM
To: Yao, Jiewen <jiewen.yao@...>; Laszlo Ersek
<lersek@...>; devel@edk2.groups.io
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>;
Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 16/08/19 04:46, Yao, Jiewen wrote:Comment below:Jiewen
-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao,<imammedo@...>;<jiewen.yao@...>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov<boris.ostrovsky@...>;Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris OstrovskyfirstJoao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:On 08/14/19 16:04, Paolo Bonzini wrote:code?On 14/08/19 15:20, Yao, Jiewen wrote:- Does this part require a new branch somewhere in the OVMF SECHow do we determine whether the CPU executing SEC is BSP or[Jiewen] I think this is blocked from hardware perspective, since the
hot-plugged AP?ifinstruction.There are some hardware specific registers can be used to determinetothe CPU is new added.OVMF hot plug driver.I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in
Yes, this would be a new operation mode for QEMU, that only appliesinhot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI,sothatfact it doesn't reply to anything at all.
- How do we tell the hot-plugged AP where to start execution? (I.e.You do not need a reset vector or INIT/SIPI/SIPI sequence at all init should execute code at a particular pflash location.)[Jiewen] Same real mode reset vector at FFFF:FFF0.
QEMU. The AP does not start execution at all when it is unplugged,tono cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not replyboardINIT/SIPI/SMI.I was going through the steps Jiewen and Yingwen recommended.
I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "sendwemessage to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally ifstepswant to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessaryAll of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is noremoved, and platform specifics filled in.Sure.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
restriction that INIT/SIPI/SIPI can only be sent in SMM.
before 07a, so this is okay.
I don’t see any extra step between 06 and 07a.
What is the magic here?
accessible only in SMM. And until 07a happens, the new CPU ignores
INIT/SIPI/SIPI even if another CPU sends it those, simply because QEMU
would implement the new CPU's behavior like that.We do have physical PCI(e) device assignment; sorry for not highlighting
However I do see a problem, because a PCI device's DMA could overwrite[Jiewen] Interesting question.
0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.
I don’t think the DMA attack is considered in threat model for the virtual environment. We only list adversary below:
-- Adversary: System Software Attacker, who can control any OS memory or silicon register from OS level, or read write BIOS data.
-- Adversary: Simple hardware attacker, who can hot add or hot remove a CPU.
that earlier. That feature (VFIO) does rely on the (physical) IOMMU, and
it makes sure that the assigned device can only access physical frames
that belong to the virtual machine that the device is assigned to.
However, as far as I know, VFIO doesn't try to restrict PCI DMA to
subsets of guest RAM... I could be wrong about that, I vaguely recall
RMRR support, which seems somewhat related.I agree it is a threat from real hardware perspective. SMM may check VTd to make sure the 38000 is blocked.I think that would be a VFIO feature.
I doubt if it is a threat in virtual environment. Do we have a way to block DMA in virtual environment?
Alex: if we wanted to block PCI(e) DMA to a specific part of guest RAM
(expressed with guest-physical RAM addresses), perhaps permanently,
perhaps just for a while -- not sure about coordination though --, could
VFIO accommodate that (I guess by "punching holes" in the IOMMU page
tables)?
unmapping arbitrary sub-ranges of previous mappings. So the hole you
want to punch would need to be independently mapped. From there you
get into the issue of whether this range is a potential DMA target. If
it is, then this is the path to data corruption. We cannot interfere
with the operation of the device and we have little to no visibility of
active DMA targets.
If we're talking about RAM that is never a DMA target, perhaps e820
reserved memory, then we can make sure certainly MemoryRegions are
skipped when mapped by QEMU and would expect the guest to never map
them through a vIOMMU as well. Maybe then it's a question of where
we're trying to provide security (it might be more difficult if QEMU
needs to sanitize vIOMMU mappings to actively prevent mapping
reserved areas).
Is there anything unique about the VM case here? Bare metal SMM needs
to be concerned about protecting itself from I/O devices that operate
outside of the realm of SMM mode as well, right? Is something "simple"
like an AddressSpace switch necessary here, such that an I/O device
always has a mapping to a safe guest RAM page while the vCPU
AddressSpace can switch to some protected page? The IOMMU and vCPU
mappings don't need to be the same. The vCPU is more under our control
than the assigned device.
FWIW, RMRRs are a VT-d specific mechanism to define an address range as
persistently, identity mapped for one or more devices. IOW, the device
would always map that range. I don't think that's what you're after
here. RMRRs are also an abomination that I hope we never find a
requirement for in a VM. Thanks,
Alex
Paolo Bonzini <pbonzini@...>
On 17/08/19 02:20, Yao, Jiewen wrote:
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
[Jiewen] That is OK. Then we MUST add the third adversary.Correct. The 0x30000...0x3ffff area is the only problematic one;
-- Adversary: Simple hardware attacker, who can use device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of scope. That is be handled by IOMMU in the real world, such as VTd. -- Please do clarify if this is TRUE.
In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or non-DMA capable
region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST be DMA capable region.
As such, IOMMU protection is NOT required for #1 and #2. IOMMU
protection MIGHT be required for #3 and MUST be required for #4.
I assume the virtual environment is designed in the same way. Please
correct me if I am wrong.
Igor's idea (or a variant, for example optionally remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more and more attractive.
Paolo
Igor Mammedov <imammedo@...>
On Thu, 15 Aug 2019 17:00:16 +0200
Laszlo Ersek <lersek@...> wrote:
to actually trigger SMI with un-trusted content at SMBASE on hotplugged (parked) CPU.
The worst that could happen is that new cpu will stay parked.
Laszlo Ersek <lersek@...> wrote:
On 08/14/19 16:04, Paolo Bonzini wrote:it shouldn't matter where from management SMI comes if OS won't be ableOn 14/08/19 15:20, Yao, Jiewen wrote:I was going through the steps Jiewen and Yingwen recommended.Yes, this would be a new operation mode for QEMU, that only applies to- Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.
You do not need a reset vector or INIT/SIPI/SIPI sequence at all in- How do we tell the hot-plugged AP where to start execution? (I.e. that[Jiewen] Same real mode reset vector at FFFF:FFF0.
it should execute code at a particular pflash location.)
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.
I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
One more comment below:(My comment below is general, and may not apply to this particularI can answer this: the SMM handler would interact with the hotplugDoes CPU hotplug apply only at the socket level? If the CPU is
multi-core, what is responsible for hot-plugging all cores present in
the socket?
controller in the same way that ACPI DSDT does normally. This supports
multiple hotplugs already.
Writes to the hotplug controller from outside SMM would be ignored.
The QEMU DSDT could be modified (when secure boot is in effect) to OUT(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)Maybe we can simplify this in QEMU by broadcasting an SMI to existent
-- I am waiting for hot-add message.
processors immediately upon plugging the new CPU.
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
situation. I'm too confused to figure that out myself, sorry!)
I dislike involving QEMU's generated DSDT in anything SMM (even
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
to actually trigger SMI with un-trusted content at SMBASE on hotplugged (parked) CPU.
The worst that could happen is that new cpu will stay parked.
I'd be OK with an SMM / SMI involvement in QEMU's DSDT if, by diverging
from that DSDT, the OS kernel could only mess with its own state, and
not with the firmware's.
Thanks
Laszlo
Right, this would be a write to the CPU hotplug controller[Jiewen] The new CPU does not enable SMI at reset.(NOTE: Host CPU can onlysendinstruction in SMM mode. -- The register is SMM only)Sorry, I don't follow -- what register are we talking about here, and
why is the BSP needed to send anything at all? What "instruction" do you
have in mind?
At some point of time later, the CPU need enable SMI, right?
The "instruction" here means, the host CPUs need tell to CPU to enable SMI.
See above.[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.(04) Host CPU: (OS) get message from board that a new CPU is added.I don't understand the OS involvement here. But, again, perhaps QEMU can
(GPIO -> SCI)
(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
will not enter CPU because SMI is disabled)
force all existent CPUs into SMM immediately upon adding the new CPU.
See above.[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMMAha, so this is the SMM-only register you mention in step (03). Is the
rebase code.
(07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
register specified in the Intel SDM?
It is platform specific register. Not defined in SDM.
You may invent one in device model.
So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).[Jiewen] The new CPU exits SMM and return to original place - where it is(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE toWhat code does the new CPU execute after it completes step (10)? Does it
TSEG.
halt?
interrupted to enter SMM - running code on the flash.
I'd rather avoid this and stay as close as possible to real hardware.(11) Host CPU: (SMM) Restore 38000.These steps (i.e., (06) through (11)) don't appear RAS-specific. The
only platform-specific feature seems to be SMI masking register, which
could be extracted into a new SmmCpuFeaturesLib API.
Thus, would you please consider open sourcing firmware code for steps
(06) through (11)?
Alternatively -- and in particular because the stack for step (01)
concerns me --, we could approach this from a high-level, functional
perspective. The states that really matter are the relocated SMBASE for
the new CPU, and the state of the full system, right at the end of step
(11).
When the SMM setup quiesces during normal firmware boot, OVMF could
use
existent (finalized) SMBASE infomation to *pre-program* some virtual
QEMU hardware, with such state that would be expected, as "final" state,
of any new hotplugged CPU. Afterwards, if / when the hotplug actually
happens, QEMU could blanket-apply this state to the new CPU, and
broadcast a hardware SMI to all CPUs except the new one.
Paolo
Paolo Bonzini <pbonzini@...>
On 15/08/19 17:00, Laszlo Ersek wrote:
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
(08a) New CPU: (Low RAM) Enter protected mode.
(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Paolo
On 08/14/19 16:04, Paolo Bonzini wrote:Sure.On 14/08/19 15:20, Yao, Jiewen wrote:I was going through the steps Jiewen and Yingwen recommended.Yes, this would be a new operation mode for QEMU, that only applies to- Does this part require a new branch somewhere in the OVMF SEC code?[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.You do not need a reset vector or INIT/SIPI/SIPI sequence at all in- How do we tell the hot-plugged AP where to start execution? (I.e. that[Jiewen] Same real mode reset vector at FFFF:FFF0.
it should execute code at a particular pflash location.)
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
(08a) New CPU: (Low RAM) Enter protected mode.
(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
But since the hotplug controller would only be accessible from SMM,The QEMU DSDT could be modified (when secure boot is in effect) to OUTI dislike involving QEMU's generated DSDT in anything SMM (even
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Paolo
Paolo Bonzini <pbonzini@...>
On 19/08/19 01:00, Yao, Jiewen wrote:
the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be
used for SMBASE relocation of hotplugged CPU. The firmware would still
keep low SMRAM disabled, *except around SMBASE relocation of hotplugged
CPUs*. To avoid cache poisoning attacks, you only have to issue a
WBINVD before enabling low SMRAM and before disabling it. Hotplug SMI
is not a performance-sensitive path, so it's not a big deal.
So I guess you agree that PCI DMA attacks are a potential vector also on
real hardware. As Alex pointed out, VT-d is not a solution because
there could be legitimate DMA happening during CPU hotplug. For OVMF
we'll probably go with Igor's idea, it would be nice if Intel chipsets
supported it too. :)
Paolo
in real world, we deprecate AB-seg usage because they are vulnerableIndeed the SMRR would not cover the A-seg on real hardware. However, if
to smm cache poison attack. I assume cache poison is out of scope in
the virtual world, or there is a way to prevent ABseg cache poison.
the chipset allowed aliasing A-seg SMRAM to 0x30000, it would only be
used for SMBASE relocation of hotplugged CPU. The firmware would still
keep low SMRAM disabled, *except around SMBASE relocation of hotplugged
CPUs*. To avoid cache poisoning attacks, you only have to issue a
WBINVD before enabling low SMRAM and before disabling it. Hotplug SMI
is not a performance-sensitive path, so it's not a big deal.
So I guess you agree that PCI DMA attacks are a potential vector also on
real hardware. As Alex pointed out, VT-d is not a solution because
there could be legitimate DMA happening during CPU hotplug. For OVMF
we'll probably go with Igor's idea, it would be nice if Intel chipsets
supported it too. :)
Paolo
Paolo Bonzini <pbonzini@...>
On 16/08/19 04:46, Yao, Jiewen wrote:
before 07a, so this is okay.
However I do see a problem, because a PCI device's DMA could overwrite
0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.
Paolo
Comment below:All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded-----Original Message-----[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@...>; devel@edk2.groups.io; Yao, Jiewen
<jiewen.yao@...>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@...>; Igor Mammedov <imammedo@...>;
Chen, Yingwen <yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky <boris.ostrovsky@...>;
Joao Marcal Lemos Martins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF
On 15/08/19 17:00, Laszlo Ersek wrote:On 08/14/19 16:04, Paolo Bonzini wrote:code?On 14/08/19 15:20, Yao, Jiewen wrote:- Does this part require a new branch somewhere in the OVMF SECinstruction.How do we determine whether the CPU executing SEC is BSP or[Jiewen] I think this is blocked from hardware perspective, since the first
hot-plugged AP?the CPU is new added.There are some hardware specific registers can be used to determine ifOVMF hot plug driver.I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used inthat
Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.- How do we tell the hot-plugged AP where to start execution? (I.e.Sure.I was going through the steps Jiewen and Yingwen recommended.You do not need a reset vector or INIT/SIPI/SIPI sequence at all init should execute code at a particular pflash location.)[Jiewen] Same real mode reset vector at FFFF:FFF0.
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.
We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.I don’t think there is problem for real hardware, who always has CAR.Why is a CPU-specific region needed if every other processor is in SMM
Can QEMU provide some CPU specific space, such as MMIO region?
and thus trusted.
In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).
Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.
Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.
(01b) QEMU: trigger SCI
(02-03) no equivalent
(04) Host CPU: (OS) execute GPE handler from DSDT
(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)
(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.
(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU
(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
restriction that INIT/SIPI/SIPI can only be sent in SMM.
before 07a, so this is okay.
However I do see a problem, because a PCI device's DMA could overwrite
0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.
Paolo
(08a) New CPU: (Low RAM) Enter protected mode.[Jiewen] NOTE: The new CPU still cannot use any physical memory, because
the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.[Jiewen] I am OK with this proposal.
(09) Host CPU: (SMM) Send SMI to the new CPU only.
(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.
(11) Host CPU: (SMM) Restore 38000.
(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)
(13) New CPU: (Flash) do whatever other initialization is needed
(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.
(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..
In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
CPU specific or on the flash.But since the hotplug controller would only be accessible from SMM,The QEMU DSDT could be modified (when secure boot is in effect) to OUTI dislike involving QEMU's generated DSDT in anything SMM (even
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
injecting the SMI), because the AML interpreter runs in the OS.
If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.
If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).
Paolo
Michael D Kinney
Could we have an initial SMBASE that is within TSEG.
If we bring in hot plug CPUs one at a time, then initial
SMBASE in TSEG can reprogram the SMBASE to the correct
value for that CPU.
Can we add a register to the hot plug controller that
allows the BSP to set the initial SMBASE value for
a hot added CPU? The default can be 3000:8000 for
compatibility.
Another idea is when the SMI handler runs for a hot add
CPU event, the SMM monarch programs the hot plug controller
register with the SMBASE to use for the CPU that is being
added. As each CPU is added, a different SMBASE value can
be programmed by the SMM Monarch.
Mike
toggle quoted message
Show quoted text
If we bring in hot plug CPUs one at a time, then initial
SMBASE in TSEG can reprogram the SMBASE to the correct
value for that CPU.
Can we add a register to the hot plug controller that
allows the BSP to set the initial SMBASE value for
a hot added CPU? The default can be 3000:8000 for
compatibility.
Another idea is when the SMI handler runs for a hot add
CPU event, the SMM monarch programs the hot plug controller
register with the SMBASE to use for the CPU that is being
added. As each CPU is added, a different SMBASE value can
be programmed by the SMM Monarch.
Mike
-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Wednesday, August 21, 2019 10:06 AM
To: Kinney, Michael D <michael.d.kinney@...>;
rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@...>
Cc: Alex Williamson <alex.williamson@...>; Laszlo
Ersek <lersek@...>; devel@edk2.groups.io; qemu
devel list <qemu-devel@...>; Igor Mammedov
<imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky
<boris.ostrovsky@...>; Joao Marcal Lemos Martins
<joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF
On 21/08/19 17:48, Kinney, Michael D wrote:Perhaps there is a way to avoid the 3000:8000 startupvector.a different
If a CPU is added after a cold reset, it is already instate because one of the active CPUs needs to releaseit byinteracting with the hot plug controller.to match the
Can the SMRR for CPUs in that state be pre-programmedSMRR in the rest of the active CPUs?SMRR value, so
For OVMF we expect all the active CPUs to use the samea check can be made to verify that all the active CPUshave the sameSMRR value. If they do, then any CPU released throughthe hot plugcontroller can have its SMRR pre-programmed and theinitial SMI willstart within TSEG.case where all the
We just need to decide what to do in the unexpectedactive CPUs do not have the same SMRR value.The problem is not the SMRR but the SMBASE. If the
This should also reduce the total number of steps.
SMBASE area is outside TSEG, it is vulnerable to DMA
attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be
preprogrammed.
(As an aside, virt platforms are also immune to cache
poisoning so they don't have SMRR yet - we could use
them for SMM_CODE_CHK_EN and block execution outside
SMRR but we never got round to it).
An even simpler alternative would be to make A0000h the
initial SMBASE.
However, I would like to understand what hardware
platforms plan to do, if anything.
PaoloMikeOn Behalf Of-----Original Message-----
From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io]Laszlo ErsekYao, Jiewen
Sent: Sunday, August 18, 2019 4:01 PM
To: Paolo Bonzini <pbonzini@...>
Cc: Alex Williamson <alex.williamson@...>;groups-io<lersek@...>; devel@edk2.groups.io; edk2- rfc-devel@...>; Igor<rfc@edk2.groups.io>; qemu devel list <qemu-<jun.nakajima@...>;Mammedov <imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, JunMarcal LemosBoris Ostrovsky <boris.ostrovsky@...>; Joaousing SMM withMartins <joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplugare vulnerableQEMU+OVMF
in real world, we deprecate AB-seg usage because theyworld, or thereto smm cache poison attack.
I assume cache poison is out of scope in the virtualsuch as VTd. --is a way to prevent ABseg cache poison.
thank you!
Yao, Jiewen在 2019年8月19日,上午3:50,Paolo Bonzini<pbonzini@...> 写道:adversary.On 17/08/19 02:20, Yao, Jiewen wrote:
[Jiewen] That is OK. Then we MUST add the thirddevice to perform DMA attack in the virtual world.-- Adversary: Simple hardware attacker, who can usescope. That is be handled by IOMMU in the real world,NOTE: The DMA attack in the real world is out oforPlease do clarify if this is TRUE.
In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable regionACPInon-DMA capableregion. It depends upon the silicon design.
#4: the normal OS accessible memory - includingbereclaim, ACPINVS, and reserved memory not included by #3 - MUSTandDMA capable region.As such, IOMMU protection is NOT required for #1#2. IOMMUrequired for #4.protection MIGHT be required for #3 and MUST besame way. PleaseI assume the virtual environment is designed in theproblematic one;correct me if I am wrong.Correct. The 0x30000...0x3ffff area is the onlyIgor's idea (or a variant, for example optionallyremapping0xa0000..0xaffff SMRAM to 0x30000) is becoming moreand more attractive.
Paolo
Michael D Kinney
Paolo,
It makes sense to match real HW. That puts us back to
the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective. It look like the only issue left is DMA.
DMA protection of memory ranges is a chipset feature.
For the current QEMU implementation, what ranges of
memory are guaranteed to be protected from DMA? Is
it only A/B seg and TSEG?
Thanks,
Mike
toggle quoted message
Show quoted text
It makes sense to match real HW. That puts us back to
the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective. It look like the only issue left is DMA.
DMA protection of memory ranges is a chipset feature.
For the current QEMU implementation, what ranges of
memory are guaranteed to be protected from DMA? Is
it only A/B seg and TSEG?
Thanks,
Mike
-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@...]
Sent: Wednesday, August 21, 2019 10:40 AM
To: Kinney, Michael D <michael.d.kinney@...>;
rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@...>
Cc: Alex Williamson <alex.williamson@...>; Laszlo
Ersek <lersek@...>; devel@edk2.groups.io; qemu
devel list <qemu-devel@...>; Igor Mammedov
<imammedo@...>; Chen, Yingwen
<yingwen.chen@...>; Nakajima, Jun
<jun.nakajima@...>; Boris Ostrovsky
<boris.ostrovsky@...>; Joao Marcal Lemos Martins
<joao.m.martins@...>; Phillip Goerl
<phillip.goerl@...>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF
On 21/08/19 19:25, Kinney, Michael D wrote:Could we have an initial SMBASE that is within TSEG.initial SMBASE in
If we bring in hot plug CPUs one at a time, thenTSEG can reprogram the SMBASE to the correct value forthat CPU.allows the BSP
Can we add a register to the hot plug controller thatto set the initial SMBASE value for a hot added CPU?The default canbe 3000:8000 for compatibility.add CPU event, the
Another idea is when the SMI handler runs for a hotSMM monarch programs the hot plug controller registerwith the SMBASEto use for the CPU that is being added. As each CPUis added, adifferent SMBASE value can be programmed by the SMMMonarch.
Yes, all of these would work. Again, I'm interested in
having something that has a hope of being implemented in
real hardware.
Another, far easier to implement possibility could be a
lockable MSR (could be the existing
MSR_SMM_FEATURE_CONTROL) that allows programming the
SMBASE outside SMM. It would be nice if such a bit
could be defined by Intel.
Paolo
Laszlo Ersek
On 08/21/19 17:48, Kinney, Michael D wrote:
* http://mid.mail-archive.com/effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com
* https://edk2.groups.io/g/devel/message/45570
Namely:
same, or at least a very similar, idea.)
Thanks!
Laszlo
Perhaps there is a way to avoid the 3000:8000 startupYes, that is what I proposed here:
vector.
If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.
* http://mid.mail-archive.com/effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com
* https://edk2.groups.io/g/devel/message/45570
Namely:
When the SMM setup quiesces during normal firmware boot, OVMF could(I know that Paolo didn't like it; I'm just confirming that I had the
use existent (finalized) SMBASE infomation to *pre-program* some
virtual QEMU hardware, with such state that would be expected, as
"final" state, of any new hotplugged CPU. Afterwards, if / when the
hotplug actually happens, QEMU could blanket-apply this state to the
new CPU, and broadcast a hardware SMI to all CPUs except the new one.
same, or at least a very similar, idea.)
Thanks!
Laszlo
Laszlo Ersek
On 08/21/19 19:05, Paolo Bonzini wrote:
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.
Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.
Thanks
Laszlo
On 21/08/19 17:48, Kinney, Michael D wrote:The firmware and QEMU could agree on a formula, which would compute thePerhaps there is a way to avoid the 3000:8000 startupThe problem is not the SMRR but the SMBASE. If the SMBASE area is
vector.
If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.
Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?
For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.
We just need to decide what to do in the unexpected
case where all the active CPUs do not have the same
SMRR value.
This should also reduce the total number of steps.
outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be preprogrammed.
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.
Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.
Thanks
Laszlo
Laszlo Ersek
On 08/22/19 08:18, Paolo Bonzini wrote:
it would "only" need to be upstreamed to edk2. :)
This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)
Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?
... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,
For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.
Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.
)
Thanks!
Laszlo
On 21/08/19 22:17, Kinney, Michael D wrote:I agree, because...Paolo,Note that it'd also be fine to match some kind of official Intel
It makes sense to match real HW.
specification even if no processor (currently?) supports it.
that would suggest that matching reset vector code already exists, andThat puts us back to the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective.
it would "only" need to be upstreamed to edk2. :)
(It look like the only issue left is DMA.Yes.
DMA protection of memory ranges is a chipset feature. For the current
QEMU implementation, what ranges of memory are guaranteed to be
protected from DMA? Is it only A/B seg and TSEG?
This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)
Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?
... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,
For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.
Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.
)
Thanks!
Laszlo