Date   

Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

Hi Jiewen,

If a hot add CPU needs to run any code before the
first SMI, I would recommend is only executes code
from a write protected FLASH range without a stack
and then wait for the first SMI.

For this OVMF use case, is any CPU init required
before the first SMI?

From Paolo's list of steps are steps (8a) and (8b)
really required? Can the SMI monarch use the Local
APIC to send a directed SMI to the hot added CPU?
The SMI monarch needs to know the APIC ID of the
hot added CPU. Do we also need to handle the case
where multiple CPUs are added at once? I think we
would need to serialize the use of 3000:8000 for the
SMM rebase operation on each hot added CPU.

It would be simpler if we can guarantee that only
one CPU can be added or removed at a time and the
complete flow of adding a CPU to SMM and the OS
needs to be completed before another add/remove
event needs to be processed.

Mike

-----Original Message-----
From: Yao, Jiewen
Sent: Thursday, August 22, 2019 10:00 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
<lersek@redhat.com>; rfc@edk2.groups.io
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

Thank you Mike!

That is good reference on the real hardware behavior.
(Glad it is public.)

For threat model, the unique part in virtual environment
is temp RAM.
The temp RAM in real platform is per CPU cache, while
the temp RAM in virtual platform is global memory.
That brings one more potential attack surface in virtual
environment, if hot-added CPU need run code with stack
or heap before SMI rebase.

Other threats, such as SMRAM or DMA, are same.

Thank you
Yao Jiewen


-----Original Message-----
From: Kinney, Michael D
Sent: Friday, August 23, 2019 9:03 AM
To: Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
<lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
<jiewen.yao@intel.com>; Kinney, Michael D
<michael.d.kinney@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor
Mammedov <imammedo@redhat.com>; Chen, Yingwen
<yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>;
Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
Marcal Lemos
Martins <joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with
QEMU+OVMF

Paolo,

I find the following links related to the discussions
here along with
one example feature called GENPROTRANGE.

https://csrc.nist.gov/CSRC/media/Presentations/The-
Whole-is-Greater/im
a ges-media/day1_trusted-computing_200-250.pdf
https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-
Rene_CPU_Ho
t-Add_flow.pdf
https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-
datasheet-1131
292.pdf

Best regards,

Mike

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, August 22, 2019 4:12 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
Laszlo Ersek
<lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
<jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor
Mammedov <imammedo@redhat.com>; Chen, Yingwen
<yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>;
Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
Marcal Lemos
Martins <joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
using SMM with
QEMU+OVMF

On 23/08/19 00:32, Kinney, Michael D wrote:
Paolo,

It is my understanding that real HW hot plug uses
the
SDM defined
methods. Meaning the initial SMI is to 3000:8000
and
they rebase to
TSEG in the first SMI. They must have chipset
specific
methods to
protect 3000:8000 from DMA.
It would be great if you could check.

Can we add a chipset feature to prevent DMA to
64KB
range from
0x30000-0x3FFFF and the UEFI Memory Map and ACPI
content can be
updated so the Guest OS knows to not use that
range for
DMA?

If real hardware does it at the chipset level, we
will probably use
Igor's suggestion of aliasing A-seg to 3000:0000.
Before starting
the new CPU, the SMI handler can prepare the SMBASE
relocation
trampoline at
A000:8000 and the hot-plugged CPU will find it at
3000:8000 when it receives the initial SMI. Because
this is backed
by RAM at 0xA0000-0xAFFFF, DMA cannot access it and
would still go
through to RAM at 0x30000.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Laszlo Ersek
 

On 08/22/19 20:51, Paolo Bonzini wrote:
On 22/08/19 20:29, Laszlo Ersek wrote:
On 08/22/19 08:18, Paolo Bonzini wrote:
On 21/08/19 22:17, Kinney, Michael D wrote:
DMA protection of memory ranges is a chipset feature. For the current
QEMU implementation, what ranges of memory are guaranteed to be
protected from DMA? Is it only A/B seg and TSEG?
Yes.
This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)

Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,

For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.

Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.
It's much simpler: these ranges are not in e820, for example

kernel: BIOS-e820: [mem 0x0000000000059000-0x000000000008bfff] usable
kernel: BIOS-e820: [mem 0x000000000008c000-0x00000000000fffff] reserved
(1) Sorry, my _DMA quote was a detour from QEMU -- I wondered how a
physical machine with actual RAM at 0x30000, and also chipset level
protection against DMA to/from that RAM range, would expose the fact to
the OS (so that the OS not innocently try to set up DMA there).

(2) In case of QEMU+OVMF, "e820" is not defined at the firmware level.

While
- QEMU exports an "e820 map" (and OVMF does utilize that),
- and Linux parses the UEFI memmap into an "e820 map" (so that dependent
logic only need to deal with e820),

in edk2 the concepts are "GCD memory space map" and "UEFI memmap".

So what OVMF does is, it reserves the TSEG area in the UEFI memmap:

https://github.com/tianocore/edk2/commit/b09c1c6f2569a

(This was later de-constified for the extended TSEG size, in commit
23bfb5c0aab6, "OvmfPkg/PlatformPei: prepare for PcdQ35TsegMbytes
becoming dynamic", 2017-07-05).

This is just to say that with OVMF, TSEG is not absent from the UEFI
memmap, it is reserved instead. (Apologies if I misunderstood and you
didn't actually claim otherwise.)


The ranges are not special-cased in any way by QEMU. Simply, AB-segs
and TSEG RAM are not part of the address space except when in SMM.
(or when TSEG is not locked, and open; but:) yes, this matches my
understanding.

Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
and to the bit bucket. When AB-segs are open, for example, DMA to that
area becomes possible.
Which seems to imply that, if we alias 0x30000 to the AB-segs, and rely
on the AB-segs for CPU hotplug, OVMF should close and lock down the
AB-segs at first boot. Correct? (Because OVMF doesn't do anything about
AB at the moment.)

Thanks
Laszlo


Paolo

[1] old timers may remember DEF SEG=&HB800: BLOAD "foo.img",0. It still
works with some disk device models.


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Yao, Jiewen
 

Thank you Mike!

That is good reference on the real hardware behavior. (Glad it is public.)

For threat model, the unique part in virtual environment is temp RAM.
The temp RAM in real platform is per CPU cache, while the temp RAM in virtual platform is global memory.
That brings one more potential attack surface in virtual environment, if hot-added CPU need run code with stack or heap before SMI rebase.

Other threats, such as SMRAM or DMA, are same.

Thank you
Yao Jiewen

-----Original Message-----
From: Kinney, Michael D
Sent: Friday, August 23, 2019 9:03 AM
To: Paolo Bonzini <pbonzini@redhat.com>; Laszlo Ersek
<lersek@redhat.com>; rfc@edk2.groups.io; Yao, Jiewen
<jiewen.yao@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>; devel@edk2.groups.io;
qemu devel list <qemu-devel@nongnu.org>; Igor Mammedov
<imammedo@redhat.com>; Chen, Yingwen <yingwen.chen@intel.com>;
Nakajima, Jun <jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl <phillip.goerl@oracle.com>
Subject: RE: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with
QEMU+OVMF

Paolo,

I find the following links related to the discussions here
along with one example feature called GENPROTRANGE.

https://csrc.nist.gov/CSRC/media/Presentations/The-Whole-is-Greater/ima
ges-media/day1_trusted-computing_200-250.pdf
https://cansecwest.com/slides/2017/CSW2017_Cuauhtemoc-Rene_CPU_Ho
t-Add_flow.pdf
https://www.mouser.com/ds/2/612/5520-5500-chipset-ioh-datasheet-1131
292.pdf

Best regards,

Mike

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, August 22, 2019 4:12 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io;
Yao, Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 23/08/19 00:32, Kinney, Michael D wrote:
Paolo,

It is my understanding that real HW hot plug uses the
SDM defined
methods. Meaning the initial SMI is to 3000:8000 and
they rebase to
TSEG in the first SMI. They must have chipset specific
methods to
protect 3000:8000 from DMA.
It would be great if you could check.

Can we add a chipset feature to prevent DMA to 64KB
range from
0x30000-0x3FFFF and the UEFI Memory Map and ACPI
content can be
updated so the Guest OS knows to not use that range for
DMA?

If real hardware does it at the chipset level, we will
probably use Igor's suggestion of aliasing A-seg to
3000:0000. Before starting the new CPU, the SMI handler
can prepare the SMBASE relocation trampoline at
A000:8000 and the hot-plugged CPU will find it at
3000:8000 when it receives the initial SMI. Because this
is backed by RAM at 0xA0000-0xAFFFF, DMA cannot access it
and would still go through to RAM at 0x30000.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, August 22, 2019 4:12 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io;
Yao, Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 23/08/19 00:32, Kinney, Michael D wrote:
Paolo,

It is my understanding that real HW hot plug uses the
SDM defined
methods. Meaning the initial SMI is to 3000:8000 and
they rebase to
TSEG in the first SMI. They must have chipset specific
methods to
protect 3000:8000 from DMA.
It would be great if you could check.

Can we add a chipset feature to prevent DMA to 64KB
range from
0x30000-0x3FFFF and the UEFI Memory Map and ACPI
content can be
updated so the Guest OS knows to not use that range for
DMA?

If real hardware does it at the chipset level, we will
probably use Igor's suggestion of aliasing A-seg to
3000:0000. Before starting the new CPU, the SMI handler
can prepare the SMBASE relocation trampoline at
A000:8000 and the hot-plugged CPU will find it at
3000:8000 when it receives the initial SMI. Because this
is backed by RAM at 0xA0000-0xAFFFF, DMA cannot access it
and would still go through to RAM at 0x30000.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

Paolo,

It is my understanding that real HW hot plug uses the SDM defined
methods. Meaning the initial SMI is to 3000:8000 and they rebase
to TSEG in the first SMI. They must have chipset specific methods
to protect 3000:8000 from DMA.

Can we add a chipset feature to prevent DMA to 64KB range from
0x30000-0x3FFFF and the UEFI Memory Map and ACPI content can be
updated so the Guest OS knows to not use that range for DMA?

Thanks,

Mike

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, August 22, 2019 3:18 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
Laszlo Ersek <lersek@redhat.com>; rfc@edk2.groups.io;
Yao, Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 22/08/19 22:06, Kinney, Michael D wrote:
The SMBASE register is internal and cannot be directly
accessed by any
CPU. There is an SMBASE field that is member of the
SMM Save State
area and can only be modified from SMM and requires the
execution of
an RSM instruction from SMM for the SMBASE register to
be updated from
the current SMBASE field value. The new SMBASE
register value is only
used on the next SMI.
Actually there is also an SMBASE MSR, even though in
current silicon it's read-only and its use is
theoretically limited to SMM-transfer monitors. If that
MSR could be made accessible somehow outside SMM, that
would be great.

Once all the CPUs have been initialized for SMM, the
CPUs that are not
needed can be hot removed. As noted above, the SMBASE
value does not
change on an INIT. So as long as the hot add operation
does not do a
RESET, the SMBASE value must be preserved.
IIRC, hot-remove + hot-add will unplugs/plugs a
completely different CPU.

Another idea is to emulate this behavior. If the hot
plug controller
provide registers (only accessible from SMM) to assign
the SMBASE
address for every CPU. When a CPU is hot added, QEMU
can set the
internal SMBASE register value from the hot plug
controller register
value. If the SMM Monarch sends an INIT or an SMI from
the Local APIC
to the hot added CPU, then the SMBASE register should
not be modified
and the CPU starts execution within TSEG the first time
it receives an SMI.

Yes, this would work. But again---if the issue is real
on current hardware too, I'd rather have a matching
solution for virtual platforms.

If the current hardware for example remembers INIT-
preserved across hot-remove/hot-add, we could emulate
that.

I guess the fundamental question is: how do bare metal
platforms avoid this issue, or plan to avoid this issue?
Once we know that, we can use that information to find a
way to implement it in KVM. Only if it is impossible
we'll have a different strategy that is specific to our
platform.

Paolo

Jiewen and I can collect specific questions on this
topic and continue
the discussion here. For example, I do not think there
is any method
other than what I referenced above to program the
SMBASE register, but
I can ask if there are any other methods.


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 21/08/19 22:17, Kinney, Michael D wrote:
Paolo,

It makes sense to match real HW.
Note that it'd also be fine to match some kind of official Intel
specification even if no processor (currently?) supports it.

That puts us back to
the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective. It look like the only issue left is DMA.

DMA protection of memory ranges is a chipset feature.
For the current QEMU implementation, what ranges of
memory are guaranteed to be protected from DMA? Is
it only A/B seg and TSEG?
Yes.

Paolo

Yes, all of these would work. Again, I'm interested in
having something that has a hope of being implemented in
real hardware.

Another, far easier to implement possibility could be a
lockable MSR (could be the existing
MSR_SMM_FEATURE_CONTROL) that allows programming the
SMBASE outside SMM. It would be nice if such a bit
could be defined by Intel.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 21/08/19 19:25, Kinney, Michael D wrote:
Could we have an initial SMBASE that is within TSEG.

If we bring in hot plug CPUs one at a time, then initial
SMBASE in TSEG can reprogram the SMBASE to the correct
value for that CPU.

Can we add a register to the hot plug controller that
allows the BSP to set the initial SMBASE value for
a hot added CPU? The default can be 3000:8000 for
compatibility.

Another idea is when the SMI handler runs for a hot add
CPU event, the SMM monarch programs the hot plug controller
register with the SMBASE to use for the CPU that is being
added. As each CPU is added, a different SMBASE value can
be programmed by the SMM Monarch.
Yes, all of these would work. Again, I'm interested in having something
that has a hope of being implemented in real hardware.

Another, far easier to implement possibility could be a lockable MSR
(could be the existing MSR_SMM_FEATURE_CONTROL) that allows programming
the SMBASE outside SMM. It would be nice if such a bit could be defined
by Intel.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 21/08/19 17:48, Kinney, Michael D wrote:
Perhaps there is a way to avoid the 3000:8000 startup
vector.

If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.

Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?

For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.

We just need to decide what to do in the unexpected
case where all the active CPUs do not have the same
SMRR value.

This should also reduce the total number of steps.
The problem is not the SMRR but the SMBASE. If the SMBASE area is
outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be preprogrammed.

(As an aside, virt platforms are also immune to cache poisoning so they
don't have SMRR yet - we could use them for SMM_CODE_CHK_EN and block
execution outside SMRR but we never got round to it).

An even simpler alternative would be to make A0000h the initial SMBASE.
However, I would like to understand what hardware platforms plan to do,
if anything.

Paolo

Mike

-----Original Message-----
From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On
Behalf Of Yao, Jiewen
Sent: Sunday, August 18, 2019 4:01 PM
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
Ersek <lersek@redhat.com>; devel@edk2.groups.io; edk2-
rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@nongnu.org>; Igor Mammedov
<imammedo@redhat.com>; Chen, Yingwen
<yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

in real world, we deprecate AB-seg usage because they
are vulnerable to smm cache poison attack.
I assume cache poison is out of scope in the virtual
world, or there is a way to prevent ABseg cache poison.

thank you!
Yao, Jiewen


在 2019年8月19日,上午3:50,Paolo Bonzini
<pbonzini@redhat.com> 写道:

On 17/08/19 02:20, Yao, Jiewen wrote:
[Jiewen] That is OK. Then we MUST add the third
adversary.
-- Adversary: Simple hardware attacker, who can use
device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of
scope. That is be handled by IOMMU in the real world,
such as VTd. -- Please do clarify if this is TRUE.

In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region or
non-DMA capable
region. It depends upon the silicon design.
#4: the normal OS accessible memory - including ACPI
reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST be
DMA capable region.
As such, IOMMU protection is NOT required for #1 and
#2. IOMMU
protection MIGHT be required for #3 and MUST be
required for #4.
I assume the virtual environment is designed in the
same way. Please
correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only
problematic one;
Igor's idea (or a variant, for example optionally
remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more
and more attractive.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 22/08/19 20:29, Laszlo Ersek wrote:
On 08/22/19 08:18, Paolo Bonzini wrote:
On 21/08/19 22:17, Kinney, Michael D wrote:
DMA protection of memory ranges is a chipset feature. For the current
QEMU implementation, what ranges of memory are guaranteed to be
protected from DMA? Is it only A/B seg and TSEG?
Yes.
This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)

Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,

For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.

Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.
It's much simpler: these ranges are not in e820, for example

kernel: BIOS-e820: [mem 0x0000000000059000-0x000000000008bfff] usable
kernel: BIOS-e820: [mem 0x000000000008c000-0x00000000000fffff] reserved

The ranges are not special-cased in any way by QEMU. Simply, AB-segs
and TSEG RAM are not part of the address space except when in SMM.
Therefore, DMA to those ranges ends up respectively to low VGA RAM[1]
and to the bit bucket. When AB-segs are open, for example, DMA to that
area becomes possible.

Paolo

[1] old timers may remember DEF SEG=&HB800: BLOAD "foo.img",0. It still
works with some disk device models.


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 22/08/19 19:59, Laszlo Ersek wrote:
The firmware and QEMU could agree on a formula, which would compute the
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.

Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.
No, that would be unmaintainable. The best solution to me seems to be
to make SMBASE programmable from non-SMM code if some special conditions
hold. Michael, would it be possible to get in contact with the Intel
architects?

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

Laszlo,

I believe all the code for the AP startup vector
is already in edk2.

It is a combination of the reset vector code in
UefiCpuPkg/ResetVecor/Vtf0 and an IA32/X64 specific
feature in the GenFv tool. It sets up a 4KB aligned
location near 4GB which can be used to start an AP
using INIT-SIPI-SIPI.

DI is set to 'AP' if the processor is not the BSP.
This can be used to choose to put the APs into a
wait loop executing from the protected FLASH region.

The SMM Monarch on a hot add event can use the Local
APIC to send an INIT-SIPI-SIPI to wake the AP at the 4KB
startup vector in FLASH. Later the SMM Monarch
can sent use the Local APIC to send an SMI to pull the
hot added CPU into SMM. It is not clear if we have to
do both SIPI followed by the SMI or if we can just do
the SMI.

Best regards,

Mike

-----Original Message-----
From: devel@edk2.groups.io
[mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
Sent: Thursday, August 22, 2019 11:29 AM
To: Paolo Bonzini <pbonzini@redhat.com>; Kinney,
Michael D <michael.d.kinney@intel.com>;
rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 08/22/19 08:18, Paolo Bonzini wrote:
On 21/08/19 22:17, Kinney, Michael D wrote:
Paolo,

It makes sense to match real HW.
Note that it'd also be fine to match some kind of
official Intel
specification even if no processor (currently?)
supports it.

I agree, because...

That puts us back to the reset vector and handling
the initial SMI at
3000:8000. That is all workable from a FW
implementation
perspective.
that would suggest that matching reset vector code
already exists, and it would "only" need to be
upstreamed to edk2. :)

It look like the only issue left is DMA.

DMA protection of memory ranges is a chipset
feature. For the current
QEMU implementation, what ranges of memory are
guaranteed to be
protected from DMA? Is it only A/B seg and TSEG?
Yes.
(

This thread (esp. Jiewen's and Mike's messages) are the
first time that I've heard about the *existence* of
such RAM ranges / the chipset feature. :)

Out of interest (independently of virtualization), how
is a general purpose OS informed by the firmware,
"never try to set up DMA to this RAM area"? Is this
communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA
(Direct Memory Access)". It writes,

For example, if a platform implements a PCI bus
that cannot access
all of physical memory, it has a _DMA object under
that PCI bus that
describes the ranges of physical memory that can be
accessed by
devices on that bus.

Sorry about the digression, and also about being late
to this thread, continually -- I'm primarily following
and learning.

)

Thanks!
Laszlo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

Paolo,

The SMBASE register is internal and cannot be directly accessed
by any CPU. There is an SMBASE field that is member of the SMM Save
State area and can only be modified from SMM and requires the
execution of an RSM instruction from SMM for the SMBASE register to
be updated from the current SMBASE field value. The new SMBASE
register value is only used on the next SMI.

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Vol 3C - Section 34.11

The default base address for the SMRAM is 30000H. This value is contained in an internal processor register called
the SMBASE register. The operating system or executive can relocate the SMRAM by setting the SMBASE field in the
saved state map (at offset 7EF8H) to a new value (see Figure 34-4). The RSM instruction reloads the internal
SMBASE register with the value in the SMBASE field each time it exits SMM. All subsequent SMI requests will use
the new SMBASE value to find the starting address for the SMI handler (at SMBASE + 8000H) and the SMRAM state
save area (from SMBASE + FE00H to SMBASE + FFFFH). (The processor resets the value in its internal SMBASE
register to 30000H on a RESET, but does not change it on an INIT.)

One idea to work around these issues is to startup OVMF with the maximum number of
CPUs. All the CPUs will be assigned an SMBASE address and at a safe time to assign
the SMBASE values using the initial 3000:8000 SMI vector because there is a guarantee
of no DMA at that point in the FW init.

Once all the CPUs have been initialized for SMM, the CPUs that are not needed
can be hot removed. As noted above, the SMBASE value does not change on
an INIT. So as long as the hot add operation does not do a RESET, the
SMBASE value must be preserved.

Of course, this is not a good idea from a boot performance perspective,
especially if the max CPUs is a large value.

Another idea is to emulate this behavior. If the hot plug controller
provide registers (only accessible from SMM) to assign the SMBASE address
for every CPU. When a CPU is hot added, QEMU can set the internal SMBASE
register value from the hot plug controller register value. If the SMM
Monarch sends an INIT or an SMI from the Local APIC to the hot added CPU,
then the SMBASE register should not be modified and the CPU starts execution
within TSEG the first time it receives an SMI.

Jiewen and I can collect specific questions on this topic and continue
the discussion here. For example, I do not think there is any method
other than what I referenced above to program the SMBASE register, but
I can ask if there are any other methods.

Thanks,

Mike

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Thursday, August 22, 2019 11:43 AM
To: Laszlo Ersek <lersek@redhat.com>; Kinney, Michael D
<michael.d.kinney@intel.com>; rfc@edk2.groups.io; Yao,
Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
devel@edk2.groups.io; qemu devel list <qemu-
devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 22/08/19 19:59, Laszlo Ersek wrote:
The firmware and QEMU could agree on a formula, which
would compute
the CPU-specific SMBASE from a value pre-programmed by
the firmware,
and the initial APIC ID of the hot-added CPU.

Yes, it would duplicate code -- the calculation --
between QEMU and
edk2. While that's not optimal, it wouldn't be a first.
No, that would be unmaintainable. The best solution to
me seems to be to make SMBASE programmable from non-SMM
code if some special conditions hold. Michael, would it
be possible to get in contact with the Intel architects?

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Laszlo Ersek
 

On 08/22/19 08:18, Paolo Bonzini wrote:
On 21/08/19 22:17, Kinney, Michael D wrote:
Paolo,

It makes sense to match real HW.
Note that it'd also be fine to match some kind of official Intel
specification even if no processor (currently?) supports it.
I agree, because...

That puts us back to the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective.
that would suggest that matching reset vector code already exists, and
it would "only" need to be upstreamed to edk2. :)

It look like the only issue left is DMA.

DMA protection of memory ranges is a chipset feature. For the current
QEMU implementation, what ranges of memory are guaranteed to be
protected from DMA? Is it only A/B seg and TSEG?
Yes.
(

This thread (esp. Jiewen's and Mike's messages) are the first time that
I've heard about the *existence* of such RAM ranges / the chipset
feature. :)

Out of interest (independently of virtualization), how is a general
purpose OS informed by the firmware, "never try to set up DMA to this
RAM area"? Is this communicated through ACPI _CRS perhaps?

... Ah, almost: ACPI 6.2 specifies _DMA, in "6.2.4 _DMA (Direct Memory
Access)". It writes,

For example, if a platform implements a PCI bus that cannot access
all of physical memory, it has a _DMA object under that PCI bus that
describes the ranges of physical memory that can be accessed by
devices on that bus.

Sorry about the digression, and also about being late to this thread,
continually -- I'm primarily following and learning.

)

Thanks!
Laszlo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Laszlo Ersek
 

On 08/21/19 19:05, Paolo Bonzini wrote:
On 21/08/19 17:48, Kinney, Michael D wrote:
Perhaps there is a way to avoid the 3000:8000 startup
vector.

If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.

Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?

For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.

We just need to decide what to do in the unexpected
case where all the active CPUs do not have the same
SMRR value.

This should also reduce the total number of steps.
The problem is not the SMRR but the SMBASE. If the SMBASE area is
outside TSEG, it is vulnerable to DMA attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be preprogrammed.
The firmware and QEMU could agree on a formula, which would compute the
CPU-specific SMBASE from a value pre-programmed by the firmware, and the
initial APIC ID of the hot-added CPU.

Yes, it would duplicate code -- the calculation -- between QEMU and
edk2. While that's not optimal, it wouldn't be a first.

Thanks
Laszlo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Laszlo Ersek
 

On 08/21/19 17:48, Kinney, Michael D wrote:
Perhaps there is a way to avoid the 3000:8000 startup
vector.

If a CPU is added after a cold reset, it is already in a
different state because one of the active CPUs needs to
release it by interacting with the hot plug controller.

Can the SMRR for CPUs in that state be pre-programmed to
match the SMRR in the rest of the active CPUs?

For OVMF we expect all the active CPUs to use the same
SMRR value, so a check can be made to verify that all
the active CPUs have the same SMRR value. If they do,
then any CPU released through the hot plug controller
can have its SMRR pre-programmed and the initial SMI
will start within TSEG.
Yes, that is what I proposed here:

* http://mid.mail-archive.com/effa5e32-be1e-4703-4419-8866b7754e2d@redhat.com
* https://edk2.groups.io/g/devel/message/45570

Namely:

When the SMM setup quiesces during normal firmware boot, OVMF could
use existent (finalized) SMBASE infomation to *pre-program* some
virtual QEMU hardware, with such state that would be expected, as
"final" state, of any new hotplugged CPU. Afterwards, if / when the
hotplug actually happens, QEMU could blanket-apply this state to the
new CPU, and broadcast a hardware SMI to all CPUs except the new one.
(I know that Paolo didn't like it; I'm just confirming that I had the
same, or at least a very similar, idea.)

Thanks!
Laszlo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

Paolo,

It makes sense to match real HW. That puts us back to
the reset vector and handling the initial SMI at
3000:8000. That is all workable from a FW implementation
perspective. It look like the only issue left is DMA.

DMA protection of memory ranges is a chipset feature.
For the current QEMU implementation, what ranges of
memory are guaranteed to be protected from DMA? Is
it only A/B seg and TSEG?

Thanks,

Mike

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, August 21, 2019 10:40 AM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
Ersek <lersek@redhat.com>; devel@edk2.groups.io; qemu
devel list <qemu-devel@nongnu.org>; Igor Mammedov
<imammedo@redhat.com>; Chen, Yingwen
<yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 21/08/19 19:25, Kinney, Michael D wrote:
Could we have an initial SMBASE that is within TSEG.

If we bring in hot plug CPUs one at a time, then
initial SMBASE in
TSEG can reprogram the SMBASE to the correct value for
that CPU.

Can we add a register to the hot plug controller that
allows the BSP
to set the initial SMBASE value for a hot added CPU?
The default can
be 3000:8000 for compatibility.

Another idea is when the SMI handler runs for a hot
add CPU event, the
SMM monarch programs the hot plug controller register
with the SMBASE
to use for the CPU that is being added. As each CPU
is added, a
different SMBASE value can be programmed by the SMM
Monarch.

Yes, all of these would work. Again, I'm interested in
having something that has a hope of being implemented in
real hardware.

Another, far easier to implement possibility could be a
lockable MSR (could be the existing
MSR_SMM_FEATURE_CONTROL) that allows programming the
SMBASE outside SMM. It would be nice if such a bit
could be defined by Intel.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Michael D Kinney
 

Could we have an initial SMBASE that is within TSEG.

If we bring in hot plug CPUs one at a time, then initial
SMBASE in TSEG can reprogram the SMBASE to the correct
value for that CPU.

Can we add a register to the hot plug controller that
allows the BSP to set the initial SMBASE value for
a hot added CPU? The default can be 3000:8000 for
compatibility.

Another idea is when the SMI handler runs for a hot add
CPU event, the SMM monarch programs the hot plug controller
register with the SMBASE to use for the CPU that is being
added. As each CPU is added, a different SMBASE value can
be programmed by the SMM Monarch.

Mike

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Wednesday, August 21, 2019 10:06 AM
To: Kinney, Michael D <michael.d.kinney@intel.com>;
rfc@edk2.groups.io; Yao, Jiewen <jiewen.yao@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>; Laszlo
Ersek <lersek@redhat.com>; devel@edk2.groups.io; qemu
devel list <qemu-devel@nongnu.org>; Igor Mammedov
<imammedo@redhat.com>; Chen, Yingwen
<yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky
<boris.ostrovsky@oracle.com>; Joao Marcal Lemos Martins
<joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using
SMM with QEMU+OVMF

On 21/08/19 17:48, Kinney, Michael D wrote:
Perhaps there is a way to avoid the 3000:8000 startup
vector.

If a CPU is added after a cold reset, it is already in
a different
state because one of the active CPUs needs to release
it by
interacting with the hot plug controller.

Can the SMRR for CPUs in that state be pre-programmed
to match the
SMRR in the rest of the active CPUs?

For OVMF we expect all the active CPUs to use the same
SMRR value, so
a check can be made to verify that all the active CPUs
have the same
SMRR value. If they do, then any CPU released through
the hot plug
controller can have its SMRR pre-programmed and the
initial SMI will
start within TSEG.

We just need to decide what to do in the unexpected
case where all the
active CPUs do not have the same SMRR value.

This should also reduce the total number of steps.
The problem is not the SMRR but the SMBASE. If the
SMBASE area is outside TSEG, it is vulnerable to DMA
attacks independent of the SMRR.
SMBASE is also different for all CPUs, so it cannot be
preprogrammed.

(As an aside, virt platforms are also immune to cache
poisoning so they don't have SMRR yet - we could use
them for SMM_CODE_CHK_EN and block execution outside
SMRR but we never got round to it).

An even simpler alternative would be to make A0000h the
initial SMBASE.
However, I would like to understand what hardware
platforms plan to do, if anything.

Paolo

Mike

-----Original Message-----
From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io]
On Behalf Of
Yao, Jiewen
Sent: Sunday, August 18, 2019 4:01 PM
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>;
Laszlo Ersek
<lersek@redhat.com>; devel@edk2.groups.io; edk2- rfc-
groups-io
<rfc@edk2.groups.io>; qemu devel list <qemu-
devel@nongnu.org>; Igor
Mammedov <imammedo@redhat.com>; Chen, Yingwen
<yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>;
Boris Ostrovsky <boris.ostrovsky@oracle.com>; Joao
Marcal Lemos
Martins <joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug
using SMM with
QEMU+OVMF

in real world, we deprecate AB-seg usage because they
are vulnerable
to smm cache poison attack.
I assume cache poison is out of scope in the virtual
world, or there
is a way to prevent ABseg cache poison.

thank you!
Yao, Jiewen


在 2019年8月19日,上午3:50,Paolo Bonzini
<pbonzini@redhat.com> 写道:

On 17/08/19 02:20, Yao, Jiewen wrote:
[Jiewen] That is OK. Then we MUST add the third
adversary.
-- Adversary: Simple hardware attacker, who can use
device to perform DMA attack in the virtual world.
NOTE: The DMA attack in the real world is out of
scope. That is be handled by IOMMU in the real world,
such as VTd. --
Please do clarify if this is TRUE.

In the real world:
#1: the SMM MUST be non-DMA capable region.
#2: the MMIO MUST be non-DMA capable region.
#3: the stolen memory MIGHT be DMA capable region
or
non-DMA capable
region. It depends upon the silicon design.
#4: the normal OS accessible memory - including
ACPI
reclaim, ACPI
NVS, and reserved memory not included by #3 - MUST
be
DMA capable region.
As such, IOMMU protection is NOT required for #1
and
#2. IOMMU
protection MIGHT be required for #3 and MUST be
required for #4.
I assume the virtual environment is designed in the
same way. Please
correct me if I am wrong.
Correct. The 0x30000...0x3ffff area is the only
problematic one;
Igor's idea (or a variant, for example optionally
remapping
0xa0000..0xaffff SMRAM to 0x30000) is becoming more
and more attractive.

Paolo


Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 16/08/19 04:46, Yao, Jiewen wrote:
Comment below:


-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com]
Sent: Friday, August 16, 2019 12:21 AM
To: Laszlo Ersek <lersek@redhat.com>; devel@edk2.groups.io; Yao, Jiewen
<jiewen.yao@intel.com>
Cc: edk2-rfc-groups-io <rfc@edk2.groups.io>; qemu devel list
<qemu-devel@nongnu.org>; Igor Mammedov <imammedo@redhat.com>;
Chen, Yingwen <yingwen.chen@intel.com>; Nakajima, Jun
<jun.nakajima@intel.com>; Boris Ostrovsky <boris.ostrovsky@oracle.com>;
Joao Marcal Lemos Martins <joao.m.martins@oracle.com>; Phillip Goerl
<phillip.goerl@oracle.com>
Subject: Re: [edk2-devel] CPU hotplug using SMM with QEMU+OVMF

On 15/08/19 17:00, Laszlo Ersek wrote:
On 08/14/19 16:04, Paolo Bonzini wrote:
On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC
code?
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
[Jiewen] I think this is blocked from hardware perspective, since the first
instruction.
There are some hardware specific registers can be used to determine if
the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in
OVMF hot plug driver.

Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.

- How do we tell the hot-plugged AP where to start execution? (I.e.
that
it should execute code at a particular pflash location.)
[Jiewen] Same real mode reset vector at FFFF:FFF0.
You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.

We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.

I don’t think there is problem for real hardware, who always has CAR.
Can QEMU provide some CPU specific space, such as MMIO region?
Why is a CPU-specific region needed if every other processor is in SMM
and thus trusted.
I was going through the steps Jiewen and Yingwen recommended.

In step (02), the new CPU is expected to set up RAM access. In step
(03), the new CPU, executing code from flash, is expected to "send board
message to tell host CPU (GPIO->SCI) -- I am waiting for hot-add
message." For that action, the new CPU may need a stack (minimally if we
want to use C function calls).

Until step (03), there had been no word about any other (= pre-plugged)
CPUs (more precisely, Jiewen even confirmed "No impact to other
processors"), so I didn't assume that other CPUs had entered SMM.

Paolo, I've attempted to read Jiewen's response, and yours, as carefully
as I can. I'm still very confused. If you have a better understanding,
could you please write up the 15-step process from the thread starter
again, with all QEMU customizations applied? Such as, unnecessary steps
removed, and platform specifics filled in.
Sure.

(01a) QEMU: create new CPU. The CPU already exists, but it does not
start running code until unparked by the CPU hotplug controller.

(01b) QEMU: trigger SCI

(02-03) no equivalent

(04) Host CPU: (OS) execute GPE handler from DSDT

(05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU
will not enter CPU because SMI is disabled)

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.

(07a) Host CPU: (SMM) Write to CPU hotplug controller to enable
new CPU

(07b) Host CPU: (SMM) Send INIT/SIPI/SIPI to new CPU.
[Jiewen] NOTE: INIT/SIPI/SIPI can be sent by a malicious CPU. There is no
restriction that INIT/SIPI/SIPI can only be sent in SMM.
All of the CPUs are now in SMM, and INIT/SIPI/SIPI will be discarded
before 07a, so this is okay.

However I do see a problem, because a PCI device's DMA could overwrite
0x38000 between (06) and (10) and hijack the code that is executed in
SMM. How is this avoided on real hardware? By the time the new CPU
enters SMM, it doesn't run off cache-as-RAM anymore.

Paolo

(08a) New CPU: (Low RAM) Enter protected mode.
[Jiewen] NOTE: The new CPU still cannot use any physical memory, because
the INIT/SIPI/SIPI may be sent by malicious CPU in non-SMM environment.

(08b) New CPU: (Flash) Signals host CPU to proceed and enter cli;hlt loop.

(09) Host CPU: (SMM) Send SMI to the new CPU only.

(10) New CPU: (SMM) Run SMM code at 38000, and rebase SMBASE to
TSEG.

(11) Host CPU: (SMM) Restore 38000.

(12) Host CPU: (SMM) Update located data structure to add the new CPU
information. (This step will involve CPU_SERVICE protocol)

(13) New CPU: (Flash) do whatever other initialization is needed

(14) New CPU: (Flash) Deadloop, and wait for INIT-SIPI-SIPI.

(15) Host CPU: (OS) Send INIT-SIPI-SIPI to pull new CPU in..


In other words, the cache-as-RAM phase of 02-03 is replaced by the
INIT-SIPI-SIPI sequence of 07b-08a-08b.
[Jiewen] I am OK with this proposal.
I think the rule is same - the new CPU CANNOT touch any system memory,
no matter it is from reset-vector or from INIT/SIPI/SIPI.
Or I would say: if the new CPU want to touch some memory before first SMI, the memory should be
CPU specific or on the flash.



The QEMU DSDT could be modified (when secure boot is in effect) to OUT
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.
I dislike involving QEMU's generated DSDT in anything SMM (even
injecting the SMI), because the AML interpreter runs in the OS.

If a malicious OS kernel is a bit too enlightened about the DSDT, it
could willfully diverge from the process that we design. If QEMU
broadcast the SMI internally, the guest OS could not interfere with that.

If the purpose of the SMI is specifically to force all CPUs into SMM
(and thereby force them into trusted state), then the OS would be
explicitly counter-interested in carrying out the AML operations from
QEMU's DSDT.
But since the hotplug controller would only be accessible from SMM,
there would be no other way to invoke it than to follow the DSDT's
instruction and write to 0xB2. FWIW, real hardware also has plenty of
0xB2 writes in the DSDT or in APEI tables (e.g. for persistent store
access).

Paolo


Re: CPU hotplug using SMM with QEMU+OVMF

Paolo Bonzini <pbonzini@...>
 

On 15/08/19 18:07, Igor Mammedov wrote:
Looking at Q35 code and Seabios SMM relocation as example, if I see it
right QEMU has:
- SMRAM is aliased from DRAM at 0xa0000
- and TSEG steals from the top of low RAM when configured

Now problem is that default SMBASE at 0x30000 isn't backed by anything
in SMRAM address space and default SMI entry falls-through to the same
location in System address space.

The later is not trusted and entry into SMM mode will corrupt area + might
jump to 'random' SMI handler (hence save/restore code in Seabios).

Here is an idea, can we map a memory region at 0x30000 in SMRAM address
space with relocation space/code reserved. It could be a part of TSEG
(so we don't have to invent ABI to configure that)?
No, there could be real mode code using it. What we _could_ do is
initialize SMBASE to 0xa0000, but I think it's better to not deviate too
much from processor behavior (even if it's admittedly a 20-years legacy
that doesn't make any sense).

Paolo


Re: CPU hotplug using SMM with QEMU+OVMF

Igor Mammedov <imammedo@...>
 

On Wed, 14 Aug 2019 16:04:50 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

On 14/08/19 15:20, Yao, Jiewen wrote:
- Does this part require a new branch somewhere in the OVMF SEC code?
How do we determine whether the CPU executing SEC is BSP or
hot-plugged AP?
[Jiewen] I think this is blocked from hardware perspective, since the first instruction.
There are some hardware specific registers can be used to determine if the CPU is new added.
I don’t think this must be same as the real hardware.
You are free to invent some registers in device model to be used in OVMF hot plug driver.
Yes, this would be a new operation mode for QEMU, that only applies to
hot-plugged CPUs. In this mode the AP doesn't reply to INIT or SMI, in
fact it doesn't reply to anything at all.

- How do we tell the hot-plugged AP where to start execution? (I.e. that
it should execute code at a particular pflash location.)
[Jiewen] Same real mode reset vector at FFFF:FFF0.
You do not need a reset vector or INIT/SIPI/SIPI sequence at all in
QEMU. The AP does not start execution at all when it is unplugged, so
no cache-as-RAM etc.

We only need to modify QEMU so that hot-plugged APIs do not reply to
INIT/SIPI/SMI.

I don’t think there is problem for real hardware, who always has CAR.
Can QEMU provide some CPU specific space, such as MMIO region?
Why is a CPU-specific region needed if every other processor is in SMM
and thus trusted.

Does CPU hotplug apply only at the socket level? If the CPU is
multi-core, what is responsible for hot-plugging all cores present in
the socket?
I can answer this: the SMM handler would interact with the hotplug
controller in the same way that ACPI DSDT does normally. This supports
multiple hotplugs already.

Writes to the hotplug controller from outside SMM would be ignored.

(03) New CPU: (Flash) send board message to tell host CPU (GPIO->SCI)
-- I am waiting for hot-add message.
Maybe we can simplify this in QEMU by broadcasting an SMI to existent
processors immediately upon plugging the new CPU.
The QEMU DSDT could be modified (when secure boot is in effect) to OUT
to 0xB2 when hotplug happens. It could write a well-known value to
0xB2, to be read by an SMI handler in edk2.



(NOTE: Host CPU can only
send
instruction in SMM mode. -- The register is SMM only)
Sorry, I don't follow -- what register are we talking about here, and
why is the BSP needed to send anything at all? What "instruction" do you
have in mind?
[Jiewen] The new CPU does not enable SMI at reset.
At some point of time later, the CPU need enable SMI, right?
The "instruction" here means, the host CPUs need tell to CPU to enable SMI.
Right, this would be a write to the CPU hotplug controller

(04) Host CPU: (OS) get message from board that a new CPU is added.
(GPIO -> SCI)

(05) Host CPU: (OS) All CPUs enter SMM (SCI->SWSMI) (NOTE: New CPU
will not enter CPU because SMI is disabled)
I don't understand the OS involvement here. But, again, perhaps QEMU can
force all existent CPUs into SMM immediately upon adding the new CPU.
[Jiewen] OS here means the Host CPU running code in OS environment, not in SMM environment.
See above.

(06) Host CPU: (SMM) Save 38000, Update 38000 -- fill simple SMM
rebase code.

(07) Host CPU: (SMM) Send message to New CPU to Enable SMI.
Aha, so this is the SMM-only register you mention in step (03). Is the
register specified in the Intel SDM?
[Jiewen] Right. That is the register to let host CPU tell new CPU to enable SMI.
It is platform specific register. Not defined in SDM.
You may invent one in device model.
See above.

(10) New CPU: (SMM) Response first SMI at 38000, and rebase SMBASE to
TSEG.
What code does the new CPU execute after it completes step (10)? Does it
halt?
[Jiewen] The new CPU exits SMM and return to original place - where it is
interrupted to enter SMM - running code on the flash.
So in our case we'd need an INIT/SIPI/SIPI sequence between (06) and (07).
Looking at Q35 code and Seabios SMM relocation as example, if I see it
right QEMU has:
- SMRAM is aliased from DRAM at 0xa0000
- and TSEG steals from the top of low RAM when configured

Now problem is that default SMBASE at 0x30000 isn't backed by anything
in SMRAM address space and default SMI entry falls-through to the same
location in System address space.

The later is not trusted and entry into SMM mode will corrupt area + might
jump to 'random' SMI handler (hence save/restore code in Seabios).

Here is an idea, can we map a memory region at 0x30000 in SMRAM address
space with relocation space/code reserved. It could be a part of TSEG
(so we don't have to invent ABI to configure that)?

In that case we do not have to care about System address space content
anymore and un-trusted code shouldn't be able to supply rogue SMI handler.
(that would cross out one of the reasons for inventing disabled-INIT/SMI state)


(11) Host CPU: (SMM) Restore 38000.
These steps (i.e., (06) through (11)) don't appear RAS-specific. The
only platform-specific feature seems to be SMI masking register, which
could be extracted into a new SmmCpuFeaturesLib API.

Thus, would you please consider open sourcing firmware code for steps
(06) through (11)?

Alternatively -- and in particular because the stack for step (01)
concerns me --, we could approach this from a high-level, functional
perspective. The states that really matter are the relocated SMBASE for
the new CPU, and the state of the full system, right at the end of step
(11).

When the SMM setup quiesces during normal firmware boot, OVMF could
use
existent (finalized) SMBASE infomation to *pre-program* some virtual
QEMU hardware, with such state that would be expected, as "final" state,
of any new hotplugged CPU. Afterwards, if / when the hotplug actually
happens, QEMU could blanket-apply this state to the new CPU, and
broadcast a hardware SMI to all CPUs except the new one.
I'd rather avoid this and stay as close as possible to real hardware.

Paolo

661 - 680 of 740