A problem with live migration of UEFI virtual machines


"wuchenye1995
 

Hi all,
   We found a problem with live migration of UEFI virtual machines due to size of OVMF.fd changes.
   Specifically, the size of OVMF.fd in edk with low version such as edk-2.0-25 is 2MB while the size of it in higher version such as edk-2.0-30 is 4MB.
   When we migrate a UEFI virtual machine from the host with low version of edk2 to the host with higher one, qemu component will report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
   We want to know how to solve this problem after updating the version of edk2.
   Thank you.


                                                                                                                                                                                                                                                          Chenye Wu
                                                                                                                                                                                                                                                             2020.2.10


"wuchenye1995
 

Hi all,
   We found a problem with live migration of UEFI virtual machines due to size of OVMF.fd changes.
   Specifically, the size of OVMF.fd in edk with low version such as edk-2.0-25 is 2MB while the size of it in higher version such as edk-2.0-30 is 4MB.
   When we migrate a UEFI virtual machine from the host with low version of edk2 to the host with higher one, qemu component will report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
   We want to know how to solve this problem after updating the version of edk2.
   Thank you.


                                                                                                                                                                                                                                                          Chenye Wu                                                                                                                                                                                                                                                             2020.2.10


Laszlo Ersek
 

(Replying through the groups.io web interface, just this one time)

wuchenye1995 wrote:

> Hi all,
>
> We found a problem with live migration of UEFI virtual machines due to
> size of OVMF.fd changes.
>
> Specifically, the size of OVMF.fd in edk with low version such as
> edk-2.0-25 is 2MB while the size of it in higher version such as
> edk-2.0-30 is 4MB.
>
> When we migrate a UEFI virtual machine from the host with low version
> of edk2 to the host with higher one, qemu component will report an
> error in function qemu_ram_resize while checking size of ovmf_pcbios:
> Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
>
> We want to know how to solve this problem after updating the version
> of edk2.

You can't solve it. The 2MB and 4MB builds of OVMF are fundamentally
incompatible with each other. It's actually beneficial that QEMU cleanly
prevents such attempts at migration; otherwise you'd see misbehavior
that would be much less graceful.

Please see commit b24fca05751f ("OvmfPkg: introduce 4MB flash image
(mainly) for Windows HCK", 2017-05-05) for more info:

  https://github.com/tianocore/edk2/commit/b24fca05751f

This is the reason why the Fedora edk2 package build script (= RPM spec
file) uses "-D FD_SIZE_2MB" explicitly.

Thanks
Laszlo


"wuchenye1995
 

Hi all,
   We found a problem with live migration of UEFI virtual machines due to size of OVMF.fd changes.
   Specifically, the size of OVMF.fd in edk with low version such as edk-2.0-25 is 2MB while the size of it in higher version such as edk-2.0-30 is 4MB.
   When we migrate a UEFI virtual machine from the host with low version of edk2 to the host with higher one, qemu component will report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
   We want to know how to solve this problem after updating the version of edk2.
   Thank you.


                                                                                                                                                                                                                                                          Chenye Wu                                                                                                                                                                                                                                                             2020.2.12


Alex Bennée <alex.bennee@...>
 

wuchenye1995 <wuchenye1995@...> writes:

Hi all,
We found a problem with live migration of UEFI virtual machines due to size of OVMF.fd changes.
Specifically, the size of OVMF.fd in edk with low version such as edk-2.0-25 is 2MB while the size of it in higher version such as edk-2.0-30 is 4MB.
When we migrate a UEFI virtual machine from the host with low version of edk2 to the host with higher one, qemu component will report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
We want to know how to solve this problem after updating the
version of edk2.
You can only migrate a machine that is identical - so instantiating a
empty machine with a different EDK image is bound to cause a problem
because the machines don't match.

--
Alex Bennée


Dr. David Alan Gilbert
 

* wuchenye1995 (wuchenye1995@...) wrote:

We found a problem with live migration of UEFI virtual machines due to size of OVMF.fd changes.</div><div class=" selfdiv" style="height: 79.6875px; width: auto !important;"
Specifically, the size of OVMF.fd in edk with low version such as edk-2.0-25 is <b>2MB</b> while the size of it in higher version such as edk-2.0-30 is <b>4MB</b>.
When we migrate a UEFI virtual machine from the host with low version of edk2 to the host with higher one, qemu component will report an error in function
qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
We want to know how to solve this problem after updating the version of edk2.
When you migrate, you must migrate between identical configurations; so
you need ROM images (including edk2) that are the same size.
There's two answers;
a) Stick with the same version of the ROM between VMs you want to
migrate
b) Pad your ROM images to some larger size (e.g. 8MB) so that
even if they grow a little bigger then you don't hit the problem.

Dave
P.S. Please use plain text email

Dr. David Alan Gilbert / dgilbert@... / Manchester, UK


Daniel P. Berrangé <berrange@...>
 

On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:

wuchenye1995 <wuchenye1995@...> writes:

Hi all,
We found a problem with live migration of UEFI virtual machines due to size of OVMF.fd changes.
Specifically, the size of OVMF.fd in edk with low version such as edk-2.0-25 is 2MB while the size of it in higher version such as edk-2.0-30 is 4MB.
When we migrate a UEFI virtual machine from the host with low version of edk2 to the host with higher one, qemu component will report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in != 0x400000: Invalid argument.
We want to know how to solve this problem after updating the
version of edk2.
You can only migrate a machine that is identical - so instantiating a
empty machine with a different EDK image is bound to cause a problem
because the machines don't match.
I don't believe we are that strict for firmware in general. The firmware
is loaded when QEMU starts, but that only matters for the original
source host QEMU. During migration, the memory content of the original
firmware will be copied during live migration, overwriting whatever the
target QEMU loaded off disk. This works....provided the memory region
is the same size on source & target host, which is where the problem
arises in this case.

If there's a risk that newer firmware will be larger than old firmware
there's only really two options:

- Keep all firmware images forever, each with a unique versioned
filename. This ensures target QEMU will always load the original
smaller firmware

- Add padding to the firmware images. IOW, if the firmware is 2 MB,
add zero-padding to the end of the image to round it upto 4 MB
(whatever you anticipate the largest size wil be in future).

Distros have often taken the latter approach for QEMU firmware in the
past. The main issue is that you have to plan ahead of time and get
this padding right from the very start. You can't add the padding after
the fact on an existing VM.

Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|


Laszlo Ersek
 

On 02/24/20 16:28, Daniel P. Berrangé wrote:
On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:

wuchenye1995 <wuchenye1995@...> writes:

Hi all,
We found a problem with live migration of UEFI virtual machines
due to size of OVMF.fd changes.
Specifically, the size of OVMF.fd in edk with low version such as
edk-2.0-25 is 2MB while the size of it in higher version such as
edk-2.0-30 is 4MB.
When we migrate a UEFI virtual machine from the host with low
version of edk2 to the host with higher one, qemu component will
report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in
!= 0x400000: Invalid argument.
We want to know how to solve this problem after updating the
version of edk2.
You can only migrate a machine that is identical - so instantiating a
empty machine with a different EDK image is bound to cause a problem
because the machines don't match.
I don't believe we are that strict for firmware in general. The
firmware is loaded when QEMU starts, but that only matters for the
original source host QEMU. During migration, the memory content of the
original firmware will be copied during live migration, overwriting
whatever the target QEMU loaded off disk. This works....provided the
memory region is the same size on source & target host, which is where
the problem arises in this case.

If there's a risk that newer firmware will be larger than old firmware
there's only really two options:

- Keep all firmware images forever, each with a unique versioned
filename. This ensures target QEMU will always load the original
smaller firmware

- Add padding to the firmware images. IOW, if the firmware is 2 MB,
add zero-padding to the end of the image to round it upto 4 MB
(whatever you anticipate the largest size wil be in future).

Distros have often taken the latter approach for QEMU firmware in the
past. The main issue is that you have to plan ahead of time and get
this padding right from the very start. You can't add the padding
after the fact on an existing VM.
Following up here *too*, just for completeness.

The query in this thread has been posted three times now (and I have
zero idea why). Each time it generated a different set of responses. For
completes, I'm now going to link the other two threads here (because the
present thread seems to have gotten the most feedback).

To the OP:

- please do *NOT* repost the same question once you get an answer. It
only fragments the discussion and creates confusion. It also doesn't
hurt if you *confirm* that you understood the answer.

- Yet further, if your email address has @gmail.com for domain, but your
msgids contain "tencent", that raises some eyebrows (mine for sure).
You say "we" in the query, but never identify the organization behind
the plural pronoun.

(I've been fuming about the triple-posting of the question for a while
now, but it's only now that, upon seeing how much work Dan has put into
his answer, I've decided that dishing out a bit of netiquette would be
in order.)

* First posting:
- msgid: <tencent_F1295F826E46EDFF3D77812B@...>
- edk2-devel: https://edk2.groups.io/g/devel/message/54146
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html

* my response:
- msgid: <12553.1581366059422195003@groups.io>
- edk2-devel: https://edk2.groups.io/g/devel/message/54161
- qemu-devel: none, because (as an exception) I used the stupid
groups.io web interface to respond, and so my response
never reached qemu-devel

* Second posting (~4 hours after the first)
- msgid: <tencent_3CD8845EC159F0161725898B@...>
- edk2-devel: https://edk2.groups.io/g/devel/message/54147
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html

* Dave's response:
- msgid: <20200220154742.GC2882@work-vm>
- edk2-devel: https://edk2.groups.io/g/devel/message/54681
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html

* Third posting (next day, present thread) -- cross posted to yet
another list (!), because apparently Dave's feedback and mine had not
been enough:
- msgid: <tencent_BC7FD00363690990994E90F8@...>
- edk2-devel: https://edk2.groups.io/g/devel/message/54220
- edk2-discuss: https://edk2.groups.io/g/discuss/message/135
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html

Back on topic: see my response again. The answer is, you can't solve the
problem (specifically with OVMF), and QEMU in fact does you service by
preventing the migration.

Laszlo


Andrew Fish
 

Laszlo,

If I understand this correctly is it not more complicated than just size. It also assumes the memory layout is the same? The legacy BIOS used fixed magic address ranges, but UEFI uses dynamically allocated memory so addresses are not fixed. While the UEFI firmware does try to keep S3 and S4 layouts consistent between boots, I'm not aware of any mechanism to keep the memory map address the same between versions of the firmware? 

Thanks,

Andrew Fish

On Feb 25, 2020, at 9:53 AM, Laszlo Ersek <lersek@...> wrote:

On 02/24/20 16:28, Daniel P. Berrangé wrote:
On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:

wuchenye1995 <wuchenye1995@...> writes:

Hi all,
  We found a problem with live migration of UEFI virtual machines
  due to size of OVMF.fd changes.
  Specifically, the size of OVMF.fd in edk with low version such as
  edk-2.0-25 is 2MB while the size of it in higher version such as
  edk-2.0-30 is 4MB.
  When we migrate a UEFI virtual machine from the host with low
  version of edk2 to the host with higher one, qemu component will
  report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in
!= 0x400000: Invalid argument.
  We want to know how to solve this problem after updating the
  version of edk2.

You can only migrate a machine that is identical - so instantiating a
empty machine with a different EDK image is bound to cause a problem
because the machines don't match.

I don't believe we are that strict for firmware in general. The
firmware is loaded when QEMU starts, but that only matters for the
original source host QEMU. During migration, the memory content of the
original firmware will be copied during live migration, overwriting
whatever the target QEMU loaded off disk. This works....provided the
memory region is the same size on source & target host, which is where
the problem arises in this case.

If there's a risk that newer firmware will be larger than old firmware
there's only really two options:

 - Keep all firmware images forever, each with a unique versioned
   filename. This ensures target QEMU will always load the original
   smaller firmware

 - Add padding to the firmware images. IOW, if the firmware is 2 MB,
   add zero-padding to the end of the image to round it upto 4 MB
   (whatever you anticipate the largest size wil be in future).

Distros have often taken the latter approach for QEMU firmware in the
past. The main issue is that you have to plan ahead of time and get
this padding right from the very start. You can't add the padding
after the fact on an existing VM.

Following up here *too*, just for completeness.

The query in this thread has been posted three times now (and I have
zero idea why). Each time it generated a different set of responses. For
completes, I'm now going to link the other two threads here (because the
present thread seems to have gotten the most feedback).

To the OP:

- please do *NOT* repost the same question once you get an answer. It
 only fragments the discussion and creates confusion. It also doesn't
 hurt if you *confirm* that you understood the answer.

- Yet further, if your email address has @gmail.com for domain, but your
 msgids contain "tencent", that raises some eyebrows (mine for sure).
 You say "we" in the query, but never identify the organization behind
 the plural pronoun.

(I've been fuming about the triple-posting of the question for a while
now, but it's only now that, upon seeing how much work Dan has put into
his answer, I've decided that dishing out a bit of netiquette would be
in order.)

* First posting:
- msgid:      <tencent_F1295F826E46EDFF3D77812B@...>
- edk2-devel: https://edk2.groups.io/g/devel/message/54146
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html

 * my response:
   - msgid:      <12553.1581366059422195003@groups.io>
   - edk2-devel: https://edk2.groups.io/g/devel/message/54161
   - qemu-devel: none, because (as an exception) I used the stupid
                 groups.io web interface to respond, and so my response
                 never reached qemu-devel

* Second posting (~4 hours after the first)
- msgid:      <tencent_3CD8845EC159F0161725898B@...>
- edk2-devel: https://edk2.groups.io/g/devel/message/54147
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html

 * Dave's response:
   - msgid:      <20200220154742.GC2882@work-vm>
   - edk2-devel: https://edk2.groups.io/g/devel/message/54681
   - qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html

* Third posting (next day, present thread) -- cross posted to yet
 another list (!), because apparently Dave's feedback and mine had not
 been enough:
- msgid:        <tencent_BC7FD00363690990994E90F8@...>
- edk2-devel:   https://edk2.groups.io/g/devel/message/54220
- edk2-discuss: https://edk2.groups.io/g/discuss/message/135
- qemu-devel:   https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html

Back on topic: see my response again. The answer is, you can't solve the
problem (specifically with OVMF), and QEMU in fact does you service by
preventing the migration.

Laszlo




Laszlo Ersek
 

Hi Andrew,

On 02/25/20 19:56, Andrew Fish wrote:
Laszlo,

If I understand this correctly is it not more complicated than just size. It also assumes the memory layout is the same?
Yes.

The legacy BIOS used fixed magic address ranges, but UEFI uses dynamically allocated memory so addresses are not fixed. While the UEFI firmware does try to keep S3 and S4 layouts consistent between boots, I'm not aware of any mechanism to keep the memory map address the same between versions of the firmware?
It's not about RAM, but platform MMIO.

The core of the issue here is that the -D FD_SIZE_4MB and -D FD_SIZE_2MB
build options (or more directly, the different FD_SIZE_IN_KB macro
settings) set a bunch of flash-related build-time constant macros, and
PCDs, differently, in the following files:

- OvmfPkg/OvmfPkg.fdf.inc
- OvmfPkg/VarStore.fdf.inc
- OvmfPkg/OvmfPkg*.dsc

As a result, the OVMF_CODE.fd firmware binary will have different
hard-coded references to the variable store pflash addresses.
(Guest-physical MMIO addresses that point into the pflash range.)

If someone tries to combine an OVMF_CODE.fd firmware binary from e.g.
the 4MB build, with a variable store file that was originally
instantiated from an OVMF_VARS.fd varstore template from the 2MB build,
then the firmware binary's physical address references and various size
references will not match the contents / layout of the varstore pflash
chip, which maps an incompatibly structured varstore file.

For example, "OvmfPkg/VarStore.fdf.inc" describes two incompatible
EFI_FIRMWARE_VOLUME_HEADER structures (which "build" generates for the
OVMF_VARS.fd template) between the 4MB (total size) build, and the
1MB/2MB (total size) build.

The commit message below summarizes the internal layout differences,
from 1MB/2MB -> 4MB:

https://github.com/tianocore/edk2/commit/b24fca05751f

Excerpt (relevant for OVMF_VARS.fd):

Description Compression type Size [KB]
------------------------- ----------------- ----------------------
Non-volatile data storage open-coded binary 128 -> 528 ( +400)
data
Variable store 56 -> 256 ( +200)
Event log 4 -> 4 ( +0)
Working block 4 -> 4 ( +0)
Spare area 64 -> 264 ( +200)

Thanks
Laszlo


On Feb 25, 2020, at 9:53 AM, Laszlo Ersek <lersek@...> wrote:

On 02/24/20 16:28, Daniel P. Berrangé wrote:
On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:

wuchenye1995 <wuchenye1995@...> writes:

Hi all,
We found a problem with live migration of UEFI virtual machines
due to size of OVMF.fd changes.
Specifically, the size of OVMF.fd in edk with low version such as
edk-2.0-25 is 2MB while the size of it in higher version such as
edk-2.0-30 is 4MB.
When we migrate a UEFI virtual machine from the host with low
version of edk2 to the host with higher one, qemu component will
report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in
!= 0x400000: Invalid argument.
We want to know how to solve this problem after updating the
version of edk2.
You can only migrate a machine that is identical - so instantiating a
empty machine with a different EDK image is bound to cause a problem
because the machines don't match.
I don't believe we are that strict for firmware in general. The
firmware is loaded when QEMU starts, but that only matters for the
original source host QEMU. During migration, the memory content of the
original firmware will be copied during live migration, overwriting
whatever the target QEMU loaded off disk. This works....provided the
memory region is the same size on source & target host, which is where
the problem arises in this case.

If there's a risk that newer firmware will be larger than old firmware
there's only really two options:

- Keep all firmware images forever, each with a unique versioned
filename. This ensures target QEMU will always load the original
smaller firmware

- Add padding to the firmware images. IOW, if the firmware is 2 MB,
add zero-padding to the end of the image to round it upto 4 MB
(whatever you anticipate the largest size wil be in future).

Distros have often taken the latter approach for QEMU firmware in the
past. The main issue is that you have to plan ahead of time and get
this padding right from the very start. You can't add the padding
after the fact on an existing VM.
Following up here *too*, just for completeness.

The query in this thread has been posted three times now (and I have
zero idea why). Each time it generated a different set of responses. For
completes, I'm now going to link the other two threads here (because the
present thread seems to have gotten the most feedback).

To the OP:

- please do *NOT* repost the same question once you get an answer. It
only fragments the discussion and creates confusion. It also doesn't
hurt if you *confirm* that you understood the answer.

- Yet further, if your email address has @gmail.com for domain, but your
msgids contain "tencent", that raises some eyebrows (mine for sure).
You say "we" in the query, but never identify the organization behind
the plural pronoun.

(I've been fuming about the triple-posting of the question for a while
now, but it's only now that, upon seeing how much work Dan has put into
his answer, I've decided that dishing out a bit of netiquette would be
in order.)

* First posting:
- msgid: <tencent_F1295F826E46EDFF3D77812B@... <mailto:tencent_F1295F826E46EDFF3D77812B@...>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54146 <https://edk2.groups.io/g/devel/message/54146>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html>

* my response:
- msgid: <12553.1581366059422195003@groups.io <mailto:12553.1581366059422195003@groups.io>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54161 <https://edk2.groups.io/g/devel/message/54161>
- qemu-devel: none, because (as an exception) I used the stupid
groups.io <http://groups.io/> web interface to respond, and so my response
never reached qemu-devel

* Second posting (~4 hours after the first)
- msgid: <tencent_3CD8845EC159F0161725898B@... <mailto:tencent_3CD8845EC159F0161725898B@...>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54147 <https://edk2.groups.io/g/devel/message/54147>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html>

* Dave's response:
- msgid: <20200220154742.GC2882@work-vm>
- edk2-devel: https://edk2.groups.io/g/devel/message/54681 <https://edk2.groups.io/g/devel/message/54681>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html>

* Third posting (next day, present thread) -- cross posted to yet
another list (!), because apparently Dave's feedback and mine had not
been enough:
- msgid: <tencent_BC7FD00363690990994E90F8@... <mailto:tencent_BC7FD00363690990994E90F8@...>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54220 <https://edk2.groups.io/g/devel/message/54220>
- edk2-discuss: https://edk2.groups.io/g/discuss/message/135 <https://edk2.groups.io/g/discuss/message/135>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html>

Back on topic: see my response again. The answer is, you can't solve the
problem (specifically with OVMF), and QEMU in fact does you service by
preventing the migration.

Laszlo



Andrew Fish
 

On Feb 25, 2020, at 12:40 PM, Laszlo Ersek <lersek@...> wrote:

Hi Andrew,

On 02/25/20 19:56, Andrew Fish wrote:
Laszlo,

If I understand this correctly is it not more complicated than just size. It also assumes the memory layout is the same?
Yes.

The legacy BIOS used fixed magic address ranges, but UEFI uses dynamically allocated memory so addresses are not fixed. While the UEFI firmware does try to keep S3 and S4 layouts consistent between boots, I'm not aware of any mechanism to keep the memory map address the same between versions of the firmware?
It's not about RAM, but platform MMIO.
Laszlo,

The FLASH offsets changing breaking things makes sense.

I now realize this is like updating the EFI ROM without rebooting the system. Thus changes in how the new EFI code works is not the issue.

Is this migration event visible to the firmware? Traditionally the NVRAM is a region in the FD so if you update the FD you have to skip NVRAM region or save and restore it. Is that activity happening in this case? Even if the ROM layout does not change how do you not lose the contents of the NVRAM store when the live migration happens? Sorry if this is a remedial question but I'm trying to learn how this migration works.

Thanks,

Andrew Fish

The core of the issue here is that the -D FD_SIZE_4MB and -D FD_SIZE_2MB
build options (or more directly, the different FD_SIZE_IN_KB macro
settings) set a bunch of flash-related build-time constant macros, and
PCDs, differently, in the following files:

- OvmfPkg/OvmfPkg.fdf.inc
- OvmfPkg/VarStore.fdf.inc
- OvmfPkg/OvmfPkg*.dsc

As a result, the OVMF_CODE.fd firmware binary will have different
hard-coded references to the variable store pflash addresses.
(Guest-physical MMIO addresses that point into the pflash range.)

If someone tries to combine an OVMF_CODE.fd firmware binary from e.g.
the 4MB build, with a variable store file that was originally
instantiated from an OVMF_VARS.fd varstore template from the 2MB build,
then the firmware binary's physical address references and various size
references will not match the contents / layout of the varstore pflash
chip, which maps an incompatibly structured varstore file.

For example, "OvmfPkg/VarStore.fdf.inc" describes two incompatible
EFI_FIRMWARE_VOLUME_HEADER structures (which "build" generates for the
OVMF_VARS.fd template) between the 4MB (total size) build, and the
1MB/2MB (total size) build.

The commit message below summarizes the internal layout differences,
from 1MB/2MB -> 4MB:

https://github.com/tianocore/edk2/commit/b24fca05751f

Excerpt (relevant for OVMF_VARS.fd):

Description Compression type Size [KB]
------------------------- ----------------- ----------------------
Non-volatile data storage open-coded binary 128 -> 528 ( +400)
data
Variable store 56 -> 256 ( +200)
Event log 4 -> 4 ( +0)
Working block 4 -> 4 ( +0)
Spare area 64 -> 264 ( +200)

Thanks
Laszlo


On Feb 25, 2020, at 9:53 AM, Laszlo Ersek <lersek@...> wrote:

On 02/24/20 16:28, Daniel P. Berrangé wrote:
On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:

wuchenye1995 <wuchenye1995@...> writes:

Hi all,
We found a problem with live migration of UEFI virtual machines
due to size of OVMF.fd changes.
Specifically, the size of OVMF.fd in edk with low version such as
edk-2.0-25 is 2MB while the size of it in higher version such as
edk-2.0-30 is 4MB.
When we migrate a UEFI virtual machine from the host with low
version of edk2 to the host with higher one, qemu component will
report an error in function qemu_ram_resize while
checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in
!= 0x400000: Invalid argument.
We want to know how to solve this problem after updating the
version of edk2.
You can only migrate a machine that is identical - so instantiating a
empty machine with a different EDK image is bound to cause a problem
because the machines don't match.
I don't believe we are that strict for firmware in general. The
firmware is loaded when QEMU starts, but that only matters for the
original source host QEMU. During migration, the memory content of the
original firmware will be copied during live migration, overwriting
whatever the target QEMU loaded off disk. This works....provided the
memory region is the same size on source & target host, which is where
the problem arises in this case.

If there's a risk that newer firmware will be larger than old firmware
there's only really two options:

- Keep all firmware images forever, each with a unique versioned
filename. This ensures target QEMU will always load the original
smaller firmware

- Add padding to the firmware images. IOW, if the firmware is 2 MB,
add zero-padding to the end of the image to round it upto 4 MB
(whatever you anticipate the largest size wil be in future).

Distros have often taken the latter approach for QEMU firmware in the
past. The main issue is that you have to plan ahead of time and get
this padding right from the very start. You can't add the padding
after the fact on an existing VM.
Following up here *too*, just for completeness.

The query in this thread has been posted three times now (and I have
zero idea why). Each time it generated a different set of responses. For
completes, I'm now going to link the other two threads here (because the
present thread seems to have gotten the most feedback).

To the OP:

- please do *NOT* repost the same question once you get an answer. It
only fragments the discussion and creates confusion. It also doesn't
hurt if you *confirm* that you understood the answer.

- Yet further, if your email address has @gmail.com for domain, but your
msgids contain "tencent", that raises some eyebrows (mine for sure).
You say "we" in the query, but never identify the organization behind
the plural pronoun.

(I've been fuming about the triple-posting of the question for a while
now, but it's only now that, upon seeing how much work Dan has put into
his answer, I've decided that dishing out a bit of netiquette would be
in order.)

* First posting:
- msgid: <tencent_F1295F826E46EDFF3D77812B@... <mailto:tencent_F1295F826E46EDFF3D77812B@...>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54146 <https://edk2.groups.io/g/devel/message/54146>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html>

* my response:
- msgid: <12553.1581366059422195003@groups.io <mailto:12553.1581366059422195003@groups.io>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54161 <https://edk2.groups.io/g/devel/message/54161>
- qemu-devel: none, because (as an exception) I used the stupid
groups.io <http://groups.io/> web interface to respond, and so my response
never reached qemu-devel

* Second posting (~4 hours after the first)
- msgid: <tencent_3CD8845EC159F0161725898B@... <mailto:tencent_3CD8845EC159F0161725898B@...>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54147 <https://edk2.groups.io/g/devel/message/54147>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html>

* Dave's response:
- msgid: <20200220154742.GC2882@work-vm>
- edk2-devel: https://edk2.groups.io/g/devel/message/54681 <https://edk2.groups.io/g/devel/message/54681>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html>

* Third posting (next day, present thread) -- cross posted to yet
another list (!), because apparently Dave's feedback and mine had not
been enough:
- msgid: <tencent_BC7FD00363690990994E90F8@... <mailto:tencent_BC7FD00363690990994E90F8@...>>
- edk2-devel: https://edk2.groups.io/g/devel/message/54220 <https://edk2.groups.io/g/devel/message/54220>
- edk2-discuss: https://edk2.groups.io/g/discuss/message/135 <https://edk2.groups.io/g/discuss/message/135>
- qemu-devel: https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html>

Back on topic: see my response again. The answer is, you can't solve the
problem (specifically with OVMF), and QEMU in fact does you service by
preventing the migration.

Laszlo





Laszlo Ersek
 

Hi Andrew,

On 02/25/20 22:35, Andrew Fish wrote:

Laszlo,

The FLASH offsets changing breaking things makes sense.

I now realize this is like updating the EFI ROM without rebooting the
system. Thus changes in how the new EFI code works is not the issue.

Is this migration event visible to the firmware? Traditionally the
NVRAM is a region in the FD so if you update the FD you have to skip
NVRAM region or save and restore it. Is that activity happening in
this case? Even if the ROM layout does not change how do you not lose
the contents of the NVRAM store when the live migration happens? Sorry
if this is a remedial question but I'm trying to learn how this
migration works.
With live migration, the running guest doesn't notice anything. This is
a general requirement for live migration (regardless of UEFI or flash).

You are very correct to ask about "skipping" the NVRAM region. With the
approach that OvmfPkg originally supported, live migration would simply
be unfeasible. The "build" utility would produce a single (unified)
OVMF.fd file, which would contain both NVRAM and executable regions, and
the guest's variable updates would modify the one file that would exist.
This is inappropriate even without considering live migration, because
OVMF binary upgrades (package updates) on the virtualization host would
force guests to lose their private variable stores (NVRAMs).

Therefore, the "build" utility produces "split" files too, in addition
to the unified OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
OVMF.fd is simply the concatenation of the latter two.

$ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd
[prints nothing]

When you define a new domain (VM) on a virtualization host, the domain
definition saves a reference (pathname) to the OVMF_CODE.fd file.
However, the OVMF_VARS.fd file (the variable store *template*) is not
directly referenced; instead, it is *copied* into a separate (private)
file for the domain.

Furthermore, once booted, guest has two flash chips, one that maps the
firmware executable OVMF_CODE.fd read-only, and another pflash chip that
maps its private varstore file read-write.

This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
package upgrades on the virt host) without messing with varstores that
were earlier instantiated from OVMF_VARS.fd. What's important here is
that the various constants in the new (upgraded) OVMF_CODE.fd file
remain compatible with the *old* OVMF_VARS.fd structure, across package
upgrades.

If that's not possible for introducing e.g. a new feature, then the
package upgrade must not overwrite the OVMF_CODE.fd file in place, but
must provide an additional firmware binary. This firmware binary can
then only be used by freshly defined domains (old domains cannot be
switched over). Old domains can be switched over manually -- and only if
the sysadmin decides it is OK to lose the current variable store
contents. Then the old varstore file for the domain is deleted
(manually), the domain definition is updated, and then a new (logically
empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd
that matches the *new* OVMF_2_CODE.fd.


During live migration, the "RAM-like" contents of both pflash chips are
migrated (the guest-side view of both chips remains the same, including
the case when the writeable chip happens to be in "programming mode",
i.e., during a UEFI variable write through the Fault Tolerant Write and
Firmware Volume Block(2) protocols).

Once live migration completes, QEMU dumps the full contents of the
writeable chip to the backing file (on the destination host). Going
forward, flash writes from within the guest are reflected to said
host-side file on-line, just like it happened on the source host before
live migration. If the file backing the r/w pflash chip is on NFS
(shared by both src and dst hosts), then this one-time dumping when the
migration completes is superfluous, but it's also harmless.

The interesting question is, what happens when you power down the VM on
the destination host (= post migration), and launch it again there, from
zero. In that case, the firmware executable file comes from the
*destination host* (it was never persistently migrated from the source
host, i.e. never written out on the dst). It simply comes from the OVMF
package that had been installed on the destination host, by the
sysadmin. However, the varstore pflash does reflect the permanent result
of the previous migration. So this is where things can fall apart, if
both firmware binaries (on the src host and on the dst host) don't agree
about the internal structure of the varstore pflash.

Thanks
Laszlo


Andrew Fish
 



On Feb 26, 2020, at 1:42 AM, Laszlo Ersek <lersek@...> wrote:

Hi Andrew,

On 02/25/20 22:35, Andrew Fish wrote:

Laszlo,

The FLASH offsets changing breaking things makes sense.

I now realize this is like updating the EFI ROM without rebooting the
system.  Thus changes in how the new EFI code works is not the issue.

Is this migration event visible to the firmware? Traditionally the
NVRAM is a region in the FD so if you update the FD you have to skip
NVRAM region or save and restore it. Is that activity happening in
this case? Even if the ROM layout does not change how do you not lose
the contents of the NVRAM store when the live migration happens? Sorry
if this is a remedial question but I'm trying to learn how this
migration works.

With live migration, the running guest doesn't notice anything. This is
a general requirement for live migration (regardless of UEFI or flash).

You are very correct to ask about "skipping" the NVRAM region. With the
approach that OvmfPkg originally supported, live migration would simply
be unfeasible. The "build" utility would produce a single (unified)
OVMF.fd file, which would contain both NVRAM and executable regions, and
the guest's variable updates would modify the one file that would exist.
This is inappropriate even without considering live migration, because
OVMF binary upgrades (package updates) on the virtualization host would
force guests to lose their private variable stores (NVRAMs).

Therefore, the "build" utility produces "split" files too, in addition
to the unified OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
OVMF.fd is simply the concatenation of the latter two.

$ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd
[prints nothing]


Laszlo,

Thanks for the detailed explanation. 

Maybe I was overcomplicating this. Given your explanation I think the part I'm missing is OVMF is implying FLASH layout, in this split model, based on the size of the OVMF_CODE.fd and OVMF_VARS.fd.  Given that if OVMF_CODE.fd gets bigger the variable address changes from a QEMU point of view. So basically it is the QEMU  API that is making assumptions about the relative layout of the FD in the split model that makes a migration to larger ROM not work. Basically the -pflash API does not support changing the size of the ROM without moving NVRAM given the way it is currently defined. 

Given the above it seems like the 2 options are:
1) Pad OVMF_CODE.fd to be very large so there is room to grow.
2) Add some feature to QUEM that allows the variable store address to not be based on OVMF_CODE.fd size. 

I did see this [1] and combined with your email I either understand, or I'm still confused? :)

I'm not saying we need to change anything, I'm just trying to make sure I understand how OVMF and QEMU are tied to together. 


Thanks,

Andrew Fish




When you define a new domain (VM) on a virtualization host, the domain
definition saves a reference (pathname) to the OVMF_CODE.fd file.
However, the OVMF_VARS.fd file (the variable store *template*) is not
directly referenced; instead, it is *copied* into a separate (private)
file for the domain.

Furthermore, once booted, guest has two flash chips, one that maps the
firmware executable OVMF_CODE.fd read-only, and another pflash chip that
maps its private varstore file read-write.

This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
package upgrades on the virt host) without messing with varstores that
were earlier instantiated from OVMF_VARS.fd. What's important here is
that the various constants in the new (upgraded) OVMF_CODE.fd file
remain compatible with the *old* OVMF_VARS.fd structure, across package
upgrades.

If that's not possible for introducing e.g. a new feature, then the
package upgrade must not overwrite the OVMF_CODE.fd file in place, but
must provide an additional firmware binary. This firmware binary can
then only be used by freshly defined domains (old domains cannot be
switched over). Old domains can be switched over manually -- and only if
the sysadmin decides it is OK to lose the current variable store
contents. Then the old varstore file for the domain is deleted
(manually), the domain definition is updated, and then a new (logically
empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd
that matches the *new* OVMF_2_CODE.fd.


During live migration, the "RAM-like" contents of both pflash chips are
migrated (the guest-side view of both chips remains the same, including
the case when the writeable chip happens to be in "programming mode",
i.e., during a UEFI variable write through the Fault Tolerant Write and
Firmware Volume Block(2) protocols).

Once live migration completes, QEMU dumps the full contents of the
writeable chip to the backing file (on the destination host). Going
forward, flash writes from within the guest are reflected to said
host-side file on-line, just like it happened on the source host before
live migration. If the file backing the r/w pflash chip is on NFS
(shared by both src and dst hosts), then this one-time dumping when the
migration completes is superfluous, but it's also harmless.

The interesting question is, what happens when you power down the VM on
the destination host (= post migration), and launch it again there, from
zero. In that case, the firmware executable file comes from the
*destination host* (it was never persistently migrated from the source
host, i.e. never written out on the dst). It simply comes from the OVMF
package that had been installed on the destination host, by the
sysadmin. However, the varstore pflash does reflect the permanent result
of the previous migration. So this is where things can fall apart, if
both firmware binaries (on the src host and on the dst host) don't agree
about the internal structure of the varstore pflash.

Thanks
Laszlo






Laszlo Ersek
 

On 02/28/20 04:20, Zhoujian (jay) wrote:
Hi Laszlo,

-----Original Message-----
From: Qemu-devel
[mailto:qemu-devel-bounces+jianjay.zhou=huawei.com@...] On Behalf
Of Laszlo Ersek
Sent: Wednesday, February 26, 2020 5:42 PM
To: Andrew Fish <afish@...>; devel@edk2.groups.io
Cc: berrange@...; qemu-devel@...; Dr. David Alan Gilbert
<dgilbert@...>; zhoujianjay <zhoujianjay@...>; discuss
<discuss@edk2.groups.io>; Alex Bennée <alex.bennee@...>;
wuchenye1995 <wuchenye1995@...>
Subject: Re: [edk2-devel] A problem with live migration of UEFI virtual machines

Hi Andrew,

On 02/25/20 22:35, Andrew Fish wrote:

Laszlo,

The FLASH offsets changing breaking things makes sense.

I now realize this is like updating the EFI ROM without rebooting the
system. Thus changes in how the new EFI code works is not the issue.

Is this migration event visible to the firmware? Traditionally the
NVRAM is a region in the FD so if you update the FD you have to skip
NVRAM region or save and restore it. Is that activity happening in
this case? Even if the ROM layout does not change how do you not lose
the contents of the NVRAM store when the live migration happens? Sorry
if this is a remedial question but I'm trying to learn how this
migration works.
With live migration, the running guest doesn't notice anything. This is a general
requirement for live migration (regardless of UEFI or flash).

You are very correct to ask about "skipping" the NVRAM region. With the
approach that OvmfPkg originally supported, live migration would simply be
unfeasible. The "build" utility would produce a single (unified) OVMF.fd file, which
would contain both NVRAM and executable regions, and the guest's variable
updates would modify the one file that would exist.
This is inappropriate even without considering live migration, because OVMF
binary upgrades (package updates) on the virtualization host would force guests
to lose their private variable stores (NVRAMs).

Therefore, the "build" utility produces "split" files too, in addition to the unified
OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
OVMF.fd is simply the concatenation of the latter two.

$ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd [prints nothing]

When you define a new domain (VM) on a virtualization host, the domain
definition saves a reference (pathname) to the OVMF_CODE.fd file.
However, the OVMF_VARS.fd file (the variable store *template*) is not directly
referenced; instead, it is *copied* into a separate (private) file for the domain.

Furthermore, once booted, guest has two flash chips, one that maps the
firmware executable OVMF_CODE.fd read-only, and another pflash chip that
maps its private varstore file read-write.

This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
package upgrades on the virt host) without messing with varstores that were
earlier instantiated from OVMF_VARS.fd. What's important here is that the
various constants in the new (upgraded) OVMF_CODE.fd file remain compatible
with the *old* OVMF_VARS.fd structure, across package upgrades.

If that's not possible for introducing e.g. a new feature, then the package
upgrade must not overwrite the OVMF_CODE.fd file in place, but must provide an
additional firmware binary. This firmware binary can then only be used by freshly
defined domains (old domains cannot be switched over). Old domains can be
switched over manually -- and only if the sysadmin decides it is OK to lose the
current variable store contents. Then the old varstore file for the domain is
deleted (manually), the domain definition is updated, and then a new (logically
empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd that
matches the *new* OVMF_2_CODE.fd.


During live migration, the "RAM-like" contents of both pflash chips are migrated
(the guest-side view of both chips remains the same, including the case when the
writeable chip happens to be in "programming mode", i.e., during a UEFI variable
write through the Fault Tolerant Write and Firmware Volume Block(2) protocols).

Once live migration completes, QEMU dumps the full contents of the writeable
chip to the backing file (on the destination host). Going forward, flash writes from
within the guest are reflected to said host-side file on-line, just like it happened
on the source host before live migration. If the file backing the r/w pflash chip is
on NFS (shared by both src and dst hosts), then this one-time dumping when the
migration completes is superfluous, but it's also harmless.

The interesting question is, what happens when you power down the VM on the
destination host (= post migration), and launch it again there, from zero. In that
case, the firmware executable file comes from the *destination host* (it was
never persistently migrated from the source host, i.e. never written out on the
dst). It simply comes from the OVMF package that had been installed on the
destination host, by the sysadmin. However, the varstore pflash does reflect the
permanent result of the previous migration. So this is where things can fall apart,
if both firmware binaries (on the src host and on the dst host) don't agree about
the internal structure of the varstore pflash.
Hi Laszlo,

I found an ealier thread that you said there're 4 options to use ovmf:

https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html

Excerpt:
"(1) If you map the unified image with -bios, all of that becomes ROM --
read-only memory.
(2) If you map the unified image with -pflash, all of that becomes
read-write MMIO.
(3) If you use the split images (OVMF_CODE.fd and a copy of
OVMF_VARS.fd), and map then as flash chips, then the top part
(OVMF_CODE.fd, consisting of SECFV and FVMAIN_COMPACT) becomes
read-only flash (MMIO), and the bottom part (copy of OVMF_VARS.fd,
consisting of FTW Spare, FTW Work, Event log, and NV store) becomes
read-write flash (MMIO).
(4) If you use -bios with OVMF_CODE.fd only, then the top part will be
ROM, and the bottom part will be "black hole" MMIO."

I think you're talking about the option (2)(acceptable) and option (3)
(best solution) in this thread, and I agree.
Yes, exactly.


I'm wondering will it be different about ancient option (1) with live
migration. You tried add -DMEM_VARSTORE_EMU_ENABLE=FALSE
build flag to disable -bios support, but Option (1) may be used for the
old VMs started several years ago running on the cloud...
I'm unaware of any VMs running in clouds that use "-bios" with OVMF. It
certainly seems a terrible idea, regardless of live migration.


With developing new features, the size of OVMF.fd is becoming larger
and larger, that seems to be the trend. It would be nice if it could be
hot-updated to the new version. As Daniel said, could it feasible to add
zero-padding to the firmware images?
You're mixing up small details. OVMF_CODE.fd is already heavily padded,
internally. We've grown the *internal* DXEFV firmware volume repeatedly
over *years*, without *any* disruption to users. Please see:

- da78c88f4535 ("OvmfPkg: raise DXEFV size to 8 MB", 2014-03-05)

- 08df58ec3043 ("OvmfPkg: raise DXEFV size to 9 MB", 2015-10-07)

- 2f7b34b20842 ("OvmfPkg: raise DXEFV size to 10 MB", 2016-05-31)

- d272449d9e1e ("OvmfPkg: raise DXEFV size to 11 MB", 2018-05-29)

To this day, i.e., with edk2 master @ edfe16a6d9f8, you can build OVMF
in the default feature configuration [*] for -D FD_SIZE_2MB.

[*]
DEFINE SECURE_BOOT_ENABLE = FALSE
DEFINE SMM_REQUIRE = FALSE
DEFINE SOURCE_DEBUG_ENABLE = FALSE
DEFINE TPM2_ENABLE = FALSE
DEFINE TPM2_CONFIG_ENABLE = FALSE

DEFINE NETWORK_TLS_ENABLE = FALSE
DEFINE NETWORK_IP6_ENABLE = FALSE
DEFINE NETWORK_HTTP_BOOT_ENABLE = FALSE

For example:

$ build \
-a IA32 -a X64 \
-b DEBUG \
-p OvmfPkg/OvmfPkgIa32X64.dsc \
-t GCC48 \
-D FD_SIZE_2MB

Note that this build will contain DEBUG messages (at least DEBUG_INFO
level ones) and ASSERT()s too.

The final usage report at the end of the command is:

SECFV [14%Full] 212992 total, 31648 used, 181344 free
PEIFV [31%Full] 917504 total, 284584 used, 632920 free
DXEFV [44%Full] 11534336 total, 5113688 used, 6420648 free
FVMAIN_COMPACT [73%Full] 1753088 total, 1284216 used, 468872 free

What does that mean? It means that largest firmware volume, DXEFV, uses
just 44% of the 11MB allotted size.

And FVMAIN_COMPACT, which embeds (among other things) DXEFV in
LZMA-compressed format, only uses 73% of its allotted size, which is
1712 KB.

All this means that in the default feature config, there's still a bunch
of room free in the 2MB build, even with DEBUGs and ASSERT()s enabled,
and with an old compiler that does not do link-time optimization.

I think you must have misunderstood the purpose of the 4MB build. The
4MB build was solely introduced for enlarging the *varstore*. That was
motivated by passing an SVVP check. This is described in detail in the
relevant commit, which I may have linked earlier.

https://github.com/tianocore/edk2/commit/b24fca05751f

(Please consult the diagram in the commit message carefully. It shows
you how the various firmware volumes / flash devices are nested; it will
help you understand where the 1712 KB FVMAIN_COMPACT firmware volume is
placed in the final image, and how FVMAIN_COMPACT embeds / compresses
DXEFV.)

And *given that* we had to introduce an incompatible change (for
enlarging the varstore, for SVVP's sake), it made sense to *also*
enlarge the other parts of the flash content. But the motivation was
strictly the varstore change, and that was inevitably an incompatible
change. In fact, you can see in the commit message that the *outer*
container FVMAIN_COMPACT was enlarged from 1712 to 3360 kilobytes, the
embedded PEIFV and DXEFV firmware volumes didn't put that extra space to
use. The SECFV firmware volume runs directly from flash, so it's not
compressed, but even that firmware volume got no "space injection". So
basically all the size increase that *could* have been exploited for
executable code size was spent on padding.

As far as I can tell, we have never broken compatibility due to
executable code size increases.

Sorry if I over-explained this; I simply don't know how to express this
any better.


Things are a little different here,
i.e. the size of src and dest are 2M and 4M respectively, copy the source
2M to the dest side, and then add zero-padding to the end of the image
to round it upto 4 MB at the dest side (With some modification of
qemu_ram_resize in QEMU to avoid length mismatch error report)?
No, this doesn't make any sense.

On both the source host and the destination host, the same pathname (for
example, "/usr/share/OVMF/OVMF_CODE.fd") must point to same-size
(compatible) firmware binaries. Both must be built with the same -D
FD_SIZE_2MB flag, or with the same -D FD_SIZE_4MB flag. Then you can
migrate.

You can offer a 4MB build too on the destination host, but it must be
under a different pathname. So that after the domain has been migrated
in from the source host, and then re-launched against the firmware
binary that's on the destination host, there is an incompatibility
between the domain's *original* varstore, and the domain's *new*
firmware binary.


The physical address assigned to ovmf region will change from
0xffe00000 - 0xffffffff to 0xffc00000 - 0xffffffff, after the OS has
started I see this range will be recycled and assigned to other PCI
devices(using the command "cat /proc/iomem") by guest OS. So,
this range change seems that will not affect the guest I think.
But if the code of OVMF is running when paused at the src side
then will it be continued to run at the dest side, I'm not sure...

So, may I ask that would it be feasible or compatible for option (1)
when live migration between different ovmf sizes? Thanks.
Sorry, my brain just cannot cope with the idea of even *running* OVMF in
production with "-bios" -- let alone migrate it.

But anyway... if you are dead set on this, you can try the following:

- On the destination host, rename the 4MB build to a different filename.

- On the destination host, update all your domain definitions to refer
to the renamed filename with "-bios"

- on the destination host, rebuild your current (more modern) firmware
package, using the -D FD_SIZE_2MB flag. If you have not enabled a bunch
of features meanwhile, it will actually succeed.

- on the destination host, put this fresh build (with unified size 2MB)
in the original place (using the original pathname)

- now you can migrate domains from your source host. The pathname they
refer to with "-bios" will exist, and it will be a 2MB build. And the
contents of that build will be more modern (presumably) than what you
are migrating away from.

Please understand this: when you *allowed* OVMF to build with 4MB size,
and installed it under the exact same pathname (on the destination host)
where you previously used to keep a 2MB binary, *that* is when you broke
compatibility.

What's quite unfathomable to me is that the 2MB->4MB change in upstream
was *solely* motivated by varstore enlargement (for passing SVVP with
*flash*-based variables), but you're still using the ancient and
non-conformant \NvVars emulation that comes with "-bios".

Please, flash based variables with OVMF and QEMU have been supported
since QEMU v1.6.

I've attempted to remove -bios support from OVMF multiple times, I've
always been prevented from doing that, and the damage is obvious only now.

Laszlo


Laszlo Ersek
 

On 02/28/20 05:04, Andrew Fish wrote:

Maybe I was overcomplicating this. Given your explanation I think the part I'm missing is OVMF is implying FLASH layout, in this split model, based on the size of the OVMF_CODE.fd and OVMF_VARS.fd. Given that if OVMF_CODE.fd gets bigger the variable address changes from a QEMU point of view. So basically it is the QEMU API that is making assumptions about the relative layout of the FD in the split model that makes a migration to larger ROM not work.
No, QEMU does not make any assumptions here. QEMU simply grabs both
pflash chips (the order is not random, it can be specified on the
command line -- in fact the QEMU user is expected to specify in the
right order), and then QEMU maps them in decreasing address order from
4GB in guest-phys address space.

If we enlarge OVMF_CODE.fd, then the base address of the varstore
(PcdOvmfFlashNvStorageVariableBase) will sink. That's not a problem per
se, because QEMU doesn't know about PcdOvmfFlashNvStorageVariableBase at
all. QEMU will simply map the varstore, automatically, where the
enlarged OVMF_CODE.fd will look for it.

Basically the -pflash API does not support changing the size of the ROM without moving NVRAM given the way it is currently defined.
Let me put it like this: the NVRAM gets moved by virtue of how OVMF is
built, and by how QEMU maps the pflash chips into guest-phys address
space. They are in sync, automatically.

The problem is when the NVRAM is internally restructured, or resized --
the new OVMF_CODE.fd binary will reflect this with changed PCDs, and
look for "stuff" at those addresses. But if you still try to use an old
(differently sized, or differently structured) varstore file, while QEMU
will happily map it, parts of the NVRAM will just not end up in places
where OVMF_CODE.fd expects them.


Given the above it seems like the 2 options are:
1) Pad OVMF_CODE.fd to be very large so there is room to grow.
There's already room to grow, *inside* OVMF_CODE.fd. As I've shown
elsewhere in this thread, even the 2MB build has approx. 457 KB free in
the DXEFV volume, even without link-time optimization and without
DEBUG/ASSERT stripping, if you don't enable additional features.

2) Add some feature to QUEM that allows the variable store address to not be based on OVMF_CODE.fd size.
Yes, this has been proposed over time.

It wouldn't help with the case when you change the internal structure of
the NVRAM, and try to run an incompatible OVMF_CODE.fd against that.

I did see this [1] and combined with your email I either understand, or I'm still confused? :)

I'm not saying we need to change anything, I'm just trying to make sure I understand how OVMF and QEMU are tied to together.
I think the most interesting function for you could be
pc_system_flash_map(), in "hw/i386/pc_sysfw.c", in the QEMU source.


[1] https://www.redhat.com/archives/libvir-list/2019-January/msg01031.html

Thanks
Laszlo


Laszlo Ersek
 

On 02/28/20 12:47, Laszlo Ersek wrote:
On 02/28/20 05:04, Andrew Fish wrote:
Given the above it seems like the 2 options are:
1) Pad OVMF_CODE.fd to be very large so there is room to grow.
There's already room to grow, *inside* OVMF_CODE.fd. As I've shown
elsewhere in this thread, even the 2MB build has approx. 457 KB free in
the DXEFV volume, even without link-time optimization and without
DEBUG/ASSERT stripping, if you don't enable additional features.
Typo; I meant FVMAIN_COMPACT, not DXEFV.

Laszlo


Zhoujian (jay) <jianjay.zhou@...>
 

Hi Laszlo,

-----Original Message-----
From: Qemu-devel
[mailto:qemu-devel-bounces+jianjay.zhou=huawei.com@...] On Behalf
Of Laszlo Ersek
Sent: Wednesday, February 26, 2020 5:42 PM
To: Andrew Fish <afish@...>; devel@edk2.groups.io
Cc: berrange@...; qemu-devel@...; Dr. David Alan Gilbert
<dgilbert@...>; zhoujianjay <zhoujianjay@...>; discuss
<discuss@edk2.groups.io>; Alex Bennée <alex.bennee@...>;
wuchenye1995 <wuchenye1995@...>
Subject: Re: [edk2-devel] A problem with live migration of UEFI virtual machines

Hi Andrew,

On 02/25/20 22:35, Andrew Fish wrote:

Laszlo,

The FLASH offsets changing breaking things makes sense.

I now realize this is like updating the EFI ROM without rebooting the
system. Thus changes in how the new EFI code works is not the issue.

Is this migration event visible to the firmware? Traditionally the
NVRAM is a region in the FD so if you update the FD you have to skip
NVRAM region or save and restore it. Is that activity happening in
this case? Even if the ROM layout does not change how do you not lose
the contents of the NVRAM store when the live migration happens? Sorry
if this is a remedial question but I'm trying to learn how this
migration works.
With live migration, the running guest doesn't notice anything. This is a general
requirement for live migration (regardless of UEFI or flash).

You are very correct to ask about "skipping" the NVRAM region. With the
approach that OvmfPkg originally supported, live migration would simply be
unfeasible. The "build" utility would produce a single (unified) OVMF.fd file, which
would contain both NVRAM and executable regions, and the guest's variable
updates would modify the one file that would exist.
This is inappropriate even without considering live migration, because OVMF
binary upgrades (package updates) on the virtualization host would force guests
to lose their private variable stores (NVRAMs).

Therefore, the "build" utility produces "split" files too, in addition to the unified
OVMF.fd file. Namely, OVMF_CODE.fd and OVMF_VARS.fd.
OVMF.fd is simply the concatenation of the latter two.

$ cat OVMF_VARS.fd OVMF_CODE.fd | cmp - OVMF.fd [prints nothing]

When you define a new domain (VM) on a virtualization host, the domain
definition saves a reference (pathname) to the OVMF_CODE.fd file.
However, the OVMF_VARS.fd file (the variable store *template*) is not directly
referenced; instead, it is *copied* into a separate (private) file for the domain.

Furthermore, once booted, guest has two flash chips, one that maps the
firmware executable OVMF_CODE.fd read-only, and another pflash chip that
maps its private varstore file read-write.

This makes it possible to upgrade OVMF_CODE.fd and OVMF_VARS.fd (via
package upgrades on the virt host) without messing with varstores that were
earlier instantiated from OVMF_VARS.fd. What's important here is that the
various constants in the new (upgraded) OVMF_CODE.fd file remain compatible
with the *old* OVMF_VARS.fd structure, across package upgrades.

If that's not possible for introducing e.g. a new feature, then the package
upgrade must not overwrite the OVMF_CODE.fd file in place, but must provide an
additional firmware binary. This firmware binary can then only be used by freshly
defined domains (old domains cannot be switched over). Old domains can be
switched over manually -- and only if the sysadmin decides it is OK to lose the
current variable store contents. Then the old varstore file for the domain is
deleted (manually), the domain definition is updated, and then a new (logically
empty, pristine) varstore can be created from the *new* OVMF_2_VARS.fd that
matches the *new* OVMF_2_CODE.fd.


During live migration, the "RAM-like" contents of both pflash chips are migrated
(the guest-side view of both chips remains the same, including the case when the
writeable chip happens to be in "programming mode", i.e., during a UEFI variable
write through the Fault Tolerant Write and Firmware Volume Block(2) protocols).

Once live migration completes, QEMU dumps the full contents of the writeable
chip to the backing file (on the destination host). Going forward, flash writes from
within the guest are reflected to said host-side file on-line, just like it happened
on the source host before live migration. If the file backing the r/w pflash chip is
on NFS (shared by both src and dst hosts), then this one-time dumping when the
migration completes is superfluous, but it's also harmless.

The interesting question is, what happens when you power down the VM on the
destination host (= post migration), and launch it again there, from zero. In that
case, the firmware executable file comes from the *destination host* (it was
never persistently migrated from the source host, i.e. never written out on the
dst). It simply comes from the OVMF package that had been installed on the
destination host, by the sysadmin. However, the varstore pflash does reflect the
permanent result of the previous migration. So this is where things can fall apart,
if both firmware binaries (on the src host and on the dst host) don't agree about
the internal structure of the varstore pflash.
Hi Laszlo,

I found an ealier thread that you said there're 4 options to use ovmf:

https://lists.gnu.org/archive/html/qemu-discuss/2018-04/msg00045.html

Excerpt:
"(1) If you map the unified image with -bios, all of that becomes ROM --
read-only memory.
(2) If you map the unified image with -pflash, all of that becomes
read-write MMIO.
(3) If you use the split images (OVMF_CODE.fd and a copy of
OVMF_VARS.fd), and map then as flash chips, then the top part
(OVMF_CODE.fd, consisting of SECFV and FVMAIN_COMPACT) becomes
read-only flash (MMIO), and the bottom part (copy of OVMF_VARS.fd,
consisting of FTW Spare, FTW Work, Event log, and NV store) becomes
read-write flash (MMIO).
(4) If you use -bios with OVMF_CODE.fd only, then the top part will be
ROM, and the bottom part will be "black hole" MMIO."

I think you're talking about the option (2)(acceptable) and option (3)
(best solution) in this thread, and I agree.

I'm wondering will it be different about ancient option (1) with live
migration. You tried add -DMEM_VARSTORE_EMU_ENABLE=FALSE
build flag to disable -bios support, but Option (1) may be used for the
old VMs started several years ago running on the cloud...

With developing new features, the size of OVMF.fd is becoming larger
and larger, that seems to be the trend. It would be nice if it could be
hot-updated to the new version. As Daniel said, could it feasible to add
zero-padding to the firmware images? Things are a little different here,
i.e. the size of src and dest are 2M and 4M respectively, copy the source
2M to the dest side, and then add zero-padding to the end of the image
to round it upto 4 MB at the dest side (With some modification of
qemu_ram_resize in QEMU to avoid length mismatch error report)?

The physical address assigned to ovmf region will change from
0xffe00000 - 0xffffffff to 0xffc00000 - 0xffffffff, after the OS has
started I see this range will be recycled and assigned to other PCI
devices(using the command "cat /proc/iomem") by guest OS. So,
this range change seems that will not affect the guest I think.
But if the code of OVMF is running when paused at the src side
then will it be continued to run at the dest side, I'm not sure...

So, may I ask that would it be feasible or compatible for option (1)
when live migration between different ovmf sizes? Thanks.

Regards,
Jay Zhou


Dr. David Alan Gilbert
 

* Laszlo Ersek (lersek@...) wrote:

The interesting question is, what happens when you power down the VM on
the destination host (= post migration), and launch it again there, from
zero. In that case, the firmware executable file comes from the
*destination host* (it was never persistently migrated from the source
host, i.e. never written out on the dst). It simply comes from the OVMF
package that had been installed on the destination host, by the
sysadmin. However, the varstore pflash does reflect the permanent result
of the previous migration. So this is where things can fall apart, if
both firmware binaries (on the src host and on the dst host) don't agree
about the internal structure of the varstore pflash.
My guess is that overtime we're going to need to find a way to handle
this, otherwise we're going to find people having to maintain old
versions of OVMF just to keep variable store compatiiblity.

Dave

Thanks
Laszlo
--
Dr. David Alan Gilbert / dgilbert@... / Manchester, UK