[RFC] fix for virt-install failure with OVMF?


Aaron Young
 

Hello, we have found what we think is a BUG in OVMF, but wanted to run it by the rfc list first to confirm.

We have discovered that the following virt-install command causes the latest OVMF code to fail to boot into the installer ISO.

virt-install --boot uefi --name Guest1 --ram 4096 --vcpus 1 --disk
path=/Disks/Guest1_disk.img --location /ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso --network bridge=virbr0,model=virtio --os-type linux --noreboot --boot=hd,cdrom,loader=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,loader_ro=yes,loader_type=pflash,loader_secure=no,nvram=/Images/OVMF_VARS.pure-efi.Guest1.fd --video vga --graphics vnc,port=5999

Instead of booting to the installer ISO, we drop into the kernel shell like so:
...

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick
or /boot
after mounting them and attach it to a bug report.


:/# ls /sys/block
:/#

---

This change to the code allows this virt-install command (above) to succeed:

diff --git a/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c b/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c
index 45d0ee9..997acaf 100644
--- a/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c
+++ b/OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c
@@ -1514,14 +1514,14 @@ PlatformBootManagerAfterConsole (
Tcg2PhysicalPresenceLibProcessRequest (NULL);

//
- // Process QEMU's -kernel command line option
+ // Perform some platform specific connect sequence
//
- TryRunningQemuKernel ();
+ PlatformBdsConnectSequence ();

//
- // Perform some platform specific connect sequence
+ // Process QEMU's -kernel command line option
//
- PlatformBdsConnectSequence ();
+ TryRunningQemuKernel ();

EfiBootManagerRefreshAllBootOption ();

i.e. if we move the TryRunningQemuKernel() call to after PlatformBdsConnectSequence(), it works... i.e. the virt-install commands boots to the installer ISO and all is good.

What it appears to be happening is the CD/IDE device for the ISO is not being connected with the existing code. The code change allows the CD to be connected before we call TryRunningQemuKernel(). With the code change we see this in the debug log, but without the code change we do not:
< Found Mass Storage device: PciRoot(0x0)/Pci(0x1,0x1)
< SataControllerStart START
< InstallProtocolInterface: A1E37052-80D9-4E65-A317-3E9A55C43EC9 BEF07EA0
< SataControllerStart END status = Success
< ==AtaAtapiPassThru Start== Controller = BEFA3C98
< [secondary] channel [master] [cdrom ] device
< CalculateBestPioMode: AdvancedPioMode = 3
< IdeInitCalculateMode: PioMode = 3
< CalculateBestUdmaMode: DeviceUDmaMode = 203F
< IdeInitCalculateMode: UdmaMode = 5


My question is: Is this a legitimate change to the boot flow to OVMF? Is there a reason that TryRunningQemuKernel() is currently called prior to PlatformBdsConnectSequence()? (Other than just an optimization to reduce unnecessary connections)? I fear that this change in the boot flow fixes this case but may break something else.

For completeness, here is the qemu command that is exec'd by the virt-install command for reference:
/usr/bin/qemu-system-x86_64 -name guest=Guest1,debug-threads=on -S -object secret,id=masterKey0,format
=raw,file=/var/lib/libvirt/qemu/domain-5-Guest1/master-key.aes -machine pc-i440fx-3.1,accel=kvm,usb=of
f,dump-guest-core=off -cpu Skylake-Client-IBRS -drive file=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,if=pf
lash,format=raw,unit=0,readonly=on -drive file=/Images/OVMF_VARS.pure-efi.G
uest1.fd,if=pflash,format=raw,unit=1 -m 4096 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -u
uid 5a3a191d-8cac-4400-80bb-59136bd05580 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd
=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global
kvm-pit.lost_tick_policy=delay -no-hpet -no-reboot -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.dis
able_s4=1 -boot strict=on -kernel /var/lib/libvirt/boot/virtinst-vmlinuz.jdC2ks -initrd /var/lib/libvi
rt/boot/virtinst-initrd.img.aHyTeQ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device ich9-u
sb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 -device ich9-usb-uhci2,master
bus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pc
i.0,addr=0x4.0x2 -drive file=/Disks/Guest1_disk.img,format=raw,if=none,id=d
rive-ide0-0-0 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file
=/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso,format=raw,if=none,id=drive-i
de0-0-1,readonly=on -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=28
,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:04:f4:9e,
bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc
127.0.0.1:99 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=ballo
on0,bus=pci.0,addr=0x5 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=den
y -msg timestamp=on

Any comments/suggestions/etc. are welcome. I plan to send the patch to the devel list in the next day or so.

Thanks in advance,

-Aaron Young


Laszlo Ersek
 

On 04/22/20 01:12, aaron.young@oracle.com wrote:

Hello, we have found what we think is a BUG in OVMF, but wanted to run
it by the rfc list first to confirm.

We have discovered that the following virt-install command causes the
latest OVMF code to fail to boot into the installer ISO.

virt-install --boot uefi --name Guest1 --ram 4096 --vcpus 1 --disk
path=/Disks/Guest1_disk.img --location
/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso --network
bridge=virbr0,model=virtio --os-type linux --noreboot
--boot=hd,cdrom,loader=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,loader_ro=yes,loader_type=pflash,loader_secure=no,nvram=/Images/OVMF_VARS.pure-efi.Guest1.fd
--video vga --graphics vnc,port=5999

Instead of booting to the installer ISO, we drop into the kernel shell
like so:
This is by-design. The behavior is expected, and your virt-install
command line is wrong.

The "--location" parameter invokes a direct kernel boot. When you pass a
local ISO image with "--location", that makes no difference in this
regard; it's still a direct kernel boot. The virt-install documentation
explains:

"""
-l LOCATION
--location OPTIONS
Distribution tree installation source. virt-install can
recognize certain distribution trees and fetches a bootable
kernel/initrd pair to launch the install.
[...]
--location allows things like --extra-args for kernel
arguments, and using --initrd-inject. If you want to use
those options with CDROM media, you have a few options:

* Run virt-install as root and do --location ISO

* Mount the ISO at a local directory, and do --location
DIRECTORY
[...]
DIRECTORY
Path to a local directory containing an installable
distribution image. Note that the directory will not be
accessible by the guest after initial boot, so the OS
installer will need another way to access the rest of
the install media.

ISO Mount the ISO and probe the directory. This requires
running virt-install as root, and has the same VM
access caveat as DIRECTORY.
"""

In other words, with your above command line, you are not booting the
ElTorito UEFI boot image that's embedded in the ISO -- you are not
performing a "UEFI CD-ROM boot". Instead, virt-install mounts the ISO
image, locates the kernel (vmlinuz) and initrd files in the directory
tree, and launches a guest with direct (that is, fw_cfg) kernel boot.

[snip]

For completeness, here is the qemu command that is exec'd by the
virt-install command for reference:
Right, please see the "-kernel" and "-initrd" options on the cmdline:

/usr/bin/qemu-system-x86_64 [...] -kernel
/var/lib/libvirt/boot/virtinst-vmlinuz.jdC2ks -initrd
/var/lib/libvirt/boot/virtinst-initrd.img.aHyTeQ [...]
Those are temporary kernel and initrd files; extracted from your ISO.
The way OVMF reacts to these QEMU options is expected.

Remove the "--location" option from your virt-install command line.
Instead, use

--disk path=/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso,device=cdrom

Further comments I have:

* "--boot uefi" is not needed if you spell out "--boot
loader=/.../OVMF_CODE.fd,loader_ro=yes,loader_type=pflash,nvram_template=/.../OVMF_VARS.fd".

"--boot uefi" is a shorthand for the latter, relying on libvirtd and
its configuration to locate a firmware executable (and a compatible
variable store template) for you.

* "--boot loader_secure=no" is the default, so no need to spell it out.

* Not sure what "--noreboot" is useful for, I never use it.

* Setting the boot order with "--boot=hd,cdrom" is legacy BIOS style;
it's best not to use it with UEFI. Instead, use the "boot_order"
property with the individual "--disk" options.

* "--vcpus 1" is also the default, no need to spell it out.

* The "nvram=..." property for "--boot" does not seem very useful. It
means that you want to reuse a pre-existent variable store file with
the newly installed domain. This is useful only in exceptional cases;
normally you want libvirtd to create the new domain's variable store
from the variable store template given with "nvram_template".

So ultimately I would try:

virt-install \
--name Guest1 \
--ram 4096 \
--disk path=/Disks/Guest1_disk.img,size=10,format=qcow2,boot_order=1 \
--disk path=/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso,device=cdrom,readonly,boot_order=2 \
--network bridge=virbr0,model=virtio \
--os-type linux \
--boot loader=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,loader_ro=yes,loader_type=pflash,nvram_template=/usr/share/OVMF/OVMF_VARS.pure-efi.fd \
--video vga \
--graphics vnc,port=5999

Hope this helps,
Laszlo


Aaron Young
 

Hi Lazslo, thanks for the comments.

My apologies - the actual virt-install command our Q/A team is using is this (which has extra args for a kick start file):

virt-install --name guest-31156337 --memory 4096 --vcpus 4 --disk path=/guest-31156337/guest-31156337-kvm.img,size=10,device=disk,bus=virtio,sparse=yes --location /ISO/OracleLinux-R7-U7-Server-x86_64-dvd.iso --nographics --initrd-inject=/guest-31156337/log/guest-31156337-ks.cfg --network bridge=virbr0,model=virtio --os-type linux --os-variant ol7.7 --noreboot --boot=hd,cdrom,loader=//usr/share/OVMF/OVMF_CODE.pure-efi-guest-31156337.fd,loader_ro=yes,loader_type=pflash,loader_secure=no,nvram=/usr/share/OVMF/OVMF_VARS.pure-efi-guest-31156337.fd --extra-args="ks=file:/guest-31156337-ks.cfg ip=dhcp console=tty0 console=ttyS0,115200n8

I had changed the args (below) in an effort to simplify things (i.e. I had removed the --extra-args and --initrd-inject args for the kick start file which I had thought were not germane to the issue).

Per the virt-install docs (which you site below) and the other examples on-line to use virt-install with a kickstart (i.e. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_deployment_and_administration_guide/sect-guest_virtual_machine_installation_overview-creating_guests_with_virt_install), the --location arg seems to be necessary to use a kickstart file (?).

Do you have any further comments/suggestions in light of needing to use a kickstart file and thus the --location arg?

Thanks again in advance,

-Aaron

On 04/22/2020 11:33 AM, Laszlo Ersek wrote:
On 04/22/20 01:12, aaron.young@oracle.com wrote:
Hello, we have found what we think is a BUG in OVMF, but wanted to run
it by the rfc list first to confirm.

We have discovered that the following virt-install command causes the
latest OVMF code to fail to boot into the installer ISO.

virt-install --boot uefi --name Guest1 --ram 4096 --vcpus 1 --disk
path=/Disks/Guest1_disk.img --location
/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso --network
bridge=virbr0,model=virtio --os-type linux --noreboot
--boot=hd,cdrom,loader=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,loader_ro=yes,loader_type=pflash,loader_secure=no,nvram=/Images/OVMF_VARS.pure-efi.Guest1.fd
--video vga --graphics vnc,port=5999

Instead of booting to the installer ISO, we drop into the kernel shell
like so:
This is by-design. The behavior is expected, and your virt-install
command line is wrong.

The "--location" parameter invokes a direct kernel boot. When you pass a
local ISO image with "--location", that makes no difference in this
regard; it's still a direct kernel boot. The virt-install documentation
explains:

"""
-l LOCATION
--location OPTIONS
Distribution tree installation source. virt-install can
recognize certain distribution trees and fetches a bootable
kernel/initrd pair to launch the install.
[...]
--location allows things like --extra-args for kernel
arguments, and using --initrd-inject. If you want to use
those options with CDROM media, you have a few options:

* Run virt-install as root and do --location ISO

* Mount the ISO at a local directory, and do --location
DIRECTORY
[...]
DIRECTORY
Path to a local directory containing an installable
distribution image. Note that the directory will not be
accessible by the guest after initial boot, so the OS
installer will need another way to access the rest of
the install media.

ISO Mount the ISO and probe the directory. This requires
running virt-install as root, and has the same VM
access caveat as DIRECTORY.
"""

In other words, with your above command line, you are not booting the
ElTorito UEFI boot image that's embedded in the ISO -- you are not
performing a "UEFI CD-ROM boot". Instead, virt-install mounts the ISO
image, locates the kernel (vmlinuz) and initrd files in the directory
tree, and launches a guest with direct (that is, fw_cfg) kernel boot.

[snip]

For completeness, here is the qemu command that is exec'd by the
virt-install command for reference:
Right, please see the "-kernel" and "-initrd" options on the cmdline:

/usr/bin/qemu-system-x86_64 [...] -kernel
/var/lib/libvirt/boot/virtinst-vmlinuz.jdC2ks -initrd
/var/lib/libvirt/boot/virtinst-initrd.img.aHyTeQ [...]
Those are temporary kernel and initrd files; extracted from your ISO.
The way OVMF reacts to these QEMU options is expected.

Remove the "--location" option from your virt-install command line.
Instead, use

--disk path=/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso,device=cdrom

Further comments I have:

* "--boot uefi" is not needed if you spell out "--boot
loader=/.../OVMF_CODE.fd,loader_ro=yes,loader_type=pflash,nvram_template=/.../OVMF_VARS.fd".

"--boot uefi" is a shorthand for the latter, relying on libvirtd and
its configuration to locate a firmware executable (and a compatible
variable store template) for you.

* "--boot loader_secure=no" is the default, so no need to spell it out.

* Not sure what "--noreboot" is useful for, I never use it.

* Setting the boot order with "--boot=hd,cdrom" is legacy BIOS style;
it's best not to use it with UEFI. Instead, use the "boot_order"
property with the individual "--disk" options.

* "--vcpus 1" is also the default, no need to spell it out.

* The "nvram=..." property for "--boot" does not seem very useful. It
means that you want to reuse a pre-existent variable store file with
the newly installed domain. This is useful only in exceptional cases;
normally you want libvirtd to create the new domain's variable store
from the variable store template given with "nvram_template".

So ultimately I would try:

virt-install \
--name Guest1 \
--ram 4096 \
--disk path=/Disks/Guest1_disk.img,size=10,format=qcow2,boot_order=1 \
--disk path=/ISO/OracleLinux-R7-U4-Server-x86_64-dvd.iso,device=cdrom,readonly,boot_order=2 \
--network bridge=virbr0,model=virtio \
--os-type linux \
--boot loader=/usr/share/OVMF/OVMF_CODE.pure-efi.fd,loader_ro=yes,loader_type=pflash,nvram_template=/usr/share/OVMF/OVMF_VARS.pure-efi.fd \
--video vga \
--graphics vnc,port=5999

Hope this helps,
Laszlo



Laszlo Ersek
 

Hi Aaron.

(I'm responding through the groups.io webui because it's way too late for a full email fetch now. So the threading & quoting will most likely be broken. Sorry about that.)

The "--initrd-inject" and "--extra-args" options are very relevant in this case. (And yes they do justify using --location.) The idea is that "--initrd-inject" modifies the temporary copy of the initrd that was copied out of the ISO (which was mounted as a local directory containing a distro install tree). "--initrd-inject" then adds the "guest-31156337-ks.cfg" file to the root of the initrd.

The "--extra-args" option goes with it. It creates an "-append" option on the QEMU command line. It tells the kernel [*] to look for a kickstart file by the name of "guest-31156337-ks.cfg" in the initial ramdisk.

[*] More precisely, the guest kernel doesn't care about "ks=..."; it's the user-space installer program (Anaconda), launched from the initrd by the kernel, that consumes "ks=..." from the kernel command line.

So these virt-install options are actually crucial (and so are the contents of the kickstart file), because they tell the guest installer what to do, after OVMF launches the guest kernel.

What devices OVMF connects in the BDS phase (that is, before ExitBootServices()), should be irrelevant wrt. how Anaconda locates a kickstart file in the initial ramdisk, and how Anaconda processes the looked-up kickstart file.

So with the new information available, I agree that "--location" is needed, because you, indeed, do *not* want a "UEFI CD-ROM boot". Instead you want a direct kernel boot, with a kickstart file injected into the initrd from the host side. However, I think OVMF's current code still does the right thing in that case, because the UEFI drivers that may or may not have connected e.g. a SATA device in the BDS phase should be totally irrelevant after ExitBootServices(). And Anaconda runs (and consumes the kickstart file) way after ExitBootServices(). The set of devices that were connected under UEFI could play a role for the kernel's UEFI stub, yes, but I don't think the kernel's UEFI stub plays any role in the installation you're trying to perform.

I'd suggest looking into the kernel log, the systemd log, and the installer logs, produced in the guest. From the log you pasted before ("Entering emergency mode. Exit the shell to continue."), it seems like the initrd is broken, and Anaconda cannot be started. This kind of error message is usually printed when the root filesystem is busted, and the kernel cannot "switch root" from the initrd to the on-disk root filesystem.

Regarding why it seems to work when you reorder PlatformBdsConnectSequence() vs. TryRunningQemuKernel() -- I have no idea. In that case, is your kickstart file really correctly processed by Anaconda?

I could imagine a (virtual) hardware problem -- assuming the firmware SATA driver doesn't ever initialize the hardware, the kernel driver could fail to access the disk, or some such... But that would require either QEMU's device model or the kernel's driver to be horribly broken. Do you see identical behavior if you use virtio-scsi?

Thanks
Laszlo