Re: [OVMF] resource assignment fails for passthrough PCI GPU


dann frazier
 

On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me. I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits. What would be the downside of bumping the
default aperture to, say, 48GB?

-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo

Join discuss@edk2.groups.io to automatically receive all group messages.