Date   

Problems in running EmulatorPkg.

Diego.Marcelino@...
 

I am facing some struggles trying running EmulatorPkg.

I am using a x86-64 machine using Ubuntu 18.04 and GCC5, install the dependencies libx11-dev libxext-dev according to https://github.com/tianocore/tianocore.github.io/wiki/EmulatorPkg.

When directly (./Host in Build folder) running Host I got the following output:

EDK II UNIX Host Emulation Environment from http://www.tianocore.org/edk2/
BootMode 0x00
OS Emulator passing in 128 KB of temp RAM at 0x40000000 to SEC
FD loaded from ../FV/FV_RECOVERY.fd at 0x102000000 contains SEC Core

0x102000400 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/EmulatorPkg/Sec/Sec/DEBUG/EmuSec.dll with entry point 0x10200163f
SEC Has Started
0x102002780 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll with entry point 0x10200c4cf
0x102014200 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/MdeModulePkg/Universal/ReportStatusCodeRouter/Pei/ReportStatusCodeRouterPei/DEBUG/ReportStatusCodeRouterPei.dll with entry point 0x102014fdb
0x102015c80 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/MdeModulePkg/Universal/StatusCodeHandler/Pei/StatusCodeHandlerPei/DEBUG/StatusCodeHandlerPei.dll with entry point 0x10201767a
PROGRESS CODE: V03020003 I0
0x10200ed00 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/MdeModulePkg/Universal/PCD/Pei/Pcd/DEBUG/PcdPeim.dll with entry point 0x102012c0a
Loading PEIM at 0x0010200EAC0 EntryPoint=0x00102012C0A PcdPeim.efi
PROGRESS CODE: V03020002 I0
Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
Register PPI Notify: 605EA650-C65C-42E1-BA80-91A52AB618C6
PROGRESS CODE: V03020003 I0
0x102018800 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/EmulatorPkg/BootModePei/BootModePei/DEBUG/BootModePei.dll with entry point 0x10201908b
Loading PEIM at 0x001020185C0 EntryPoint=0x0010201908B BootModePei.efi
PROGRESS CODE: V03020002 I0
Emu Boot Mode PEIM Loaded
Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410
PROGRESS CODE: V03020003 I0
0x102019b00 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/EmulatorPkg/AutoScanPei/AutoScanPei/DEBUG/AutoScanPei.dll with entry point 0x10201a4b3
Loading PEIM at 0x001020198C0 EntryPoint=0x0010201A4B3 AutoScanPei.efi
PROGRESS CODE: V03020002 I0
Emu Autoscan PEIM Loaded
PeiInstallPeiMemory MemoryBegin 0x41000000, MemoryLength 0x4000000
PROGRESS CODE: V03020003 I0
Temp Stack : BaseAddress=0x40000000 Length=0x10000
Temp Heap : BaseAddress=0x40010000 Length=0x10000
Total temporary memory: 131072 bytes.
temporary memory stack ever used: 65532 bytes.
temporary memory heap used for HobList: 5240 bytes.
temporary memory heap occupied by memory pages: 0 bytes.
Old Stack size 65536, New stack size 131072
Stack Hob: BaseAddress=0x41000000 Length=0x20000
Heap Offset = 0x1010000 Stack Offset = 0x1010000
0x44ff2240 Loading /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll with entry point 0x44ffbf8f
Segmentation fault (core dumped)

-------------------------------------------------------------------------------------------------------------------------------------------

When running using EmulatorPkg/build.sh run, it opens gdb terminal and the application is not running:

Initializing workspace
/home/diego/Desktop/edk2/BaseTools
Loading previous configuration from /home/diego/Desktop/edk2/Conf/BuildEnv.sh
WORKSPACE: /home/diego/Desktop/edk2
EDK_TOOLS_PATH: /home/diego/Desktop/edk2/BaseTools
CONF_PATH: /home/diego/Desktop/edk2/Conf
using prebuilt tools
Reading symbols from /home/diego/Desktop/edk2/Build/Emulator/DEBUG_GCC5/X64/Host...(no debugging symbols found)...done.
/home/diego/Desktop/edk2/EmulatorPkg/Unix/GdbRun:79: Error in sourced command file:
No symbol table is loaded. Use the "file" command.
(gdb) c
The program is not being run.
(gdb)

-------------------------------------------------------------------------------------------------------------------------------------------

I tried in both master branch and tag vUDK2018.


[EDK2] OS booting from SPI flash device on Leaf Hill-based new board

violet7027@...
 

Hi.
Could you please help to take a look at my problem?
I have designed Leaf Hill-based new board and testing now.
BIOS booting was done successfully, memory and most of peripherals are listed on the efi shell.
There are two SPI Flash memory on the board, the one is for OS and other one is for image files.

Q1. SPI flash for OS and image files are not accessible.(FS0, FS1)
Efi shell displays no map files
How can I get access to SPI flash?

Q2. How can I boot from SPI flash?
Booting was done successfully from USB memory stick, making file system and copying to SPI flash the OS image was done.
But I couldn't boot from SPI flash and there are no options in the boot order set up menu.
How can I make boot OS image from SPI flash?
Any help would be appreciated.

Thank you.


How to determine/confirm whether SCT test being run is against the intended driver

kusumakaralthi@...
 

Hi,
I am trying to run SCT test on a NIC(Network Interface Card) Option Rom/Driver. But I am not sure that the test is performed against my particular Option Rom driver or any other UEFI driver
I tried one debug driver by adding debug messages at multiple locations and tried SCT, but none of those debug messages are being printed during the test
Q1) How to confirm if the SCT is being run against my driver?
Q2) Can we provide Driver Handle to UEFI SCT test to test against?
Q3) I could not find much documentation on SCT in the UEFI forums. I want to look at specific documentation that if SCT throw warning/Error, which line of code/method in driver this Error is thrown at. Can you point me to the documentation?

--
Thanks,
Kusumakar Althi
Broadcom


Re: Examples opening and reading/writing a file with EDK2

Gao, Zhichao
 

Hi Alejadro,

I am trying to answer your questions. If anything missed, please feel free to let me know.

1. First I think CapsuleApp is a good example to consume the file protocol.
2. File handle like the standard C's file pointer, it has multi required protocol install on it, such as device path protocol, simple file protocol and so on.
We can do some operation on the file thru the file handle.
3. We can use DevicePathFromHandle to get the file's path from the file handle. And use ConvertDevicePathToText to convert the path to text mode to view.

All the above can be found in the CapsuleApp.

Thanks,
Zhichao

-----Original Message-----
From: discuss@edk2.groups.io [mailto:discuss@edk2.groups.io] On Behalf Of
alejandro.estay@gmail.com
Sent: Friday, November 22, 2019 9:54 AM
To: discuss@edk2.groups.io
Subject: [edk2-discuss] Examples opening and reading/writing a file with EDK2

Hi, I'm making a little UEFI app, just for check basic functionality of the firmware.
inside this app I want to load, read and write a file, binary or text. However I
can't find a "complete explanation" or examples about the use of the
procedures (EFI_FILE_PROTOCOL.Open(), EFI_FILE_PROTOCOL.Read()) from the
UEFI API (steps, what to check). The only thing I found was some little Uefi Shell
apps doing this using the shell API. However I would like to do it using the "bare
firmware" instead of loading the shell. For me, the most confusing part, is when
the program has to check the handle database to find the particular handle of
the file that is being opened. Also I have some doubts about how to check,
without the shell, what volume or partition would have the exact file I'm looking
for (i.e. what if 2 volumes have simmilar, or even identical root directories).

Thanks in advance


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Eduardo Habkost <ehabkost@...>
 

(+Jiri, +libvir-list)

On Fri, Nov 22, 2019 at 04:58:25PM +0000, Dr. David Alan Gilbert wrote:
* Laszlo Ersek (lersek@redhat.com) wrote:
(+Dave, +Eduardo)

On 11/22/19 00:00, dann frazier wrote:
On Tue, Nov 19, 2019 at 06:06:15AM +0100, Laszlo Ersek wrote:
On 11/19/19 01:54, dann frazier wrote:
On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me.
Good to hear, thanks.

I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits.
Right.

What would be the downside of bumping the
default aperture to, say, 48GB?
The placement of the aperture is not trivial (please see the code
comments in the linked commit). The base address of the aperture is
chosen so that the largest BAR that can fit in the aperture may be
naturally aligned. (BARs are whole powers of two.)

The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore
such an aperture would be aligned at 32 GB -- the lowest base address
(dependent on guest RAM size) would be 32 GB. Meaning that the aperture
would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys
address width.

32 GB is the largest aperture size that can work with 36-bit phys
address width; that's the aperture that ends at 64 GB exactly.
Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits?
"PCPU address width" is not a "function" of the available physical bits
-- it *is* the available physical bits. "PCPU" simply stands for
"physical CPU".

IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?
Maybe.

The current logic in OVMF works from the guest-physical address space
size -- as deduced from multiple factors, such as the 64-bit MMIO
aperture size, and others -- towards the guest-CPU (aka VCPU) address
width. The VCPU address width is important for a bunch of other purposes
in the firmware, so OVMF has to calculate it no matter what.

Again, the current logic is to calculate the highest guest-physical
address, and then deduce the VCPU address width from that (and then
expose it to the rest of the firmware).

Your suggestion would require passing the PCPU (physical CPU) address
width from QEMU/KVM into the guest, and reversing the direction of the
calculation. The PCPU address width would determine the VCPU address
width directly, and then the 64-bit PCI MMIO aperture would be
calculated from that.

However, there are two caveats.

(1) The larger your guest-phys address space (as exposed through the
VCPU address width to the rest of the firmware), the more guest RAM you
need for page tables. Because, just before entering the DXE phase, the
firmware builds 1:1 mapping page tables for the entire guest-phys
address space. This is necessary e.g. so you can access any PCI MMIO BAR.

Now consider that you have a huge beefy virtualization host with say 46
phys address bits, and a wimpy guest with say 1.5GB of guest RAM. Do you
absolutely want tens of *terabytes* for your 64-bit PCI MMIO aperture?
Do you really want to pay for the necessary page tables with that meager
guest RAM?

(Such machines do exist BTW, for example:

http://mid.mail-archive.com/9BD73EA91F8E404F851CF3F519B14AA8036C67B5@DGGEMI521-MBX.china.huawei.com
)

In other words, you'd need some kind of knob anyway, because otherwise
your aperture could grow too *large*.


(2) Exposing the PCPU address width to the guest may have nasty
consequences at the QEMU/KVM level, regardless of guest firmware. For
example, that kind of "guest enlightenment" could interfere with migration.

If you boot a guest let's say with 16GB of RAM, and tell it "hey friend,
have 40 bits of phys address width!", then you'll have a difficult time
migrating that guest to a host with a CPU that only has 36-bits wide
physical addresses -- even if the destination host has plenty of RAM
otherwise, such as a full 64GB.

There could be other QEMU/KVM / libvirt issues that I m unaware of
(hence the CC to Dave and Eduardo).
host physical address width gets messy. There are differences as well
between upstream qemu behaviour, and some downstreams.
I think the story is that:

a) Qemu default: 40 bits on any host
b) -cpu blah,host-phys-bits=true to follow the host.
c) RHEL has host-phys-bits=true by default

As you say, the only real problem with host-phys-bits is migration -
between say an E3 and an E5 xeon with different widths. The magic 40's
is generally wrong as well - I think it came from some ancient AMD,
but it's the default on QEMU TCG as well.
Yes, and because it affects live migration ability, we have two
constraints:
1) It needs to be exposed in the libvirt domain XML;
2) QEMU and libvirt can't choose a value that works for everybody
(because neither QEMU or libvirt know where the VM might be
migrated later).

Which is why the BZ below is important:


I don't think there's a way to set it in libvirt;
https://bugzilla.redhat.com/show_bug.cgi?id=1578278 is a bz asking for
that.

IMHO host-phys-bits is actually pretty safe; and makes most sense in a
lot of cases.
Yeah, it is mostly safe and makes sense, but messy if you try to
migrate to a host with a different size.


Dave


Thanks,
Laszlo


-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Eduardo


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Laszlo Ersek
 

On 11/22/19 17:58, Dr. David Alan Gilbert wrote:
* Laszlo Ersek (lersek@redhat.com) wrote:
(+Dave, +Eduardo)

On 11/22/19 00:00, dann frazier wrote:
On Tue, Nov 19, 2019 at 06:06:15AM +0100, Laszlo Ersek wrote:
On 11/19/19 01:54, dann frazier wrote:
On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me.
Good to hear, thanks.

I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits.
Right.

What would be the downside of bumping the
default aperture to, say, 48GB?
The placement of the aperture is not trivial (please see the code
comments in the linked commit). The base address of the aperture is
chosen so that the largest BAR that can fit in the aperture may be
naturally aligned. (BARs are whole powers of two.)

The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore
such an aperture would be aligned at 32 GB -- the lowest base address
(dependent on guest RAM size) would be 32 GB. Meaning that the aperture
would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys
address width.

32 GB is the largest aperture size that can work with 36-bit phys
address width; that's the aperture that ends at 64 GB exactly.
Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits?
"PCPU address width" is not a "function" of the available physical bits
-- it *is* the available physical bits. "PCPU" simply stands for
"physical CPU".

IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?
Maybe.

The current logic in OVMF works from the guest-physical address space
size -- as deduced from multiple factors, such as the 64-bit MMIO
aperture size, and others -- towards the guest-CPU (aka VCPU) address
width. The VCPU address width is important for a bunch of other purposes
in the firmware, so OVMF has to calculate it no matter what.

Again, the current logic is to calculate the highest guest-physical
address, and then deduce the VCPU address width from that (and then
expose it to the rest of the firmware).

Your suggestion would require passing the PCPU (physical CPU) address
width from QEMU/KVM into the guest, and reversing the direction of the
calculation. The PCPU address width would determine the VCPU address
width directly, and then the 64-bit PCI MMIO aperture would be
calculated from that.

However, there are two caveats.

(1) The larger your guest-phys address space (as exposed through the
VCPU address width to the rest of the firmware), the more guest RAM you
need for page tables. Because, just before entering the DXE phase, the
firmware builds 1:1 mapping page tables for the entire guest-phys
address space. This is necessary e.g. so you can access any PCI MMIO BAR.

Now consider that you have a huge beefy virtualization host with say 46
phys address bits, and a wimpy guest with say 1.5GB of guest RAM. Do you
absolutely want tens of *terabytes* for your 64-bit PCI MMIO aperture?
Do you really want to pay for the necessary page tables with that meager
guest RAM?

(Such machines do exist BTW, for example:

http://mid.mail-archive.com/9BD73EA91F8E404F851CF3F519B14AA8036C67B5@DGGEMI521-MBX.china.huawei.com
)

In other words, you'd need some kind of knob anyway, because otherwise
your aperture could grow too *large*.


(2) Exposing the PCPU address width to the guest may have nasty
consequences at the QEMU/KVM level, regardless of guest firmware. For
example, that kind of "guest enlightenment" could interfere with migration.

If you boot a guest let's say with 16GB of RAM, and tell it "hey friend,
have 40 bits of phys address width!", then you'll have a difficult time
migrating that guest to a host with a CPU that only has 36-bits wide
physical addresses -- even if the destination host has plenty of RAM
otherwise, such as a full 64GB.

There could be other QEMU/KVM / libvirt issues that I m unaware of
(hence the CC to Dave and Eduardo).
host physical address width gets messy. There are differences as well
between upstream qemu behaviour, and some downstreams.
I think the story is that:

a) Qemu default: 40 bits on any host
b) -cpu blah,host-phys-bits=true to follow the host.
c) RHEL has host-phys-bits=true by default

As you say, the only real problem with host-phys-bits is migration -
between say an E3 and an E5 xeon with different widths. The magic 40's
is generally wrong as well - I think it came from some ancient AMD,
but it's the default on QEMU TCG as well.

I don't think there's a way to set it in libvirt;
https://bugzilla.redhat.com/show_bug.cgi?id=1578278 is a bz asking for
that.

IMHO host-phys-bits is actually pretty safe; and makes most sense in a
lot of cases.
Thanks -- this is a useful piece of the puzzle to know. It seems that
the guest can learn about the guest-phys address width via CPUID.
(cpu_x86_cpuid() in "target/i386/cpu.c" consumes "cpu->phys_bits", which
seems to be set in x86_cpu_realizefn().)

Cheers!
Laszlo


Re: Examples opening and reading/writing a file with EDK2

Laszlo Ersek
 

On 11/22/19 02:54, alejandro.estay@gmail.com wrote:
Hi, I'm making a little UEFI app, just for check basic functionality
of the firmware. inside this app I want to load, read and write a
file, binary or text. However I can't find a "complete explanation" or
examples about the use of the procedures (EFI_FILE_PROTOCOL.Open(),
EFI_FILE_PROTOCOL.Read()) from the UEFI API (steps, what to check).
The only thing I found was some little Uefi Shell apps doing this
using the shell API. However I would like to do it using the "bare
firmware" instead of loading the shell. For me, the most confusing
part, is when the program has to check the handle database to find
the particular handle of the file that is being opened. Also I have
some doubts about how to check, without the shell, what volume or
partition would have the exact file I'm looking for (i.e. what if 2
volumes have simmilar, or even identical root directories).
First, you need to find the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL instance in
the protocol database that is right for your purposes. You could locate
this protocol instance for example with the LocateDevicePath() boot
service. There could be other ways for you to locate the right handle,
and then open the Simple File System protocol interface on that handle.

This really depends on your use case. It's your application that has to
know on what device (such as, what PCI(e) controller, what SCSI disk,
what ATAPI disk, what partition, etc) to look for the interesting file.

For example, if you simply check every EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
in the protocol database, and among those, you cannot distinguish two
from each other (because both look suitable), then you'll have to
investigate the device path protocol installed on each handle. You might
be able to make a decision based on the structure / semantics of the
device paths themselves. Alternatively, you might have to traverse the
device paths node by node, and open further protocol interfaces on the
corresponding handles, to ultimately pick the right
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL.


Once you have the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL interface open, you
need to call its OpenVolume() member function. See the UEFI spec for
details please. It will give you the EFI_FILE_PROTOCOL for the root
directory of that file system.

Once you got the EFI_FILE_PROTOCOL interface for the root directory, you
can call the Open() member function for opening files or directories
relative to the root directory. Either way, you'll get a new
EFI_FILE_PROTOCOL interface for the opened object (file or directory).
If you've opened a directory previously, then you can issue further
Open() calls for opening files or directories relative to *that*
(sub)directory.

In case you start with an EFI_DEVICE_PATH_PROTOCOL instance that
identifies a particular file in a particular filesystem, then the device
path protocol will contain *at least one* File Path Media Device Path
node. It is important that there may be more than one such device path
node, and the full pathname (within the filesystem), from the root
directory to the particular file, may be split over a number of device
path nodes.

For example, you could have just one File Path node containing
"\dir1\dir2\hello.txt". Or you could have three File Path nodes
containing "dir1", "dir2", "hello.txt", respectively. Or you could have
two File Path nodes containing "dir1\dir2\" and "hello.txt",
respectively.

In these cases, you'd need one, three, or two, EFI_FILE_PROTOCOL.Open()
calls, accordingly.

Alternatively, you'd need to concatenate the pathname fragments into a
whole pathname, making sure that there be precisely one backslash
separator between each pair of pathname components, and then issue a
single EFI_FILE_PROTOCOL.Open() call in the end.


You can find a helper function called EfiOpenFileByDevicePath() in
"MdePkg/Library/UefiLib/UefiLib.c".

A somewhat similar function is GetFileBufferByFilePath(), in
"MdePkg/Library/DxeServicesLib/DxeServicesLib.c".

Thanks,
Laszlo


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Dr. David Alan Gilbert <dgilbert@...>
 

* Laszlo Ersek (lersek@redhat.com) wrote:
(+Dave, +Eduardo)

On 11/22/19 00:00, dann frazier wrote:
On Tue, Nov 19, 2019 at 06:06:15AM +0100, Laszlo Ersek wrote:
On 11/19/19 01:54, dann frazier wrote:
On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me.
Good to hear, thanks.

I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits.
Right.

What would be the downside of bumping the
default aperture to, say, 48GB?
The placement of the aperture is not trivial (please see the code
comments in the linked commit). The base address of the aperture is
chosen so that the largest BAR that can fit in the aperture may be
naturally aligned. (BARs are whole powers of two.)

The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore
such an aperture would be aligned at 32 GB -- the lowest base address
(dependent on guest RAM size) would be 32 GB. Meaning that the aperture
would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys
address width.

32 GB is the largest aperture size that can work with 36-bit phys
address width; that's the aperture that ends at 64 GB exactly.
Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits?
"PCPU address width" is not a "function" of the available physical bits
-- it *is* the available physical bits. "PCPU" simply stands for
"physical CPU".

IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?
Maybe.

The current logic in OVMF works from the guest-physical address space
size -- as deduced from multiple factors, such as the 64-bit MMIO
aperture size, and others -- towards the guest-CPU (aka VCPU) address
width. The VCPU address width is important for a bunch of other purposes
in the firmware, so OVMF has to calculate it no matter what.

Again, the current logic is to calculate the highest guest-physical
address, and then deduce the VCPU address width from that (and then
expose it to the rest of the firmware).

Your suggestion would require passing the PCPU (physical CPU) address
width from QEMU/KVM into the guest, and reversing the direction of the
calculation. The PCPU address width would determine the VCPU address
width directly, and then the 64-bit PCI MMIO aperture would be
calculated from that.

However, there are two caveats.

(1) The larger your guest-phys address space (as exposed through the
VCPU address width to the rest of the firmware), the more guest RAM you
need for page tables. Because, just before entering the DXE phase, the
firmware builds 1:1 mapping page tables for the entire guest-phys
address space. This is necessary e.g. so you can access any PCI MMIO BAR.

Now consider that you have a huge beefy virtualization host with say 46
phys address bits, and a wimpy guest with say 1.5GB of guest RAM. Do you
absolutely want tens of *terabytes* for your 64-bit PCI MMIO aperture?
Do you really want to pay for the necessary page tables with that meager
guest RAM?

(Such machines do exist BTW, for example:

http://mid.mail-archive.com/9BD73EA91F8E404F851CF3F519B14AA8036C67B5@DGGEMI521-MBX.china.huawei.com
)

In other words, you'd need some kind of knob anyway, because otherwise
your aperture could grow too *large*.


(2) Exposing the PCPU address width to the guest may have nasty
consequences at the QEMU/KVM level, regardless of guest firmware. For
example, that kind of "guest enlightenment" could interfere with migration.

If you boot a guest let's say with 16GB of RAM, and tell it "hey friend,
have 40 bits of phys address width!", then you'll have a difficult time
migrating that guest to a host with a CPU that only has 36-bits wide
physical addresses -- even if the destination host has plenty of RAM
otherwise, such as a full 64GB.

There could be other QEMU/KVM / libvirt issues that I m unaware of
(hence the CC to Dave and Eduardo).
host physical address width gets messy. There are differences as well
between upstream qemu behaviour, and some downstreams.
I think the story is that:

a) Qemu default: 40 bits on any host
b) -cpu blah,host-phys-bits=true to follow the host.
c) RHEL has host-phys-bits=true by default

As you say, the only real problem with host-phys-bits is migration -
between say an E3 and an E5 xeon with different widths. The magic 40's
is generally wrong as well - I think it came from some ancient AMD,
but it's the default on QEMU TCG as well.

I don't think there's a way to set it in libvirt;
https://bugzilla.redhat.com/show_bug.cgi?id=1578278 is a bz asking for
that.

IMHO host-phys-bits is actually pretty safe; and makes most sense in a
lot of cases.

Dave


Thanks,
Laszlo


-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Gerd Hoffmann <kraxel@...>
 

Hi,

Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits? IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?
Yes. You can see it as "address sizes" in /proc/cpuinfo

Problem is this isn't reliable in virtual machines. qemu reports 40
bits physical even in case the host supports less. Intel hardware often
has 36 or 39 bits (depending on age). So if edk2 would go with the 40
bits (=> 1TB physical address space), then reserve -- for example --
topmost 25% of that (everything above 768 MB) for I/O things would
simply not work on a host with 39 (or less) bits physical address space
because the 64bit PCI bars would not be addressable by the CPU.

So edk2 tries to be as conservative as possible by default ...

cheers,
Gerd


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Laszlo Ersek
 

On 11/22/19 07:18, Gerd Hoffmann wrote:
Hi,

Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits? IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?
Yes. You can see it as "address sizes" in /proc/cpuinfo

Problem is this isn't reliable in virtual machines. qemu reports 40
bits physical even in case the host supports less. Intel hardware often
has 36 or 39 bits (depending on age). So if edk2 would go with the 40
bits (=> 1TB physical address space), then reserve -- for example --
topmost 25% of that (everything above 768 MB) for I/O things would
simply not work on a host with 39 (or less) bits physical address space
because the 64bit PCI bars would not be addressable by the CPU.

So edk2 tries to be as conservative as possible by default ...
Heh, now that you explain this, I *vaguely* recall it from discussions
conducted maybe years ago. :)

It's just as well that I wrote, in my sibling response, "There could be
other QEMU/KVM / libvirt issues that I m unaware of" ;)

Thanks!
Laszlo


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Laszlo Ersek
 

(+Dave, +Eduardo)

On 11/22/19 00:00, dann frazier wrote:
On Tue, Nov 19, 2019 at 06:06:15AM +0100, Laszlo Ersek wrote:
On 11/19/19 01:54, dann frazier wrote:
On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me.
Good to hear, thanks.

I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits.
Right.

What would be the downside of bumping the
default aperture to, say, 48GB?
The placement of the aperture is not trivial (please see the code
comments in the linked commit). The base address of the aperture is
chosen so that the largest BAR that can fit in the aperture may be
naturally aligned. (BARs are whole powers of two.)

The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore
such an aperture would be aligned at 32 GB -- the lowest base address
(dependent on guest RAM size) would be 32 GB. Meaning that the aperture
would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys
address width.

32 GB is the largest aperture size that can work with 36-bit phys
address width; that's the aperture that ends at 64 GB exactly.
Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits?
"PCPU address width" is not a "function" of the available physical bits
-- it *is* the available physical bits. "PCPU" simply stands for
"physical CPU".

IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?
Maybe.

The current logic in OVMF works from the guest-physical address space
size -- as deduced from multiple factors, such as the 64-bit MMIO
aperture size, and others -- towards the guest-CPU (aka VCPU) address
width. The VCPU address width is important for a bunch of other purposes
in the firmware, so OVMF has to calculate it no matter what.

Again, the current logic is to calculate the highest guest-physical
address, and then deduce the VCPU address width from that (and then
expose it to the rest of the firmware).

Your suggestion would require passing the PCPU (physical CPU) address
width from QEMU/KVM into the guest, and reversing the direction of the
calculation. The PCPU address width would determine the VCPU address
width directly, and then the 64-bit PCI MMIO aperture would be
calculated from that.

However, there are two caveats.

(1) The larger your guest-phys address space (as exposed through the
VCPU address width to the rest of the firmware), the more guest RAM you
need for page tables. Because, just before entering the DXE phase, the
firmware builds 1:1 mapping page tables for the entire guest-phys
address space. This is necessary e.g. so you can access any PCI MMIO BAR.

Now consider that you have a huge beefy virtualization host with say 46
phys address bits, and a wimpy guest with say 1.5GB of guest RAM. Do you
absolutely want tens of *terabytes* for your 64-bit PCI MMIO aperture?
Do you really want to pay for the necessary page tables with that meager
guest RAM?

(Such machines do exist BTW, for example:

http://mid.mail-archive.com/9BD73EA91F8E404F851CF3F519B14AA8036C67B5@DGGEMI521-MBX.china.huawei.com
)

In other words, you'd need some kind of knob anyway, because otherwise
your aperture could grow too *large*.


(2) Exposing the PCPU address width to the guest may have nasty
consequences at the QEMU/KVM level, regardless of guest firmware. For
example, that kind of "guest enlightenment" could interfere with migration.

If you boot a guest let's say with 16GB of RAM, and tell it "hey friend,
have 40 bits of phys address width!", then you'll have a difficult time
migrating that guest to a host with a CPU that only has 36-bits wide
physical addresses -- even if the destination host has plenty of RAM
otherwise, such as a full 64GB.

There could be other QEMU/KVM / libvirt issues that I m unaware of
(hence the CC to Dave and Eduardo).

Thanks,
Laszlo


-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo


Examples opening and reading/writing a file with EDK2

alejandro.estay@...
 

Hi, I'm making a little UEFI app, just for check basic functionality of the firmware. inside this app I want to load, read and write a file, binary or text. However I can't find a "complete explanation" or examples about the use of the procedures (EFI_FILE_PROTOCOL.Open(), EFI_FILE_PROTOCOL.Read()) from the UEFI API (steps, what to check). The only thing I found was some little Uefi Shell apps doing this using the shell API. However I would like to do it using the "bare firmware" instead of loading the shell. For me, the most confusing part, is when the program has to check the handle database to find the particular handle of the file that is being opened. Also I have some doubts about how to check, without the shell, what volume or partition would have the exact file I'm looking for (i.e. what if 2 volumes have simmilar, or even identical root directories).

Thanks in advance


Re: [OVMF] resource assignment fails for passthrough PCI GPU

dann frazier
 

On Tue, Nov 19, 2019 at 06:06:15AM +0100, Laszlo Ersek wrote:
On 11/19/19 01:54, dann frazier wrote:
On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me.
Good to hear, thanks.

I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits.
Right.

What would be the downside of bumping the
default aperture to, say, 48GB?
The placement of the aperture is not trivial (please see the code
comments in the linked commit). The base address of the aperture is
chosen so that the largest BAR that can fit in the aperture may be
naturally aligned. (BARs are whole powers of two.)

The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore
such an aperture would be aligned at 32 GB -- the lowest base address
(dependent on guest RAM size) would be 32 GB. Meaning that the aperture
would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys
address width.

32 GB is the largest aperture size that can work with 36-bit phys
address width; that's the aperture that ends at 64 GB exactly.
Thanks, yeah - now that I read the code comments that is clear (as
clear as it can be w/ my low level of base knowledge). In the commit you
mention Gerd (CC'd) had suggested a heuristic-based approach for
sizing the aperture. When you say "PCPU address width" - is that a
function of the available physical bits? IOW, would that approach
allow OVMF to automatically grow the aperture to the max ^2 supported
by the host CPU?

-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo


Design discussion for SEV-ES

Tom Lendacky <thomas.lendacky@...>
 

I'd like to be added to the TianoCore Design Meeting to discuss support
SEV-ES in OVMF.

Looking at the calendar, the meeting scheduled for December 12, 2019 would
be best.

Discussion length will depend on how much everyone understands the current
SEV support and the additional requirements of SEV-ES.

Thank you,
Tom Lendacky


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Laszlo Ersek
 

On 11/19/19 01:54, dann frazier wrote:
On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me.
Good to hear, thanks.

I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits.
Right.

What would be the downside of bumping the
default aperture to, say, 48GB?
The placement of the aperture is not trivial (please see the code
comments in the linked commit). The base address of the aperture is
chosen so that the largest BAR that can fit in the aperture may be
naturally aligned. (BARs are whole powers of two.)

The largest BAR that can fit in a 48 GB aperture is 32 GB. Therefore
such an aperture would be aligned at 32 GB -- the lowest base address
(dependent on guest RAM size) would be 32 GB. Meaning that the aperture
would end at 32 + 48 = 80 GB. That still breaches the 36-bit phys
address width.

32 GB is the largest aperture size that can work with 36-bit phys
address width; that's the aperture that ends at 64 GB exactly.

Thanks
Laszlo


-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo


Re: [OVMF] resource assignment fails for passthrough PCI GPU

dann frazier
 

On Fri, Nov 15, 2019 at 11:51:18PM +0100, Laszlo Ersek wrote:
On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38
Hi Laszlo,

Thanks for taking the time to describe this in detail! The -fw_cfg
option did avoid the problem for me. I also noticed that the above
commit message mentions the existence of a 24GB card as a reasoning
behind choosing the 32GB default aperture. From what you say below, I
understand that bumping this above 64GB could break hosts w/ <= 37
physical address bits. What would be the downside of bumping the
default aperture to, say, 48GB?

-dann

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo


Re: [OVMF] resource assignment fails for passthrough PCI GPU

Laszlo Ersek
 

On 11/15/19 19:56, dann frazier wrote:
Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563
By default, OVMF exposes such a 64-bit MMIO aperture for PCI MMIO BAR
allocation that is 32GB in size. The generic PciBusDxe driver collects,
orders, and assigns / allocates the MMIO BARs, but it can work only out
of the aperture that platform code advertizes.

Your GPU's region 1 is itself 32GB in size. Given that there are further
PCI devices in the system with further 64-bit MMIO BARs, the default
aperture cannot accommodate everything. In such an event, PciBusDxe
avoids assigning the largest BARs (to my knowledge), in order to
conserve the most aperture possible, for other devices -- hence break
the fewest possible PCI devices.

You can control the aperture size from the QEMU command line. You can
also do it from the libvirt domain XML, technically speaking. The knob
is experimental, so no stability or compatibility guarantees are made.
(That's also the reason why it's a bit of a hack in the libvirt domain XML.)

The QEMU cmdline options is described in the following edk2 commit message:

https://github.com/tianocore/edk2/commit/7e5b1b670c38

For example, to set a 64GB aperture, pass:

-fw_cfg name=opt/ovmf/X-PciMmio64Mb,string=65536

The libvirt domain XML syntax is a bit tricky (and it might "taint" your
domain, as it goes outside of the QEMU features that libvirt directly
maps to):

<domain
type='kvm'
xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
<qemu:commandline>
<qemu:arg value='-fw_cfg'/>
<qemu:arg value='opt/ovmf/X-PciMmio64Mb,string=65536'/>
</qemu:commandline>
</domain>

Some notes:

(1) The "xmlns:qemu" namespace definition attribute in the <domain> root
element is important. You have to add it manually when you add
<qemu:commandline> and <qemu:arg> too. Without the namespace
definition, the latter elements will make no sense, and libvirt will
delete them immediately.

(2) The above change will grow your guest's physical address space to
more than 64GB. As a consequence, on your *host*, *if* your physical CPU
supports nested paging (called "ept" on Intel and "npt" on AMD), *then*
the CPU will have to support at least 37 physical address bits too, for
the guest to work. Otherwise, the guest will break, hard.

Here's how to verify (on the host):

(2a) run "egrep -w 'npt|ept' /proc/cpuinfo" --> if this does not produce
output, then stop reading here; things should work. Your CPU does not
support nested paging, so KVM will use shadow paging, which is slower,
but at least you don't have to care about the CPU's phys address width.

(2b) otherwise (i.e. when you do have nested paging), run "grep 'bits
physical' /proc/cpuinfo" --> if the physical address width is >=37,
you're good.

(2c) if you have nested paging but exactly 36 phys address bits, then
you'll have to forcibly disable nested paging (assuming you want to run
a guest with larger than 64GB guest-phys address space, that is). On
Intel, issue:

rmmod kvm_intel
modprobe kvm_intel ept=N

On AMD, go with:

rmmod kvm_amd
modprobe kvm_amd npt=N

Hope this helps,
Laszlo


[OVMF] resource assignment fails for passthrough PCI GPU

dann frazier
 

Hi,
I'm trying to passthrough an Nvidia GPU to a q35 KVM guest, but UEFI
is failing to allocate resources for it. I have no issues if I boot w/
a legacy BIOS, and it works fine if I tell the linux guest to do the
allocation itself - but I'm looking for a way to make this work w/
OVMF by default.

I posted a debug log here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563/+attachment/5305740/+files/q35-uefidbg.log

Linux guest lspci output is also available for both seabios/OVMF boots here:
https://bugs.launchpad.net/ubuntu/+source/edk2/+bug/1849563

-dann


Re: Establish network and run 'ping'

King Sumo
 

Reconnecting the Intel driver was without success. Unloading the intel driver, I do not get it back to reload as I donĀ“t know how to reload a driver of which I do not have an efi-file...
TIP: you can use the FvSimpleFileSystem.efi module to mount the Firmware Volume of your BIOS and then locate the EFI drivers / files.
(build FvSimpleFileSystem from edk2 sources)

E.g.
load FvSimpleFileSystem.efi
FS0:
dir
Directory of: FS0:\
00/00/0000 00:00 r 11,040 FbGop.efi
00/00/0000 00:00 r 12,446 7BB28B99-61BB-11D5-9A5D-0090273FC14D
00/00/0000 00:00 r 918,880 Shell.efi
00/00/0000 00:00 r 55,040 dpDynamicCommand.efi
00/00/0000 00:00 r 35,744 tftpDynamicCommand.efi
00/00/0000 00:00 r 24,704 OhciDxe.efi
00/00/0000 00:00 r 14,624 UsbMassStorageDxe.efi
00/00/0000 00:00 r 19,680 UsbKbDxe.efi
00/00/0000 00:00 r 21,728 UsbBusDxe.efi
00/00/0000 00:00 r 35,392 XhciDxe.efi
00/00/0000 00:00 r 22,656 EhciDxe.efi
00/00/0000 00:00 r 20,032 UhciDxe.efi
00/00/0000 00:00 r 15,328 SdDxe.efi
...


Re: Establish network and run 'ping'

Laszlo Ersek
 

On 11/05/19 11:47, Tomas Pilar (tpilar) wrote:
I am rather surprised that the default value for Network Stack is disabled on a platform. If the platform has a working implementation, I would strongly suggest you use that.

Otherwise you'll probably need to spend a lot more time poking around and familiarising yourself with the environment and the individual modules that comprise the network stack. Also note that platform vendors often modify the upstream network stack code to add new features or optimise the way it works on their hardware.
Agreed -- if there is a platform-specific HII knob in the Setup UI, then
it can control anything at all.

Your question is very generic and not something I can walk you through using email (maybe someone else here can), but I am happy to try and answer more specific questions when you have any (though admittedly I am not an expert on the network stack).

If you do want to learn more and play around, I would suggest starting with OVMF, rather than a platform, for a number of different reasons.
OVMF *is* a firmware platform, it's just not a physical one. :)

(But, of course, I agree with you -- OVMF is fully open source, the
"boards" underneath are fully open source (QEMU, KVM, Xen), and having
software, as opposed to hardware, beneath the software that you want to
debug, is helpful.)

Thanks
Laszlo

821 - 840 of 895