Proposal to add support for PCIe enumeration protocols in PEI


Albecki, Mateusz
 

Hi All,

I would like to propose a new protocol which would standardize a way a device driver PEIM communicates with PEIM that discovers PCI devices and manages their resources(later called enumerator PEIM).


1. Background - why we need access to PCIe devices in PEI

As I understand it, historically PEI was supposed to do only minimum required initialization to bring up RAM and as soon as that completed it was supposed to transition to DXE phase which offered much more robust services and a working driver model. With introduction of S3 resume which would skip DXE phase we had to push more and more responsibilities into PEI. Current features supported in PEI which require access to PCI devices are:
* S3 resume opal unlock (bios needs to unlock storage with the saved password)
* Capsule update flows (capsule is located on mass storage)
* Boot from block devices (EFI variables are stored on those)


1. Current solution



To meet those use cases EDK2 maintains a set of PEI drivers for mass storage devices such as UFS(Bus/Pci/UfsPciHcPei), AHCI(Bus/Ata/AhciPei), NVMe(Bus/Pci/NvmExpressPei) etc. Each of those PEIMs uses custom PPI interfaces to interact with HW. For instance AHCI PEIM uses EDKII_ATA_AHCI_HOST_CONTROLLER_PPI while NVMe uses EDKII_NVM_EXPRESS_HOST_CONTROLLER_PPI. Both PPIs offer the same functionalities(get BAR, get device path) just with different names. It is the platform code responsibility to install those PPIs and ensure the proper PEIM dispatch sequence. For instance in case of multiple AHCI controllers platform code needs to ensure that all PPIs are installed before AhciPei PEIM is allowed to execute as it won't register for notification on new devices discovered on the system.


1. Problems with current solution.

Enumerator PEIM needs to track the class code of the discovered PCI device and select appropriate PPI to install. Not only is that cumbersome for the enumerator PEIM, it also makes it harder to scale the system with support for new device classes. For instance if in the future we want to support another device class in PEI we will need to define yet another PPI in EDK2 and modify the enumerator PEIM to support it. This makes rollout of new features in PEI harder and is not in-line with what DXE PCI stack offers. Current solution also lacks a model on what the sequence of events in PCI enumeration should look like.

2. Solution

Given that we can't get rid of PEI or replace PPIs with protocols in PEI easily, the solution I would like to propose is to introduce a EDKII_PCI_DEVICE_PPI which would leverage EFI_PCI_IO_PROTOCOL(to allow access to hardware) and EFI_DEVICE_PATH_PROTOCOL (used by opal unlock to match the password with the device). This new PPI could be used by device PEIMs instead of the old device class specific PPIs. This would bring the PEI solution more in line with DXE stack and would allow to seamlessly extend the system with support for new device classes. Definition of the PPI can be found below. Protocols from DXE are used intentionally to reduce unnecessary delta between 2 environments.

On the sequencing side I propose the device PEIMs should install a notify for EDKII_PCI_DEVICE_PPI installation. This would ensure that every time we discover new device an appropriate device PEIM is loaded on it. Device PEIMs also should have EDKII_DEVICE_PPI in it's depex section.

For example let's assume a system which boots from UFS and has the EFI variable store located on it, the system also has SATA drive which is locked with opal and needs to be unlocked on S3 resume. Platform BIOS writers are aware of this configuration and have put UFS block io device PEIM and PCI enumeration PEIM into pre-mem firmware volumes while AHCI device PEIM is put into post mem firmware volume.

enumerator PEIM and UFS device PEIM are dispatched in pre-memory -> assuming that UFS PEIM runs first it checks in its entry point if EDKII_PCI_DEVICE_PPI is inatalled and if not it simply installs notify and exits -> enumerator PEIM scans the PCI tree and assigns resources to UFS controller, EDKII_PCI_DEVICE_PEIM is installed -> UFS PEIM notify is invoked it checks if PPI matches its device class and proceeds with UFS PPIs installation which makes EFI varstore available -> platform proceeds with boot and memory training -> post-mem modules are dispatched -> AHCI device PEIM executes and installs its notify -> enumerator PEIM is invoked again through a platform specific mechanism -> on another pass enumerator PEIM will assign resources to AHCI devices and install corresponding EDKII_PCI_DEVICE_PPI -> both UFS and AHCI device PEIMs notifies are invoked -> AHCI device PEIM finds its device and proceeds with opal unlock.

If we can agree that such a PPI is a good addition I would proceed with the submit of this PPI into EDK2 followed by patches to add support for this PPI in device PEIMs in the near future.

Patch:

From c573594b4d3adf3c750f63f5f7ac333549dbc1de Mon Sep 17 00:00:00 2001
From: Mateusz Albecki mateusz.albecki@intel.com<mailto:mateusz.albecki@intel.com>
Date: Thu, 28 Oct 2021 14:31:10 +0200
Subject: [PATCH 1/1] MdeModulePkg: Add EDKII_PCI_DEVICE_PPI definition

New PPI will allow developers to easily add new device PEIMs.
---
MdeModulePkg/Include/Ppi/PciDevicePpi.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
create mode 100644 MdeModulePkg/Include/Ppi/PciDevicePpi.h

diff --git a/MdeModulePkg/Include/Ppi/PciDevicePpi.h b/MdeModulePkg/Include/Ppi/PciDevicePpi.h
new file mode 100644
index 0000000000..ab2e5c6b07
--- /dev/null
+++ b/MdeModulePkg/Include/Ppi/PciDevicePpi.h
@@ -0,0 +1,12 @@
+#ifndef _EDKII_PCI_DEVICE_PPI_H_
+#define _EDKII_PCI_DEVICE_PPI_H_
+
+#include <Protocol/PciIo.h>
+#include <Protocol/DevicePath.h>
+
+typedef struct {
+ EFI_PCI_IO_PROTOCOL PciIo;
+ EFI_DEVICE_PATH_PROTOCOL *DevicePath;
+} EDKII_PCI_DEVICE_PPI;
+
+#endif



Thanks,
Mateusz
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.


Ni, Ray
 

1. Background - why we need access to PCIe devices in PEI
I agree. It also matches today's SBL and coreboot behavior that enums PCI in early stage.


-> enumerator PEIM scans the PCI tree and assigns resources to UFS controller, EDKII_PCI_DEVICE_PEIM is installed
...
-> on another pass enumerator PEIM will assign resources to AHCI devices and install corresponding EDKII_PCI_DEVICE_PPI
Can you please explain how the enumerator PEIM firstly only assigns resources to UFS, and then another pass it assigns resources to AHCI?
Usually, one pass of enumeration assigns resources to all PCI devices.

Thanks,
Ray


Albecki, Mateusz
 

That's my bad on the extra resource assignment. While I believe it is possible to use less memory-efficient algorithm to assign resources on second pass there should be no need for such behavior on majority of systems. What the flow should look like is:

-> enumerator PEIM scans the PCI tree and assigns resources to all controllers and installs EDKII_PCI_DEVICE_PPI for UFS controller
-> on another pass enumerator PEIM will install corresponding EDKII_PCI_DEVICE_PPI for AHCI controller

The reason why platform might want to do multiple passes is to save CAR space and only install EDKII_PCI_DEVICE_PPI for required devices in pre-mem and that is what I wanted to emphasize with this flow. Of course for small configurations with few controllers that need to be enumerated it should be possible to do one pass and install PPI for every device.


Abner Chang
 

Hi Mateusz,
Is the PEI PCI enumeration on demand according to the platform configuration? Or you are proposing the PCI enumeration is mandatory for each boot?
No matter which one, that would be good if the PCI device information that enumerated in PEI phase can be passed to DXE phase, then we don’t need the second time PCI enumeration in DXE phase if the device is ever enumerated in PEI.

Another benefit of PCI device information:
The PCI device information (e.g. PCI Device Table) can provide the information such as PCI BDF, device class, device type, embedded or on slot, IRQ routing, and etc.(maybe incorporate with platform libraries). This table can be leveraged by other upper layer EFI drivers (DXE/PEI/SMM) for other purposes without accessing PCI registers through PCI I/O protocol. (for example, create the SMBIOS table, ACPI table, ACPI methods and etc.).

Thanks
Abner

-----Original Message-----
From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On Behalf Of Albecki,
Mateusz
Sent: Friday, October 29, 2021 9:02 PM
To: Ni@mx0b-002e3701.pphosted.com; Ray <ray.ni@intel.com>;
rfc@edk2.groups.io
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration
protocols in PEI

That's my bad on the extra resource assignment. While I believe it is possible
to use less memory-efficient algorithm to assign resources on second pass
there should be no need for such behavior on majority of systems. What the
flow should look like is:

-> enumerator PEIM scans the PCI tree and assigns resources to all
controllers and installs EDKII_PCI_DEVICE_PPI for UFS controller
-> on another pass enumerator PEIM will install corresponding
EDKII_PCI_DEVICE_PPI for AHCI controller

The reason why platform might want to do multiple passes is to save CAR
space and only install EDKII_PCI_DEVICE_PPI for required devices in pre-
mem and that is what I wanted to emphasize with this flow. Of course for
small configurations with few controllers that need to be enumerated it
should be possible to do one pass and install PPI for every device.




Albecki, Mateusz
 

Hi Abner,

The way I see it is that platform should be in control of starting the enumeration and which devices it wants enumerated(by enumerated I mean resources assigned and PPI installed). Right now our idea is to simply have our own PEIM implementation that will hold Intel specific logic inside of it and in the next step propose a common enumeration PEIM to EDK2 with several hooks that will allow platform configurability.

As for second part of your mail I see 2 points to unpack. First is the optimization that would eliminate the need for 2 resource assignment passes during boot(PEI + DXE). It should be possible to install HOB in enumeration PEIM that would later on be consumed by DXE PCI bus driver which would inform it that resources are assigned and it should only scan the hierarchy and install required protocols which would save us some boot time. The problem with this solution is that while it probably could work quite well on system on which both PEI and DXE are either 32-bit or 64-bit it is hard to achieve on what I think is right now the most popular configuration which is 32-bit PEI and 64-bit DXE. Due to limited system addresses in 32-bit PEI enumerator PEIM might be forced to make suboptimal decisions in resource assignment or might even decide to skip some devices altogether if it starts running out of memory.

Second point is about PCI device information that, if I understood you correctly, would contain PCI device information that we can get from PCI config of a device organized in a table and probably installed into a HOB. I can see some value in such table as it could be used by DXE code that runs before DXE enumeration however I don't really see why DXE code would prefer to use it over locating protocols. Note that protocols are required to be installed anyway to make DXE device drivers work so I don't really see how we could skip them.


Thanks,
Mateusz


Brian J. Johnson
 

Mateusz,

Unfortunately, the continued migration of DXE code into PEI (specifically, into PEI code running before RAM is available) is putting a huge crunch on BIOS size. Server processors have a limited amount of cache-as-RAM (CAR) space available, and extremely complex code for initializing memory and the system fabric. Space is at a premium today, and adding PCIe initialization to CAR will make the problem worse.

Servers also tend to have very large, complex PCIe hierarchies, most of which isn't needed for S3 or the other flows you mentioned. (Not that servers do much with S3 anyway....) Spending CAR space on it doesn't seem like a good tradeoff.

Can PCIe enumeration and the things that depend on it (S3, capsules, recovery, etc.) isolate their code to a compressed FD placed outside of the CAR region, and not run until RAM is available?

Or can you think of another way to reduce the amount of code and data in CAR?

Thanks,

Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Wednesday, November 10, 2021, 6:54 AM
To: Abner Chang <abner.chang@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Hi Abner,

The way I see it is that platform should be in control of starting the enumeration and which devices it wants enumerated(by enumerated I mean resources assigned and PPI installed). Right now our idea is to simply have our own PEIM implementation that will hold Intel specific logic inside of it and in the next step propose a common enumeration PEIM to EDK2 with several hooks that will allow platform configurability.

As for second part of your mail I see 2 points to unpack. First is the optimization that would eliminate the need for 2 resource assignment passes during boot(PEI + DXE). It should be possible to install HOB in enumeration PEIM that would later on be consumed by DXE PCI bus driver which would inform it that resources are assigned and it should only scan the hierarchy and install required protocols which would save us some boot time. The problem with this solution is that while it probably could work quite well on system on which both PEI and DXE are either 32-bit or 64-bit it is hard to achieve on what I think is right now the most popular configuration which is 32-bit PEI and 64-bit DXE. Due to limited system addresses in 32-bit PEI enumerator PEIM might be forced to make suboptimal decisions in resource assignment or might even decide to skip some devices altogether if it starts running out of memory.

Second point is about PCI device information that, if I understood you correctly, would contain PCI device information that we can get from PCI config of a device organized in a table and probably installed into a HOB. I can see some value in such table as it could be used by DXE code that runs before DXE enumeration however I don't really see why DXE code would prefer to use it over locating protocols. Note that protocols are required to be installed anyway to make DXE device drivers work so I don't really see how we could skip them.


Thanks,
Mateusz







--

Brian

--------------------------------------------------------------------

"As an adolescent I aspired to lasting fame, I craved factual
certainty, and I thirsted for a meaningful vision of human life - so
I became a scientist. This is like becoming an archbishop so you
can meet girls."
-- M. Cartmill, sociologist


Ni, Ray
 

The CAR size concern caused two additional design requirements:
1. separate FV being loaded after mem is ready
2. pre-scan in pre-mem and full-scan in post-mem

So, I am wondering if pre-mem scan is really needed?
if not, let's just say: one full-scan in post-mem

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Brian J. Johnson
Sent: Thursday, November 11, 2021 7:22 AM
To: rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>; Chang, Abner <abner.chang@hpe.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Mateusz,

Unfortunately, the continued migration of DXE code into PEI
(specifically, into PEI code running before RAM is available) is putting
a huge crunch on BIOS size. Server processors have a limited amount of
cache-as-RAM (CAR) space available, and extremely complex code for
initializing memory and the system fabric. Space is at a premium today,
and adding PCIe initialization to CAR will make the problem worse.

Servers also tend to have very large, complex PCIe hierarchies, most of
which isn't needed for S3 or the other flows you mentioned. (Not that
servers do much with S3 anyway....) Spending CAR space on it doesn't
seem like a good tradeoff.

Can PCIe enumeration and the things that depend on it (S3, capsules,
recovery, etc.) isolate their code to a compressed FD placed outside of
the CAR region, and not run until RAM is available?

Or can you think of another way to reduce the amount of code and data in
CAR?

Thanks,

Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Wednesday, November 10, 2021, 6:54 AM
To: Abner Chang <abner.chang@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration
protocols in PEI

Hi Abner,

The way I see it is that platform should be in control of starting the
enumeration and which devices it wants enumerated(by enumerated I mean
resources assigned and PPI installed). Right now our idea is to simply
have our own PEIM implementation that will hold Intel specific logic
inside of it and in the next step propose a common enumeration PEIM to
EDK2 with several hooks that will allow platform configurability.

As for second part of your mail I see 2 points to unpack. First is the
optimization that would eliminate the need for 2 resource assignment
passes during boot(PEI + DXE). It should be possible to install HOB in
enumeration PEIM that would later on be consumed by DXE PCI bus driver
which would inform it that resources are assigned and it should only
scan the hierarchy and install required protocols which would save us
some boot time. The problem with this solution is that while it probably
could work quite well on system on which both PEI and DXE are either
32-bit or 64-bit it is hard to achieve on what I think is right now the
most popular configuration which is 32-bit PEI and 64-bit DXE. Due to
limited system addresses in 32-bit PEI enumerator PEIM might be forced
to make suboptimal decisions in resource assignment or might even decide
to skip some devices altogether if it starts running out of memory.

Second point is about PCI device information that, if I understood you
correctly, would contain PCI device information that we can get from PCI
config of a device organized in a table and probably installed into a
HOB. I can see some value in such table as it could be used by DXE code
that runs before DXE enumeration however I don't really see why DXE code
would prefer to use it over locating protocols. Note that protocols are
required to be installed anyway to make DXE device drivers work so I
don't really see how we could skip them.


Thanks,
Mateusz







--

Brian

--------------------------------------------------------------------

"As an adolescent I aspired to lasting fame, I craved factual
certainty, and I thirsted for a meaningful vision of human life - so
I became a scientist. This is like becoming an archbishop so you
can meet girls."
-- M. Cartmill, sociologist





Abner Chang
 

-----Original Message-----
From: rfc@edk2.groups.io [mailto:rfc@edk2.groups.io] On Behalf Of Ni, Ray
Sent: Thursday, November 11, 2021 9:00 AM
To: rfc@edk2.groups.io; Johnson, Brian (EXL - Bloomington, MN)
<brian.johnson@hpe.com>; Albecki, Mateusz <mateusz.albecki@intel.com>;
Chang, Abner (HPS SW/FW Technologist) <abner.chang@hpe.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration
protocols in PEI

The CAR size concern caused two additional design requirements:
1. separate FV being loaded after mem is ready
2. pre-scan in pre-mem and full-scan in post-mem

So, I am wondering if pre-mem scan is really needed?
if not, let's just say: one full-scan in post-mem

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Brian J.
Johnson
Sent: Thursday, November 11, 2021 7:22 AM
To: rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>;
Chang, Abner <abner.chang@hpe.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration
protocols in PEI

Mateusz,

Unfortunately, the continued migration of DXE code into PEI
(specifically, into PEI code running before RAM is available) is putting
a huge crunch on BIOS size. Server processors have a limited amount of
cache-as-RAM (CAR) space available, and extremely complex code for
initializing memory and the system fabric. Space is at a premium today,
and adding PCIe initialization to CAR will make the problem worse.

Servers also tend to have very large, complex PCIe hierarchies, most of
which isn't needed for S3 or the other flows you mentioned. (Not that
servers do much with S3 anyway....) Spending CAR space on it doesn't
seem like a good tradeoff.
Agree that would be tough for server platform having PCI enum in CAR phase.


Can PCIe enumeration and the things that depend on it (S3, capsules,
recovery, etc.) isolate their code to a compressed FD placed outside of
the CAR region, and not run until RAM is available?

Or can you think of another way to reduce the amount of code and data in
CAR?

Thanks,

Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Wednesday, November 10, 2021, 6:54 AM
To: Abner Chang <abner.chang@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration
protocols in PEI

Hi Abner,

The way I see it is that platform should be in control of starting the
enumeration and which devices it wants enumerated(by enumerated I
mean
resources assigned and PPI installed). Right now our idea is to simply
have our own PEIM implementation that will hold Intel specific logic
inside of it and in the next step propose a common enumeration PEIM to
EDK2 with several hooks that will allow platform configurability.

As for second part of your mail I see 2 points to unpack. First is the
optimization that would eliminate the need for 2 resource assignment
passes during boot(PEI + DXE). It should be possible to install HOB in
enumeration PEIM that would later on be consumed by DXE PCI bus driver
which would inform it that resources are assigned and it should only
scan the hierarchy and install required protocols which would save us
some boot time. The problem with this solution is that while it probably
could work quite well on system on which both PEI and DXE are either
32-bit or 64-bit it is hard to achieve on what I think is right now the
most popular configuration which is 32-bit PEI and 64-bit DXE. Due to
limited system addresses in 32-bit PEI enumerator PEIM might be forced
to make suboptimal decisions in resource assignment or might even decide
to skip some devices altogether if it starts running out of memory.
I think this could be fixed if we can have the record in HOB for the 32-bit/64-bit mixed boot phases.
DXE PCI enumeration can reassign the resource for 64-bit boot phase if necessary.


Second point is about PCI device information that, if I understood you
correctly, would contain PCI device information that we can get from PCI
config of a device organized in a table and probably installed into a
HOB. I can see some value in such table as it could be used by DXE code
that runs before DXE enumeration however I don't really see why DXE
code
would prefer to use it over locating protocols. Note that protocols are
required to be installed anyway to make DXE device drivers work so I
don't really see how we could skip them.
I know this subject could be apart from your proposal. I add this suggestion to your proposal because I notice that we may need to carry some PCI information which discovered during PEI PCI enum to DXE phase in order to reduce the duplicate effort of PCI enum in DXE. We still can have that PCI device information even the PEI PCI enum is not enabled on platform. Some information is not able to be retrieved from the existing EFI protocol defined in UEFI spec, such as the PCI port number, slot number, embedded or slot, bifurcation, topology, device type and etc. We can either enhance the EFI device path to accommodate this information or install the additional protocol that points to the information on the PCI EFI handle. The use case could be in either DXE phase (with PEI PCI enum enabled) or in BDS phase.

Thanks
Abner



Thanks,
Mateusz







--

Brian

--------------------------------------------------------------------

"As an adolescent I aspired to lasting fame, I craved factual
certainty, and I thirsted for a meaningful vision of human life - so
I became a scientist. This is like becoming an archbishop so you
can meet girls."
-- M. Cartmill, sociologist








Albecki, Mateusz
 

Brian & Ray,

In general I think we can categorize platforms into 4 types for purpose of this discussion

1. Platform that doesn't care about PEI enumeration due to lack of use cases
2. Platform that needs devices enumerated in post mem(for instance desktops that want to support S3 opal unlock + capsule update)
3. Platform that needs devices enumerated in pre-mem and post-mem and can accommodate enumerator PEIM in pre-mem FV(for instance tablets or laptops that boot fw from block storage and need to support S3 opal unlock)
4. Platform that needs devices enumerated in pre-mem and post-mem and can't accommodate enumerator PEIM in pre-mem FV(same as 3 but smaller CAR)

For servers I think that majority of those platforms fall into 1 and 2. For 1 platform just doesn't include enumerator and device driver PEIMs. For 2 platform includes those PEIMs into post mem FVs. My initial description focused on 3 since that is the most complicated one from the enumerator PEIM perspective. 4 is something that each platform will have to decide on their own on how they want to support it. One possible solution for such platforms is to prepare a very simple PEIM that would only assign some hardcoded resources from a special address pool to the block IO device and skip over everything else. Later on platform would tell the enumerator PEIM to skip over such controller during full enumeration(this can be accomplished via platform hooks).

So to answer the question - "Can PCIe enumeration and the things that depend on it (S3, capsules, recovery, etc.) isolate their code to a compressed FD placed outside of the CAR region, and not run until RAM is available?"

Yes, on systems which do no require pre-mem enumeration this should be possible.


Abner,

Yes it should be possible to record in HOB that enumeration was done in 32-bit environment and should be redone in 64-bit DXE. I was simply rising this point to note that, right now at least, majority of the systems won't be improved by it unfortunately.

As for the additional info from PEI to DXE - I think we should maybe have a separate discussion to see what would be needed and why and how would PEI produce it. At its base enumerator PEIM won't have access to any special information about PCIe devices so I am not sure if we can demand that it produces it. Going over the list you have prepared I see that:

PCI port number, slot number - I think those are the same and can be retrieved from PCI Express Capabilities field Physical Slot Number
Embedded or Slot - Can be retrieved from Slot Implemented field in PCI Express Capabilities
Bifurcation - this should correspond to Maximum Link Width field in Link Capabilities
topology - I am not sure what you are referring to here. If this is just about PCIe topology(i.e. how many switches and other devices are there between root bus and endpoint) I think it is at least partially covered by device path
device type - can be retrieved from Device/Port Type field

So far it seems like all of the information can be retrieved from HW itself and device path protocol. Are you proposing to add an abstraction that would shield the user from the necessity to access HW to get this information?

Thanks,
Mateusz


Brian J. Johnson
 

Mateusz,

Thank you for your enumeration of platform types, that helps clarify the discussion.

Most servers I'm familiar with would fall in category 1. Large servers can't fit the entire PCIe config space below 4GB, there's simply not enough room. In some cases only the devices connected to the BSP socket are accessible from 32-bit code. That's about 1/32 of the system on our large machines.... So it's impossible to enumerate the entire system in 32-bit PEI, even if we wanted to.

EDK2 is used by a huge variety of platforms, as you pointed out. OEMs who develop BIOSes like to share as much code as possible across platforms, to reduce development and test effort. So large portions of EDK2 are often shared across multiple types of platforms, for instance your type 1 and type 3.

When an architectural feature is present for one type of platform, it can become difficult to disable it for other types of platforms. On a type 1 platform, disabling unneeded PEI device drivers and the infrastructure they need (such as the proposed enumeration PPI) isn't difficult, as long as they are self-contained. But if they generate HOBs, PPIs, PCDs, etc. which are required by other code, it may become impossible. That will break our platforms.

I'd prefer that PCIe enumeration not be done in PEI at all. If the code is there, someone working on a type 3 or 4 platform may add a dependency on it to common code, without realizing that they are breaking type 1 platforms. It may take months or even years before the change reaches us, at which point it could be very difficult to undo.

If it really has to be there for type 3 and 4 platforms, please make it clear that it's an optional feature, and shouldn't be assumed to be present by code which doesn't use it directly. I know that's not a very well-defined request, and I don't know how best to implement it.... Build options? Feature PCDs? Keep it in Edk2Platforms?

At least, please think of PEI PCIe enumeration as a self-contained thing, used only by PEI drivers. I know the natural next step is to remove it from DXE since it's being done in PEI. But DXE needs full, standalone PCIe enumeration for large servers. We can't do it in 32-bit code, let alone in CAR. Do we need different PCI host bridge implementations for PEI vs. DXE enumeration?

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Friday, November 12, 2021, 10:03 AM
To: Abner Chang <abner.chang@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Brian & Ray,

In general I think we can categorize platforms into 4 types for purpose of this discussion

1. Platform that doesn't care about PEI enumeration due to lack of use cases
2. Platform that needs devices enumerated in post mem(for instance desktops that want to support S3 opal unlock + capsule update)
3. Platform that needs devices enumerated in pre-mem and post-mem and can accommodate enumerator PEIM in pre-mem FV(for instance tablets or laptops that boot fw from block storage and need to support S3 opal unlock)
4. Platform that needs devices enumerated in pre-mem and post-mem and can't accommodate enumerator PEIM in pre-mem FV(same as 3 but smaller CAR)

For servers I think that majority of those platforms fall into 1 and 2. For 1 platform just doesn't include enumerator and device driver PEIMs. For 2 platform includes those PEIMs into post mem FVs. My initial description focused on 3 since that is the most complicated one from the enumerator PEIM perspective. 4 is something that each platform will have to decide on their own on how they want to support it. One possible solution for such platforms is to prepare a very simple PEIM that would only assign some hardcoded resources from a special address pool to the block IO device and skip over everything else. Later on platform would tell the enumerator PEIM to skip over such controller during full enumeration(this can be accomplished via platform hooks).

So to answer the question - "Can PCIe enumeration and the things that depend on it (S3, capsules, recovery, etc.) isolate their code to a compressed FD placed outside of the CAR region, and not run until RAM is available?"

Yes, on systems which do no require pre-mem enumeration this should be possible.


Abner,

Yes it should be possible to record in HOB that enumeration was done in 32-bit environment and should be redone in 64-bit DXE. I was simply rising this point to note that, right now at least, majority of the systems won't be improved by it unfortunately.

As for the additional info from PEI to DXE - I think we should maybe have a separate discussion to see what would be needed and why and how would PEI produce it. At its base enumerator PEIM won't have access to any special information about PCIe devices so I am not sure if we can demand that it produces it. Going over the list you have prepared I see that:

PCI port number, slot number - I think those are the same and can be retrieved from PCI Express Capabilities field Physical Slot Number
Embedded or Slot - Can be retrieved from Slot Implemented field in PCI Express Capabilities
Bifurcation - this should correspond to Maximum Link Width field in Link Capabilities
topology - I am not sure what you are referring to here. If this is just about PCIe topology(i.e. how many switches and other devices are there between root bus and endpoint) I think it is at least partially covered by device path
device type - can be retrieved from Device/Port Type field

So far it seems like all of the information can be retrieved from HW itself and device path protocol. Are you proposing to add an abstraction that would shield the user from the necessity to access HW to get this information?

Thanks,
Mateusz







--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise

brian.johnson@hpe.com
hpe.com


Albecki, Mateusz
 

Brian,

I fully agree that the PEI enumeration should be a self-contained(as in there shouldn't be a hard dependency between stuff produced by this stack, soft dependency should be allowed I think), optional feature. I am also not sure what is the best way to achieve it but I am leaning towards using feature PCDs. Given that device drivers PEIMs are available in EDK2 repo I think that both new PCI_DEVICE_PPI and the implementation of the enumerator PEIM should be in EDK2 repo as well. BTW did I understand correctly that you wouldn't like to see the optimization for platform types 2,3 and 4 which would skip the resources assignment in DXE when possible? While I also wouldn't like to introduce such dependency I am not sure I can guarantee that either myself or somebody else won't eventually propose such an optimization as the performance gain might be substantial on desktop systems(especially if it is running on 64-bit PEI). At the very least I understand that such dependency should be handled in a way that wouldn't break the DXE driver if enumerator PEIM is not present.

BTW, I agree that we shouldn't be doing enumeration in PEI. This proposition is just a band-aid that needs to be applied to fix several architectural mistakes that we have done in the past(S3 resume definition mostly...). From my point of view ideal solution for a large chunk of our pain points is to simply remove PEI and re-architect DXE. Having uniform execution environment across all Sx flows would help a lot in my opinion. Sadly such change is probably not possible in short term and even in long term I wouldn't know how to go about this.

Thanks,
Mateusz


Brian J. Johnson
 

Mateusz,

Your changes provide a nice cleanup to the driver mess in PEI. My main concern is that someone will later add a dependency which *requires* PCIe enumeration in PEI, without realizing that this is impossible on some platforms. The only ways to prevent that which I can think of are to document it clearly, in the code, and to choose not to introduce such dependencies now.

That's why I was suggesting not trying to optimize DXE resource assignment at this time. That may not be a reasonable request... code size and execution speed tend to trump other concerns in firmware. I'd rather see us put our effort into removing things from PEI than adding things to it. But I'm not the only consumer of TianoCore....

I'm relieved you think that there have been some architectural mistakes made, particularly around S3 resume. I wanted to say so, but I'm glad you did so first. :) Putting device drivers in PEI at all seems to stray from the original design. But I don't know how to fix it either. And I don't even work on platforms which use S3, so I know nothing about the requirements which led to the current design.

Maybe the stewards will have a suggestion. But as you said, this seems like it could easily become a very, very large effort.

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Tuesday, November 16, 2021, 11:35 AM
To: Brian J. Johnson <brian.johnson@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Brian,

I fully agree that the PEI enumeration should be a self-contained(as in there shouldn't be a hard dependency between stuff produced by this stack, soft dependency should be allowed I think), optional feature. I am also not sure what is the best way to achieve it but I am leaning towards using feature PCDs. Given that device drivers PEIMs are available in EDK2 repo I think that both new PCI_DEVICE_PPI and the implementation of the enumerator PEIM should be in EDK2 repo as well. BTW did I understand correctly that you wouldn't like to see the optimization for platform types 2,3 and 4 which would skip the resources assignment in DXE when possible? While I also wouldn't like to introduce such dependency I am not sure I can guarantee that either myself or somebody else won't eventually propose such an optimization as the performance gain might be substantial on desktop systems(especially if it is running on 64-bit PEI). At the very least I understand that such dependency should be handled in a way that wouldn't break the DXE driver if enumerator PEIM is not present.

BTW, I agree that we shouldn't be doing enumeration in PEI. This proposition is just a band-aid that needs to be applied to fix several architectural mistakes that we have done in the past(S3 resume definition mostly...). From my point of view ideal solution for a large chunk of our pain points is to simply remove PEI and re-architect DXE. Having uniform execution environment across all Sx flows would help a lot in my opinion. Sadly such change is probably not possible in short term and even in long term I wouldn't know how to go about this.

Thanks,
Mateusz







--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise

brian.johnson@hpe.com
hpe.com


Nate DeSimone
 

Hi All,

I would like to add a few points here.

First of all, allocating fixed BAR resources from a special address pool is not really scalable for any devices outside of Bus 0. Any device that is behind a PCI/PCI bridge needs to have its MMIO BARs fall within a single continuous region of physical address space that contains all MMIO BARs for all devices attached downstream from that bridge. If there is a chain of PCI/PCI bridges between the device and the root complex, then any PCI/PCI bridges that are not at the root need to have their MMIO base/limits programmed to a subset of the parent bridge's MMIO base/limit. Because of that, it may not be possible to set the MMIO base/limit for the PCI/PCI bridge to cover the MMIO needed by both the device with the special fixed BAR address and all of the other regular devices which do not have restrictions on where their BARs end up. For that reason, assigning a pre-determined fixed BAR to a PCI device really only makes sense for devices that happen to be on Bus 0. In a modern PCIe system the only devices that could ever be on Bus 0 are devices that are embedded into the chipset.

For this reason, any partial enumeration like what Mateusz is suggesting for special cases like platforms that store their UEFI firmware on block I/O devices would end up allocating BAR assignments that would need to be considered temporary. Certain block I/O devices like NVMe will always be devices behind a PCI/PCI bridge, so this issue is highly relevant to pre-memory PCI enumeration. Only after a full enumeration of the entire PCI topology is done would you be able to assume BAR assignments won't need to be reallocated. In general, a partial PCI enumeration has the potential to become very wasteful of MMIO address space if that address space is not reclaimed and reused during the full PCI enumeration. Additionally, we should aim to only do a single partial enumeration per boot. If more than one partial enumeration is performed (each time for separate devices), then it is possible for them to come up with conflicting BAR assignments.

Ray and I had a discussion on supporting BARs allocated from special MMIO ranges reserved by platform code on Bugzilla last year (https://bugzilla.tianocore.org/show_bug.cgi?id=2958) and we collectively came to the conclusion that a feature like this would be more trouble than it is worth. When you consider the continuing increase in SoC complexity, it is unlikely that we will be able to fit 100% of embedded devices in Bus 0 for very much longer. On server systems, we crossed that point back in Sandy Bridge 10 years ago. In the end, a feature like that will continue to become less useful over time and it is already close to the point of irrelevancy. Besides, a lot of the use cases for hardcoded BAR values would go away if we did a full PCI enumeration in PEI anyway. For example, SMM using a different BAR value for the SPI controller or the HECI controller than what was assigned by PCI enumeration would no longer be a problem if enumeration is done early.

Second, I would like to point out that the situation on server systems is a little more complicated than "the platform doesn't care about PCI enumeration during PEI". On a multi-socket server system, the UPI controllers need to know which PCI segment+bus numbers to route to which socket. Moreover, the MMIO ranges containing any BARs for PCIe devices that are connected to PCIe root ports that are off-socket need to also be routed through the UPI controller. So while you are right that it is unlikely that we will need to access PCIe devices outside the SoC during PEI on server systems... being able to count up those devices and enumerate the system topology is very useful as we would then be able to efficiently program which MMIO and PCI bus ranges to route to each UPI controller. Therefore I would like to propose that this PEI PCI enumeration feature have THREE modes:

MODE 1: Find all PCI devices, count the number of PCI bus numbers that would be needed make all PCI devices visible but do not assign PCI bus numbers. Count the amount of MMIO space each device needs but DO NOT assign actual BAR values. Detect and report any VGA devices found in the PCI topology.
MODE 2: Find all PCI devices, assign all PCI bus numbers (if > 256 PCI bus numbers is needed to expose the full system topology, assign PCI segment numbers as well), count the amount of MMIO space each device needs but only assign actual BAR values to a SUBSET of PCI devices. Detect and report any VGA devices found in the PCI topology.
MODE 3: Perform a full PCI enumeration, assign all PCI bus numbers (if > 256 PCI bus numbers is needed to expose the full system topology, assign PCI segment numbers as well), assign all BAR values. Detect and report any VGA devices found in the PCI topology.

MODE 1 is somewhat useful on client systems as well. MODE 1 could find all the VGA devices in the system which helps us decide which device to route the legacy VGA framebuffer BAR to (hardcoded to address 0xA0000 for legacy reasons.) In addition, if user wishes to boot a 32-bit OS, knowing the full amount of MMIO space required for PCI devices enables us to configure the system memory map such that the MMIO region below 4GB is large enough to contain all BARs, while still providing as much address space for RAM as possible. A feature like this is not as useful as it was in the past, since very few systems support 32-bit OSes anymore, and having <=3GB of MMIO BARs is less common now. But even on 64-bit systems there is a niche use for this feature as some older PCI devices have BARs that only support 32-bit addressing.

MODE 2 is useful for the block I/O use cases Mateusz mentions. MODE 3 makes a lot of sense for any systems that use 64-bit PEI, which I suspect we will start to see more of in the future. I think we should define a new PPI that allows platform code to request PCI enumeration. Whether to use MODE1/2/3 should be controllable via this PPI. In addition, for MODE 2 & 3 we would need the equivalent of the PCI host bridge driver to allow platform code to have a say in how PCI bus numbers and BARs are allocated.

Sorry for the long email but hopefully it helps!

Thanks,
Nate

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Brian J. Johnson
Sent: Tuesday, November 16, 2021 1:48 PM
To: rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Mateusz,

Your changes provide a nice cleanup to the driver mess in PEI. My main concern is that someone will later add a dependency which *requires* PCIe enumeration in PEI, without realizing that this is impossible on some platforms. The only ways to prevent that which I can think of are to document it clearly, in the code, and to choose not to introduce such dependencies now.

That's why I was suggesting not trying to optimize DXE resource assignment at this time. That may not be a reasonable request... code size and execution speed tend to trump other concerns in firmware.
I'd rather see us put our effort into removing things from PEI than adding things to it. But I'm not the only consumer of TianoCore....

I'm relieved you think that there have been some architectural mistakes made, particularly around S3 resume. I wanted to say so, but I'm glad you did so first. :) Putting device drivers in PEI at all seems to stray from the original design. But I don't know how to fix it either.
And I don't even work on platforms which use S3, so I know nothing about the requirements which led to the current design.

Maybe the stewards will have a suggestion. But as you said, this seems like it could easily become a very, very large effort.

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Tuesday, November 16, 2021, 11:35 AM
To: Brian J. Johnson <brian.johnson@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Brian,

I fully agree that the PEI enumeration should be a self-contained(as in there shouldn't be a hard dependency between stuff produced by this stack, soft dependency should be allowed I think), optional feature. I am also not sure what is the best way to achieve it but I am leaning towards using feature PCDs. Given that device drivers PEIMs are available in EDK2 repo I think that both new PCI_DEVICE_PPI and the implementation of the enumerator PEIM should be in EDK2 repo as well.
BTW did I understand correctly that you wouldn't like to see the optimization for platform types 2,3 and 4 which would skip the resources assignment in DXE when possible? While I also wouldn't like to introduce such dependency I am not sure I can guarantee that either myself or somebody else won't eventually propose such an optimization as the performance gain might be substantial on desktop systems(especially if it is running on 64-bit PEI). At the very least I understand that such dependency should be handled in a way that wouldn't break the DXE driver if enumerator PEIM is not present.

BTW, I agree that we shouldn't be doing enumeration in PEI. This proposition is just a band-aid that needs to be applied to fix several architectural mistakes that we have done in the past(S3 resume definition mostly...). From my point of view ideal solution for a large chunk of our pain points is to simply remove PEI and re-architect DXE.
Having uniform execution environment across all Sx flows would help a lot in my opinion. Sadly such change is probably not possible in short term and even in long term I wouldn't know how to go about this.

Thanks,
Mateusz







--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise

brian.johnson@hpe.com
hpe.com


Brian J. Johnson
 

Nate,

I agree that MODE 3 mainly makes sense for systems with 64-bit PEI, since only a limited number of PCI segments and MMIO space can fit below the 4GB boundary. On most of our large systems, only the legacy socket and its devices fit there.

Plus, not all PCI root ports are visible in PEI on our systems, especially before memory initialization. Some can only be seen after all of the system interconnect is initialized, which requires memory and 64-bit addressing.

So whatever is done, we should recognize that some systems will need to (re)do PCI enumeration and BAR assignment in DXE, and will treat all early PCIe init as temporary.

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Desimone, Nathaniel L [mailto:nathaniel.l.desimone@intel.com]
Sent: Tuesday, November 30, 2021, 4:53 PM
To: rfc@edk2.groups.io <rfc@edk2.groups.io>, Johnson, Brian <brian.johnson@hpe.com>, Albecki, Mateusz <mateusz.albecki@intel.com>
Cc: Ni, Ray <ray.ni@intel.com>
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Hi All,

I would like to add a few points here.

First of all, allocating fixed BAR resources from a special address pool is not really scalable for any devices outside of Bus 0. Any device that is behind a PCI/PCI bridge needs to have its MMIO BARs fall within a single continuous region of physical address space that contains all MMIO BARs for all devices attached downstream from that bridge. If there is a chain of PCI/PCI bridges between the device and the root complex, then any PCI/PCI bridges that are not at the root need to have their MMIO base/limits programmed to a subset of the parent bridge's MMIO base/limit. Because of that, it may not be possible to set the MMIO base/limit for the PCI/PCI bridge to cover the MMIO needed by both the device with the special fixed BAR address and all of the other regular devices which do not have restrictions on where their BARs end up. For that reason, assigning a pre-determined fixed BAR to a PCI device really only makes sense for devices that happen to be on Bus 0. In a modern PCIe system the only devices that could ever be on Bus 0 are devices that are embedded into the chipset.

For this reason, any partial enumeration like what Mateusz is suggesting for special cases like platforms that store their UEFI firmware on block I/O devices would end up allocating BAR assignments that would need to be considered temporary. Certain block I/O devices like NVMe will always be devices behind a PCI/PCI bridge, so this issue is highly relevant to pre-memory PCI enumeration. Only after a full enumeration of the entire PCI topology is done would you be able to assume BAR assignments won't need to be reallocated. In general, a partial PCI enumeration has the potential to become very wasteful of MMIO address space if that address space is not reclaimed and reused during the full PCI enumeration. Additionally, we should aim to only do a single partial enumeration per boot. If more than one partial enumeration is performed (each time for separate devices), then it is possible for them to come up with conflicting BAR assignments.

Ray and I had a discussion on supporting BARs allocated from special MMIO ranges reserved by platform code on Bugzilla last year (https://bugzilla.tianocore.org/show_bug.cgi?id=2958 ) and we collectively came to the conclusion that a feature like this would be more trouble than it is worth. When you consider the continuing increase in SoC complexity, it is unlikely that we will be able to fit 100% of embedded devices in Bus 0 for very much longer. On server systems, we crossed that point back in Sandy Bridge 10 years ago. In the end, a feature like that will continue to become less useful over time and it is already close to the point of irrelevancy. Besides, a lot of the use cases for hardcoded BAR values would go away if we did a full PCI enumeration in PEI anyway. For example, SMM using a different BAR value for the SPI controller or the HECI controller than what was assigned by PCI enumeration would no longer be a problem if enumeration is done early.

Second, I would like to point out that the situation on server systems is a little more complicated than "the platform doesn't care about PCI enumeration during PEI". On a multi-socket server system, the UPI controllers need to know which PCI segment+bus numbers to route to which socket. Moreover, the MMIO ranges containing any BARs for PCIe devices that are connected to PCIe root ports that are off-socket need to also be routed through the UPI controller. So while you are right that it is unlikely that we will need to access PCIe devices outside the SoC during PEI on server systems... being able to count up those devices and enumerate the system topology is very useful as we would then be able to efficiently program which MMIO and PCI bus ranges to route to each UPI controller. Therefore I would like to propose that this PEI PCI enumeration feature have THREE modes:

MODE 1: Find all PCI devices, count the number of PCI bus numbers that would be needed make all PCI devices visible but do not assign PCI bus numbers. Count the amount of MMIO space each device needs but DO NOT assign actual BAR values. Detect and report any VGA devices found in the PCI topology.
MODE 2: Find all PCI devices, assign all PCI bus numbers (if > 256 PCI bus numbers is needed to expose the full system topology, assign PCI segment numbers as well), count the amount of MMIO space each device needs but only assign actual BAR values to a SUBSET of PCI devices. Detect and report any VGA devices found in the PCI topology.
MODE 3: Perform a full PCI enumeration, assign all PCI bus numbers (if > 256 PCI bus numbers is needed to expose the full system topology, assign PCI segment numbers as well), assign all BAR values. Detect and report any VGA devices found in the PCI topology.

MODE 1 is somewhat useful on client systems as well. MODE 1 could find all the VGA devices in the system which helps us decide which device to route the legacy VGA framebuffer BAR to (hardcoded to address 0xA0000 for legacy reasons.) In addition, if user wishes to boot a 32-bit OS, knowing the full amount of MMIO space required for PCI devices enables us to configure the system memory map such that the MMIO region below 4GB is large enough to contain all BARs, while still providing as much address space for RAM as possible. A feature like this is not as useful as it was in the past, since very few systems support 32-bit OSes anymore, and having <=3GB of MMIO BARs is less common now. But even on 64-bit systems there is a niche use for this feature as some older PCI devices have BARs that only support 32-bit addressing.

MODE 2 is useful for the block I/O use cases Mateusz mentions. MODE 3 makes a lot of sense for any systems that use 64-bit PEI, which I suspect we will start to see more of in the future. I think we should define a new PPI that allows platform code to request PCI enumeration. Whether to use MODE1/2/3 should be controllable via this PPI. In addition, for MODE 2 & 3 we would need the equivalent of the PCI host bridge driver to allow platform code to have a say in how PCI bus numbers and BARs are allocated.

Sorry for the long email but hopefully it helps!

Thanks,
Nate

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Brian J. Johnson
Sent: Tuesday, November 16, 2021 1:48 PM
To: rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Mateusz,

Your changes provide a nice cleanup to the driver mess in PEI. My main concern is that someone will later add a dependency which *requires* PCIe enumeration in PEI, without realizing that this is impossible on some platforms. The only ways to prevent that which I can think of are to document it clearly, in the code, and to choose not to introduce such dependencies now.

That's why I was suggesting not trying to optimize DXE resource assignment at this time. That may not be a reasonable request... code size and execution speed tend to trump other concerns in firmware.
I'd rather see us put our effort into removing things from PEI than adding things to it. But I'm not the only consumer of TianoCore....

I'm relieved you think that there have been some architectural mistakes made, particularly around S3 resume. I wanted to say so, but I'm glad you did so first. :) Putting device drivers in PEI at all seems to stray from the original design. But I don't know how to fix it either.
And I don't even work on platforms which use S3, so I know nothing about the requirements which led to the current design.

Maybe the stewards will have a suggestion. But as you said, this seems like it could easily become a very, very large effort.

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Tuesday, November 16, 2021, 11:35 AM
To: Brian J. Johnson <brian.johnson@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Brian,

I fully agree that the PEI enumeration should be a self-contained(as in there shouldn't be a hard dependency between stuff produced by this stack, soft dependency should be allowed I think), optional feature. I am also not sure what is the best way to achieve it but I am leaning towards using feature PCDs. Given that device drivers PEIMs are available in EDK2 repo I think that both new PCI_DEVICE_PPI and the implementation of the enumerator PEIM should be in EDK2 repo as well.
BTW did I understand correctly that you wouldn't like to see the optimization for platform types 2,3 and 4 which would skip the resources assignment in DXE when possible? While I also wouldn't like to introduce such dependency I am not sure I can guarantee that either myself or somebody else won't eventually propose such an optimization as the performance gain might be substantial on desktop systems(especially if it is running on 64-bit PEI). At the very least I understand that such dependency should be handled in a way that wouldn't break the DXE driver if enumerator PEIM is not present.

BTW, I agree that we shouldn't be doing enumeration in PEI. This proposition is just a band-aid that needs to be applied to fix several architectural mistakes that we have done in the past(S3 resume definition mostly...). From my point of view ideal solution for a large chunk of our pain points is to simply remove PEI and re-architect DXE.
Having uniform execution environment across all Sx flows would help a lot in my opinion. Sadly such change is probably not possible in short term and even in long term I wouldn't know how to go about this.

Thanks,
Mateusz







--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise

brian.johnson@hpe.com
hpe.com








--

Brian

--------------------------------------------------------------------

Linux is sort of like the Three Stooges. There are people who "get
it" and think it's the greatest thing ever, and then there are
people who don't get it
-- Andy Ihnatko


Albecki, Mateusz
 

Hi Nate,

I see in your mail that you are referring to segment assignment and that is something that is firmly in the scope of the platform/silicon code. In general my assumption was that when the PEI enumeration runs root bridges are already configured which should take care of segment, bus range, mmio range and io range assignment. PEI enumerator would get the information about root bridges on the system via the PciHostBridgeLib the same way DXE does.

With that assumption in mind MODE3 and MODE2 are fairly straight forward to support since MODE3 is the same mode as the one we have in DXE while MODE2 is a reduced MODE3 which we can support by giving a platform control over which devices are needed via PPI. MODE1 is what is tricky as root bridges need to be in this half-configured state in which segments and bus numbers ranges are configured while MMIO and IO is not(or at least is not in its final state). This weird root bridge state aside the functionality itself would be useful even for client systems as you have mentioned so maybe we just need to accept that we need a code which works under such conditions. I think following algorithm for MODE1 would work:
1. Platform code assigns a wide range of bus resources to the root bridge to make sure enumerator will be able to access all PCI devices under that bridge. No memory or IO resources assigned.
2. Platform requests "enumeration"(or maybe resource discovery would be a better name?) from the enumerator PEIM via PPI or maybe via event(which I guess is also a PPI).
3. Enumerator PEIM scans the root bridge and reports the actual number of buses, mmio and IO needed by the root bridge to fully enumerate the bridge. We probably need a separate reporting for video bars as you mentioned.
4. Repeat for each root bridge once every root bridge is scanned balance the resources in the platform code.
5. Finally start the enumeration at whatever time suites the platform.

In the larger picture MODE1 resource scan would need to be executed before RAM is initialized if we want to use it for optimizing MEM32 usage as such:
Client systems that only need opal unlock:
Resource discovery(MODE1) -> Resource balancing -> RAM configuration -> Enumeration(resource assignment and PPI installation)
Client systems with BIOS image on block io:
Resource discovery -> Resource balancing -> Resource assignment(full) -> Install PPI for bootable device -> RAM configuration -> Install PPI for remaining required devices
Server systems:
Resource discovery -> Resource balancing -> RAM configuration -> Optional resource assignment and enumeration

Regarding the partial enumeration - partial resource assignment is probably not needed if we can do resource discovery and assignment on systems which boot from block io before we assign any resources to such device(which sounds strange but you can use one of the other agents running on your platform as a proxy). If we can't the only solution for such systems is special pools I am afraid. I think we only need to support several passes of the enumerator to install new PPIs as platform use cases demand. For instance for client system with boot from block IO we need to have PPI for bootable block io in pre-mem but we can wait with installation of PPIs for remaining storage devices until post-mem. In this model we do single resource assignment and during later passes simply install the PPIs without modifying resource map.

Thanks,
Mateusz


Nate DeSimone
 

Hi Brian,

For server systems MODE3 would be useful in Post-Memory PEI. Assuming MODE3 is executed in Post-Memory PEI it would be possible to replace the DXE PCI enumeration with a simple driver that reads a HOB from PEI and creates the appropriate PCI_IO_PROTOCOL et. al.

Thanks,
Nate

-----Original Message-----
From: Brian J. Johnson <brian.johnson@hpe.com>
Sent: Wednesday, December 1, 2021 4:07 PM
To: Desimone, Nathaniel L <nathaniel.l.desimone@intel.com>; rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>
Cc: Ni, Ray <ray.ni@intel.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Nate,

I agree that MODE 3 mainly makes sense for systems with 64-bit PEI, since only a limited number of PCI segments and MMIO space can fit below the 4GB boundary. On most of our large systems, only the legacy socket and its devices fit there.

Plus, not all PCI root ports are visible in PEI on our systems, especially before memory initialization. Some can only be seen after all of the system interconnect is initialized, which requires memory and 64-bit addressing.

So whatever is done, we should recognize that some systems will need to (re)do PCI enumeration and BAR assignment in DXE, and will treat all early PCIe init as temporary.

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Desimone, Nathaniel L [mailto:nathaniel.l.desimone@intel.com]
Sent: Tuesday, November 30, 2021, 4:53 PM
To: rfc@edk2.groups.io <rfc@edk2.groups.io>, Johnson, Brian <brian.johnson@hpe.com>, Albecki, Mateusz <mateusz.albecki@intel.com>
Cc: Ni, Ray <ray.ni@intel.com>
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Hi All,

I would like to add a few points here.

First of all, allocating fixed BAR resources from a special address pool is not really scalable for any devices outside of Bus 0. Any device that is behind a PCI/PCI bridge needs to have its MMIO BARs fall within a single continuous region of physical address space that contains all MMIO BARs for all devices attached downstream from that bridge. If there is a chain of PCI/PCI bridges between the device and the root complex, then any PCI/PCI bridges that are not at the root need to have their MMIO base/limits programmed to a subset of the parent bridge's MMIO base/limit. Because of that, it may not be possible to set the MMIO base/limit for the PCI/PCI bridge to cover the MMIO needed by both the device with the special fixed BAR address and all of the other regular devices which do not have restrictions on where their BARs end up. For that reason, assigning a pre-determined fixed BAR to a PCI device really only makes sense for devices that happen to be on Bus 0. In a modern PCIe system the only devices that could ever be on Bus 0 are devices that are embedded into the chipset.

For this reason, any partial enumeration like what Mateusz is suggesting for special cases like platforms that store their UEFI firmware on block I/O devices would end up allocating BAR assignments that would need to be considered temporary. Certain block I/O devices like NVMe will always be devices behind a PCI/PCI bridge, so this issue is highly relevant to pre-memory PCI enumeration. Only after a full enumeration of the entire PCI topology is done would you be able to assume BAR assignments won't need to be reallocated. In general, a partial PCI enumeration has the potential to become very wasteful of MMIO address space if that address space is not reclaimed and reused during the full PCI enumeration.
Additionally, we should aim to only do a single partial enumeration per boot. If more than one partial enumeration is performed (each time for separate devices), then it is possible for them to come up with conflicting BAR assignments.

Ray and I had a discussion on supporting BARs allocated from special MMIO ranges reserved by platform code on Bugzilla last year
(https://bugzilla.tianocore.org/show_bug.cgi?id=2958
) and we collectively came to the conclusion that a feature like this would be more trouble than it is worth. When you consider the continuing increase in SoC complexity, it is unlikely that we will be able to fit 100% of embedded devices in Bus 0 for very much longer. On server systems, we crossed that point back in Sandy Bridge 10 years ago. In the end, a feature like that will continue to become less useful over time and it is already close to the point of irrelevancy. Besides, a lot of the use cases for hardcoded BAR values would go away if we did a full PCI enumeration in PEI anyway. For example, SMM using a different BAR value for the SPI controller or the HECI controller than what was assigned by PCI enumeration would no longer be a problem if enumeration is done early.

Second, I would like to point out that the situation on server systems is a little more complicated than "the platform doesn't care about PCI enumeration during PEI". On a multi-socket server system, the UPI controllers need to know which PCI segment+bus numbers to route to which socket. Moreover, the MMIO ranges containing any BARs for PCIe devices that are connected to PCIe root ports that are off-socket need to also be routed through the UPI controller. So while you are right that it is unlikely that we will need to access PCIe devices outside the SoC during PEI on server systems... being able to count up those devices and enumerate the system topology is very useful as we would then be able to efficiently program which MMIO and PCI bus ranges to route to each UPI controller. Therefore I would like to propose that this PEI PCI enumeration feature have THREE modes:

MODE 1: Find all PCI devices, count the number of PCI bus numbers that would be needed make all PCI devices visible but do not assign PCI bus numbers. Count the amount of MMIO space each device needs but DO NOT assign actual BAR values. Detect and report any VGA devices found in the PCI topology.
MODE 2: Find all PCI devices, assign all PCI bus numbers (if > 256 PCI bus numbers is needed to expose the full system topology, assign PCI segment numbers as well), count the amount of MMIO space each device needs but only assign actual BAR values to a SUBSET of PCI devices.
Detect and report any VGA devices found in the PCI topology.
MODE 3: Perform a full PCI enumeration, assign all PCI bus numbers (if >
256 PCI bus numbers is needed to expose the full system topology, assign PCI segment numbers as well), assign all BAR values. Detect and report any VGA devices found in the PCI topology.

MODE 1 is somewhat useful on client systems as well. MODE 1 could find all the VGA devices in the system which helps us decide which device to route the legacy VGA framebuffer BAR to (hardcoded to address 0xA0000 for legacy reasons.) In addition, if user wishes to boot a 32-bit OS, knowing the full amount of MMIO space required for PCI devices enables us to configure the system memory map such that the MMIO region below 4GB is large enough to contain all BARs, while still providing as much address space for RAM as possible. A feature like this is not as useful as it was in the past, since very few systems support 32-bit OSes anymore, and having <=3GB of MMIO BARs is less common now. But even on 64-bit systems there is a niche use for this feature as some older PCI devices have BARs that only support 32-bit addressing.

MODE 2 is useful for the block I/O use cases Mateusz mentions. MODE 3 makes a lot of sense for any systems that use 64-bit PEI, which I suspect we will start to see more of in the future. I think we should define a new PPI that allows platform code to request PCI enumeration.
Whether to use MODE1/2/3 should be controllable via this PPI. In addition, for MODE 2 & 3 we would need the equivalent of the PCI host bridge driver to allow platform code to have a say in how PCI bus numbers and BARs are allocated.

Sorry for the long email but hopefully it helps!

Thanks,
Nate

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Brian J. Johnson
Sent: Tuesday, November 16, 2021 1:48 PM
To: rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Mateusz,

Your changes provide a nice cleanup to the driver mess in PEI. My main concern is that someone will later add a dependency which *requires* PCIe enumeration in PEI, without realizing that this is impossible on some platforms. The only ways to prevent that which I can think of are to document it clearly, in the code, and to choose not to introduce such dependencies now.

That's why I was suggesting not trying to optimize DXE resource assignment at this time. That may not be a reasonable request... code size and execution speed tend to trump other concerns in firmware.
I'd rather see us put our effort into removing things from PEI than adding things to it. But I'm not the only consumer of TianoCore....

I'm relieved you think that there have been some architectural mistakes made, particularly around S3 resume. I wanted to say so, but I'm glad you did so first. :) Putting device drivers in PEI at all seems to stray from the original design. But I don't know how to fix it either.
And I don't even work on platforms which use S3, so I know nothing about the requirements which led to the current design.

Maybe the stewards will have a suggestion. But as you said, this seems like it could easily become a very, very large effort.

Thanks,
Brian J. Johnson
HPE

-------- Original Message --------
From: Albecki, Mateusz [mailto:mateusz.albecki@intel.com]
Sent: Tuesday, November 16, 2021, 11:35 AM
To: Brian J. Johnson <brian.johnson@hpe.com>, rfc@edk2.groups.io
Subject: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Brian,

I fully agree that the PEI enumeration should be a self-contained(as in there shouldn't be a hard dependency between stuff produced by this stack, soft dependency should be allowed I think), optional feature. I am also not sure what is the best way to achieve it but I am leaning towards using feature PCDs. Given that device drivers PEIMs are available in EDK2 repo I think that both new PCI_DEVICE_PPI and the implementation of the enumerator PEIM should be in EDK2 repo as well.
BTW did I understand correctly that you wouldn't like to see the optimization for platform types 2,3 and 4 which would skip the resources assignment in DXE when possible? While I also wouldn't like to introduce such dependency I am not sure I can guarantee that either myself or somebody else won't eventually propose such an optimization as the performance gain might be substantial on desktop systems(especially if it is running on 64-bit PEI). At the very least I understand that such dependency should be handled in a way that wouldn't break the DXE driver if enumerator PEIM is not present.

BTW, I agree that we shouldn't be doing enumeration in PEI. This proposition is just a band-aid that needs to be applied to fix several architectural mistakes that we have done in the past(S3 resume definition mostly...). From my point of view ideal solution for a large chunk of our pain points is to simply remove PEI and re-architect DXE.
Having uniform execution environment across all Sx flows would help a lot in my opinion. Sadly such change is probably not possible in short term and even in long term I wouldn't know how to go about this.

Thanks,
Mateusz







--
Brian J. Johnson
Enterprise X86 Lab

Hewlett Packard Enterprise

brian.johnson@hpe.com
hpe.com








--

Brian

--------------------------------------------------------------------

Linux is sort of like the Three Stooges. There are people who "get
it" and think it's the greatest thing ever, and then there are
people who don't get it
-- Andy Ihnatko


Nate DeSimone
 

Hi Mateusz,

I agree that deciding how many PCI segments worth of MMIO to allocate for PCI configuration space and what the BAR(s) are for those PCI segments is a platform decision. Where the PCI root ports end up in a multi-segment system should mostly be determined by PCI enumeration however. I agree that there are some corner cases like Thunderbolt where we will want to dedicate specific PCI segments for certain PCIe root ports, and we will need to enable platform code to specify those details somehow (maybe via PCI Host Bridge PPI?)

Keep in mind the server use case as well. On server there are so many "uncore" (aka System Agent) PCI devices that even just a single CPU north cluster can now easily consume the entirety of the 256 busses in the first PCI segment. Accordingly, a multi-socket system is almost always multi-segment... and the multiple segments are used for regular PCI devices... not special cases like Thunderbolt root ports. In such a system... it is more helpful to think of (Segment 0/Bus 255) as "bus 255" and (Segment 1/Bus 0) as "bus 256"... more or less an extension of the maximum available PCI bus numbers. With one important corner case... the downstream devices attached to a single PCI root port all need to fit within the same PCI segment. So that may force you to place a specific PCI root port into a different segment so there is enough PCI bus numbers to fully enumerate all downstream devices. On the server side almost all devices are fully capable of being mapped to a PCI segments other than Bus 0. For server, the PCH is considered "legacy" and is the only source of PCI devices that must be in Bus 0 on the system.

"Resource Discovery" seems like an good name for MODE1. Actually I don't think MODE1 will require a very large amount of bus resources in most cases. It only needs enough to expose the deepest part of the PCI topology. Once we have reached the leaf nodes of the tree we can reclaim all the bus numbers that where used to traverse down to the leaf(s) by disabling the PCI/PCI bridges and resetting the secondary/subordinate bus numbers back to zero.

I have actually implemented the algorithm for MODE1 a few years ago. It is in the closed source code right now, I'll send you a pointer to it.

For MODE2, yes I agree that it more or less requires "special pools" but my point is that we can reclaim the memory address space from those special pools once the full enumeration is done. And yes, I agree that for client systems it is probably feasible to just go through the full MODE3 in Pre-Memory PEI, which would make MODE2 rather redundant.

Thanks,
Nate

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Albecki, Mateusz
Sent: Monday, December 6, 2021 8:11 AM
To: Johnson, Brian <brian.johnson@hpe.com>; rfc@edk2.groups.io
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Hi Nate,

I see in your mail that you are referring to segment assignment and that is something that is firmly in the scope of the platform/silicon code. In general my assumption was that when the PEI enumeration runs root bridges are already configured which should take care of segment, bus range, mmio range and io range assignment. PEI enumerator would get the information about root bridges on the system via the PciHostBridgeLib the same way DXE does.

With that assumption in mind MODE3 and MODE2 are fairly straight forward to support since MODE3 is the same mode as the one we have in DXE while MODE2 is a reduced MODE3 which we can support by giving a platform control over which devices are needed via PPI. MODE1 is what is tricky as root bridges need to be in this half-configured state in which segments and bus numbers ranges are configured while MMIO and IO is not(or at least is not in its final state). This weird root bridge state aside the functionality itself would be useful even for client systems as you have mentioned so maybe we just need to accept that we need a code which works under such conditions. I think following algorithm for MODE1 would work:
1. Platform code assigns a wide range of bus resources to the root bridge to make sure enumerator will be able to access all PCI devices under that bridge. No memory or IO resources assigned.
2. Platform requests "enumeration"(or maybe resource discovery would be a better name?) from the enumerator PEIM via PPI or maybe via event(which I guess is also a PPI).
3. Enumerator PEIM scans the root bridge and reports the actual number of buses, mmio and IO needed by the root bridge to fully enumerate the bridge. We probably need a separate reporting for video bars as you mentioned.
4. Repeat for each root bridge once every root bridge is scanned balance the resources in the platform code.
5. Finally start the enumeration at whatever time suites the platform.

In the larger picture MODE1 resource scan would need to be executed before RAM is initialized if we want to use it for optimizing MEM32 usage as such:
Client systems that only need opal unlock:
Resource discovery(MODE1) -> Resource balancing -> RAM configuration -> Enumeration(resource assignment and PPI installation) Client systems with BIOS image on block io:
Resource discovery -> Resource balancing -> Resource assignment(full) -> Install PPI for bootable device -> RAM configuration -> Install PPI for remaining required devices Server systems:
Resource discovery -> Resource balancing -> RAM configuration -> Optional resource assignment and enumeration

Regarding the partial enumeration - partial resource assignment is probably not needed if we can do resource discovery and assignment on systems which boot from block io before we assign any resources to such device(which sounds strange but you can use one of the other agents running on your platform as a proxy). If we can't the only solution for such systems is special pools I am afraid. I think we only need to support several passes of the enumerator to install new PPIs as platform use cases demand. For instance for client system with boot from block IO we need to have PPI for bootable block io in pre-mem but we can wait with installation of PPIs for remaining storage devices until post-mem. In this model we do single resource assignment and during later passes simply install the PPIs without modifying resource map.

Thanks,
Mateusz


Nate DeSimone
 

Actually I just remembered the reason for MODE2, 32bit PEI doesn't have enough MMIO space for a full enumeration even on client systems. A lot of discrete GPUs use >8GB of MMIO space these days.

Thanks,
Nate

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Nate DeSimone
Sent: Wednesday, December 22, 2021 4:21 PM
To: rfc@edk2.groups.io; Albecki, Mateusz <mateusz.albecki@intel.com>; Johnson, Brian <brian.johnson@hpe.com>
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Hi Mateusz,

I agree that deciding how many PCI segments worth of MMIO to allocate for PCI configuration space and what the BAR(s) are for those PCI segments is a platform decision. Where the PCI root ports end up in a multi-segment system should mostly be determined by PCI enumeration however. I agree that there are some corner cases like Thunderbolt where we will want to dedicate specific PCI segments for certain PCIe root ports, and we will need to enable platform code to specify those details somehow (maybe via PCI Host Bridge PPI?)

Keep in mind the server use case as well. On server there are so many "uncore" (aka System Agent) PCI devices that even just a single CPU north cluster can now easily consume the entirety of the 256 busses in the first PCI segment. Accordingly, a multi-socket system is almost always multi-segment... and the multiple segments are used for regular PCI devices... not special cases like Thunderbolt root ports. In such a system... it is more helpful to think of (Segment 0/Bus 255) as "bus 255" and (Segment 1/Bus 0) as "bus 256"... more or less an extension of the maximum available PCI bus numbers. With one important corner case... the downstream devices attached to a single PCI root port all need to fit within the same PCI segment. So that may force you to place a specific PCI root port into a different segment so there is enough PCI bus numbers to fully enumerate all downstream devices. On the server side almost all devices are fully capable of being mapped to a PCI segments other than Bus 0. For server, the PCH is considered "legacy" and is the only source of PCI devices that must be in Bus 0 on the system.

"Resource Discovery" seems like an good name for MODE1. Actually I don't think MODE1 will require a very large amount of bus resources in most cases. It only needs enough to expose the deepest part of the PCI topology. Once we have reached the leaf nodes of the tree we can reclaim all the bus numbers that where used to traverse down to the leaf(s) by disabling the PCI/PCI bridges and resetting the secondary/subordinate bus numbers back to zero.

I have actually implemented the algorithm for MODE1 a few years ago. It is in the closed source code right now, I'll send you a pointer to it.

For MODE2, yes I agree that it more or less requires "special pools" but my point is that we can reclaim the memory address space from those special pools once the full enumeration is done. And yes, I agree that for client systems it is probably feasible to just go through the full MODE3 in Pre-Memory PEI, which would make MODE2 rather redundant.

Thanks,
Nate

-----Original Message-----
From: rfc@edk2.groups.io <rfc@edk2.groups.io> On Behalf Of Albecki, Mateusz
Sent: Monday, December 6, 2021 8:11 AM
To: Johnson, Brian <brian.johnson@hpe.com>; rfc@edk2.groups.io
Subject: Re: [edk2-rfc] Proposal to add support for PCIe enumeration protocols in PEI

Hi Nate,

I see in your mail that you are referring to segment assignment and that is something that is firmly in the scope of the platform/silicon code. In general my assumption was that when the PEI enumeration runs root bridges are already configured which should take care of segment, bus range, mmio range and io range assignment. PEI enumerator would get the information about root bridges on the system via the PciHostBridgeLib the same way DXE does.

With that assumption in mind MODE3 and MODE2 are fairly straight forward to support since MODE3 is the same mode as the one we have in DXE while MODE2 is a reduced MODE3 which we can support by giving a platform control over which devices are needed via PPI. MODE1 is what is tricky as root bridges need to be in this half-configured state in which segments and bus numbers ranges are configured while MMIO and IO is not(or at least is not in its final state). This weird root bridge state aside the functionality itself would be useful even for client systems as you have mentioned so maybe we just need to accept that we need a code which works under such conditions. I think following algorithm for MODE1 would work:
1. Platform code assigns a wide range of bus resources to the root bridge to make sure enumerator will be able to access all PCI devices under that bridge. No memory or IO resources assigned.
2. Platform requests "enumeration"(or maybe resource discovery would be a better name?) from the enumerator PEIM via PPI or maybe via event(which I guess is also a PPI).
3. Enumerator PEIM scans the root bridge and reports the actual number of buses, mmio and IO needed by the root bridge to fully enumerate the bridge. We probably need a separate reporting for video bars as you mentioned.
4. Repeat for each root bridge once every root bridge is scanned balance the resources in the platform code.
5. Finally start the enumeration at whatever time suites the platform.

In the larger picture MODE1 resource scan would need to be executed before RAM is initialized if we want to use it for optimizing MEM32 usage as such:
Client systems that only need opal unlock:
Resource discovery(MODE1) -> Resource balancing -> RAM configuration -> Enumeration(resource assignment and PPI installation) Client systems with BIOS image on block io:
Resource discovery -> Resource balancing -> Resource assignment(full) -> Install PPI for bootable device -> RAM configuration -> Install PPI for remaining required devices Server systems:
Resource discovery -> Resource balancing -> RAM configuration -> Optional resource assignment and enumeration

Regarding the partial enumeration - partial resource assignment is probably not needed if we can do resource discovery and assignment on systems which boot from block io before we assign any resources to such device(which sounds strange but you can use one of the other agents running on your platform as a proxy). If we can't the only solution for such systems is special pools I am afraid. I think we only need to support several passes of the enumerator to install new PPIs as platform use cases demand. For instance for client system with boot from block IO we need to have PPI for bootable block io in pre-mem but we can wait with installation of PPIs for remaining storage devices until post-mem. In this model we do single resource assignment and during later passes simply install the PPIs without modifying resource map.

Thanks,
Mateusz


Albecki, Mateusz
 

Just to clarify I wasn't proposing to do MODE3 on all client systems, I was just commenting how MODE2 is simply a restricted version of MODE3 so it shouldn't be too much of a problem to support it once we have a way to communicate restrictions to generic code.

For the PCI segment assignment - I think I understand you better now. I thought you were suggesting to shift segment numbers around during MODE2/3 execution which should be impossible as at that time segment numbers and MEM and IO ranges decoded by the root bus should already have been established but now I think the suggestion is to include segment number shifting during resource rebalancing done after resource discovery(MODE1) which makes sense to me.

Thanks,
Mateusz


Albecki, Mateusz
 

I know it has been a while but we have finally managed to POC a solution so I have filed a bugzilla here: https://bugzilla.tianocore.org/show_bug.cgi?id=3907 which contains the link to the EDK2 fork which adds PCI_DEVICE_PPI definition and rewrites the AHCI driver to actually use it. We can't disclose enumerator part since it is using some of the proprietary code but it essentially does similar thing to DXE host bridge driver(just simpler). We will try to upstream the enumerator once we are done with the cleanup(might take a while).

Link to POC for convenience: https://github.com/mczaj/edk2/commits/AhciPei_PciDevicePpi_Both

Thanks,
Mateusz