[PATCH v3 03/16] ArmVirtPkg: make EFI_LOADER_DATA non-executable


Ard Biesheuvel
 

When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.

Signed-off-by: Ard Biesheuvel <ardb@...>
---
ArmVirtPkg/ArmVirt.dsc.inc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ArmVirtPkg/ArmVirt.dsc.inc b/ArmVirtPkg/ArmVirt.dsc.inc
index 34575585adbb..462073517a22 100644
--- a/ArmVirtPkg/ArmVirt.dsc.inc
+++ b/ArmVirtPkg/ArmVirt.dsc.inc
@@ -368,7 +368,7 @@ [PcdsFixedAtBuild.common]
# reserved ones, with the exception of LoaderData regions, of which OS l=
oaders=0D
# (i.e., GRUB) may assume that its contents are executable.=0D
#=0D
- gEfiMdeModulePkgTokenSpaceGuid.PcdDxeNxMemoryProtectionPolicy|0xC0000000=
00007FD1=0D
+ gEfiMdeModulePkgTokenSpaceGuid.PcdDxeNxMemoryProtectionPolicy|0xC0000000=
00007FD5=0D
=0D
[Components.common]=0D
#=0D
--=20
2.35.1


Leif Lindholm
 

On 2022-09-26 01:24, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.
This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Signed-off-by: Ard Biesheuvel <ardb@...>


---
ArmVirtPkg/ArmVirt.dsc.inc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ArmVirtPkg/ArmVirt.dsc.inc b/ArmVirtPkg/ArmVirt.dsc.inc
index 34575585adbb..462073517a22 100644
--- a/ArmVirtPkg/ArmVirt.dsc.inc
+++ b/ArmVirtPkg/ArmVirt.dsc.inc
@@ -368,7 +368,7 @@ [PcdsFixedAtBuild.common]
# reserved ones, with the exception of LoaderData regions, of which OS loaders
# (i.e., GRUB) may assume that its contents are executable.
Should the comment be updated too ("old versions of GRUB")?

Regardless:
Reviewed-by: Leif Lindholm <quic_llindhol@...>

/
Leif

#
- gEfiMdeModulePkgTokenSpaceGuid.PcdDxeNxMemoryProtectionPolicy|0xC000000000007FD1
+ gEfiMdeModulePkgTokenSpaceGuid.PcdDxeNxMemoryProtectionPolicy|0xC000000000007FD5
[Components.common]
#


Gerd Hoffmann
 

On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.

take care,
Gerd


dann frazier
 

On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.

-dann

[*] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1025656
[**] https://git.launchpad.net/~ubuntu-core-dev/grub/+git/ubuntu/commit/?id=a0ee822f1c85fcf55066996ab51c5cf77e2728b2)


Ard Biesheuvel
 

On Thu, 29 Dec 2022 at 19:00, dann frazier <dann.frazier@...> wrote:

On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.
Thanks for the report.

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.

I do not intend to revert this patch - the trend under EFI is towards
much stricter memory permissions, also on the MS side, and this is
especially important under CC scenarios. And if 5+ years is not
sufficient for out-of-tree GRUB to catch up, what is the point of
waiting for it?

Thanks,
Ard.


Alexander Graf
 

Hey Ard,

On 03.01.23 10:59, Ard Biesheuvel wrote:
On Thu, 29 Dec 2022 at 19:00, dann frazier <dann.frazier@...> wrote:
On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.
Thanks for the report.

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.

I do not intend to revert this patch - the trend under EFI is towards
much stricter memory permissions, also on the MS side, and this is
especially important under CC scenarios. And if 5+ years is not
sufficient for out-of-tree GRUB to catch up, what is the point of
waiting for it?

I think we need to be smarter here. ArmVirtPkg is used by a lot of virtualization software - such as QEMU. Typically, users (and developers) expect that an update to a newer version will preserve compatibility.

Let me contrive an example: In 1 year, QEMU updates to the latest AAVMF. It ships that as part of its pc-bios directory. Suddenly, we accidentally break old (immutable!) iso images that used to work. So someone that updates QEMU to a newer version will have a regression in running a Fedora 37 installation. Or RHEL 8.7 installation. Not good :).

Outside of OSVs providing new iso images, there is also not much you can do about this. The grub binary here runs way before any updater code that could pull a fixed version from the internet.

What alternatives do we have? Assuming grub is the only offender we have to worry about, is there a way we can identify that the allocation came from an unpatched version? Or have a database of hashes (with all known grub binaries that have this bug in existing isos) that would force disable NX protection for it if we hit it? Or disable NX when Secure Boot is disabled?

I really think we need to be a bit more creative than make NX mandatory in all cases when we know the are immutable images out there that won't work with it, otherwise we break very legit use cases.


Alex


dann frazier
 

On Tue, Jan 03, 2023 at 08:39:24PM +0100, Alexander Graf wrote:
Hey Ard,

On 03.01.23 10:59, Ard Biesheuvel wrote:
On Thu, 29 Dec 2022 at 19:00, dann frazier <dann.frazier@...> wrote:
On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.
Thanks for the report.

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.

I do not intend to revert this patch - the trend under EFI is towards
much stricter memory permissions, also on the MS side, and this is
especially important under CC scenarios. And if 5+ years is not
sufficient for out-of-tree GRUB to catch up, what is the point of
waiting for it?

I think we need to be smarter here. ArmVirtPkg is used by a lot of
virtualization software - such as QEMU. Typically, users (and developers)
expect that an update to a newer version will preserve compatibility.

Let me contrive an example: In 1 year, QEMU updates to the latest AAVMF. It
ships that as part of its pc-bios directory. Suddenly, we accidentally break
old (immutable!) iso images that used to work. So someone that updates QEMU
to a newer version will have a regression in running a Fedora 37
installation. Or RHEL 8.7 installation. Not good :).

Outside of OSVs providing new iso images, there is also not much you can do
about this. The grub binary here runs way before any updater code that could
pull a fixed version from the internet.

What alternatives do we have? Assuming grub is the only offender we have to
worry about, is there a way we can identify that the allocation came from an
unpatched version? Or have a database of hashes (with all known grub
binaries that have this bug in existing isos) that would force disable NX
protection for it if we hit it? Or disable NX when Secure Boot is disabled?

I really think we need to be a bit more creative than make NX mandatory in
all cases when we know the are immutable images out there that won't work
with it, otherwise we break very legit use cases.
While it has its own issues, I suppose another way to go about it is
for distributors to provide multiple AAVMF_CODE images, and perhaps
invent a "feature" flag for the json descriptor for management tools
to select an appropriate one.

-dann


Ard Biesheuvel
 

On Tue, 3 Jan 2023 at 23:47, dann frazier <dann.frazier@...> wrote:

On Tue, Jan 03, 2023 at 08:39:24PM +0100, Alexander Graf wrote:
Hey Ard,

On 03.01.23 10:59, Ard Biesheuvel wrote:
On Thu, 29 Dec 2022 at 19:00, dann frazier <dann.frazier@...> wrote:
On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.
Thanks for the report.

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.

I do not intend to revert this patch - the trend under EFI is towards
much stricter memory permissions, also on the MS side, and this is
especially important under CC scenarios. And if 5+ years is not
sufficient for out-of-tree GRUB to catch up, what is the point of
waiting for it?

I think we need to be smarter here. ArmVirtPkg is used by a lot of
virtualization software - such as QEMU. Typically, users (and developers)
expect that an update to a newer version will preserve compatibility.

Let me contrive an example: In 1 year, QEMU updates to the latest AAVMF. It
ships that as part of its pc-bios directory. Suddenly, we accidentally break
old (immutable!) iso images that used to work. So someone that updates QEMU
to a newer version will have a regression in running a Fedora 37
installation. Or RHEL 8.7 installation. Not good :).

Outside of OSVs providing new iso images, there is also not much you can do
about this. The grub binary here runs way before any updater code that could
pull a fixed version from the internet.

What alternatives do we have? Assuming grub is the only offender we have to
worry about, is there a way we can identify that the allocation came from an
unpatched version? Or have a database of hashes (with all known grub
binaries that have this bug in existing isos) that would force disable NX
protection for it if we hit it? Or disable NX when Secure Boot is disabled?

I really think we need to be a bit more creative than make NX mandatory in
all cases when we know the are immutable images out there that won't work
with it, otherwise we break very legit use cases.
While it has its own issues, I suppose another way to go about it is
for distributors to provide multiple AAVMF_CODE images, and perhaps
invent a "feature" flag for the json descriptor for management tools
to select an appropriate one.
I don't think having different versions of the image makes sense, tbh,
but of course, this is up to the distros.

Compatibility with ancient downstream GRUB builds is not a goal of the
EDK2 upstream, so as long as distros can tweak the build to their
needs, I don't see a reason to revert this change upstream.


Gerd Hoffmann
 

Hi,

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.
Can this also be flipped at runtime?
Does this pcd work the same way on all architectures?

I don't think having different versions of the image makes sense, tbh,
but of course, this is up to the distros.
Fedora has reverted the patch for now, and I don't see how we can enable
that anytime soon given that RHEL-8,9 with loooooong support times ship
broken grub binaries today.

Compatibility with ancient downstream GRUB builds is not a goal of the
EDK2 upstream, so as long as distros can tweak the build to their
needs, I don't see a reason to revert this change upstream.
The versions are not that ancient. The problem is more that upstream
grub is really slow on integrating patches so every distro does carry
a huge pile of downstream patches. And they seem to re-introduce the
bug ...

But, yes, just reverting upstream too doesn't look like a good option
either, we need at least a little pressure to get things fixed.

take care,
Gerd


Ard Biesheuvel
 

On Wed, 4 Jan 2023 at 12:11, Gerd Hoffmann <kraxel@...> wrote:

Hi,

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.
Can this also be flipped at runtime?
Currently, it is fixed or patchable, which means that you can override
it at build time only. I don't think making this a dynamic PCD would
be difficult, and on QEMU, we can set the value early enough if we key
it off fw_cfg or something like that.

But that implies that you need a 'permissive' mode to invoke QEMU,
which ends up being always enabled, most likely, so I'm not sure this
is an improvement.

Does this pcd work the same way on all architectures?
In principle, yes. However, I cannot vouch for the X86 code not doing
dodgy things with data regions, so whether the same *value* works
reliably across all architectures is a separate matter.

I don't think having different versions of the image makes sense, tbh,
but of course, this is up to the distros.
Fedora has reverted the patch for now, and I don't see how we can enable
that anytime soon given that RHEL-8,9 with loooooong support times ship
broken grub binaries today.
Yeah. This is really disappointing.

Compatibility with ancient downstream GRUB builds is not a goal of the
EDK2 upstream, so as long as distros can tweak the build to their
needs, I don't see a reason to revert this change upstream.
The versions are not that ancient. The problem is more that upstream
grub is really slow on integrating patches so every distro does carry
a huge pile of downstream patches. And they seem to re-introduce the
bug ...

But, yes, just reverting upstream too doesn't look like a good option
either, we need at least a little pressure to get things fixed.
Indeed.


Gerd Hoffmann
 

On Wed, Jan 04, 2023 at 01:04:41PM +0100, Ard Biesheuvel wrote:
On Wed, 4 Jan 2023 at 12:11, Gerd Hoffmann <kraxel@...> wrote:

Hi,

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1
Can this also be flipped at runtime?
Currently, it is fixed or patchable, which means that you can override
it at build time only. I don't think making this a dynamic PCD would
be difficult, and on QEMU, we can set the value early enough if we key
it off fw_cfg or something like that.

But that implies that you need a 'permissive' mode to invoke QEMU,
which ends up being always enabled, most likely, so I'm not sure this
is an improvement.
It works both ways. Being able to enable nx protection at runtime on
builds which have it disabled by default would be quite useful. Write
test cases. Write reproducer instructions which don't include building
edk2 yourself.

take care,
Gerd


Alexander Graf
 

On 04.01.23 10:35, Ard Biesheuvel wrote:
On Tue, 3 Jan 2023 at 23:47, dann frazier <dann.frazier@...> wrote:
On Tue, Jan 03, 2023 at 08:39:24PM +0100, Alexander Graf wrote:
Hey Ard,

On 03.01.23 10:59, Ard Biesheuvel wrote:
On Thu, 29 Dec 2022 at 19:00, dann frazier <dann.frazier@...> wrote:
On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.
Thanks for the report.

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.

I do not intend to revert this patch - the trend under EFI is towards
much stricter memory permissions, also on the MS side, and this is
especially important under CC scenarios. And if 5+ years is not
sufficient for out-of-tree GRUB to catch up, what is the point of
waiting for it?
I think we need to be smarter here. ArmVirtPkg is used by a lot of
virtualization software - such as QEMU. Typically, users (and developers)
expect that an update to a newer version will preserve compatibility.

Let me contrive an example: In 1 year, QEMU updates to the latest AAVMF. It
ships that as part of its pc-bios directory. Suddenly, we accidentally break
old (immutable!) iso images that used to work. So someone that updates QEMU
to a newer version will have a regression in running a Fedora 37
installation. Or RHEL 8.7 installation. Not good :).

Outside of OSVs providing new iso images, there is also not much you can do
about this. The grub binary here runs way before any updater code that could
pull a fixed version from the internet.

What alternatives do we have? Assuming grub is the only offender we have to
worry about, is there a way we can identify that the allocation came from an
unpatched version? Or have a database of hashes (with all known grub
binaries that have this bug in existing isos) that would force disable NX
protection for it if we hit it? Or disable NX when Secure Boot is disabled?

I really think we need to be a bit more creative than make NX mandatory in
all cases when we know the are immutable images out there that won't work
with it, otherwise we break very legit use cases.
While it has its own issues, I suppose another way to go about it is
for distributors to provide multiple AAVMF_CODE images, and perhaps
invent a "feature" flag for the json descriptor for management tools
to select an appropriate one.
I don't think having different versions of the image makes sense, tbh,
but of course, this is up to the distros.

Compatibility with ancient downstream GRUB builds is not a goal of the
EDK2 upstream, so as long as distros can tweak the build to their
needs, I don't see a reason to revert this change upstream.

First of all, I don't think we should revert this change. We should augment it to give users the out-of-the-box experience they expect.

On top of that, I don't think it's a problem of "distros". Every consumer of AAVMF will run into this problem as soon as their users will want to run any Red Hat OS (installer / image) all the way into 2022. That's  very likely >90% of the user base. Because of that, I'm pretty sure no Cloud vendor will be able to enable NX in its current shape for example.

I'm very happy to see NX proliferate through the stack, but let's please make sure we do it compatibly by default, otherwise the net result is that *everyone* who compiles AAVMF will disable NX by default again - and all you end up with is more frustration and more downstream code / forks.

IMHO the most obvious approach would be a fingerprint based override. There should be a finite number of known broken grub binaries. If we just maintain a database with them and then apply some magic when we detect such a binary gets loaded, we'll have tightened security by default, without breaking backwards compat.

For environments that know they're running in environments with CC requirements, we can automatically disable the fingerprint override :).


Alex


Alexander Graf
 

On 04.01.23 14:13, Alexander Graf wrote:

On 04.01.23 10:35, Ard Biesheuvel wrote:
On Tue, 3 Jan 2023 at 23:47, dann frazier <dann.frazier@...> wrote:
On Tue, Jan 03, 2023 at 08:39:24PM +0100, Alexander Graf wrote:
Hey Ard,

On 03.01.23 10:59, Ard Biesheuvel wrote:
On Thu, 29 Dec 2022 at 19:00, dann frazier <dann.frazier@...> wrote:
On Mon, Nov 28, 2022 at 04:46:10PM +0100, Gerd Hoffmann wrote:
On Mon, Sep 26, 2022 at 10:24:58AM +0200, Ard Biesheuvel wrote:
When the memory protections were implemented and enabled on ArmVirtQemu
5+ years ago, we had to work around the fact that GRUB at the time
expected EFI_LOADER_DATA to be executable, as that is the memory type it
allocates when loading its modules.

This has been fixed in GRUB in August 2017, so by now, we should be able
to tighten this, and remove execute permissions from EFI_LOADER_DATA
allocations.
Data point: https://bugzilla.redhat.com/show_bug.cgi?id=2149020
tl;dr: fedora 37 grub.efi is still broken.
This is also the case with existing Ubuntu releases, as well as
AlmaLinux 9.1 and RHEL 8.7[*]. While it does appear to be fixed for
the upcoming Ubuntu 23.04 (presumably via [**]), I plan to revert this
patch in Debian/Ubuntu until it is more ubiquitous. Do you want to do
the same upstream? I'm not sure at what point it would make sense to
reintroduce it, given we can't force users to upgrade their bootloaders.
Thanks for the report.

You can override PCDs on the build command line, so I suggest you use
that for building these images as long as it is needed.

E.g,, append this to the build.sh command line

--pcd PcdDxeNxMemoryProtectionPolicy=0xC000000000007FD1

to undo the effects of this patch.

I do not intend to revert this patch - the trend under EFI is towards
much stricter memory permissions, also on the MS side, and this is
especially important under CC scenarios. And if 5+ years is not
sufficient for out-of-tree GRUB to catch up, what is the point of
waiting for it?
I think we need to be smarter here. ArmVirtPkg is used by a lot of
virtualization software - such as QEMU. Typically, users (and developers)
expect that an update to a newer version will preserve compatibility.

Let me contrive an example: In 1 year, QEMU updates to the latest AAVMF. It
ships that as part of its pc-bios directory. Suddenly, we accidentally break
old (immutable!) iso images that used to work. So someone that updates QEMU
to a newer version will have a regression in running a Fedora 37
installation. Or RHEL 8.7 installation. Not good :).

Outside of OSVs providing new iso images, there is also not much you can do
about this. The grub binary here runs way before any updater code that could
pull a fixed version from the internet.

What alternatives do we have? Assuming grub is the only offender we have to
worry about, is there a way we can identify that the allocation came from an
unpatched version? Or have a database of hashes (with all known grub
binaries that have this bug in existing isos) that would force disable NX
protection for it if we hit it? Or disable NX when Secure Boot is disabled?

I really think we need to be a bit more creative than make NX mandatory in
all cases when we know the are immutable images out there that won't work
with it, otherwise we break very legit use cases.
While it has its own issues, I suppose another way to go about it is
for distributors to provide multiple AAVMF_CODE images, and perhaps
invent a "feature" flag for the json descriptor for management tools
to select an appropriate one.
I don't think having different versions of the image makes sense, tbh,
but of course, this is up to the distros.

Compatibility with ancient downstream GRUB builds is not a goal of the
EDK2 upstream, so as long as distros can tweak the build to their
needs, I don't see a reason to revert this change upstream.

First of all, I don't think we should revert this change. We should augment it to give users the out-of-the-box experience they expect.

On top of that, I don't think it's a problem of "distros". Every consumer of AAVMF will run into this problem as soon as their users will want to run any Red Hat OS (installer / image) all the way into 2022. That's  very likely >90% of the user base. Because of that, I'm pretty sure no Cloud vendor will be able to enable NX in its current shape for example.

I'm very happy to see NX proliferate through the stack, but let's please make sure we do it compatibly by default, otherwise the net result is that *everyone* who compiles AAVMF will disable NX by default again - and all you end up with is more frustration and more downstream code / forks.

IMHO the most obvious approach would be a fingerprint based override. There should be a finite number of known broken grub binaries. If we just maintain a database with them and then apply some magic when we detect such a binary gets loaded, we'll have tightened security by default, without breaking backwards compat.

For environments that know they're running in environments with CC requirements, we can automatically disable the fingerprint override :).

To clarify, I mean something like the patch below, but with an additional callback notification similar to the Emu one in LoadImage(), so that we can make sure we only enable the quirk when we load a known-bad grub binary. That way we still force distros to ship fixed versions of their code, but enable old code to continue running.


Alex


diff --git a/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c b/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c
index 3ad1ecd9d2..365eb1c157 100644
--- a/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c
+++ b/ArmVirtPkg/Library/PlatformBootManagerLib/PlatformBm.c
@@ -902,6 +902,25 @@ PlatformBootManagerBeforeConsole (
   FilterAndProcess (&gEfiPciIoProtocolGuid, IsVirtioPciRng, Connect);
 }

+static EFI_ALLOCATE_PAGES RealAllocatePages;
+
+static EFI_STATUS EFIAPI AllocatePagesForceLoaderCode(
+  IN     EFI_ALLOCATE_TYPE            Type,
+  IN     EFI_MEMORY_TYPE              MemoryType,
+  IN     UINTN                        Pages,
+  IN OUT EFI_PHYSICAL_ADDRESS         *Memory
+)
+{
+    /*
+     * Broken grub versions do LoaderData allocations for code. Let's patch
+     * them into LoaderCode allocations instead.
+     */
+    if (MemoryType == EfiLoaderData)
+        MemoryType = EfiLoaderCode;
+
+    return RealAllocatePages(Type, MemoryType, Pages, Memory);
+}
+
 /**
   Do the platform specific action after the console is ready
   Possible things that can be done in PlatformBootManagerAfterConsole:
@@ -964,6 +983,14 @@ PlatformBootManagerAfterConsole (
   SetBootOrderFromQemu ();

   PlatformBmPrintScRegisterHandler ();
+
+  /* TODO: Only run this as part of a notify callback in ImageLoad() when we
+           load a grub binary with a known-broken hash */
+  BOOLEAN is_broken_grub = TRUE;
+  if (is_broken_grub) {
+    RealAllocatePages = gBS->AllocatePages;
+    gBS->AllocatePages = AllocatePagesForceLoaderCode;
+  }
 }

 /**


Gerd Hoffmann
 

Hi,

To clarify, I mean something like the patch below, but with an additional
callback notification similar to the Emu one in LoadImage(), so that we can
make sure we only enable the quirk when we load a known-bad grub binary.
That way we still force distros to ship fixed versions of their code, but
enable old code to continue running.
+  /* TODO: Only run this as part of a notify callback in ImageLoad() when
we
+           load a grub binary with a known-broken hash */
+  BOOLEAN is_broken_grub = TRUE;
+  if (is_broken_grub) {
+    RealAllocatePages = gBS->AllocatePages;
+    gBS->AllocatePages = AllocatePagesForceLoaderCode;
+  }
You left out the hard part, which is the list of hashes. And I suspect
you underestimate the number of broken grub binaries in the wild ...

take care,
Gerd


Alexander Graf
 

Am 05.01.2023 um 09:11 schrieb Gerd Hoffmann <kraxel@...>:

 Hi,

To clarify, I mean something like the patch below, but with an additional
callback notification similar to the Emu one in LoadImage(), so that we can
make sure we only enable the quirk when we load a known-bad grub binary.
That way we still force distros to ship fixed versions of their code, but
enable old code to continue running.
+ /* TODO: Only run this as part of a notify callback in ImageLoad() when
we
+ load a grub binary with a known-broken hash */
+ BOOLEAN is_broken_grub = TRUE;
+ if (is_broken_grub) {
+ RealAllocatePages = gBS->AllocatePages;
+ gBS->AllocatePages = AllocatePagesForceLoaderCode;
+ }
You left out the hard part, which is the list of hashes.
Yes, I'd crowd source that list. If someone has vested interest to keep their old grub binaries working, they can send an upstream patch to add their hash :). At least we'd have a path forward to make things work that is not "revert NX enablement".

And I suspect
you underestimate the number of broken grub binaries in the wild ...
What number would you expect? I'd hope that we get to <100 realistically.

I'm happy to hear about alternatives to this approach. I'm very confident that forcing NX on always is going to have the opposite effect of what we want: Everyone who ships AAVMF binaries will disable NX because they eventually get bug reports from users that their shiny update regressed some legit use case.

The only alternative I can think of would be logic similar to the patch I sent without any grub hash check: Exclude AllocatePages for LoaderData from the NX logic. Keep NX for any other non-code memory type as well as LoaderData pool allocations.


Alex


Ard Biesheuvel
 

On Thu, 5 Jan 2023 at 09:43, Alexander Graf <agraf@...> wrote:



Am 05.01.2023 um 09:11 schrieb Gerd Hoffmann <kraxel@...>:

 Hi,

To clarify, I mean something like the patch below, but with an additional
callback notification similar to the Emu one in LoadImage(), so that we can
make sure we only enable the quirk when we load a known-bad grub binary.
That way we still force distros to ship fixed versions of their code, but
enable old code to continue running.
+ /* TODO: Only run this as part of a notify callback in ImageLoad() when
we
+ load a grub binary with a known-broken hash */
+ BOOLEAN is_broken_grub = TRUE;
+ if (is_broken_grub) {
+ RealAllocatePages = gBS->AllocatePages;
+ gBS->AllocatePages = AllocatePagesForceLoaderCode;
+ }
You left out the hard part, which is the list of hashes.
Yes, I'd crowd source that list. If someone has vested interest to keep their old grub binaries working, they can send an upstream patch to add their hash :). At least we'd have a path forward to make things work that is not "revert NX enablement".

And I suspect
you underestimate the number of broken grub binaries in the wild ...
What number would you expect? I'd hope that we get to <100 realistically.

I'm happy to hear about alternatives to this approach. I'm very confident that forcing NX on always is going to have the opposite effect of what we want: Everyone who ships AAVMF binaries will disable NX because they eventually get bug reports from users that their shiny update regressed some legit use case.

The only alternative I can think of would be logic similar to the patch I sent without any grub hash check: Exclude AllocatePages for LoaderData from the NX logic. Keep NX for any other non-code memory type as well as LoaderData pool allocations.
Another thing we might consider is trapping exec permission violations
and switching the pages in question from rw- to r-x.

Does GRUB generally load/map executable modules at page granularity?


Gerd Hoffmann
 

Hi,

What number would you expect? I'd hope that we get to <100 realistically.

I'm happy to hear about alternatives to this approach. I'm very confident that forcing NX on always is going to have the opposite effect of what we want: Everyone who ships AAVMF binaries will disable NX because they eventually get bug reports from users that their shiny update regressed some legit use case.

The only alternative I can think of would be logic similar to the patch I sent without any grub hash check: Exclude AllocatePages for LoaderData from the NX logic. Keep NX for any other non-code memory type as well as LoaderData pool allocations.
Another thing we might consider is trapping exec permission violations
and switching the pages in question from rw- to r-x.
That sounds neat, especially as we can print a big'n'fat warning in that
case, so the problem gets attention without actually breaking users.

Looking at the efi calls it looks like edk2 doesn't track the owner of
an allocation (say by image handle), so I suspect it is not possible to
automatically figure who is to blame?

Does GRUB generally load/map executable modules at page granularity?
I don't think so, at least the code handles modules not being page
aligned. But I think it's not grub modules, that fix was actually
picked up meanwhile. But there are downstream patches for image
loader code which look suspicious to me ...

take care,
Gerd


Ard Biesheuvel
 

On Thu, 5 Jan 2023 at 12:19, Gerd Hoffmann <kraxel@...> wrote:

Hi,

What number would you expect? I'd hope that we get to <100 realistically.

I'm happy to hear about alternatives to this approach. I'm very confident that forcing NX on always is going to have the opposite effect of what we want: Everyone who ships AAVMF binaries will disable NX because they eventually get bug reports from users that their shiny update regressed some legit use case.

The only alternative I can think of would be logic similar to the patch I sent without any grub hash check: Exclude AllocatePages for LoaderData from the NX logic. Keep NX for any other non-code memory type as well as LoaderData pool allocations.
Another thing we might consider is trapping exec permission violations
and switching the pages in question from rw- to r-x.
That sounds neat, especially as we can print a big'n'fat warning in that
case, so the problem gets attention without actually breaking users.
That, and a sleep(5)

Looking at the efi calls it looks like edk2 doesn't track the owner of
an allocation (say by image handle), so I suspect it is not possible to
automatically figure who is to blame?

Does GRUB generally load/map executable modules at page granularity?
I don't think so, at least the code handles modules not being page
aligned. But I think it's not grub modules, that fix was actually
picked up meanwhile. But there are downstream patches for image
loader code which look suspicious to me ...
OK, so the GRUB PE/COFF loader strikes again :-(

So shim may be broken in the exact same way, and the things shim loads
may not adhere to page alignment for the sections. Loading the kernel
itself this way should be fine, though - we always had section
alignment in the EFI stub kernel.

The downside of this approach is that it can only be done on a
page-by-page basis, given that the EFI_LOADER_DATA region in question
will cover both the .text/.rodata and the .data/.bss of the loaded
image.

Could someone check/confirm whether shim builds need to be take into
account here? Thanks.


Gerd Hoffmann
 

Hi,

That sounds neat, especially as we can print a big'n'fat warning in that
case, so the problem gets attention without actually breaking users.
That, and a sleep(5)

Looking at the efi calls it looks like edk2 doesn't track the owner of
an allocation (say by image handle), so I suspect it is not possible to
automatically figure who is to blame?

Does GRUB generally load/map executable modules at page granularity?
I don't think so, at least the code handles modules not being page
aligned. But I think it's not grub modules, that fix was actually
picked up meanwhile. But there are downstream patches for image
loader code which look suspicious to me ...
OK, so the GRUB PE/COFF loader strikes again :-(
Yep.

Could someone check/confirm whether shim builds need to be take into
account here? Thanks.
Tried booting grub.efi directly and via shim.efi, on Fedora 37 GA.

In both cases I get a pagefault on linux kernel boot (before any other
message is printed), which I guess happens because the loader places the
linux kernel efi stub in EfiLoaderData memory.

I'd say that confirms grub.efi being buggy.

Not sure about shim.efi. It managed to run grub.efi without hitting a
fault, which is good. But it also installs efi protocols for the boot
loader to call, so it could be involved too. But maybe that happens
only in case secure boot is active, which is not supported by
ArmVirtPkg.

take care,
Gerd


Gerd Hoffmann
 

Hi,

Tried booting grub.efi directly and via shim.efi, on Fedora 37 GA.

In both cases I get a pagefault on linux kernel boot (before any other
message is printed), which I guess happens because the loader places the
linux kernel efi stub in EfiLoaderData memory.
When compiling ovmf with the same pcd value I get the same behavior
on x64, so it's consistently buggy across architectures.

take care,
Gerd