Re: [PATCH 1/2] BaseTools/GenFw AARCH64: convert ADRP to ADR if binary size allows it


Leif Lindholm <leif.lindholm@...>
 

Apologies, lost track of this one.

On Mon, Aug 01, 2016 at 01:53:09PM +0200, Ard Biesheuvel wrote:
On 27 July 2016 at 13:26, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
The ADRP instruction in the AArch64 ISA requires the link time and load
time offsets of a binary to be equal modulo 4 KB. The reason is that this
instruction always produces a multiple of 4 KB, and relies on a subsequent
ADD or LDR instruction to set the offset into the page. The resulting
symbol reference only produces the correct value if the symbol in question
resides at that exact offset into the page, and so loading the binary at
arbitrary offsets is not possible.

Due to the various levels of padding when packing FVs into FVs into FDs,
this alignment is very costly for XIP code, and so we would like to relax
this alignment requirement if possible.

Given that symbols that are sufficiently close (within 1 MB) of the
reference can also be reached using an ADR instruction which does not
suffer from this alignment issue, let's replace ADRP instructions with ADR
after linking if the offset can be encoded in this instruction's immediate
field. Note that this only makes sense if the section alignment is < 4 KB.
Otherwise, replacing the ADRP has no benefit, considering that the
subsequent ADD or LDR instruction is retained, and that micro-architectures
are more likely to be optimized for ADRP/ADD pairs (i.e., via micro op
fusing) than for ADR/ADD pairs, which are non-typical.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
@Liming, @Leif:

are there any objections to these patches? I know it is unfortunate
that we need to modify instructions as part of the ELF to PE/COFF
conversion, but it is very effective
It's absolutely horrid, but extremely useful.
For the series:
Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

ArmVirtQemu-AARCH64 built with CLANG35:

Before:

FVMAIN_COMPACT [41%Full] 2093056 total, 868416 used, 1224640 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

After:

FVMAIN_COMPACT [36%Full] 2093056 total, 768064 used, 1324992 free
FVMAIN [99%Full] 4848064 total, 4848008 used, 56 free

For comparision, GCC49

FVMAIN_COMPACT [35%Full] 2093056 total, 749960 used, 1343096 free
FVMAIN [99%Full] 3929088 total, 3929032 used, 56 free

and GCC5 (with LTO)

FVMAIN_COMPACT [34%Full] 2093056 total, 732400 used, 1360656 free
FVMAIN [99%Full] 3730240 total, 3730216 used, 24 free

In other words, it turns CLANG35 from a pathetic outlier into
something usable :-)

Regards,
Ard.

Join devel@edk2.groups.io to automatically receive all group messages.