Managing GCC Assembly Code Size (AArch64)

Cohen, Eugene <eugene@...>

Ard and Leif,

I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal...

As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections.

For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.)

So the solution I've settled on is to do this:

in MdePkg\Include\AArch64\ProcessorBind.h define:

/// Macro to place a function in its own section for dead code elimination
/// This must be placed directly before the corresponding code since the
/// .section directive applies to the code that follows it.
#define GCC_ASM_EXPORT_SECTION(func__) \
.global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.type ASM_PFX(func__), %function; \

This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed.

then for every single assembly procedure change from this:

[top of file]
GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA)

[lower down]
dc ivac, x0 // Invalidate single data cache line

to this:

dc ivac, x0 // Invalidate single data cache line

Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great.

I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler?

I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping.



Join to automatically receive all group messages.