Managing GCC Assembly Code Size (AArch64)


Cohen, Eugene <eugene@...>
 

Ard and Leif,

I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal...


As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections.

For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.)

So the solution I've settled on is to do this:

in MdePkg\Include\AArch64\ProcessorBind.h define:

/// Macro to place a function in its own section for dead code elimination
/// This must be placed directly before the corresponding code since the
/// .section directive applies to the code that follows it.
#define GCC_ASM_EXPORT_SECTION(func__) \
.global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.type ASM_PFX(func__), %function; \

This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed.

then for every single assembly procedure change from this:

[top of file]
GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA)

[lower down]
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

to this:

GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA)
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great.

I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler?

I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping.

Thanks,

Eugene


Ard Biesheuvel
 

On 4 August 2016 at 20:08, Cohen, Eugene <eugene@hp.com> wrote:
Ard and Leif,

I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal...


As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections.

For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.)

So the solution I've settled on is to do this:

in MdePkg\Include\AArch64\ProcessorBind.h define:

/// Macro to place a function in its own section for dead code elimination
/// This must be placed directly before the corresponding code since the
/// .section directive applies to the code that follows it.
#define GCC_ASM_EXPORT_SECTION(func__) \
.global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.type ASM_PFX(func__), %function; \

This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed.

then for every single assembly procedure change from this:

[top of file]
GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA)

[lower down]
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

to this:

GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA)
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great.
What about GAS macros (.macro / .endm). I prefer those over cpp macros
in assembler anyway.

I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler?
The asm dialect is 99% aligned between CLANG and GNU as, so this
shouldn't be a problem

I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping.
I think this would be an improvement, so go for it. The only thing to
be wary of is routines that fall through into the subsequent one.
Those need to remain in the same section.


Ard Biesheuvel
 

On 4 August 2016 at 21:18, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
On 4 August 2016 at 20:08, Cohen, Eugene <eugene@hp.com> wrote:
Ard and Leif,

I've been too backlogged to provide a real patchset at this point but wanted to get your approval on this proposal...


As you know we have some code size sensitive uncompressed XIP stuff going on. For C code we get dead code stripping thanks to the "-ffunction-sections" switch which places each function in its own section so the linker can strip unreferenced sections.

For assembly there is not a solution that's as easy. For RVCT we handled this with an assembler macro that combined the procedure label definition, export of global symbols and placement of the procedure in its own section. For GCC I haven't found a way to fully do this because we rely on the C preprocessor for assembly which means you cannot expand to multi-line macros. (The label and assembler directives require their own lines but the preprocessor collapses stuff onto one line because in the C language newlines don't matter.)

So the solution I've settled on is to do this:

in MdePkg\Include\AArch64\ProcessorBind.h define:

/// Macro to place a function in its own section for dead code elimination
/// This must be placed directly before the corresponding code since the
/// .section directive applies to the code that follows it.
#define GCC_ASM_EXPORT_SECTION(func__) \
.global _CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.section .text._CONCATENATE (__USER_LABEL_PREFIX__, func__) ;\
.type ASM_PFX(func__), %function; \

This has the effect of placing the function in a section called .text.<func__> so the linker can do its dead code stripping stuff. It also absorbs the making the symbol globally visible so the corresponding GCC_ASM_EXPORT statement can be removed.

then for every single assembly procedure change from this:

[top of file]
GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA)

[lower down]
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

to this:

GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA)
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

Because the assembly label must appear in column 1 I couldn't find a way to use the C preprocessor to absorb it so hence the two lines. If you can find a way to improve on this it would be great.
What about GAS macros (.macro / .endm). I prefer those over cpp macros
in assembler anyway.
FYI there is a null token \() for GAS which you can use to concatenate
a string with a macro argument, e.g.,

.macro func, x
.globl \x
.type \x, %function
.section .text.\x
\x\():
.endm


I'm not sure what impacts this might have to other toolchains - can this be translated to CLANG and ARM Compiler?
The asm dialect is 99% aligned between CLANG and GNU as, so this
shouldn't be a problem

I'd like to get your OK on this conceptually and then I could upstream some patches that modify the AArch64 *.S files to use this approach. Unfortunately it won't be complete because I only updated the libraries that we use. My hope is that long term all assembly (or at least assembly in libraries) adopt this approach so we are positioned for maximum dead code stripping.
I think this would be an improvement, so go for it. The only thing to
be wary of is routines that fall through into the subsequent one.
Those need to remain in the same section.


Cohen, Eugene <eugene@...>
 

Ard, as usual you rock...

FYI there is a null token \() for GAS which you can use to concatenate
a string with a macro argument, e.g.,

.macro func, x
.globl \x
.type \x, %function
.section .text.\x
\x\():
.endm
Using the GAS .macro syntax this all collapses nicely. I tested it with one assembly function and all the right stuff happens.

So the request becomes: can we modify all of the assembly (at least Aarch64 please) to use this? How would you like to phase this in?

I think this would be an improvement, so go for it. The only thing to
be wary of is routines that fall through into the subsequent one.
Those need to remain in the same section.
Yes, I've accidentally modified these with disastrous results. I now know to stay away from them (ExceptionSupport.S in particular). :)

Thanks,

Eugene

-----Original Message-----
From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
Sent: Thursday, August 04, 2016 1:47 PM
To: Cohen, Eugene <eugene@hp.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>; edk2-devel@lists.01.org
Subject: Re: Managing GCC Assembly Code Size (AArch64)

On 4 August 2016 at 21:18, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
On 4 August 2016 at 20:08, Cohen, Eugene <eugene@hp.com>
wrote:
Ard and Leif,

I've been too backlogged to provide a real patchset at this point but
wanted to get your approval on this proposal...


As you know we have some code size sensitive uncompressed XIP
stuff going on. For C code we get dead code stripping thanks to the "-
ffunction-sections" switch which places each function in its own
section so the linker can strip unreferenced sections.

For assembly there is not a solution that's as easy. For RVCT we
handled this with an assembler macro that combined the procedure
label definition, export of global symbols and placement of the
procedure in its own section. For GCC I haven't found a way to fully do
this because we rely on the C preprocessor for assembly which means
you cannot expand to multi-line macros. (The label and assembler
directives require their own lines but the preprocessor collapses stuff
onto one line because in the C language newlines don't matter.)

So the solution I've settled on is to do this:

in MdePkg\Include\AArch64\ProcessorBind.h define:

/// Macro to place a function in its own section for dead code
elimination
/// This must be placed directly before the corresponding code
since the
/// .section directive applies to the code that follows it.
#define GCC_ASM_EXPORT_SECTION(func__) \
.global _CONCATENATE (__USER_LABEL_PREFIX__, func__)
;\
.section .text._CONCATENATE (__USER_LABEL_PREFIX__,
func__) ;\
.type ASM_PFX(func__), %function; \

This has the effect of placing the function in a section called
.text.<func__> so the linker can do its dead code stripping stuff. It also
absorbs the making the symbol globally visible so the corresponding
GCC_ASM_EXPORT statement can be removed.

then for every single assembly procedure change from this:

[top of file]
GCC_ASM_EXPORT (ArmInvalidateDataCacheEntryByMVA)

[lower down]
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

to this:

GCC_ASM_EXPORT_SECTION(ArmInvalidateDataCacheEntryByMVA)
ASM_PFX(ArmInvalidateDataCacheEntryByMVA):
dc ivac, x0 // Invalidate single data cache line
ret

Because the assembly label must appear in column 1 I couldn't find
a way to use the C preprocessor to absorb it so hence the two lines. If
you can find a way to improve on this it would be great.
What about GAS macros (.macro / .endm). I prefer those over cpp
macros
in assembler anyway.
FYI there is a null token \() for GAS which you can use to concatenate
a string with a macro argument, e.g.,

.macro func, x
.globl \x
.type \x, %function
.section .text.\x
\x\():
.endm


I'm not sure what impacts this might have to other toolchains - can
this be translated to CLANG and ARM Compiler?
The asm dialect is 99% aligned between CLANG and GNU as, so this
shouldn't be a problem

I'd like to get your OK on this conceptually and then I could
upstream some patches that modify the AArch64 *.S files to use this
approach. Unfortunately it won't be complete because I only updated
the libraries that we use. My hope is that long term all assembly (or at
least assembly in libraries) adopt this approach so we are positioned
for maximum dead code stripping.
I think this would be an improvement, so go for it. The only thing to
be wary of is routines that fall through into the subsequent one.
Those need to remain in the same section.