[PATCH] BaseTools: fix decoding issue in file operation


Wang, Jian J
 

The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.

Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).

Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.=0D
=0D
cp1252 is similar to latin-1 but it doesn't support encoding '\x80'=0D
to '\xff' and doesn't support decoding following bytes:=0D
=0D
'\x81', '\x8d', '\x8f', '\x90', '\x9d'
=0D
So if there're utf-8/16 encoded characters in file, it will fail=0D
sometimes.=0D
=0D
Refer to following links for details:=0D
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)=0D
https://en.wikipedia.org/wiki/Windows-1252=0D
https://kb.iu.edu/d/aepu=0D
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html=0D

One can use following python code to verify this.

for i in range(0x100):
try:
chr(i).encode('latin-1')
except:
print(" %s cannot encode %02x" % ('latin-1', i))

for i in range(0x100):
try:
b =3D bytes([i])
b.decode('latin-1')
except:
print(" %s cannot decode %02x" % ('latin-1', i))

This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.
=0D
The possible related BZs:=0D
https://bugzilla.tianocore.org/show_bug.cgi?id=3D1434=0D
https://bugzilla.tianocore.org/show_bug.cgi?id=3D1637=0D
https://bugzilla.tianocore.org/show_bug.cgi?id=3D2578=0D
https://bugzilla.tianocore.org/show_bug.cgi?id=3D2709=0D
https://bugzilla.tianocore.org/show_bug.cgi?id=3D2829=0D

Cc: Bob Feng <bob.c.feng@...>
Cc: Liming Gao <gaoliming@...>
Cc: Yuwei Chen <yuwei.chen@...>
Signed-off-by: Jian J Wang <jian.j.wang@...>
---
BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTo=
ols/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
# wrap open to support opening a long file path=0D
#=0D
def OpenLongFilePath(FileName, Mode=3D'r', Buffer=3D -1):=0D
- return open(LongFilePath(FileName), Mode, Buffer)=0D
+ Encoding =3D None if 'b' in Mode else 'latin-1'=0D
+ return open(LongFilePath(FileName), Mode, Buffer, Encoding)=0D
=0D
def CodecOpenLongFilePath(Filename, Mode=3D'rb', Encoding=3DNone, Errors=
=3D'strict', Buffering=3D1):=0D
return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buf=
fering)=0D
--=20
2.24.0.windows.2

Join devel@edk2.groups.io to automatically receive all group messages.