How to read UTF8 files?


Konstantin Aladyshev
 

Hello!
What is the best way to handle files encoded in UTF8?
I'm looking for ways to read strings from such files, print these
strings or compare them to my own CHAR16* strings.

For example if I have read a UTF8 string to a buffer via `ShellReadFile` call:
EFI_STATUS
EFIAPI
ShellReadFile(
IN SHELL_FILE_HANDLE FileHandle,
IN OUT UINTN *ReadSize,
OUT VOID *Buffer
);
How to print this string? Print function has only options for ASCII
(%a) or UTF16 (%s) strings.

Best regards,
Konstantin Aladyshev


Andrew Fish
 

On Jul 7, 2021, at 3:01 AM, Konstantin Aladyshev <aladyshev22@...> wrote:

Hello!
What is the best way to handle files encoded in UTF8?
Konstantin,

You need to deserialize the UTF-8 to Unicode and then serialize that to UCS-2 (CHAR16).

The Terminal driver has support for UTF-8 terminals so you can probably leverage some of that code [1]. The Terminal code is probably a little more complex than you need since it has to deal with the data coming in a byte at a time over serial, but the conversion logic is what you are looking for.

[1] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Console/TerminalDxe/Vtutf8.c#L19

Thanks,

Andrew Fish

I'm looking for ways to read strings from such files, print these
strings or compare them to my own CHAR16* strings.

For example if I have read a UTF8 string to a buffer via `ShellReadFile` call:
EFI_STATUS
EFIAPI
ShellReadFile(
IN SHELL_FILE_HANDLE FileHandle,
IN OUT UINTN *ReadSize,
OUT VOID *Buffer
);
How to print this string? Print function has only options for ASCII
(%a) or UTF16 (%s) strings.

Best regards,
Konstantin Aladyshev