Re: How to read UTF8 files?


Andrew Fish
 

On Jul 7, 2021, at 3:01 AM, Konstantin Aladyshev <aladyshev22@gmail.com> wrote:

Hello!
What is the best way to handle files encoded in UTF8?
Konstantin,

You need to deserialize the UTF-8 to Unicode and then serialize that to UCS-2 (CHAR16).

The Terminal driver has support for UTF-8 terminals so you can probably leverage some of that code [1]. The Terminal code is probably a little more complex than you need since it has to deal with the data coming in a byte at a time over serial, but the conversion logic is what you are looking for.

[1] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Console/TerminalDxe/Vtutf8.c#L19

Thanks,

Andrew Fish

I'm looking for ways to read strings from such files, print these
strings or compare them to my own CHAR16* strings.

For example if I have read a UTF8 string to a buffer via `ShellReadFile` call:
EFI_STATUS
EFIAPI
ShellReadFile(
IN SHELL_FILE_HANDLE FileHandle,
IN OUT UINTN *ReadSize,
OUT VOID *Buffer
);
How to print this string? Print function has only options for ASCII
(%a) or UTF16 (%s) strings.

Best regards,
Konstantin Aladyshev




Join discuss@edk2.groups.io to automatically receive all group messages.