回复: [edk2-rfc] RFC: EXT4 filesystem driver


gaoliming
 

-----邮件原件-----
发件人: rfc@edk2.groups.io <rfc@edk2.groups.io> 代表 Pedro Falcato
发送时间: 2021年7月22日 7:12
收件人: devel@edk2.groups.io
抄送: rfc@edk2.groups.io
主题: [edk2-rfc] RFC: EXT4 filesystem driver

EXT4 (fourth extended filesystem) is a filesystem developed for Linux
that has been in wide use (desktops, servers, smartphones) since 2008.

The Ext4Pkg implements the Simple File System Protocol for a partition
that is formatted with the EXT4 file system. This allows UEFI Drivers,
UEFI Applications, UEFI OS Loaders, and the UEFI Shell to access files
on an EXT4 partition and supports booting a UEFI OS Loader from an
EXT4 partition.
This project is one of the TianoCore Google Summer of Code projects.

Right now, Ext4Pkg only contains a single member, Ext4Dxe, which is a
UEFI driver that consumes Block I/O, Disk I/O and (optionally) Disk
I/O 2 Protocols, and produces the Simple File System protocol. It
allows mounting ext4 filesystems exclusively.

Brief overhead of EXT4:
Layout of an EXT2/3/4 filesystem:
(note: this driver has been developed using
https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html as
documentation).

An ext2/3/4 filesystem (here on out referred to as simply an ext4 filesystem,
due to the similarities) is composed of various concepts:

1) Superblock
The superblock is the structure near (1024 bytes offset from the start)
the start of the partition, and describes the filesystem in general.
Here, we get to know the size of the filesystem's blocks, which features
it supports or not, whether it's been cleanly unmounted, how many blocks
we have, etc.

2) Block groups
EXT4 filesystems are divided into block groups, and each block group covers
s_blocks_per_group(8 * Block Size) blocks. Each block group has an
associated block group descriptor; these are present directly after the
superblock. Each block group descriptor contains the location of the
inode table, and the inode and block bitmaps (note these bitmaps are only
a block long, which gets us the 8 * Block Size formula covered previously).

3) Blocks
The ext4 filesystem is divided into blocks, of size s_log_block_size ^ 1024.
Blocks can be allocated using individual block groups's bitmaps. Note
that block 0 is invalid and its presence on extents/block tables means
it's part of a file hole, and that particular location must be read as
a block full of zeros.

4) Inodes
The ext4 filesystem divides files/directories into inodes (originally
index nodes). Each file/socket/symlink/directory/etc (here on out referred
to as a file, since there is no distinction under the ext4 filesystem) is
stored as a /nameless/ inode, that is stored in some block group's inode
table. Each inode has s_inode_size size (or GOOD_OLD_INODE_SIZE if it's
an old filesystem), and holds various metadata about the file. Since the
largest inode structure right now is ~160 bytes, the rest of the inode
contains inline extended attributes. Inodes' data is stored using either
data blocks (under ext2/3) or extents (under ext4).

5) Extents
Ext4 inodes store data in extents. These let N contiguous logical blocks
that are represented by N contiguous physical blocks be represented by a
single extent structure, which minimizes filesystem metadata bloat and
speeds up block mapping (particularly due to the fact that high-quality
ext4 implementations like linux's try /really/ hard to make the file
contiguous, so it's common to have files with almost 0 fragmentation).
Inodes that use extents store them in a tree, and the top of the tree
is stored on i_data. The tree's leaves always start with an
EXT4_EXTENT_HEADER and contain EXT4_EXTENT_INDEX on eh_depth != 0
and
EXT4_EXTENT on eh_depth = 0; these entries are always sorted by logical
block.

6) Directories
Ext4 directories are files that store name -> inode mappings for the
logical directory; this is where files get their names, which means ext4
inodes do not themselves have names, since they can be linked (present)
multiple times with different names. Directories can store entries in two
different ways:
1) Classical linear directories: They store entries as a mostly-linked
mostly-list of EXT4_DIR_ENTRY.
2) Hash tree directories: These are used for larger directories, with
hundreds of entries, and are designed in a backwards-compatible way.
These are not yet implemented in the Ext4Dxe driver.

7) Journal
Ext3/4 filesystems have a journal to help protect the filesystem against
system crashes. This is not yet implemented in Ext4Dxe but is described
in detail in the Linux kernel's documentation.

The EDK2 implementation of ext4 is based only on the public documentation
available at
https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html
and
the FreeBSD ext2fs driver (available at
https://github.com/freebsd/freebsd-src/tree/main/sys/fs/ext2fs,
BSD-2-Clause-FreeBSD licensed). It is licensed as
SPDX-License-Identifier: BSD-2-Clause-Patent.

After a brief discussion with the community, the proposed package
location is edk2-platform/Features/Ext4Pkg
(relevant discussion: https://edk2.groups.io/g/devel/topic/83060185).

I was the main contributor and I would like to maintain the package in
the future, if possible.

Current limitations:
1) The Ext4Dxe driver is, at the moment, read-only.
2) The Ext4Dxe driver at the moment cannot mount older (ext2/3)
filesystems. Ensuring compatibility with
those may not be a bad idea.

I intend to test the package using the UEFI SCTs present in edk2-test,
and implement any other needed unit tests myself using the already
available unit test framework. I also intend to (privately) fuzz the
UEFI driver with bad/unusual disk images, to improve the security and
reliability of the driver.

In the future, ext4 write support should be added so edk2 has a
fully-featured RW ext4 driver. There could also be a focus on
supporting the older ext4-like filesystems, as I mentioned in the
limitations, but that is open for discussion.

The driver's handling of unclean unmounting through forced shutdown is
unclear.
Is there a position in edk2 on how to handle such cases? I don't think
FAT32 has a "this filesystem is/was dirty" and even though it seems to
me that stopping a system from booting/opening the partition because
"we may find some tiny irregularities" is not the best course of
action, I can't find a clear answer.

The driver also had to add implementations of CRC32C and CRC16, and
after talking with my mentor we quickly reached the conclusion that
these may be good candidates for inclusion in MdePkg. We also
discussed moving the Ucs2 <-> Utf8 conversion library in RedfishPkg
(BaseUcs2Utf8Lib) into MdePkg as well. Any comments?
Current MdePkg BaseLib has CalculateCrc32(). So, CRC32C and CRC16 can be added into BaseLib.

If more modules need to consume Ucs2 <-> Utf8 conversion library, BaseUcs2Utf8Lib is generic enough
to be placed in MdePkg.

Thanks
Liming

Feel free to ask any questions you may find relevant.

Best Regards,

Pedro Falcato



Join devel@edk2.groups.io to automatically receive all group messages.