Re: Windows 2019 VM fails to boot from vhost-scsi with UEFI mode


annie li
 

On 5/27/2020 2:00 PM, Laszlo Ersek wrote:
On 05/27/20 17:58, annie li wrote:
Hi Laszlo,

(I sent out my reply to your original response twice, but my reply
somehow doesn't show up in https://edk2.groups.io/g/discuss. It is
confusing.
Apologies for that -- while I'm one of the moderators on edk2-devel (I
get moderation notifications with the other mods, and we distribute the
mod workload the best we can), I'm not one of the edk2-discuss mods.

Hmm, wait a sec -- it seems like I am? And I just don't get mod
notifications for edk2-discuss? Let me poke around in the settings :/

edk2-devel:

- Spam Control
- Messages are not moderated
- New Members moderated
- Unmoderate after 1 approved message
- Message Policies
- Allow Nonmembers to post (messages from nonmembers will be moderated
instead of rejected)

edk2-discuss:

- Spam Control
- Messages are not moderated
- New Members ARE NOT moderated
- Message Policies
- Allow Nonmembers to post (messages from nonmembers will be moderated
instead of rejected)

So I think the bug in our configuration is that nonmembers are moderated
on edk2-discuss just the same (because of the identical "Allow
Nonmembers to post" setting), *however*, mods don't get notified because
of the "New Members ARE NOT moderated" setting.

So let me tweak this -- I'm setting the same

- Spam Control
- New Members moderated
- Unmoderate after 1 approved message

for edk2-discuss as we have on edk2-devel, *plus* I'm removing the
following from the edk2-discuss list description: "Basically
unmoderated". (I mean I totally agree that it *should* be unmoderated,
but fully open posting doesn't seem possible on groups.io at all!)
Thanks for addressing it.
My another email sent out yesterday didn't reach to edk2-discuss.
I joined this group and hope the email can show up this time.
See my following comments.
Anyway, re-sending it here, hope you can get it...)
Thanks -- in case you CC me personally in addition to messaging the list
(which is the common "best practice" for mailing lists), then I'll
surely get it.

Following up below:

On 5/27/2020 7:43 AM, Laszlo Ersek wrote:
(2) Regardig "max_sectors", the spec says:

max_sectors is a hint to the driver about the maximum transfer
size to use.

OvmfPkg/VirtioScsiDxe honors and exposes this field to higher level
protocols, as follows:

(2.1) in VirtioScsiInit(), the field is read and saved. It is also
checked to be at least 2 (due to the division quoted in the next
bullet).

(2.2) PopulateRequest() contains the following logic:

//
// Catch oversized requests eagerly. If this condition evaluates to
false,
// then the combined size of a bidirectional request will not
exceed the
// virtio-scsi device's transfer limit either.
//
if (ALIGN_VALUE (Packet->OutTransferLength, 512) / 512
> Dev->MaxSectors / 2 ||
ALIGN_VALUE (Packet->InTransferLength, 512) / 512
> Dev->MaxSectors / 2) {
Packet->InTransferLength = (Dev->MaxSectors / 2) * 512;
Packet->OutTransferLength = (Dev->MaxSectors / 2) * 512;
Packet->HostAdapterStatus =

EFI_EXT_SCSI_STATUS_HOST_ADAPTER_DATA_OVERRUN_UNDERRUN;
Packet->TargetStatus = EFI_EXT_SCSI_STATUS_TARGET_GOOD;
Packet->SenseDataLength = 0;
return EFI_BAD_BUFFER_SIZE;
}

That is, VirtioScsiDxe only lets such requests reach the device that
do not exceed *half* of "max_sectors" *per direction*. Meaning that,
for uni-directional requests, the check is stricter than
"max_sectors" requires, and for bi-directional requests, it is
exactly as safe as "max_sectors" requires. (VirtioScsiDxe will indeed
refuse to drive a device that has just 1 in "max_sectors", per (2.1),
but that's not a *practical* limitation, I would say.)

(2.3) When the above EFI_BAD_BUFFER_SIZE branch is taken, the maximum
transfer sizes that the device supports are exposed to the caller
(per direction), in accordance with the UEFI spec.

(2.4) The ScsiDiskRead10(), ScsiDiskWrite10(), ScsiDiskRead16(),
ScsiDiskWrite16() functions in
"MdeModulePkg/Bus/Scsi/ScsiDiskDxe/ScsiDisk.c" set the "NeedRetry"
output param to TRUE upon seeing EFI_BAD_BUFFER_SIZE.
Thanks for the detailed explanation, it is very helpful.
I recently added more log in
MdeModulePkg/Bus/Scsi/ScsiDiskDxe/ScsiDisk.c that
has maximum setting related to MAX SCSI I/O size.

For example, In Read(10) command, the MaxBlock is 0xFFFF, and the
BlockSize is 0x200.
So the max ByteCount is 0xFFFF*0x200 = 0x1FFFE00(32M).
After setting MaxBlock as 0x4000 to limit the max ByteCount to 8M,
Windows 2019 can boot up from vhost-scsi in my local environment.
Looks this 32M setting in ScsiDiskDxe is consistent with the one you
mentioned
in following (3.2) in QEMU?
Yes, that's possible -- maybe the caller starts with an even larger
transfer size, and then the EFI_BAD_BUFFER_SIZE logic is already at
work, but it only reduces the transfer size to 32MB (per "max_sectors"
from QEMU). And then all the protocols expect that to succeed, and when
it fails, the failure is propagated to the outermost caller.

(4) Annie: can you try launching QEMU with the following flag:

-global vhost-scsi-pci.max_sectors=2048
This limits the I/O size to 1M. The EFI_BAD_BUFFER_SIZE logic reduces
I/O size to 512K for uni-directional requests.
To send biggest I/O(8M) allowed by current vhost-scsi setting, I adjust the
value to 0x3FFF. The EFI_BAD_BUFFER_SIZE logic reduces I/O size to 4M
for uni-directional requests.
   -global vhost-scsi-pci.max_sectors=0x3FFF
0x4000 doesn't survive here.

If that works, then I *guess* the kernel-side vhost device model
could interrogate the virtio-scsi config space for "max_sectors", and
use the value seen there in place of PREALLOC_SGLS /
PREALLOC_PROT_SGLS.
You mean the vhost device on the guest side here, right? In Windows
virtio-scsi driver, it does read out max_sectors. Even though the driver
doesn't take use of it later, it can be used to adjust the transfer length
of I/O.

I guess you are not mentioning the vhost-scsi on the host?
Both VHOST_SCSI_PREALLOC_SGLS(2048) and
TCM_VHOST_PREALLOC_PROT_SGLS(512) are hard coded in vhost/scsi.c.
...
sgl_count = vhost_scsi_calc_sgls(prot_iter, prot_bytes,
TCM_VHOST_PREALLOC_PROT_SGLS);
....
sgl_count = vhost_scsi_calc_sgls(data_iter, data_bytes,
VHOST_SCSI_PREALLOC_SGLS);


In vhost_scsi_calc_sgls, error is printed out if sgl_count is more than
TCM_VHOST_PREALLOC_PROT_SGLS or VHOST_SCSI_PREALLOC_SGLS.

    sgl_count = iov_iter_npages(iter, 0xffff);
    if (sgl_count > max_sgls) {
        pr_err("%s: requested sgl_count: %d exceeds pre-allocated"
               " max_sgls: %d\n", __func__, sgl_count, max_sgls);
        return -EINVAL;

    }
Looks like vhost-scsi doesn't interrogate the virtio-scsi config space for
"max_sectors".

Although Win2019 boots from vhost-scsi with above flag, I assume we still
need to enlarge the value of VHOST_SCSI_PREALLOC_SGLS in vhost-scsi for
final fix instead of setting max_sectors through QEMU options?
Adding specific QEMU command option for booting Win2019 from vhost-scsi
seems not appropriate.
Suggestions?

Thanks
Annie
Cool!

I can boot Win2019 VM up from vhost-scsi with the flag above.
Thank you for confirming!

Laszlo

Join discuss@edk2.groups.io to automatically receive all group messages.