Re: Windows 2019 VM fails to boot from vhost-scsi with UEFI mode

annie li

On 5/28/2020 6:08 PM, Laszlo Ersek wrote:
On 05/28/20 18:39, annie li wrote:
On 5/27/2020 2:00 PM, Laszlo Ersek wrote:
(4) Annie: can you try launching QEMU with the following flag:

    -global vhost-scsi-pci.max_sectors=2048
This limits the I/O size to 1M.
Indeed -- as I just pointed out under your other email, I previously
missed that the host kernel-side unit was not "sector" but "4K page". So
yes, the value 2048 above is too strict.

The EFI_BAD_BUFFER_SIZE logic reduces
I/O size to 512K for uni-directional requests.
To send biggest I/O(8M) allowed by current vhost-scsi setting, I adjust the
value to 0x3FFF. The EFI_BAD_BUFFER_SIZE logic reduces I/O size to 4M
for uni-directional requests.
   -global vhost-scsi-pci.max_sectors=0x3FFF

0x4000 doesn't survive here.
That's really interesting.
I'm not sure why that happens.
Then I found out it is related to operations on this VM, see following.
... Is it possible that vhost_scsi_handle_vq() -- in the host kernel --
puts stuff in the scatter-gather list *other* than the transfer buffers?
Some headers and such? Maybe those headers need an extra page.
I ran more tests, and found booting failure happens randomly when I boot the VM
right after it was previously terminated by Ctrl+C directly from QEMU monitor, no
matter the max_sectors is 2048, 16383 or 16384. The failing chance is about 7 out of 20.

So my previous statement about 0x4000 and 0x3FFF isn't accurate.
It is just that booting happened to succeed with 0x3FFF(16383 ), but not with 0x4000(16384).

Also, when this failure happens, dmesg doesn't print out following errors,
vhost_scsi_calc_sgls: requested sgl_count: 2368 exceeds pre-allocated max_sgls: 2048

This new failure is totally different issue from the one caused by max sized I/O. For my
debug log of OVMF, the biggest I/O size is only about 1M. This means Windows 2019
didn't send out big sized I/O out yet.

The interesting part is that I didn't see this new failure happen if I boot the VM which
was previously shutdown gracefully from inside Windows guest.

If that works, then I *guess* the kernel-side vhost device model
could interrogate the virtio-scsi config space for "max_sectors", and
use the value seen there in place of PREALLOC_SGLS /
You mean the vhost device on the guest side here, right? In Windows
virtio-scsi driver, it does read out max_sectors. Even though the driver
doesn't take use of it later, it can be used to adjust the transfer length
of I/O.
With vhost, the virtio-scsi device model is split between QEMU and the
host kernel. While QEMU manages the "max_sectors" property (= accepts it
from the command line, and exposes it to the guest driver), the host
kernel (i.e., the other half of the device model) ignores the same property.

Consequently, although the guest driver obeys "max_sectors" for limiting
the transfer size, the host kernel's constants may prove *stricter* than
that. Because, the host kernel ignores "max_sectors". So one idea is to
make the host kernel honor the "max_sectors" limit that QEMU manages.
This involves both changes in kernel and QEMU. I guess maybe it is more straight
that kernel controls the transfer size based on memory consumed.

The other two ideas are: use larger constants in the kernel, or use a
smaller "max_sectors" default in QEMU.
I prefer to fixing it by using larger constants in the kernel, this also avoid splitting
big sized I/O by using smaller "max_sectors"default in QEMU.
Following is the code change I did in the kernel code vhost/scsi.c,


The goal behind all three alternatives is the same: the limit that QEMU
exposes to the guest driver should satisfy the host kernel.


Join to automatically receive all group messages.