Re: Windows 2019 VM fails to boot from vhost-scsi with UEFI mode

annie li

On 6/3/2020 9:33 AM, Laszlo Ersek wrote:
On 06/03/20 00:19, annie li wrote:
On 6/2/2020 7:44 AM, Laszlo Ersek wrote:
On 05/29/20 16:47, annie li wrote:

I ran more tests, and found booting failure happens randomly when I
boot the VM right after it was previously terminated by Ctrl+C
directly from QEMU monitor, no matter the max_sectors is 2048, 16383
or 16384. The failing chance is about 7 out of 20.

So my previous statement about 0x4000 and 0x3FFF isn't accurate. It is
just that booting happened to succeed with 0x3FFF(16383 ), but not
with 0x4000(16384).

Also, when this failure happens, dmesg doesn't print out following
errors, vhost_scsi_calc_sgls: requested sgl_count: 2368 exceeds
pre-allocated max_sgls: 2048

This new failure is totally different issue from the one caused by max
sized I/O. For my debug log of OVMF, the biggest I/O size is only
about 1M. This means Windows 2019 didn't send out big sized I/O out

The interesting part is that I didn't see this new failure happen if I
boot the VM which was previously shutdown gracefully from inside
Windows guest.
Can you build the host kernel with "CONFIG_VHOST_SCSI=m", and repeat
your Ctrl-C test such that you remove and re-insert "vhost_scsi.ko"
after every Ctrl-C?
I am using targetcli to create SCSI lun that the VM boots from. The
module gets loaded right after I create target in /vhost. However, I
cannot remove
vhost_scsi module since then. It always complains " Module vhost_scsi is
in use"
(same even after I delete target in targetcli).
Maybe it is related to targetcli, but I didn't try other tools yet.
Can you check with "lsmod" if other modules use vhost_scsi?
lsmod shows vhost_scsi is used by 4 programs, I assume these 4 are related
to targetcli.
lsmod |grep vhost_scsi
vhost_scsi             36864  4
vhost                      53248  1 vhost_scsi
target_core_mod       380928  14 target_core_file,target_core_iblock,iscsi_target_mod,vhost_scsi,target_core_pscsi,target_core_user

I was thinking maybe these target_* modules are using vhost_scsi, then removed
following modules by modprobe -r,
then lsmod shows "used by" down to 3 programs,
vhost_scsi             36864  3
vhost                  53248  1 vhost_scsi
target_core_mod       380928  6 iscsi_target_mod,vhost_scsi
However, others can not be removed. "rmmod --force" doesn't help either.
"dmesg |grep vhost_scsi" doesn't show much useful information either.

If you shut down QEMU gracefully, can you rmmod vhost_scsi in that case?
No, I cannot rmmod these modules right after I create target in targetcli, no matter
whether I start a VM or not. Deleting the target in targetcli doesn't help either.
Before I create target in targetcli, I can add and remove vhost_scsi module. The
"used by" of vhost_scsi is 0.
See following steps I did right after I reboot my host,
# modprobe vhost_scsi
# lsmod |grep vhost
vhost_scsi             36864  0
vhost                  53248  1 vhost_scsi
target_core_mod       380928  1 vhost_scsi
# modprobe -r vhost_scsi
# lsmod |grep vhost
Right after I setup luns in targetcli, the "used by" is always 4 no matter I stop the VM
by "CTRL-C" or graceful shutdown, no matter the VM is running or not. So targetcli
is the suspect of these 4 "used by".


I wonder if the failure to remove the vhost_scsi module is actually
another sign of the same (as yet unknown) leaked reference.


My guess is that, when you kill QEMU with Ctrl-C, "vhost_scsi.ko" might
not clean up something, and that could break the next guest boot. If you
re-insert "vhost_scsi.ko" for each QEMU launch, and that ends up masking
the symptom, then there's likely some resource leak in "vhost_scsi.ko".
Nods, it is possible.


Just a guess.


Join to automatically receive all group messages.