TFTP failing in tunneled network


Georg Pfuetzenreuter
 

Hello,

I have two KVM hypervisors:
Hypervisor A - Internal VM network
Hypervisor B - Internal VM network bridged to Hypervisor A's internal VM network through Open vSwitch GRE tunnel (VXLAN attempted as well)

On Hypervisor A, VM's provide DHCP and TFTP services for PXE boot.
VM's on Hypervisor A are able to successfully boot PXE in EFI mode.

VM's on Hypervisor B however, confirm receiving an IP address, but eventually time out with `PXE-E18: Server response timeout.`.

What is interesting, is that this seems to be specific to EFI's TFTP implementation.
If I boot a ready-installed OS on the same VM on Hypervisor B, I receive an IP address using DHCP, and I am able to utilize regular *nix TFTP clients to download my boot file manually.

Using tcpdump it looks like the boot file is successfully REQuested, but never acknowledged as having succeeded. The boot file seems to instead be transferred over and over again, until it eventually times out, printing "User aborted the transfer" in the hex output.

In the log of the TFTP servers (I attempted tftp-hpa as well as OpenBSD's tftpd), only a REQ shows up, never a success message.

My attachment "tcpdump_failed.txt" is the tcpdump of an attempted PXE boot on said VM on Hypervisor B.

For comparison, "tcpdump_ok.txt" is the tcpdump of a successful PXE boot on a VM on Hypervisor A.

I then attempted to initiate a manual TFTP request using the TFTP client in the EFI shell:

Shell> tftp 172.16.25.2 grub.efi
Unable to get the size of the file 'grub.efi' on 'eth0' - No mapping

Specifying the blocksize does not seem to help:

Shell> tftp -S 2424 172.16.25.2 grub.efi
Downloading the file 'grub.efi'
tftp: Cannot open file - 'grub.efi'
Unable to download the file 'grub.efi' on 'eth0' - Not Found

The IP address configuration seems to be fine (it applied the DHCP configuration it received during the PXE process):

Shell> ifconfig -l eth0
name : eth0
Media State : Media present
policy : dhcp
mac addr : 52:54:00:C5:1A:63
ipv4 address : 172.16.25.16
subnet mask : 255.255.255.19
default gateway: 172.16.25.1
Routes (2 entries):
Entry[0]
Subnet : 172.16.25.0
Netmask: 255.255.255.192
Gateway: 0.0.0.0
Entry[1]
Subnet : 0.0.0.0
Netmask: 0.0.0.0
Gateway: 172.16.25.1
DNS server :
172.16.25.1

I additionally attempted:
- DHCP server with and without Option 13 (boot file size)
- TFTP server with and without "max block size" option specified
- TFTP server with timeout infinitely increased
- TFTP server with retransmit infinitely increased
- Compiling latest OVMF from Git master branch (pc-q35-5.2)

I want to emphasize again, that I am able to successfully download the same file using UNIX based TFTP clients on the same VM from an installed OS - with, and without specifying a blocksize.

There is no routing and there are no firewalls in between. The bridge is acting like a Layer 2 switch and other network applications communicate over it with no issues. The VM's network interfaces are `virtio`.

I apologize for the long email but would appreciate any hints on what the EFI TFTP implementation is doing different, stopping it from working in networks a regular TFTP client is able to work flawlessly in.

Thanks for reading!
Best
Georg