Virsh Cloning VMs — DAD — IPv6 Duplicate Address Detected(dadfailed errors) — CentOS/RHEL8
So I was doing a little experiment to spin up few VM”s for the purpose of doing some docker/kubernetes experiments; as a first step created a virtual VM using virsh (called rhel8a) and used that as a template to spin up the rest of the VM’s — rhel8[b,c,d].
Brought all of them up and it took considerably long time to do ssh! So something was going on within the VM’s.
So intial step was to create rhel8a and use that to clone it to create the rest of the fleet.
# Create a vdisk, followed by the VM installation.virsh vol-create-as sdd00 rhel8a.img 15G
virsh start rhel8a — console#Use the GUI with standard installation process and once created, check the domain.[root@panchajanya ~]# virsh list --all|grep rhel8a
7 rhel8a running# Run the domifaddr to identify the IP Address of the machine.
[root@panchajanya ~]# virsh domifaddr rhel8a
Name MAC address Protocol Address
----------------------------------------------------
vnet0 52:54:00:56:0a:a2 ipv4 192.168.122.20/24
- - ipv6 2001:db8:ca2:2:1::4b/64
I wanted to set some static IP’s and may be being finicky tried to align the MAC addresses too, so it’s easy to remember but the main point here in the network is the DHCP defined for IPv6.
[root@panchajanya ~]# virsh net-dumpxml default
<network connections=’4'>
<name>default</name>
<uuid>e4c0c5d7-d44f-4596-a7b0-eba6ec22756a</uuid>
<forward mode=’nat’>
<nat>
<port start=’1024' end=’65535'/>
</nat>
</forward>
<bridge name=’virbr0' stp=’on’ delay=’0'/>
<mac address=’52:54:00:5c:61:e1'/>
<ip address=’192.168.122.1' netmask=’255.255.255.0'>
<dhcp>
<range start=’192.168.122.2' end=’192.168.122.254'/>
<host mac=’52:54:00:56:0a:a2' name=’rhel8a’ ip=’192.168.122.20'/>
<host mac=’52:54:00:56:0a:b3' name=’rhel8b’ ip=’192.168.122.30'/>
<host mac=’52:54:00:56:0a:c4' name=’rhel8c’ ip=’192.168.122.40'/>
<host mac=’52:54:00:56:0a:d5' name=’rhel8d’ ip=’192.168.122.50'/>
</dhcp>
</ip>
<ip family=’ipv6' address=’2001:db8:ca2:2::1' prefix=’64'>
<dhcp>
<range start=’2001:db8:ca2:2:1::10' end=’2001:db8:ca2:2:1::ff’/>
</dhcp>
</ip>
</network>
The network was edited/destroyed and re-started to ensure the entries are active as below and triggered the cloning to create the rest of the VMs
#Reload the static IP/mac entries that was just addedvirsh net-destroy default
virsh net-start default#Clone, destroy, update IP/MAC & Start the VM.
virt-clone — original rhel8a — name rhel8b — file /data/sdd00/rhel8b.img#Destroy the VM and edit the xml to update the MAC and IP addresses as defined in the network.xml above.
virsh destroy rhel8b
virsh edit rhel8b #Search for network and update both MAC and IP
virsh start rhel8b#Update the hostname
hostnamectl set-hostname rhel8b#Repeat all the above steps to create the rest of VM's[root@panchajanya ~]# virsh list --all |grep rhel8
7 rhel8a running
8 rhel8b running
9 rhel8c running
10 rhel8d running
So all these machines were created using cloning process, I had the IPv4 updated in host files /etc/hosts and was able to ssh into each of them — but it took quite some time for ssh to work.
[root@rhel8a ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00inet 127.0.0.1/8 scope host lovalid_lft forever preferred_lft foreverinet6 ::1/128 scope hostvalid_lft forever preferred_lft forever2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000link/ether 52:54:00:56:0a:a2 brd ff:ff:ff:ff:ff:ffinet 192.168.122.20/24 brd 192.168.122.255 scope global dynamic noprefixroute enp1s0valid_lft 3111sec preferred_lft 3111secinet6 fe80::7c8c:5c26:f8da:7a24/64 scope link dadfailed tentative noprefixroutevalid_lft forever preferred_lft foreverinet6 fe80::680e:a987:4db8:92c4/64 scope link noprefixroutevalid_lft forever preferred_lft forever
So fe80:: prefixes are LLA’s or otherwise called as link-local addresses — in other words local to the link and non routable and always created by the host operating system . In my case it was CentOS8
Looking deeper into dmesg , I found additional logs as below
dmesg ..< output trimmed>...[ 11.105231] snd_hda_codec_generic hdaudioC0D0: Line=0x5[ 11.534145] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)[ 200.005779] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready[ 202.594616] IPv6: enp1s0: IPv6 duplicate address fe80::7c8c:5c26:f8da:7a24 used by 52:54:00:56:0a:b3 detected![ 202.778596] IPv6: enp1s0: IPv6 duplicate address fe80::680e:a987:4db8:92c4 used by 52:54:00:56:0a:c4 detected![ 203.169845] IPv6: enp1s0: IPv6 duplicate address fe80::7c8c:5c26:f8da:7a24 used by 52:54:00:56:0a:b3 detected![ 203.745747] IPv6: enp1s0: IPv6 duplicate address fe80::7c8c:5c26:f8da:7a24 used by 52:54:00:56:0a:b3 detected![ 203.745775] IPv6: enp1s0: IPv6 duplicate address fe80::3072:1a67:bf57:fffa used by 52:54:00:56:0a:d5 detected![ 203.938659] IPv6: enp1s0: IPv6 duplicate address fe80::680e:a987:4db8:92c4 used by 52:54:00:56:0a:c4 detected!
So there was something going on with regards to all IPv6 IP’s and it looks like the IPv6 MAC generation was tied to the machine-id’s of the VM’s and since the environment was cloned — the machine ID’s of all servers were same and hence while systemd is trying to generate the unique MAC address — it resulted in generating the same and finally with the IPv6 neighbor discovery mechanism it finds that same MAC is appearing in the local link and hence the DAD failure.
[root@rhel8a ~]# cat /etc/machine-ide802042b1af441919ce59bbf63928302[root@rhel8c ~]# cat /etc/machine-ide802042b1af441919ce59bbf63928302
So Centos/RHEL relies on the machine-id’s for generating the LLA’s and apparently when I cloned the machine all the VM’s got the same machine id and the link-local devices while booting up these machines generates the exact same MAC addresses and during the IPv6 neighbor hood discovery each machine finds that the MAC is used and tries to resolve by generating a new one. Since the neighborhood discovery and eventually setting up a new link-local address was taking time, it added up the boot time for these machines.
Solution was to reset the machine-id on each of these VM’s so it’s unique. One more thing that had to be done along with changing hostnames when you do cloning of virtual machines.
#Change permissions and add rw
chmod ugo+w /etc/machine-id
cat /dev/null > /etc/machine-id#Command to generate a new machine ID and reset the permissions back
systemd-machine-id-setup
chmod ugo-w /etc/machine-id
After performing the above on all the 4 servers and rebooting the servers the issue was resolved.
[root@panchajanya ~]# virsh net-dhcp-leases default | grep -v Expiry | sort -k5--------------------------------------------------------------------2021-05-26 15:23:41 52:54:00:56:0a:a2 ipv4 192.168.122.20/24 rhel8a 01:52:54:00:56:0a:a22021-05-26 15:18:11 52:54:00:56:0a:b3 ipv4 192.168.122.30/24 rhel8b 01:52:54:00:56:0a:b32021-05-26 15:18:15 52:54:00:56:0a:c4 ipv4 192.168.122.40/24 rhel8c 01:52:54:00:56:0a:c42021-05-26 15:18:17 52:54:00:56:0a:d5 ipv4 192.168.122.50/24 rhel8d 01:52:54:00:56:0a:d52021-05-26 15:18:17 52:54:00:56:0a:c4 ipv6 2001:db8:ca2:2:1::10/64 rhel8c 00:04:7d:a4:1e:e8:b3:24:61:bb:69:cf:0e:a7:2f:fd:7b:462021-05-26 15:18:18 52:54:00:56:0a:d5 ipv6 2001:db8:ca2:2:1::21/64 rhel8d 00:04:71:34:56:bf:ed:66:23:71:d8:19:e5:6b:18:73:22:e22021-05-26 15:23:43 52:54:00:56:0a:a2 ipv6 2001:db8:ca2:2:1::4b/64 rhel8a 00:04:52:22:63:aa:67:7c:98:85:c2:df:18:03:95:d5:cd:0b2021-05-26 15:20:14 52:54:00:56:0a:a2 ipv6 2001:db8:ca2:2:1::87/64 rhel8a 00:04:c4:c9:d8:fa:77:eb:71:7c:6a:91:e7:15:30:23:9e:3b[root@rhel8a ~]# ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
inet6 2001:db8:ca2:2:1::4b/128 scope global dynamic noprefixroute
valid_lft 2676sec preferred_lft 2676sec
inet6 fe80::f07a:9952:46c1:9d32/64 scope link noprefixroute
valid_lft forever preferred_lft forever
These are little scratch notes that I took while triaging the issue, not a detailed blog ;-) ..