MigrationToOpenNebula
Some information about the project itself
Start of the project end of February 2015. HA ZFS fileserver "volta" recently put in service. It is dedicated to storage and sharing of VM disk images.
Some facts about the situation of the hypervisors before the migration
Heterogeneous farm of KVM vHosts : - different OS's and KVM versions; - different hardware (disk capacities, number of NICs...); - vHosts are not connected to the same networks. The migration will be an opportunity to reinstall the vHosts with the same OS, and to change the resources (CPU and RAM) allocated to the VMs (most of the time, we were too generous when giving resources to the VMs).
Installation procedure
Preparation of the NFS fileserver
- The pool "tank" is NFS shared with no-root-squash and rw for the vHosts
- Inside the pool "tank", several shared directories were created :
- one : to share the cloud config between the front-end and the nodes
- disk_images : to store the disk images of VMs
Reinstallation of all the vHosts
Goal is to install CentOS 7.0 on all vHosts.
Problem met on some Dell machines (like dom08) during the partitioning step due to Grub2 in MBR mode : https://bugzilla.redhat.com/show_bug.cgi?id=986431. Problem is due to the fact that "Dell diagnostic" partition is starting at sector 36, and so there is not enough space between the MBR and the start of the first partition for Grub2 to put its core.img file. The solution is simply to remove the two Dell partitions.
Network and firewall
We don't disable iptables, but we add a rule to allow input traffic from other servers in the private range :
-A INPUT -p tcp -m iprange --src-range 192.168.10.1-192.168.10.253 -j ACCEPT
The front-end also needs these two rules :
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9869 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 29876 -j ACCEPT
When configuring the bridges on the hosts, to keep some uniformity, we adopted the following naming rules :
- br0 -> public range (193.58.172.0/25)
- br1 -> private range (192.168.0.0/16)
- br3 -> old public range (193.190.247.0/24)
Not having the same bridge names for each network would be a problem when moving VMs from one host to another !
Installation of OpenNebula
- OpenNebula 4.8
- Documentation followed : here
- Front-end on dom02
Installation of the front-end
- Firewall rules...
- Sharing the cloud config /var/lib/one -> we moved everything on the NFS server "volta" for sharing between the front-end and the nodes
- Datastore also on NFS server "volta"
Troubleshooting the moving
Each time a new vHost is added to the cloud, it is test with the creation of a VM on it. And then, we try to move "live" to other vHosts. If this moving test fails, check the following things :
- Make sure SElinux is really disabled with command "getenforce" -> you need to reboot the machine after having set "disabled" in /etc/sysconfig/selinux.
- Check the firewall rules on each vHost : on each node, you should add a rule to accept traffic from other nodes.
- Libvirtd daemon need a reboot after installation of OpenNebula on otherwise you might get permission problems.
Configuration of OpenNebula
Infrastructure - Hosts
I've added an attribute "PUBLIC_NETWORK_AVAILABLE" whose value can be "old" or "new". This way, when creating VM templates, you can add this expression in the "Scheduling" section :
PUBLIC_NETWORK_AVAILABLE = \"new\" or PUBLIC_NETWORK_AVAILABLE = \"old\"
so that VMs get instantiated in a suitable vHost.
Infrastructure - Virtual Networks
Three different virtual networks were created : "OldPublic_T2B", "Public_T2B" and "Private_T2B". In the "Addresses" section of each network, I've added the addresses already in use in the KVM farm, together with the corresponding MAC addresses. This way, when importing existing VMs in the T2B cloud, you just have to specify the IP in the machine template, and then it gets leased when the template is instantiated.
Procedure to migrate a VM to the T2B cloud
- Shut down the machine.
2. Copy its image to volta:/tank/disk_images (have a look at /etc/fstab on dom02 to see how it is mounted). 3. In Sunstone, import the image : create a new one, provide the path of the original image, and DON'T FORGET TO SELECT THE "Persistent" CHECKBOX ! In the advanced options, set "Device prefix" to "vd" so that the virtio driver is used, and set "Driver" to "raw". 4. In Sunstone, create the machine template... (Storage : select the VM image you've created in the previous step, and to the following : "Device prefix" -> "vd", "Driver" -> "raw", "Cache" -> "none";....)
Status of the migration
18th of March 2015
vHosts in the OpenNebula cloud :
- dom02 and dom11 : connected to new public network
- dom07, dom08, dom09 : connected to the old public network
vHosts that are still to migrate :
- dom04 : connected to 3 networks (old public, new public, and private), and actually, at least one vm on it is using the 3 (egon)
- dom05 : connected to old public and private networks; the vm qrepo_begrid is using a volume group begrid_qrepo_data that is on /dev/sdb1
- dom06 : connected to 3 networks like dom04; it is only hosting cdns02 (private DNS ns10); this machine is also used as a MySQL server for accounting statistics
- dom10 : connected to new public and private networks
25th of March 2015
We have upgraded the whole cloud (front-end and nodes) to OpenNebula 4.12 following this procedure. We faced only one problem after the migration : Sunstone web interface was not reachable. The reason for this was that /etc/one/sunstone-server.conf was replaced during the upgrade, and so the server configuration went back to :
:host: 127.0.0.1
instead of :
:host: 0.0.0.0
Problems I've met
VMs with several NICs
I tried to install with Quattor a VM having two network interfaces (eth0 -> public network, eth1 -> private network). The machine was expected to PXE-boot from the second NIC, but that's not what happened : it first tried to boot from the first NIC, and then it stopped, instead of going on further with the second NIC. The problem was coming from the default behavior of the iPXE rom. In the past, in the KVM farm, the vHosts were using gPXE, whose behavior was correct in case of several NICs. With CentOS 7, gPXE has been replaced by iPXE, a fork of the gPXE project. Solutions :
- You can alter the behavior of a iPXE rom with an embedded menu, as it is explained here. On this page, it is explained how to create iPXE menus. (I've also found this page, but I don't know if it can be used to create roms compatible with KVM.)
2. You can get back to gPXE :
[root@dom06 ~]# scp -r /usr/share/gpxe dom02.wn.iihe.ac.be:/usr/share/ [root@dom02 ~]# cd /usr/share/qemu-kvm/ [root@dom02 qemu-kvm]# rm pxe-* [root@dom02 qemu-kvm]# ln -s ../gpxe/e1000-0x100e.rom pxe-e1000.rom [root@dom02 qemu-kvm]# ln -s ../gpxe/ne.rom pxe-ne2k_pci.rom [root@dom02 qemu-kvm]# ln -s ../gpxe/pcnet32.rom pxe-pcnet.rom [root@dom02 qemu-kvm]# ln -s ../gpxe/rtl8139.rom pxe-rtl8139.rom [root@dom02 qemu-kvm]# ln -s ../gpxe/virtio-net.rom pxe-virtio.rom
If you want to restore iPXE :
pxe-e1000.rom -> ../ipxe/8086100e.rom pxe-ne2k_pci.rom -> ../ipxe/10ec8029.rom pxe-pcnet.rom -> ../ipxe/10222000.rom pxe-rtl8139.rom -> ../ipxe/10ec8139.rom pxe-virtio.rom -> ../ipxe/1af41000.rom