RestoringCloudFrontendFromBackup
Context
In July 2024, we have lost our OpenNebula frontend VM (cloud2) after an attempt to reboot it. It was hosted on domz02, a basic QEMU/KVM standalone hypervizor managed with libvirt. The problem seemed to be that the system was not able to find the partition table in the qcow2 image. As there was no backup of the OpenNebula data and config files, the only solution to recover these things was to attach the image to a new VM and to recreate the partition table in the mounted image using the tool gpart. It eventually worked, but things would have far more easier if we had a simple backup of the directories containing the important ONE data and configuration files of our cloud system. And above all, the procedure we followed to restore the machine was sketched in a situation of emergency with no warranty that it would succeed.
Data and configuration items to backup
Here is a list of the important files/directories to backup:
- /var/lib/one
- /var/lib/mysql
- /etc/one
- /etc/my.cnf
- /etc/my.cnf.d
Procedure to restore the backup
On the standalone hypervizor, create a new VM with same hardware characteristics as the previous (mac address, disk size, memory, cpu,...). The easiest way is to copy-paste the xml of the previous. Be careful that libvirt will complain about the fact that you reuse the mac address. The solution is simple: remove the nic of the previous VM to free the mac address. Here are some commands that might be useful for this step:
virsh list virsh edit <machine_name> virsh create <xml_description_of_vm>
Of course, you can also use the GUI virt-manager for most tasks.
Once the VM is running, you'll have to reinstall the frontend on it with Quattor and Puppet. However, the VM must initially be reinstalled with machine-type 'puppet_node' and Puppet app set to 'servers' and role to 'none'. Why? Because if you directly reinstall the VM as frontend, the initizations scripts that come with the ONE packages and also with Puppet will generate some new settings that might be tricky to overwrite with the backup. So to say, in the beginning of the restore process, the machine must be a vanilla one.