Important note:
This process is generally used to recover a primary headnode from a failure state (filesystem corruption for example). This process doesn’t replace a good backup regime.
If you intend to use this process to recover a primary headnode, we recommend contacting Bright Support first so we may assess your cluster.
Yes, it is indeed possible to reclone the primary headnode from the secondary headnode.
The overview of the process is as follows:
It is assumed, for the purpose of this article, that headnode01 has experienced a failure and is in primary passive mode.
- Secondary headnode (headnode02) is in the active state. This is an important step. This can be checked with the “cmha status” command, or “cat /var/spool/cmd/state”.
- PXEBoot the primary passive headnode (headnode01) off the secondary active headnode (headnode02) and select “rescue” mode at the grub prompt.
- On the primary passive headnode (headnode01) while booted into rescue mode, run the clone command at the prompt. /cm/cm-clone-install –clone –hostname=headnode01.
- This will now clone headnode02 -> headnode01.
- After entering the interface and running through the steps, you may receive the error “unable to contact master”. This is mainly because the passive master is trying to contact itself, as this is how the regular cloning process works.
- Modify /etc/hosts on headnode01 booted in the rescue mode and point the “master” record to headnode02. For example (replace 10.X.X.X with internalnet IP of headnode02).
10.X.X.X master.cm.cluster master localmaster.cm.cluster localmaster ldapserver.cm.cluster ldapserver
- Rerun clone process.
- Reboot the primary passive headnode.
At this point, these steps may or may not be required and they can vary from cluster to cluster.
- Modify the /etc/my.cnf on headnode01 so it matches headnode02, except the “mysql-id” parameter needs to be 1 on headnode01 and 2 on headnode02.
- Restart mariadb / MySQL service on headnode01.
- On headnode02 run the “cmha dbreclone headnode01” command.
If the above dbreclone process fails, you may need to do the following:
- Stop cmd on both head nodes
- Stop mariadb / mysqld on both head nodes
- Create a backup of “/var/lib/mysql/mysql” on headnode02
- Copy “/var/lib/mysql/mysql” from headnode02 to headnode01
- Check permissions on /var/lib/mysql are correct on headnode01
- Start mariadb / mysqld on both head nodes
- Check mysql connectivity again
- cmha dbreclone
At this point check the cluster status on headnode02 with cmha status. It should show headnode01 as up and OK.
Congratulations, you have brought the primary headnode back online 🙂