Prerequisites
- The following article was written with Bright Cluster Manager version 10 (
10.23.09
or newer) in mind. - We assume we have shared storage available as a mount in
/cm/shared
, we will create a target directory there for our Etcd backups. - The backup part of this KB article (Section 2) is always a good idea, and it can be executed on a running system, without downtime of Etcd. A snapshot is always made on the Etcd leader, and can be used in the future to restore all Etcd members of the cluster in the case of required disaster recovery.
- The restore part of this KB article (Section 3) should only be followed if your entire Etcd cluster has to be recreated from the backup, and/or downtime is acceptable. If you run a multi-node Etcd cluster, broken members can be replaced or fixed by synchronizing from the remaining working Etcd members. This is often a better approach when possible, and is explained in this KB article: https://kb.brightcomputing.com/knowledge-base/etcd-membership-reconfiguration-in-bright-9-0/
1. Etcd installations
- Bright Kubernetes setups always require an odd number of Etcd nodes. Typically this is one or three nodes.
- Three Etcd nodes are recommended, but single-node Etcd deployments are also possible. If you have a single-node deployment it is worth considering adding two more nodes by following the “Add new nodes” portion of the following KB article: https://kb.brightcomputing.com/knowledge-base/etcd-membership-reconfiguration-in-bright-9-0/
- Etcd members, when installed on Compute Nodes, are marked as datanodes. This prevents Full Provisioning from unintentionally wiping the Etcd database (stored in
/var/lib/etcd
usually). - Etcd stores its data in
/var/lib/etcd
by default, which it calls the spool directory.
Etcd::Host
role via cmsh
. Please note that in this example the Kubernetes cluster’s label is default
. The name of the Configuration Overlay is different for each Kubernetes cluster managed by BCM. Below the spool directory was not configured to something different.
[cluster->configurationoverlay[kube-default-etcd]->roles[Etcd::Host]]% get spool /var/lib/etcd
If your spool directory is different, please use that instead of /var/lib/etcd
throughout the rest of the KB article.
2. Create the snapshot
2.1. Check the cluster health
First login to one of the Etcd cluster nodes (any of the devices with the Etcd::Host
role). Load the module file and check for the Health. Please note that we will set an additional environment variable (ETCDCTL_ENDPOINTS
, only relevant for three-member Etcd clusters) to get the output for all endpoints at once. Please note again that in below example the Kubernetes cluster has the label/name default
. This label can be different.
# module load etcd/kube-default/<version>
# ETCDCTL_ENDPOINTS=$(etcdctl member list | awk -F ',' '{print $5}' | sed 's/\s//' | paste -sd ",")
# etcdctl -w table endpoint health
+-----------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+-------------+-------+
| https://10.141.0.2:2379 | true | 15.065402ms | |
| https://10.141.0.1:2379 | true | 22.261303ms | |
| https://10.141.255.254:2379 | true | 15.98154ms | |
+-----------------------------+--------+-------------+-------+
In this example output the Health is good for all endpoints, which should mean we’re good to go for creating a snapshot. If not, please fix the broken node first (see https://kb.brightcomputing.com/knowledge-base/etcd-membership-reconfiguration-in-bright-9-0/)
2.2. Find the Etcd leader
Now we execute another query.
# etcdctl -w table endpoint status
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.141.0.2:2379 | 10cee25dc156ff4a | 3.5.15 | 34 MB | false | false | 21 | 132769 | 132769 | |
| https://10.141.255.254:2379 | 1e8ae2b8f6e1cbd9 | 3.5.15 | 34 MB | false | false | 21 | 132769 | 132769 | |
| https://10.141.0.1:2379 | 4a336cbcb0bafdc0 | 3.5.15 | 35 MB | true | false | 21 | 132769 | 132769 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Here we see that the leader node is the endpoint with the 10.141.0.1
internal IP. This is the internal IP for node001
in this example cluster, we will now switch to that node, or if it happens to be the node we’re already on we don’t have to ssh.
root@??# ssh root@10.141.0.1 # node001
We will have to “reload” the module file, and we will not set the $ETCDCTL_ENDPOINTS
environment variable this time!
root@node001:~# module unload etcd/kube-default/3.5.15
root@node001:~# module load etcd/kube-default/3.5.15
(If we just SSH-ed to node001, we do not have to unload first, the module unload would print an error, which can be ignored)
We can verify we indeed only have the leader as an endpoint to talk to using:
root@node002:~# echo $ETCDCTL_ENDPOINTS
https://10.141.0.2:2379
2.3. Prepare backup location
We will use a directory in /cm/shared
shared storage which is shared among all Etcd hosts in our example cluster.
root@node002:~# mkdir -p /cm/shared/backup/etcd
2.4. Use etcdctl to save the snapshot
We make the snapshot on this Etcd leader node. The snapshot contains the entire Etcd state for all members.
root@node001:~# etcdctl snapshot save "/cm/shared/backup/etcd/etcd-$(hostname)-$(date +"%Y%m%d%H%M%S")"
{"level":"info","ts":"2024-10-17T16:38:09.943338+0200","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/cm/shared/backup/etcd/etcd-node001-20241017163809.part"}
{"level":"info","ts":"2024-10-17T16:38:09.953540+0200","logger":"client","caller":"v3@v3.5.15/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2024-10-17T16:38:09.954000+0200","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://10.141.0.1:2379"}
{"level":"info","ts":"2024-10-17T16:38:10.432409+0200","logger":"client","caller":"v3@v3.5.15/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2024-10-17T16:38:10.705496+0200","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://10.141.0.1:2379","size":"35 MB","took":"now"}
{"level":"info","ts":"2024-10-17T16:38:10.708300+0200","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/cm/shared/backup/etcd/etcd-node001-20241017163809"}
Snapshot saved at /cm/shared/backup/etcd/etcd-node001-20241017163809
As can be seen in the above output, we saved a snapshot to: /cm/shared/backup/etcd/etcd-node001-20241017163809
.
This file can be used to restore all Etcd members when needed, see the next section (Section 3).
3. Restore the snapshot
This section assumes that there is an actual need to restore a snapshot. For example, hardware failure of all Etcd members. If some are still up & running, then replacing, adding or removing members while it remains operational is possibly a better solution. Again, the following KB article might be more appropriate for this case: https://kb.brightcomputing.com/knowledge-base/etcd-membership-reconfiguration-in-bright-9-0/.
If the above KB article cannot save the Etcd cluster, then this section will explain how to restore all Etcd members at once from the snapshot. This will require downtime of Etcd, we don’t want Etcd to corrupt it’s database while we are restoring the snapshot.
3.1. Make sure all Etcd Hosts are UP
Assuming this is a disaster recovery, perhaps one server is completely new, or all servers corrupted or suffered data loss, before we can restore the database, all the nodes need to be up and running. We can check this using cmsh
.
root@bcm:~# cmsh
[bcm]% device
[bcm->device]% list -l etcd::host
Type Hostname (key) MAC Category Ip Network Status
---------------------- ----------------- ------------------ ---------------- --------------- -------------- ----------------
HeadNode bcm FA:16:3E:CB:0F:A1 10.141.255.254 internalnet [ UP ]
PhysicalNode node001 FA:16:3E:47:A8:B6 default 10.141.0.1 internalnet [ UP ]
PhysicalNode node002 FA:16:3E:C9:2D:95 default 10.141.0.2 internalnet [ UP ]
Above we listed all the devices with the etcd::host
role and confirmed their status to be UP
.
3.2. Stop all Etcd services
We can then also use cmsh
for stopping the nodes. Let’s first get the list of hostnames: bcm
, node001
and node002
in this example.
root@bcm:~# cmsh
[bcm]% device
[bcm->device]% roleoverview -v | grep Etcd
Etcd::Host bcm overlay:kube-default-etcd
Etcd::Host node001 overlay:kube-default-etcd
Etcd::Host node002 overlay:kube-default-etcd
We can stop the etcd
service on those nodes using:
[bcm->device]% foreach -n bcm,node001,node002 (services; stop etcd)
Thu Oct 17 16:10:30 2024 [notice] bcm: Service etcd was stopped
Thu Oct 17 16:10:31 2024 [notice] node001: Service etcd was stopped
Thu Oct 17 16:10:32 2024 [notice] node002: Service etcd was stopped
This might take a few seconds. After that, we can want to confirm if the services have indeed stopped:
[bcm->device]% foreach -n bcm,node001,node002 (services; status etcd)
Service Status
------------ -----------
etcd [STOPPED ]
Service Status
------------ -----------
etcd [STOPPED ]
Service Status
------------ -----------
etcd [STOPPED ]
Now we are safe to continue restoring the snapshot, without running processes corrupting the database.
3.3. Recreate the spool dir
We will use pdsh
to reset the spool directory for all Etcd members at once.
# move the spool dir out of the way:
pdsh -w bcm,node001,node002 mv /var/lib/etcd /var/lib/etcd.old
# create new:
pdsh -w bcm,node001,node002mkdir /var/lib/etcd
# fix permissions and ownership:
pdsh -w bcm,node001,node002 chmod 0700 /var/lib/etcd
pdsh -w bcm,node001,node002 chown etcd:etcd /var/lib/etcd
As can be seen we use the same comma-separated list of nodes that we previously determined in section 3.2.
All these pdsh
commands do not give any output, but to verify that they worked correctly, we can use these commands and compare against the expected output:
root@bcm:~# pdsh -w bcm,node001,node002 'ls -al /var/lib/etcd'
node002: total 3
node002: drwx------ 3 etcd etcd 20 Oct 18 03:56 .
node002: drwxr-xr-x 62 root root 4096 Oct 18 03:59 ..
node001: total 3
node001: drwx------ 3 etcd etcd 20 Oct 18 03:57 .
node001: drwxr-xr-x 62 root root 4096 Oct 18 03:59 ..
bcm: total 3
bcm: drwx------ 3 etcd etcd 20 Oct 18 03:57 .
bcm: drwxr-xr-x 79 root root 4096 Oct 18 03:59 ..
The drwx------
(0700
) permissions are important for /var/lib/etcd
, the etcd:etcd
ownership, and the directory is expected to be empty (no member
subdirectory that exists for a populated database).
3.4. Import the snapshot
Again, we will use pdsh
to do it all at once. Please note that unfortunately this is a slightly tedious step. All nodes need the list of all Etcd members, as documented here: https://etcd.io/docs/v3.5/op-guide/recovery/.
Therefor, please edit the below pdsh
command. And replace <etcd-node1-hostname>
, <etcd-node1-ip>
, <etcd-node2-hostname>
, <etcd-node2-ip>
, <etcd-node3-hostname>
, <etcd-node3-ip>
in the below command:
pdsh -w bcm,node001,node002 ". /etc/profile.d/modules.sh; module load etcd && etcdctl snapshot restore --data-dir=/var/lib/etcd --name \$(hostname) --initial-cluster <etcd-node1-hostname>=https://<etcd-node1-ip>:2380,<etcd-node2-hostname>=https://<etcd-node2-ip>:2380,<etcd-node3-hostname>=https://<etcd-node3-ip>:2380 --initial-advertise-peer-urls=https://\$(hostname -i):2380 /cm/shared/backup/etcd/etcd-node001-20241017163809"
For a single member Etcd cluster:
pdsh -w node001 ". /etc/profile.d/modules.sh; module load etcd && etcdctl snapshot restore --data-dir=/var/lib/etcd --name \$(hostname) --initial-cluster <etcd-node1-hostname>=https://<etcd-node1-ip>:2380 --initial-advertise-peer-urls=https://\$(hostname -i):2380 /cm/shared/backup/etcd/etcd-node001-20241017163809"
For a five member Etcd cluster:
pdsh -w node001,node002,node003,node004,node005 ". /etc/profile.d/modules.sh; module load etcd && etcdctl snapshot restore --data-dir=/var/lib/etcd --name \$(hostname) --initial-cluster <etcd-node1-hostname>=https://<etcd-node1-ip>:2380,<etcd-node2-hostname>=https://<etcd-node2-ip>:2380,<etcd-node3-hostname>=https://<etcd-node3-ip>:2380,<etcd-node4-hostname>=https://<etcd-node4-ip>:2380,<etcd-node5-hostname>=https://<etcd-node5-ip>:2380 --initial-advertise-peer-urls=https://\$(hostname -i):2380 /cm/shared/backup/etcd/etcd-node001-20241017163809"
To continue with the same example cluster that we’ve dealt with so far in this KB article, the command with replaced values looks like this:
pdsh -w bcm,node001,node002 ". /etc/profile.d/modules.sh; module load etcd && etcdctl snapshot restore --data-dir=/var/lib/etcd --name \$(hostname) --initial-cluster bcm=https://10.141.255.254:2380,node001=https://10.141.0.1:2380,node002=https://10.141.0.2:2380 --initial-advertise-peer-urls=https://\$(hostname -i):2380 /cm/shared/backup/etcd/etcd-node001-20241017163809"
Example output of the above command. The deprecation warning can be ignored for now.
bcm: Deprecated: Use `etcdutl snapshot restore` instead.
bcm:
bcm: 2024-10-30T11:51:42+01:00 info snapshot/v3_snapshot.go:265 restoring snapshot {"path": "/cm/shared/backup/etcd/etcd-bcm-20241030112048", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "initial-memory-map-size": 0}
node002: Deprecated: Use `etcdutl snapshot restore` instead.
node002:
node002: 2024-10-30T11:51:42+01:00 info snapshot/v3_snapshot.go:265 restoring snapshot {"path": "/cm/shared/backup/etcd/etcd-bcm-20241030112048", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "initial-memory-map-size": 0}
node001: Deprecated: Use `etcdutl snapshot restore` instead.
node001:
node001: 2024-10-30T11:51:42+01:00 info snapshot/v3_snapshot.go:265 restoring snapshot {"path": "/cm/shared/backup/etcd/etcd-bcm-20241030112048", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "initial-memory-map-size": 0}
node002: 2024-10-30T11:51:43+01:00 info membership/store.go:141 Trimming membership information from the backend...
node001: 2024-10-30T11:51:43+01:00 info membership/store.go:141 Trimming membership information from the backend...
node002: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "39fd91fec477489", "added-peer-peer-urls": ["https://10.141.0.1:2380"]}
node002: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "10f8defdec135a31", "added-peer-peer-urls": ["https://10.141.255.254:2380"]}
node002: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "5a2d7ce8e12419a7", "added-peer-peer-urls": ["https://10.141.0.2:2380"]}
node001: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "39fd91fec477489", "added-peer-peer-urls": ["https://10.141.0.1:2380"]}
node001: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "10f8defdec135a31", "added-peer-peer-urls": ["https://10.141.255.254:2380"]}
node001: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "5a2d7ce8e12419a7", "added-peer-peer-urls": ["https://10.141.0.2:2380"]}
bcm: 2024-10-30T11:51:43+01:00 info membership/store.go:141 Trimming membership information from the backend...
node002: 2024-10-30T11:51:43+01:00 info snapshot/v3_snapshot.go:293 restored snapshot {"path": "/cm/shared/backup/etcd/etcd-bcm-20241030112048", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "initial-memory-map-size": 0}
bcm: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "39fd91fec477489", "added-peer-peer-urls": ["https://10.141.0.1:2380"]}
bcm: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "10f8defdec135a31", "added-peer-peer-urls": ["https://10.141.255.254:2380"]}
bcm: 2024-10-30T11:51:43+01:00 info membership/cluster.go:421 added member {"cluster-id": "601f2e306eb58e49", "local-member-id": "0", "added-peer-id": "5a2d7ce8e12419a7", "added-peer-peer-urls": ["https://10.141.0.2:2380"]}
node001: 2024-10-30T11:51:43+01:00 info snapshot/v3_snapshot.go:293 restored snapshot {"path": "/cm/shared/backup/etcd/etcd-bcm-20241030112048", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "initial-memory-map-size": 0}
bcm: 2024-10-30T11:51:43+01:00 info snapshot/v3_snapshot.go:293 restored snapshot {"path": "/cm/shared/backup/etcd/etcd-bcm-20241030112048", "wal-dir": "/var/lib/etcd/member/wal", "data-dir": "/var/lib/etcd", "snap-dir": "/var/lib/etcd/member/snap", "initial-memory-map-size": 0}
We need to fix the ownership once more, since the import by default recreates subdirectories with the root user.
pdsh -w bcm,node001,node002 chown etcd:etcd -R /var/lib/etcd
3.5. Start the Etcd services
This will be different from Section 3.2, starting via cmsh
is not going to work. We need all etcd services to start in parallel. So we will have to use pdsh
instead of cmsh
.
bcm:~# pdsh -w bcm,node001,node002 systemctl start etcd
If the restore command was done correctly (no typo’s in hostnames and/or IP addresses) this start command should exit without hanging. Next we can double check if the services have indeed started:
root@bcm:~# pdsh -w bcm,node001,node002 systemctl status etcd | grep Active:
bcm: Active: active (running) since Thu 2024-10-31 09:12:52 CET; 1min 51s ago
node001: Active: active (running) since Thu 2024-10-31 09:12:52 CET; 1min 51s ago
node002: Active: active (running) since Thu 2024-10-31 09:12:52 CET; 1min 51s ago
Now Section 2.1 and 2.2 can be repeated in order to query Etcd for it’s endpoint health status.
root@bcm:~# module load etcd/kube-default/3.5.15
root@bcm:~# ETCDCTL_ENDPOINTS=$(etcdctl member list | awk -F ',' '{print $5}' | sed 's/\s//' | paste -sd ",")
root@bcm:~# etcdctl -w table endpoint health
+-----------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+-----------------------------+--------+-------------+-------+
| https://10.141.255.254:2379 | true | 18.23326ms | |
| https://10.141.0.2:2379 | true | 25.209949ms | |
| https://10.141.0.1:2379 | true | 28.479844ms | |
+-----------------------------+--------+-------------+-------+
root@bcm:~# etcdctl -w table endpoint status
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://10.141.0.1:2379 | 39fd91fec477489 | 3.5.15 | 34 MB | false | false | 2 | 1126 | 1126 | |
| https://10.141.255.254:2379 | 10f8defdec135a31 | 3.5.15 | 34 MB | true | false | 2 | 1126 | 1126 | |
| https://10.141.0.2:2379 | 5a2d7ce8e12419a7 | 3.5.15 | 34 MB | false | false | 2 | 1126 | 1126 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
root@ci-tmp-100-u2204-smiley-square-300706:~# etcdctl -w table member list
+------------------+---------+---------------------------------------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+---------------------------------------+-----------------------------+-----------------------------+------------+
| 39fd91fec477489 | started | node001 | https://10.141.0.1:2380 | https://10.141.0.1:2379 | false |
| 10f8defdec135a31 | started | ci-tmp-100-u2204-smiley-square-300706 | https://10.141.255.254:2380 | https://10.141.255.254:2379 | false |
| 5a2d7ce8e12419a7 | started | node002 | https://10.141.0.2:2380 | https://10.141.0.2:2379 | false |
+------------------+---------+---------------------------------------+-----------------------------+-----------------------------+------------+
Pay special attention to the fact that the members are considered to be part of the same cluster, and not all individually restored as a member. In that case the member list output will only show one member on each of the nodes. (The above output is correct.)
In case Etcd is reporting a healthy status, the next logical check would be to see if the Kubernetes API server is working with our restored Etcd database. A quick sanity check could be to see if commands such as the following yield expected output.
# kubectl get nodes -o wide
...
# kubectl get pod -A -o wide
...