1. Home
  2. OpenStack
  3. How do I add Ceph OSDs to an existing Ceph cluster?

How do I add Ceph OSDs to an existing Ceph cluster?

This article is being updated. Please be aware the content herein, not limited to version numbers and slight syntax changes, may not match the output from the most recent versions of Bright. This notation will be removed when the content has been updated.

This procedure was tested on Bright 7.3.

Case 1: No separate journal device.

Step 1: 
Set the datanode property on all of your Ceph nodes to yes. Eg:

#cmsh
[am-test3]% device use node001
[am-test3->device[node001]]% set datanode yes
[am-test3->device*[node001*]]% commit
[am-test3->device[node001]]%

Step 2: 
Identify which of your nodes you want to add disks to. Make a list of those nodes.

Step 3: 
Starting with the first node (you need to carry out this procedure one node at a time):

#cmsh
[am-powercontrol]% device use node001
[am-powercontrol->device[node001]]% set disksetup

After the last <device></device> stanza in the disksetup, you can add your new devices. An example of adding a new Ceph OSD device :

<device origin="cm-ceph-setup">
    <blockdev>/dev/<device id></blockdev>
    <partition id="osdN">
      <cephosdassociation>osdN</cephosdassociation>
      <size>max</size>
      <type>linux</type>
      <filesystem>xfs</filesystem>
      <mkfsFlags>-i size=2048</mkfsFlags>
      <mountOptions>defaults,noatime,nodiratime,inode64</mountOptions>
    </partition>
  </device>

Step 4:

1. /dev/<device id> should be your block device path, e.g.: /dev/sdd

2. osdN , change N into a number. This number can be any number that does not conflict with other values of <cephassociation> in the same node.

For example, if your last <cephassociation> value is osd2, then your new OSD can be osd3.

Keep adding <device> </device> as in the previous example for each of your new devices.

3. Execute the following:

#cmsh

[am-powercontrol]% device use node001
[am-powercontrol->device[node001]]% roles
[am-powercontrol->device[node001]->roles]% use cephosd
[am-powercontrol->device[node001]->roles[cephosd]]% osdassociations
[am-powercontrol->device[node001]->roles[cephosd]->osdassociations]% add osdN
[am-powercontrol->device[node001]->roles[cephosd]->osdassociations]% commit

change osdN to match the <cephosdassociation> you defined in your disk setup. Repeat step 3 for each of the OSDs that you defined for the disk setup of the node.

4. Execute the following :

[am-powercontrol]% device use node001
[am-powercontrol->device[node001]]% append blockdevicesclearedonnextboot /dev/sdd
[am-powercontrol->device[node001]]% commit

You need to append each of your new OSD devices toblockdevicescleardonnextboot, and commit.
Now you can reboot your node.
Make sure to follow these steps for each of your nodes one at a time.
Then wait for Ceph to be healthy again. Do not reboot any of your nodes until Ceph is healthy again.

Case 2: with a separate journal device.

Follow steps 1 & 2 from case 1.
If you are sharing your SSD with the OSDs on each node, then you will need to :

1. Stop (take out) your OSDs running on that node.

2. Flush the journal of that OSD.

To do this you can execute the following:

ceph osd getcrushmap -o map
crushtool -d map -o dmap

You will have your decompiled crush map in “dmap”

Edit that file and note the OSDs running on your nodes. Do not confuse the OSDs in the crush map with the OSDs associations defined in CMDaemon. They are different.

An example would be :

host node002 {

       id -2           # do not change unnecessarily
        # weight 0.098
        alg straw
        hash 0  # rjenkins1
        item osd.1 weight 0.098
}

In this case I have :

item osd.1 running on node002.
You have two options:

1 . Take osd.1 out, Ceph will rebalance itself.

2 . Set your Ceph cluster to notout, then take down the OSD. Ceph will not rebalance itself, but the PGs hosted on that OSD will be in degraded mode.

We are going with option two. We suggest you stop any virtual instances before you take any further steps, because data corruption on your virtual machines is otherwise possible:

#ceph osd set noout
#cmsh
[am-powercontrol]% device use node001
[am-powercontrol->device[node001]]% services
[am-powercontrol->device[node001]->services]% list
Service (key)            Monitored  Autostart
------------------------ ---------- ----------
ceph-osd-osd0            yes        yes
nslcd                    yes        yes
[am-powercontrol->device[node001]->services]% stop ceph-osd-osd0
Fri Feb  3 16:29:35 2017 [notice] node001: Service ceph-osd-osd0 was stopped
[am-powercontrol->device[node001]->services]%

Check Ceph, it should be in HEALTH_ERR state, and you should have degraded PGs, this is normal.
Now you need to flush the journal on each of your Ceph OSDs on that node

ceph-osd -i (osd number) --flush-journal

Please replace osd_number with the number of your osd you got from the crush map.

If you know which device is used for your journal, then proceed. Else, if you do not, then execute the following :

%device use node 001; roles ; use cephosd ; foreach * (show);

This will show you which device is being used for journaling, it should be the value of journal data

Follow step 3 from the previous instructions, add your devices.

Scan your disk setup and locate your journal device within the <device></device> tags.

You can change the layout of the drive as you like. For example:

 <device origin="cm-ceph-setup">
    <blockdev>/dev/sde</blockdev>
    <partition id="/dev/sde1">
      <size>1/2</size>
      <type>linux</type>
    </partition>
    <partition id="/dev/sde2">
      <size>1/2</size>
      <type>linux</type>
    </partition>
  </device>

 

I have here two partitions, each is half of the disk space, I am going to add two more :

  <device origin="cm-ceph-setup">
    <blockdev>/dev/sde</blockdev>
    <partition id="/dev/sde1">
      <size>1/4</size>
      <type>linux</type>
    </partition>
    <partition id="/dev/sde2">
      <size>1/4</size>
      <type>linux</type>
    </partition>
    <partition id="/dev/sde3">
      <size>1/4</size>
      <type>linux</type>
    </partition>
    <partition id="/dev/sde5">
      <size>max</size>
      <type>linux</type>
    </partition>
  </device>

Save your disk setup, and commit

Now follow step 4 from case 1 to the end of case 1. The only change you need to make is to make sure that you set your journaldata property to your journal partition in the new Ceph associations that you just added.

An example would be

set journaldata /dev/sde3
Right now you have configured your disks, your journal and your Ceph associations.
You need to set your Ceph OSDs on that node to autostart = no

#cmsh
%device use node001 ; services
% foreach * (set autostart no)
%commit

Reboot your node.

After you reboot your node execute the following :

#cmsh
%ceph
%osdinfo

Note your OSD numbers and the mapping to the OSD associations you added to your node.

From your head node :

#ceph-osd -i osdN --mkjournal

Replace N with the item number of the OSD you got from the previous command.
Execute the previous step for each of your OSDs on the node you just rebooted 
After you are done, execute the following :

#ceph osd unset noout

Ceph will at first be in HEALTH_WARN status, because it will first need to rebalance its data to the new OSDs that have been added to your cluster.

Updated on October 5, 2020

Related Articles

Leave a Comment