Categories

ID #1349

How do I add Ceph OSDs to an existing Ceph cluster?

How do I add an OSD to a Ceph cluster manually?

 

How to manually add an OSD to a Ceph cluster

 

This procedure was tested on Bright 7.3.


Case 1: No separate journal device.

 

Step 1: Set the datanode property on all of your Ceph nodes to yes. Eg:

 

#cmsh

[am-test3]% device use node001

[am-test3->device[node001]]% set datanode yes

[am-test3->device*[node001*]]% commit

[am-test3->device[node001]]%

 

Step 2: Identify which of your nodes you want to add disks to. Make a list of those nodes.

 

Step 3: Starting with the first node (you need to carry out this procedure one node at a time):

 

#cmsh

[am-powercontrol]% device use node001

[am-powercontrol->device[node001]]% set disksetup

 

After the last <device></device> stanza in the disksetup, you can add your new devices. An example of adding a new Ceph OSD device :

 

  <device origin="cm-ceph-setup">

    <blockdev>/dev/<device id></blockdev>

    <partition id="osdN">

      <cephosdassociation>osdN</cephosdassociation>

      <size>max</size>

      <type>linux</type>

      <filesystem>xfs</filesystem>

      <mkfsFlags>-i size=2048</mkfsFlags>

      <mountOptions>defaults,noatime,nodiratime,inode64</mountOptions>

    </partition>

  </device>

 

 

Step 4:

 

1. /dev/<device id> should be your block device path, e.g.: /dev/sdd

2. osdN , change N into a number. This number can be any number that does not conflict with other values of <cephassociation> in the same node.

 

For example, if your last <cephassociation> value is osd2, then your new OSD can be osd3.

 

Keep adding <device> </device> as in the previous example for each of your new devices.

 

3. Execute the following:

 

#cmsh

[am-powercontrol]% device use node001

[am-powercontrol->device[node001]]% roles

[am-powercontrol->device[node001]->roles]% use cephosd

[am-powercontrol->device[node001]->roles[cephosd]]% osdassociations

[am-powercontrol->device[node001]->roles[cephosd]->osdassociations]% add osdN

[am-powercontrol->device[node001]->roles[cephosd]->osdassociations]% commit

 

change osdN to match the <cephosdassociation> you defined in your disk setup. Repeat step 3 for each of the OSDs that you defined for the disk setup of the node.

 

4. Execute the following :

 

[am-powercontrol]% device use node001

[am-powercontrol->device[node001]]% append blockdevicesclearedonnextboot /dev/sdd

[am-powercontrol->device[node001]]% commit

 

You need to append each of your new OSD devices to blockdevicescleardonnextboot, and commit.

 

Now you can reboot your node.

 

Make sure to follow these steps for each of your nodes one at a time.

 

Then wait for Ceph to be healthy again. Do not reboot any of your nodes until Ceph is healthy again.

 

 

Case 2: with a separate journal device.

 

Follow steps 1 & 2 from case 1.

 

If you are sharing your SSD with the OSDs on each node, then you will need to :

 

1. Stop (take out) your OSDs running on that node.

2. Flush the journal of that OSD.

 

To do this you can execute the following:

 

ceph osd getcrushmap -o map

crushtool -d map -o dmap

 

You will have your decompiled crush map in "dmap"

 

Edit that file and note the OSDs running on your nodes. Do not confuse the OSDs in the crush map with the OSDs associations defined in CMDaemon. They are different.

 

An example would be :

 

host node002 {

        id -2           # do not change unnecessarily

        # weight 0.098

        alg straw

        hash 0  # rjenkins1

        item osd.1 weight 0.098

}

 

 

In this case I have :

 

item osd.1 running on node002.

 

You have two options:

1 . Take osd.1 out, Ceph will rebalance itself.

2 . Set your Ceph cluster to notout, then take down the OSD. Ceph will not rebalance itself, but the PGs hosted on that OSD will be in degraded mode.

 

We are going with option two. We suggest you stop any virtual instances before you take any further steps, because data corruption on your virtual machines is otherwise possible:

 

#ceph osd set noout

#cmsh

[am-powercontrol]% device use node001

[am-powercontrol->device[node001]]% services

[am-powercontrol->device[node001]->services]% list

Service (key)            Monitored  Autostart

------------------------ ---------- ----------

ceph-osd-osd0            yes        yes

nslcd                    yes        yes

[am-powercontrol->device[node001]->services]% stop ceph-osd-osd0

Fri Feb  3 16:29:35 2017 [notice] node001: Service ceph-osd-osd0 was stopped

[am-powercontrol->device[node001]->services]%

 

 

Check Ceph, it should be in HEALTH_ERR state, and you should have degraded PGs, this is normal.

 

Now you need to flush the journal on each of your Ceph OSDs on that node

 

ceph-osd -i (osd number) --flush-journal

 

Please replace osd_number with the number of your osd you got from the crush map.

 

If you know which device is used for your journal, then proceed. Else, if you do not, then execute the following :

 

%device use node 001; roles ; use cephosd ; foreach * (show);

 

This will show you which device is being used for journaling, it should be the value of journal data.

 

Follow step 3 from the previous instructions, add your devices.

 

Scan your disk setup and locate your journal device within the <device></device> tags.

 

You can change the layout of the drive as you like. For example:

 

  <device origin="cm-ceph-setup">

    <blockdev>/dev/sde</blockdev>

    <partition id="/dev/sde1">

      <size>1/2</size>

      <type>linux</type>

    </partition>

    <partition id="/dev/sde2">

      <size>1/2</size>

      <type>linux</type>

    </partition>

  </device>

 

I have here two partitions, each is half of the disk space, I am going to add two more :

 

  <device origin="cm-ceph-setup">

    <blockdev>/dev/sde</blockdev>

    <partition id="/dev/sde1">

      <size>1/4</size>

      <type>linux</type>

    </partition>

    <partition id="/dev/sde2">

      <size>1/4</size>

      <type>linux</type>

    </partition>

    <partition id="/dev/sde3">

      <size>1/4</size>

      <type>linux</type>

    </partition>

    <partition id="/dev/sde5">

      <size>max</size>

      <type>linux</type>

    </partition>

  </device>

 

Save your disk setup, and commit

 

Now follow step 4 from case 1 to the end of case 1. The only change you need to make is to make sure that you set your journaldata property to your journal partition in the new Ceph associations that you just added.

 

An example would be

 

set journaldata /dev/sde3

 

Right now you have configured your disks, your journal and your Ceph associations.

 

You need to set your Ceph OSDs on that node to autostart = no

 

#cmsh

%device use node001 ; services

% foreach * (set autostart no)

%commit

 

Reboot your node.

After you reboot your node execute the following :

 

#cmsh

%ceph

%osdinfo

 

Note your OSD numbers and the mapping to the OSD associations you added to your node.

 

From your head node :

#ceph-osd -i osdN --mkjournal

 

Replace N with the item number of the OSD you got from the previous command.

 

Execute the previous step for each of your OSDs on the node you just rebooted

After you are done, execute the following :

 

#ceph osd unset noout

 

Ceph will at first be in HEALTH_WARN status, because it will first need to rebalance its data to the new OSDs that have been added to your cluster.

 

Tags: -

Related entries:

You cannot comment on this entry