CephBasics: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
 
(10 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{DISPLAYTITLE:Operating a Ceph cluster}}
{{DISPLAYTITLE:Operating a Ceph cluster}}
== Where to operate ? ==
All operations should be done on the ceph-admin machine. In our experimental testbed, it is cephq1.wn.iihe.ac.be.
== Check Ceph cluster status ==
== Check Ceph cluster status ==
The command :
The command :
Line 25: Line 27:
== Remove OSDs ==
== Remove OSDs ==
When you want to remove a machine that contains OSDs (for example : decommissioning of an old equipment out of warranty), there is manual procedure to follow in order to do things in a clean way and to avoid problems :
When you want to remove a machine that contains OSDs (for example : decommissioning of an old equipment out of warranty), there is manual procedure to follow in order to do things in a clean way and to avoid problems :
#Identify the OSDs hosted by the machine with the command :
<ol>
<li> Identify the OSDs hosted by the machine with the command :
<pre>
<pre>
ceph osd tree
ceph osd tree
</pre>
</pre>
#Take the OSDs out of the cluster :
</li>
Before you remove an OSD, it is usually up and in. You need to take it out of the cluster so that Ceph can begin rebalancing and copying its data to other OSDs :
<li> Take the OSDs out of the cluster :
Before you remove an OSD, it is usually up and in. You need to take it out of the cluster so that Ceph can begin rebalancing and copying its data to other OSDs.
First put its weight to 0 in the crushmap:
<pre>ceph osd crush reweight osd.{osd-num} 0.0</pre>
Repeat this operation for all the OSDs on the machine.
</li>
<li> Monitor the data migration :
Once you have taken the OSDs out of the cluster, Ceph will begin rebalancing the cluster by migrating placement groups out of the OSDs you've removed. You must follow this process with the following command :
<pre>
ceph -w
</pre>
You should see the placement group states change from active+clean to active, some degraded objects, and finally active+clean when migration completes.
Once the procedure is completed, take the osd out of the cluster:
<pre>ceph osd out {osd-num}</pre>
</li>
<li> Stopping the OSDs daemons
After you take an OSD out of the cluster, it may still be running. That is, the OSD may be up and out. You must stop your OSD before you remove it from the configuration :
<pre>
<pre>
ceph osd out {osd-num}
ssh {osd-host}
/etc/init.d/ceph stop osd.{osd-num}
</pre>
</pre>
Repeat this operation for all the OSDs on the machine.
(Repeat the last command for all the OSDs on the machine.)
As a result, a "ceph -s" should show the OSDs as down.
</li>
<li> Removing the OSDs :
<ol>
<li>Remove OSDs from crush map :
<pre>
ceph osd crush remove osd.{osd-num}
</pre>
</li>
<li>Remove the OSD authentication key :
<pre>
ceph auth del osd.{osd-num}
</pre>
</li>
<li>Remove the OSDs
<pre>
ceph osd rm {osd-num}
</pre>
</li>
</ol>
</li>
</ol>

Latest revision as of 14:52, 8 March 2018

Where to operate ?

All operations should be done on the ceph-admin machine. In our experimental testbed, it is cephq1.wn.iihe.ac.be.

Check Ceph cluster status

The command :

ceph -s

shows the actual status of the Ceph cluster :

[root@cephq1 ~]# ceph -s
    cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
     health HEALTH_OK
     monmap e1: 3 mons at {cephq2=192.168.41.2:6789/0,cephq3=192.168.41.3:6789/0,cephq4=192.168.41.4:6789/0}
            election epoch 8, quorum 0,1,2 cephq2,cephq3,cephq4
     osdmap e78: 6 osds: 6 up, 6 in
      pgmap v1293: 192 pgs, 2 pools, 0 bytes data, 0 objects
            27920 kB used, 4021 GB / 4106 GB avail
                 192 active+clean

The following command displays a real-time summary of the status of the cluster, and major events :

ceph -w

Remove OSDs

When you want to remove a machine that contains OSDs (for example : decommissioning of an old equipment out of warranty), there is manual procedure to follow in order to do things in a clean way and to avoid problems :

  1. Identify the OSDs hosted by the machine with the command :
    ceph osd tree
    
  2. Take the OSDs out of the cluster : Before you remove an OSD, it is usually up and in. You need to take it out of the cluster so that Ceph can begin rebalancing and copying its data to other OSDs. First put its weight to 0 in the crushmap:
    ceph osd crush reweight osd.{osd-num} 0.0

    Repeat this operation for all the OSDs on the machine.

  3. Monitor the data migration : Once you have taken the OSDs out of the cluster, Ceph will begin rebalancing the cluster by migrating placement groups out of the OSDs you've removed. You must follow this process with the following command :
    ceph -w
    

    You should see the placement group states change from active+clean to active, some degraded objects, and finally active+clean when migration completes. Once the procedure is completed, take the osd out of the cluster:

    ceph osd out {osd-num}
  4. Stopping the OSDs daemons After you take an OSD out of the cluster, it may still be running. That is, the OSD may be up and out. You must stop your OSD before you remove it from the configuration :
    ssh {osd-host}
    /etc/init.d/ceph stop osd.{osd-num}
    

    (Repeat the last command for all the OSDs on the machine.) As a result, a "ceph -s" should show the OSDs as down.

  5. Removing the OSDs :
    1. Remove OSDs from crush map :
      ceph osd crush remove osd.{osd-num}
      
    2. Remove the OSD authentication key :
      ceph auth del osd.{osd-num}
      
    3. Remove the OSDs
      ceph osd rm {osd-num}