Your submission was sent successfully! Close

Removing OSDs

This guide describes the procedure of removing an OSD from a Ceph cluster.

Note: This method makes use of the ceph-osd charm’s remove-disk action, which appeared in the charm’s quincy/stable channel.

  1. Before removing an OSD unit, we first need to ensure that the cluster is healthy:

    juju ssh ceph-mon/leader sudo ceph status
    
  2. Identify the target OSD

    Check OSD tree to map OSDs to their host machines:

    juju ssh ceph-mon/leader sudo ceph osd tree
    

    Sample output:

    ID  CLASS  WEIGHT   TYPE NAME             STATUS  REWEIGHT  PRI-AFF
    -1         0.09357  root default                                   
    -5         0.03119      host finer-shrew                           
    2    hdd  0.03119          osd.2             up   1.00000  1.00000
    ...
    

    Assuming that we want to remove osd.2. As shown in the output, it is hosted on the machine finer-shrew.

    Check which unit is deployed on this machine:

    juju status
    

    Sample output:

    ...
    Unit         Workload  Agent  Machine  Public address  Ports  Message
    ...
    ceph-osd/1*  blocked   idle   1        192.168.122.48         No block devices detected using current configuration
    ...
    
    Machine  State    DNS             Inst id              Series  AZ       Message
    ...
    1        started  192.168.122.48  finer-shrew          focal   default  Deployed
    ...
    

    In this case, ceph-osd/1 is the unit we want to remove.

    Therefore, the target OSD can be identified by the following properties:

    OSD_UNIT=ceph-osd/1
    OSD=osd.2
    OSD_ID=2
    
  3. Remove and purge disk with juju action

    First reweight the OSD to zero:

    juju run-action --wait ceph-mon/leader change-osd-weight osd=$OSD_ID weight=0
    

    Monitor the resulting data movement across the cluster:

    juju ssh ceph-mon/leader sudo ceph -w
    

    When the movement has ceased, remove the OSD:

    juju run-action --wait $OSD_UNIT remove-disk osd-ids=$OSD purge=true
    

    Note: The remove-disk action attempts to safely remove the target OSD from the cluster. This action will fail with a timeout error if the OSD cannot be safely removed within the timeout period (default is 5 minutes) or if there are not enough OSDs remaining in the cluster after the removal to meet the various pool level replication requirements. If you insist on removing the disk even if it is considered unsafe, you can add force=true to the command when running the action.

  4. (Optional) If the unit hosting the target OSD does not have other active OSDs attached and you would like to delete it, you can do so by running:

    juju remove-unit $OSD_UNIT
    

    Caution: If there are active OSDs on the unit, removing it will produce unexpected errors.

  5. Ensure the cluster is in healthy state after being scaled down:

    juju ssh ceph-mon/leader sudo ceph status
    

Last updated 8 days ago. Help improve this document in the forum.