-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to confirm that everything good after OSD reweight to 0.0 and purge after #53
Comments
Note that due to Ceph limitations, reweighting to 0 will prevent pgremapper from being able to do anything. We usually reweight to
You're missing a pile of steps here, and this has more to do with Ceph knowledge than pgremapper itself. When you reweighted your OSDs, Ceph scheduled backfill to move data off of the OSDs. Normally, you would wait for that backfill to completed, verify that the OSDs that you want to remove are empty of PGs via
Yes, it can be run at any time. Setting the flags just allows you to remain in control the whole time, reducing any wasted work that you might want to actually cancel using pgremapper. It is possible that pgremapper can occasionally fail if backfill finishes or gets scheduled why the tool runs, though.
I'm not sure what you mean by this - can you elaborate?
injectargs should almost never be used as of Nautilus - ceph's centralized conf is almost always what you want to use instead. |
I see, so what would be the proper steps if I want to remove osds (or even a complete host) with using the cancel-backfill pgremapper feature to prevent overloading osds on the same nodes or other nodes same classed devices? Let's say I have this node
I want to remove these osds then add back later (they have 2 osds on 1 nvme and I want to add back as 1osd/nvme).
Moving forward when I add back the osds, the steps would be as written earlier in my first post, am I correct?
May I have an example for the
If this fails happen just need to rerun the pgremapper command again?
Sure, so let's say something goes wrong with the pgremapper tool, what would be the backup steps? Or let's say I want to speed up the process which is taken care at this stage by the balancer to optimize the pg allocation? Are these possible?
We are running on octopus (not cephadm). I want to slowly control the backfill, so first I increase the backfill and recovery ops on the newly added osds let's say to 4 but all the rest stays 1. When the process started to slow down I increase the rest of them to 2 or increase the newly added even further. I think with config db more work to control osd by osd. |
OK, if you use the balancer in those steps then you should be OK.
Setting an OSD out is equivalent to setting its weight to 0; that can be done at the very end, when there is little backfill remaining.
Yes
I'll try to keep this high-level; this tooling is very much advanced user territory and while I don't know of cases where it can do dangerous things, without understanding what it's doing at a deep level it could make your life harder, not easier. After you reweight OSDs to 0.0001, Ceph will reassign the PGs associated with those OSDs to other OSDs. Running So, for example: Running that in a loop should cause one backfill to be scheduled per source OSD and target OSD at a time until all upmaps targeting OSDs in the host
Yeah
This is out of scope of pgremapper. There are standard ways of affecting the speed that backfill operates at via balancer and backfill settings.
With the config DB, you can set these settings at a host level (though unfortunately that might be broken in Octopus, not sure). The issue with injectargs is that the config DB can no longer control those settings in the OSDs; anything set by injectargs will remain until the OSD reboots and takes precedence over config DB settings. You can remove these overrides that you have injected so that the config values take effect again, you just need to remember to do this. |
Hi,
Thank you for the effort to create this tool, hopefully this will help me with my issue to somehow lower the load on my osds during add/remove osd.
Currently I'm testing the
pgremapper cancel-backfill --yes
option as written in the documentation.
I would like to have 2 questions:
Q1 regarding osd removal:
What I've done in test env:
ceph osd set nobackfill;ceph osd set norebalance
for i in {36..43};do ceph osd reweight $i 0.0;done
pgremapper cancel-backfill --yes
ceph osd unset norebalance;ceph osd unset nobackfill
for num in {36..43}; do ceph osd out osd.$num;systemctl disable ceph-osd@$num;systemctl purge $num --yes-i-really-mean-it;umount /var/lib/ceph/osd/ceph-$num;done
In theory now I'm missing a lot of chunks so how I know that all the data actually recovered/regenerated somewhere else in the cluster?
Q2 regarding the process:
I see in the documentation this needs to be done before unset the flags. Is it possible to do if the recovery is progress already? Let's say I'm experiencing issue between recovery so I want to somehow limit.
Also this question vice-versa, if the remapping too slow, is it possible to cancel and make it faster back like original? Let's say I'd increase on all osds with the injectargs command the osd_max_backfill and the osd_recovery_max_op.
The text was updated successfully, but these errors were encountered: