Troubleshooting Bricks
On this page you’ll find troubleshooting tips for when bricks are down.
If bricks are down, try and follow the steps bellow:
Check if there is enough disk space available. Gluster core dumps or logs may have filled up the root filesystem or /var, causing Gluster to crash again. |
If a brick is down, you can start it with: gluster volume start $volname force
|
Mass-start volumes
Force all bricks to start:
gluster volume list | xargs --max-procs=3 --max-args=5 bash -c ' for i; do if /usr/lib64/nagios/plugins/check_gluster_volume --retries 1 "$i"> /dev/null; then continue fi gluster --mode=script volume start "$i" force done ' --
Start heal on all unhealthy volumes:
gluster volume list | xargs --max-procs=3 --max-args=5 bash -c ' for i; do if /usr/lib64/nagios/plugins/check_gluster_volume_heal --retries 1 "$i" > /dev/null; then continue fi gluster --mode=script volume heal "$i" enable && \ gluster --mode=script volume heal "$i" done ' --
Free up log space
ansible mungg_gluster_server -m shell -a 'find /var/log/glusterfs -mtime +50 -delete; logrotate --force /etc/logrotate.conf'
Rejoin a Gluster glusterfs process
This should only be done if the brick process has died and is not restarted by the management process glusterd.
# # gluster volume status gluster-pvxxxx Status of volume: gluster-pvxxxx Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick storage2:/data/shared_s sd/gluster-pvxxxx/brick N/A N/A N N/A Brick storage3:/data/shared_s sd/gluster-pvxxxx/brick 49163 0 Y 13471 Brick storage1:/data/shared_s sd/gluster-pvxxxx/brick 49163 0 Y 26454 Self-heal Daemon on localhost N/A N/A Y 25299 Self-heal Daemon on x.x.x.x N/A N/A Y 2478 Self-heal Daemon on x.x.x.x N/A N/A Y 10545 # ps -f --pid $(cat /run/gluster/vols/gluster-pvxxx/*.pid) 23077 ? Ssl 1:33 /usr/sbin/glusterfs --read-only --log-file=/var/log/glusterfs/backup-gluster-pv1099.log --volfile-server=x.x.x.x --volfile-server=x.x.x.x --volfile-server=x.x.x.x --volfile-id=/gluster-pvxxxx /var/lib/gluster-backup/mnt/gluster-pvxxxx # kill $pid
Check Remove obsolete PID file below.
Remove obsolete PID file
In the case the glusterfs process do crash, they leave their PID file in place, preventing the glusterfs process to start.
volume=gluster volume name # e.g gluster-pv42 # ps -f --pid $(cat /run/gluster/vols/$volume/*.pid) <must be empty> # rm /run/gluster/vols/$volume/*${volume}-brick.pid # gluster volume start $volume force
Clean up all stale PID files:
find /run/gluster/vols/ -name '*.pid' | \ while read -r pidfile; do vol="$(basename "$(dirname "$pidfile")")"; ps -f --pid "$(<"$pidfile")" | grep -q glusterfsd || rm -vf "$pidfile"; done