Troubleshooting Bricks

On this page you’ll find troubleshooting tips for when bricks are down.

If bricks are down, try and follow the steps bellow:

 Check if there is enough disk space available. Gluster core dumps or logs may have filled up the root filesystem or /var, causing Gluster to crash again.
 If a brick is down, you can start it with: `gluster volume start $volname force` Check for free RAM `ansible mungg_gluster_server -a "free -h"` Mass-start volumes Force all bricks to start: ```gluster volume list | xargs --max-procs=3 --max-args=5 bash -c ' for i; do if /usr/lib64/nagios/plugins/check_gluster_volume --retries 1 "$i"> /dev/null; then
continue
fi

gluster --mode=script volume start "$i" force done ' --``` Start heal on all unhealthy volumes: ```gluster volume list | xargs --max-procs=3 --max-args=5 bash -c ' for i; do if /usr/lib64/nagios/plugins/check_gluster_volume_heal --retries 1 "$i" > /dev/null; then
continue
fi

gluster --mode=script volume heal "$i" enable && \ gluster --mode=script volume heal "$i"
done
' --```

Free up log space

`ansible mungg_gluster_server -m shell -a 'find /var/log/glusterfs -mtime +50 -delete; logrotate --force /etc/logrotate.conf'`

Rejoin a Gluster glusterfs process

This should only be done if the brick process has died and is not restarted by the management process glusterd.

```# # gluster volume status gluster-pvxxxx
Status of volume: gluster-pvxxxx
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick storage2:/data/shared_s
sd/gluster-pvxxxx/brick                     N/A       N/A        N       N/A
Brick storage3:/data/shared_s
sd/gluster-pvxxxx/brick                     49163     0          Y       13471
Brick storage1:/data/shared_s
sd/gluster-pvxxxx/brick                     49163     0          Y       26454
Self-heal Daemon on localhost               N/A       N/A        Y       25299
Self-heal Daemon on x.x.x.x                 N/A       N/A        Y       2478
Self-heal Daemon on x.x.x.x                 N/A       N/A        Y       10545

# ps -f --pid $(cat /run/gluster/vols/gluster-pvxxx/*.pid) 23077 ? Ssl 1:33 /usr/sbin/glusterfs --read-only --log-file=/var/log/glusterfs/backup-gluster-pv1099.log --volfile-server=x.x.x.x --volfile-server=x.x.x.x --volfile-server=x.x.x.x --volfile-id=/gluster-pvxxxx /var/lib/gluster-backup/mnt/gluster-pvxxxx # kill$pid```

Remove obsolete PID file

In the case the glusterfs process do crash, they leave their PID file in place, preventing the glusterfs process to start.

```volume=gluster volume name # e.g gluster-pv42
# ps -f --pid $(cat /run/gluster/vols/$volume/*.pid)
<must be empty>

# rm /run/gluster/vols/$volume/*${volume}-brick.pid

# gluster volume start $volume force``` Clean up all stale PID files: ```find /run/gluster/vols/ -name '*.pid' | \ while read -r pidfile; do vol="$(basename "$(dirname "$pidfile")")";
ps -f --pid "$(<"$pidfile")" | grep -q glusterfsd || rm -vf "\$pidfile";
done```