The availability of data on your VPS or Big Storage is very important. Even when downtime is short and rare, it can be costly for your business. Where resources permit, we therefore recommend setting up your systems redundantly.
GlusterFS is a scalable network file system that is very suitable for redundantly setting up data on two or more VPSs and/or Big Storages. GlusterFS is also developed to provide optimal performance in a wide range of scenarios, namely for:
- Working with large files or very large numbers of small files
- Read and write intensive operations
- Sequential and random access tasks
- Large numbers of clients using the available storage
In this guide, we show how to set up VPSs with Ubuntu, Debian, AlmaLinux, Rocky Linux, or CentOS Stream as GlusterFS servers (hosting the redundant storage) and GlusterFS clients (using the redundant storage).
- For this guide, at least the following hardware is needed:
- Two or more VPSs serving as GlusterFS servers. In the examples in this article, GlusterFS server 1 and 2 use the private network IPs 192.168.1.1 and 192.168.1.2, respectively.
- One or more client VPSs using data on the GlusterFS servers.
- Optionally one Big Storage per GlusterFS server.
- Optionally a Private Network.
- Perform the steps in this guide as root, or as a user with root rights.
Private network vs public network
Where possible, we recommend using a private network. GlusterFS does not encrypt traffic between the different servers and clients. A private network provides protection because you are using a shielded network.
Only when computers or servers cannot be added to a private network but still need direct access to the redundant data, do you use the public IP address of your servers & clients (or their DNS names). In that case, it is advisable to use a VPN connection between all involved computers and servers for additional security.
If you are unsure whether a private network is sufficient for your use case, feel free to ask us for advice via a message from the TransIP control panel.
Install GlusterFS
Step 1
Connect to the VPSs (servers and clients) via SSH or the VPS console in the TransIP control panel.
Step 2
Update and restart your VPSs:
Ubuntu / Debian:
apt -y update && apt -y upgrade
reboot
CentOS Stream / AlmaLinux / Rocky Linux:
dnf -y update
reboot
Step 3
Install GlusterFS (note to use the right instruction for your operating system):
Ubuntu:
Install the software properties common package if it's not already present on your servers:
apt -y install software-properties-common
Add the GlusterFS PPA (repository) to your VPSs:
add-apt-repository ppa:gluster/glusterfs-7
apt -y update
Install the GlusterFS-server software on the VPSs that you will use as GlusterFS servers with the command:
apt -y install
glusterfs-server
Finally, install the GlusterFS-client software on the clients that will use the redundant storage:
apt -y install glusterfs-client
CentOS Stream / AlmaLinux / Rocky Linux:
Install the GlusterFS-server software on your GlusterFS servers with the command:
dnf -y install glusterfs-server
Install the GlusterFS-client software on the clients that will use the redundant storage:
dnf -y install glusterfs-client
Debian:
Add the GPG-key to apt:
wget -O - https://download.gluster.org/pub/gluster/glusterfs/9/rsa.pub | apt-key add -
Add the apt-source:
DEBID=$(grep 'VERSION_ID=' /etc/os-release | cut -d '=' -f 2 | tr -d '"')
DEBVER=$(grep 'VERSION=' /etc/os-release | grep -Eo '[a-z]+')
DEBARCH=$(dpkg --print-architecture)
echo deb https://download.gluster.org/pub/gluster/glusterfs/LATEST/Debian/${DEBID}/${DEBARCH}/apt ${DEBVER} main > /etc/apt/sources.list.d/gluster.list
Then update the package list
apt -y update
Install the GlusterFS-server software on your GlusterFS servers with the command:
apt -y install glusterfs-server
Install the GlusterFS-client software on the clients that will use the redundant storage:
apt -y install glusterfs-client
Configure GlusterFS servers
Step 1
Enable GlusterFS on the VPSs that you will use as GlusterFS servers and start the GlusterFS service:
systemctl enable glusterd.service
systemctl start glusterd.service
Step 2
The Gluster daemon uses port 24007 for communication between the GlusterFS servers. Therefore, execute the command below on each of the GlusterFS-servers to open port 24007 in your firewall.
- Replace 192.168.1.2 with the private network IP address of the GlusterFS server(s) that you want to give access to this server (and not the IP address of the server from which you are executing this command).
- Repeat the command for each VPS used as a GlusterFS server.
- Are you using the VPS firewall in the TransIP control panel? Then also open the ports there.
Ubuntu/Debian (UFW):
ufw allow from 192.168.1.2 to any port 24007
CentOS Stream / AlmaLinux / Rocky Linux:
firewall-cmd --permanent --zone=public --add-rich-rule='
Step 3
Now connect the various GlusterFS servers with each other using the gluster peer probe command. It does not matter on which server you execute this command.
gluster peer probe 192.168.1.2
Replace 192.168.1.2 with the (private) IP address of the server with which you want to establish a connection.
With this command, you tell your VPS that you trust the server with the IP address 192.168.1.2 and that it should be registered as part of the storage pool.
GlusterFS uses 'volumes'. Simply put, a volume is a collection of servers that together form a storage pool. The volume uses a daemon that runs on each GlusterFS server. In turn, the daemon uses a 'brick process' (called glusterfsd) to address the underlying storage. In practice, a GlusterFS server is often called a 'brick'.
Create a volume now by executing the following command on one of the GlusterFS servers (see the points under the command):
gluster volume create volume_name replica number_of_servers 192.168.1.1:/data/directory 192.168.1.2:/data/directory force
- gluster volume create: Creates a volume.
- volume_name: You are free in how you name the volume, but it is useful to use a name that easily recognizes what the purpose of the volume is (for example, a project name).
- replica: replica here is the type of volume. This means that data is replicated on the servers that you define in this command
- number_of_servers: The number of GlusterFS servers.
- 192.168.1.1:/data/directory 192.168.1.2:/data/directory force: For each GlusterFS server, specify the IP address followed by the directory you want to use for the redundant storage.
- force: forces the creation of the volume and ignores any warnings.
An example for a private network with two GlusterFS servers each with a Big Storage might look like this:
gluster volume create storage1 replica 2 192.168.1.1:/mnt/bigstorage 192.168.1.2:/mnt/bigstorage force
The handy thing about working with volumes is that you can also create multiple volumes on the same servers. This is useful, for example, if you want to use a volume per software package, customer, or project.
Step 5
The volume is not automatically active and you activate it with the command:
gluster volume start volume_name
Replace volume_name with the name you used in step 4 when creating the volume.
Step 6
Check the status of the GlusterFS servers with the command:
gluster volume status
In the output, you see, among other things, the TCP port on which the 'brick' is active, usually 49152:
Status of volume: volume_name Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.1.1:/data/storage 49152 0 Y 15179 Brick 192.168.1.2:/data/storage 49152 0 Y 799 Self-heal Daemon on localhost N/A N/A Y 15200 Self-heal Daemon on 192.168.1.3 N/A N/A Y 828 Task Status of Volume volume1 ------------------------------------------------------------------------------ There are no active volume tasks
Open from all GlusterFS-servers the port on which the 'brick' is active for your clients.
- Replace 192.168.1.3 with the (private) IP address of the client where you want to mount the GlusterFS volume.
- Repeat this step on all your GlusterFS servers for each client on which you want to mount the GlusterFS volume.
- Did you see a different port than 49152 in step 6? Then adjust that in the commands below.
Ubuntu/Debian:
ufw allow from 192.168.1.3 to any port 49152
CentOS Stream / AlmaLinux / Rocky Linux:
firewall-cmd --permanent --zone=public --add-rich-rule='<br> rule family="ipv4"<br> source address="192.168.1.3"<br> port protocol="tcp" port="49152" accept'
Step 8 - optional
In the previous step, you limited access to the 'brick' port. However, other computers still have free access to the GlusterFS volume. This is not a very big security risk on a private network, but it is if you used the public network connection of your VPSs for the steps in this guide. It is then advisable to also restrict access to the volume to the IPs of your clients as follows:
gluster volume set volume_name auth.allow 192.168.1.3
- Adjust volume_name to the name of the GlusterFS volume that you created in step 4.
- Replace 192.168.1.3 with the (private) IP address of your client.
For multiple clients, split the IP addresses with commas as follows:
gluster volume set volume_name auth.allow 192.168.1.3, 192.168.1.4, 192.168.1.5
You can remove this restriction later with the command:
gluster volume set volume_name auth.allow *
Configure GlusterFS clients
Step 1
The GlusterFS volume that you created in the previous paragraph can easily be mounted to a client with a mount command. First, create a directory on your client(s) where you will mount the volume, for example:
mkdir /mnt/storage
You are free to adjust the directory in this example.
Step 2
Next, mount the GlusterFS volume. Adjust the following details:
- 192.168.1.1: Adjust to the (private) IP address of one of your GlusterFS servers. It does not matter which of your GlusterFS servers you use for this; GlusterFS sees your servers as a single entity. Even if the server with IP address 192.168.1.1 goes offline, GlusterFS still knows which IP addresses the other servers are active on.
- volume_name: Adjust to the name of the volume that you created in step 4 of the previous paragraph.
- /mnt/storage: Adjust this directory to the directory that you created in the previous step.
mount -t glusterfs 192.168.1.1:/volume_name /mnt/storage
The addition -t glusterfs indicates that a GlusterFS type filesystem is being mounted.
That's it! You can now test your redundant storage directly by creating a file on a client...
touch /mnt/storage/testfile
...and then checking on a server to see if you see it back:
ls /data/directory
Adjust /data/directory here to the directory that you used in step 4 of the server configuration, for example /mnt/bigstorage.
Automatically mounting
In most scenarios, it is desirable that the GlusterFS clients automatically remount the GlusterFS volume after a restart of the underlying VPS.
To do this, first add an entry to /etc/fstab (this file controls which filesystems are mounted at the booting of your VPS)
echo "192.168.1.1:/volume_name /mnt/storage glusterfs defaults,_netdev 0 0" >> /etc/fstab
Normally, this is sufficient, but it takes a while during the (re)start of your VPS for the glusterfs-server service to start (which is needed to be able to mount the volume). The systemd-mount process takes care of the mount points in /etc/fstab and starts faster than the glusterfs-server service. As a result, the attempt by systemd-mount to automatically mount the GlusterFS volume fails.
Solve this with the following commands:
touch /etc/systemd/system/gluster-mount.service
echo "[Unit]" >> /etc/systemd/system/gluster-mount.service
echo "After=glusterfs-server.service" >> /etc/systemd/system/gluster-mount.service
echo "Wants=glusterfs-server.service" >> /etc/systemd/system/gluster-mount.service
This creates a service that ensures the GlusterFS volume is only mounted when the glusterfs-server service is available.
Adding/removing servers to a volume
You have created a volume, but over time you want to add an additional server to a volume. For this, first go through the steps in the paragraph 'Install GlusterFS' and steps 1 to 3 of the paragraph 'Configure GlusterFS servers' on the new server, with two small adjustments:
- Perform step 2 on the new and existing GlusterFS servers to allow the servers to access each other.
- Use the (private) IP address of the new server in step 3.
Now execute the following command from any existing GlusterFS server:
gluster volume add-brick volume_name replica 3 192.168.1.10:/data/directory
- gluster volume add-brick: The basic command for adding a new server to an existing volume.
- volume_name: Adjust to the name of the existing volume.
- replica 3: Here adjust the 3 to the new total number of GlusterFS servers that the volume offers. If you are upgrading from 2 servers to 3 servers, then you use replica 3 here.
- 192.168.1.10:/data/directory: Specify the IP address of the new server, followed by the directory that you want to use for the redundant storage.
You remove a server (brick) from a volume with a few commands:
gluster volume remove-brick volume_name 192.168.1.1 start
- volume_name: Adjust to the name of the volume from which the server is being removed.
- 192.168.1.1: Specify the (private) IP address of the server to be removed.
Check the status of the removal of the brick/server:
gluster volume remove-brick volume_name 192.168.1.1 status
Once the removal is complete, confirm the removal by committing it:
gluster volume remove-brick volume_name 192.168.1.1 commit
Adding clients
When you want to mount a volume on a new client, go through the following paragraphs/steps in this order:
- Go through the paragraph 'Install GlusterFS' on the new client.
- Go through step 7 of the paragraph 'Configure GlusterFS servers' on the GlusterFS servers.
- Go through the paragraph 'Configure GlusterFS clients' on the new client.
Manage GlusterFS
Finally, it is useful to be familiar with the commands you can use to manage your GlusterFS setup (as far as they have not already been covered). A useful selection can be found below, but also take a look at GlusterFS' own documentation.
gluster volume stop volume_name
gluster volume start volume_name
- Brings the volume with the name 'volume_name' offline / online.
gluster volume set volume_name group metadata-cache
- Enables metadata caching; this improves the performance of the cluster except when many clients are editing the same file simultaneously.
gluster volume delete volume_name
- Deletes the volume with the name volume_name, but not the data that is on your servers. For example, you can therefore delete a volume and add it again without that needing to impact the data that is on your servers.
gluster volume status
- Although mentioned earlier in this article we mention this command again because it is your first 'go to' when troubleshooting your setup.
gluster volume info
- Displays slightly more detailed information about the available volumes.
gluster volume profile volume_name start
- Gathers information about the performance of the given volume
gluster volume profile volume_name info
- Displays the collected performance information of the given volume
gluster
- Starts the gluster console. Use the command 'help' to see available options or 'exit' to close the gluster console.
With that, we have come to the end of this guide on installing and configuring redundant storage in Linux using GlusterFS.