Labs: installation & configuration of GlusterFS as synchronous data storage solution.
By: Pascal Charest, Freesoftware consultant
Date: September, 2008.
Synchronization of files in a cloud environment is a challenge in the path of high-{availability, performance}. From simple load balanced web sites to full-blown applications - some files always need to be in sync. Peoples, for simplicity, rely on asynchronous transfer (ie: rsync ), others deploy bigger solutions (ie: block device replication through DRBD or shared storage through AoE protocol & concurrency management with OCFSv2) or even go for the “lazy” “no-shared-storage” solution through NFS.
To address this problem in the PraizedMedia software stack, I decided to give FUSE based GlusterFS a try. Awesome, really ! The technical knowledge to deploy a basic solution is very very low. The modularity of the program also help to have “something working right now”. This isn’t meant as a direct alternative to DRBD or a good SAN deployment but in my use case, it fit perfectly.
In this lab, I will guide you through the installation of GlusterFS on 2 networked systems. They will be both used as “servers” & “client” for the GlusterFS filesystem. They will be sharing a directory (on both system : /var/production/brick), re-mounted as /var/production/static through GlusterFS. Any write I/O on this directory (of any client server) will be synchronized to the pool. This last feature is called “AFR” (for automatic file replication) and is a module (called a translator) to the GlusterFS file system.
The specificity of my environment is around the file-locking management : I don’t need any. By design, the application will never try to write the same file twice on any of the server.
#Installation of requirement (standard tools)
apt-get install flex bison libfuse-dev linux-headaers-`uname -r` curl
#download of the sources
cd /usr/local/src/
curl -O http://ftp.zresearch.com/pub/gluster/glusterfs/1.3/glusterfs-CURRENT.tar.gz
tar zxf glusterfs-CURRENT.tar.gz
# configure
cd glusterfs-1.3.11
./configure --prefix=/usr/local/glusterfs-1.3.11
make && make install
ln -s /usr/local/glusterfs-1.3.11 /usr/local/glusterfs
So we now have a basic 2 servers GlusterFS systems installed. Lets be honest, that wasn’t really hard! We are still missing configuration files though.
#Editing /usr/local/glusterfs/etc/glusterfs/glusterfs-server.vol
#
# glusterfs-servers definition
# volume definition are on first lvl, other are on second lvl (tabbed)
volume brick
type storage/posix
option directory /mnt/production/brick
end-volume
volume server
type protocol/server
option transport-type tcp/server
option auth.ip.brick.allow *
subvolumes brick
end-volume
#Editing the /usr/local/glusterfs/etc/glusterfs/glusterfs-client.vol
#
# glusterfs-client.vol
# volume definition are on first lvl, other are on second lvl (tabbed)
#
volume remote1
type protocol/client
option transport-type tcp/client
option remote-host 002.praized.com
option remote-subvolume brick
end-volume
volume remote2
type protocol/client
option transport-type tcp/client
option remote-host 001.praized.com
option remote-subvolume brick
end-volume
volume mirror0
type cluster/afr
subvolumes remote1 remote2
end-volume
#Launching services (servers and clients)
mkdir -p /mnt/production/brick
/usr/local/glusterfs-1.3.11/sbin/glusterfsd -f /usr/local/glusterfs-1.3.11/etc/glusterfs/glusterfs-server.vol
mkdir -p /mnt/production/static
/usr/local/glusterfs-1.3.11/sbin/glusterfs -f /usr/local/glusterfs-1.3.11/etc/glusterfs/glusterfs-client.vol /mnt/production/static/
You now possess a synchronized directory between your two systems. Please note that GlusterFS require TCP/6996 port to be open. There is also some improvement that can be done to this setup through adding a locking mechanism & i/o thread - I don’t currently need them, but you might.
Enjoy!
Debugging notes ; after starting the server process you should have a kernel process call glusterfs. All log files are in /usr/local/glusterfs/var/log/glusterfs*. After starting the client, “df -h” should show you your new mount point. Careful with UID/GID (&Permission), there is no such thing as root_squash_fs in GlusterFS yet.
Other notes ; Using Amazon EBS would have been the perfect solution if they did allow multiple servers-volume mount and lets us deal with concurrency / lock problems. But, they don’t.
I’ve been asked about the possibility of harnessing the power “of the cloud” in the context of an email server. For the simplicity of this blog post, I’ll assume the definition of “cloud computing” to be equivalent to “Amazon AWS” offer.
When emails goes in
This is the easy part. Receiving email in an EC2 (Elastic Cloud Computing) instance is as easy as receiving it anywhere. You launch 2 instances in different availability zone, grab 2 IP and change your MX records. With the recent availability of EBS (Elastic blocks store), you even have access to permanent storage for email. In hours (big maximum) you have a complete setup supporting fail-over and backup capability (leave your queue/data store on EBS for persistence and snapshot for backup).
Being in a full virtual environment also negate most scaling problems. You dynamically start and stop anti-{spam,virus} scanning instances following the need of your clients and customers. This setup is also very cost-effective: you don’t have to pay for hardware (servers, switches, hard drive..), maintenance, power and all the network management involved in having public infrastructure (bgp, firewall, etc…).You don’t even have to vouch for a long term contract.
For your customer, this represent a very decent offer: speed and latency in the Amazon cloud are very nice - way better than most small technical shop can afford.
Then emails have recipient
Emails are not only coming IN your infrastructure, they - sometime - must be transmitted to other people’s networks. This is where archaic email management style really fail. Emails as a services is a dynasty based on the conception that internet proprieties are big, controllable, static and permanent. This is the exact opposite of what you would get placing an email server inside Amazon Cloud.
You do not control IP space/range - even if, you are leased “1″ IP. This is the big “bug”. You have no idea what peoples do in their instances. Get used to it, your range will be tagged, {grey,black} listed often in dns based blocking list. Very often. White list will refuse your queries, since you cannot vouch for Amazon customer use of the cloud.
Solution, you can still use a smtp server install somewhere else, but… kind of defeat the whole purpose. The financial exercise of fighting dnsbl vs maintaining hardware infrastructure is left to the reader.
Amazon Elastic Block Store (EBS)
Amazon Elastic Block Store (EBS) provides block level storage volumes for use with Amazon EC2 instances. Amazon EBS volumes are off-instance storage that persists independently from the life of an instance. Amazon Elastic Block Store provides highly available, highly reliable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.
source: Amazon AWS