glusterfs & synchronous data storage

Labs: installation & configuration of GlusterFS as synchronous data storage solution.
By: Pascal Charest, Freesoftware consultant
Date: September, 2008.

Synchronization of files in a cloud environment is a challenge in the path of high-{availability, performance}. From simple load balanced web sites to full-blown applications - some files always need to be in sync. Peoples, for simplicity, rely on asynchronous transfer (ie: rsync ), others deploy bigger solutions (ie: block device replication through DRBD or shared storage through AoE protocol & concurrency management with OCFSv2) or even go for the “lazy” “no-shared-storage” solution through NFS.

To address this problem in the PraizedMedia software stack, I decided to give FUSE based GlusterFS a try. Awesome, really ! The technical knowledge to deploy a basic solution is very very low. The modularity of the program also help to have “something working right now”. This isn’t meant as a direct alternative to DRBD or a good SAN deployment but in my use case, it fit perfectly.

In this lab, I will guide you through the installation of GlusterFS on 2 networked systems. They will be both used as “servers” & “client” for the GlusterFS filesystem. They will be sharing a directory (on both system : /var/production/brick), re-mounted as /var/production/static through GlusterFS. Any write I/O on this directory (of any client server) will be synchronized to the pool. This last feature is called “AFR” (for automatic file replication) and is a module (called a translator) to the GlusterFS file system.

The specificity of my environment is around the file-locking management : I don’t need any. By design, the application will never try to write the same file twice on any of the server.

#Installation of requirement (standard tools)
apt-get install flex bison libfuse-dev linux-headaers-`uname -r` curl

#download of the sources
cd /usr/local/src/
curl -O http://ftp.zresearch.com/pub/gluster/glusterfs/1.3/glusterfs-CURRENT.tar.gz
tar zxf glusterfs-CURRENT.tar.gz


# configure
cd glusterfs-1.3.11
./configure --prefix=/usr/local/glusterfs-1.3.11
make && make install
ln -s /usr/local/glusterfs-1.3.11 /usr/local/glusterfs


So we now have a basic 2 servers GlusterFS systems installed. Lets be honest, that wasn’t really hard! We are still missing configuration files though.

#Editing /usr/local/glusterfs/etc/glusterfs/glusterfs-server.vol
#
# glusterfs-servers definition
# volume definition are on first lvl, other are on second lvl (tabbed)
volume brick
type storage/posix
option directory /mnt/production/brick
end-volume

volume server
type protocol/server
option transport-type tcp/server
option auth.ip.brick.allow *
subvolumes brick
end-volume


#Editing the /usr/local/glusterfs/etc/glusterfs/glusterfs-client.vol
#
# glusterfs-client.vol
# volume definition are on first lvl, other are on second lvl (tabbed)
#
volume remote1
type protocol/client
option transport-type tcp/client
option remote-host 002.praized.com
option remote-subvolume brick
end-volume

volume remote2
type protocol/client
option transport-type tcp/client
option remote-host 001.praized.com
option remote-subvolume brick
end-volume

volume mirror0
type cluster/afr
subvolumes remote1 remote2
end-volume


#Launching services (servers and clients)
mkdir -p /mnt/production/brick
/usr/local/glusterfs-1.3.11/sbin/glusterfsd -f /usr/local/glusterfs-1.3.11/etc/glusterfs/glusterfs-server.vol

mkdir -p /mnt/production/static
/usr/local/glusterfs-1.3.11/sbin/glusterfs -f /usr/local/glusterfs-1.3.11/etc/glusterfs/glusterfs-client.vol /mnt/production/static/


You now possess a synchronized directory between your two systems. Please note that GlusterFS require TCP/6996 port to be open. There is also some improvement that can be done to this setup through adding a locking mechanism & i/o thread - I don’t currently need them, but you might.
Enjoy!

Debugging notes ; after starting the server process you should have a kernel process call glusterfs. All log files are in /usr/local/glusterfs/var/log/glusterfs*. After starting the client, “df -h” should show you your new mount point. Careful with UID/GID (&Permission), there is no such thing as root_squash_fs in GlusterFS yet.


Other notes ; Using Amazon EBS would have been the perfect solution if they did allow multiple servers-volume mount and lets us deal with concurrency / lock problems. But, they don’t.

mass-storage.org

In the last couples of days, I’ve been doing a lot of experimentations on mass-storage systems. I do not want to saturate this blog with high-ends labs when most of my friends and family doesn’t clearly see the difference between a SAN and a NAS. On the other hand, I still want to publish my research process. Research might seem a bit presumptuous in the light of what I’ve published so far, but this is really just a side effect of this dichotomy.

www.mass-storage.org is my answer to this dilemma. As one of my pet project, it is an oasis (ok: small wiki) where I (and any so oriented researcher) can publish informations related to mass-storage. I’ve already published 2 articles about the recent storage labs i’ve concluded (DRBD , OCFSv2, AoE) and more is under way (about labs thatare currently under way [Lustre, AoE, DRBD Optimization])…

I should start posting more insight into my own life here (hey, it was always noted as MY private little place), and move the storage related (and more "permanent") info at m-s.org.

If you have any comments, as always, feel free to post.

Pascal Charest, directly from Camellia Sinensis on an IleSansfil connection.

You may save your extra charges by having the final deals with the cheap web hosting companies. The functionality of dedicated servers is well-liked by all small and large webmasters. The different tactics of pay per click are valuable to boost up the revenue of the internet marketers. There are a lot of the drawbacks of the shared web hosting due to the limited services of hosting providers. The web hosting services of the reliable companies are more acceptable by all clients. The web hosting services of the reputable service provider are featured with all-inclusive hosting packages in the affordable ratings.

drbd-8.2.4 as P/P setup (storage fun, part 2)

NOTE: Now on www.mass-storage.org, this blog-post isn’t up-to-date anymore. Please see mass-storage.org for the up-to-date labs note.

Fun stuff with DRBD

Ok, so yesterday, I’ve tried without much success to rebuild my computer lab with Debian/SID and unstable DRBD-8.2.5. Now that I know that the main branch of drbd can contain "unusable version", it will go a bit faster.

Installation of DRBD-8.2.4 took around 60 seconds, most of it being the download from their website and the copy of the source tree between Crystal and Ruby, my two lab systems.

# cd /usr/local/src
# wget http://oss.linbit.com/drbd/8.2/drbd-8.2.4.tar.gz
# tar xvf drbd-8.2.4.tar.gz
# apt-get install linux-headers-`uname -r` build-essential flex docbook-utils
# cd /usr/local/src/drbd-8.2.4
# make all
# make install

Online verification of the sync. status

Now the fun part :

(ruby)# drbdadm verify store

It worked like a charm. I used the "verify-alg md5;" line in my config since the kernel crypto. API already had this algorithm available and loaded. Being able to have an online verify allow me to remove the "data-integrity-alg" function I had in some of my setup - verification once a while does really reduce the cpu processing overhead of DRBD. 

The crypto. API interface speed can be tested with

# openssl speed

and currently available (loaded) functions can be queried with :

# cat /proc/crypto

Adding some security

Another thing I had never tried in the past is activating this security feature :

(/etc/drbd.conf)# cram-hmac-alg "md5" ;
(/etc/drbd.conf)# shared-secret "password";

Once again, worked as supposed. I can now see the HMAC handshake when the peer connect. The module is automatically loaded in the crypto API.

Primary/Primary setup ?

Now, here is the true test I wanted to do.

(/etc/drbd.conf)# uncommenting the "allow-two-primaries" line
(ruby&crystal)# /etc/init.d/drbd stop ; /etc/init.d/drbd start
(ruby&crystal)# drbdadm store primary

I now have a Primary/Primary setup. Fun, yet we need a filesystem with support for concurrent connections. Lets go for OCFS2 (The docs say that GFS is also supported).

(ruby&crystal)# apt-get install ocfs2-tools
(ruby&crystal)# mkdir /etc/ocfs2

The creation of the config file is very straight forward :

(/etc/ocfs2/cluster.conf)

node:
ip_port = 7777
ip_address = 10.0.0.18
number = 0
name = crystal
cluster =lab

node:
ip_port = 7777
ip_address = 10.0.0.19
number = 1
name = ruby
cluster = lab

cluster:
node_count = 2
name = lab

Configuration of the Heartbeat process is also very easy (careful to use the good cluster name).

(ruby&crystal): dpkg-reconfigure ocfs2-tools

Then the magic begin:

(ruby&crystal)# /etc/init.d/o2cb start
(ruby)# mkfs.ocfs2 /dev/drbd0
(ruby&crystal)# mount -t ocfs2 /dev/drbd0 /storage

Et Voila.

Concurrent access to the same filesystem on 2 computers. Some-one said "Cheap load-balancing/hot-fail-over for web-server" ? For the optimization part, can I loudly suggest to go, at the very minimum, with giga speed network interfaced… which bring the point that infiniband isn’t the price it used to be… and performance/latency are really a big step forward…

220-602 is very easy if you already have 640-863 and 642-432 or only 70-297 on your credit. However, going for EX0-100 might be a bit more difficult and doing 70-431 would help tremendously.

DRBD-8.2.5 on Debian/SID

While updating my Gnu/Linux lab, I’ve decided to put the latest version of DRBD (stable: 8.2.4, unstable: 8.2.5) on the testing bench. I wanted to try the "online verification" and "primary/primary" state for cluster filesystem (OCFS2, GFS).

The current version available through Debian repository is out-of-date (v8.0.8) and doesn’t have the online verification option, so I’ve had no other choice than to build my own modules & utils. Another problem was the "out-of-date" status of the ./drbd-8.2/INSTALL file. Especially about Debian systems - in fact, most of the debian related stuff seem to be broken.

So here goes the missing "INSTALL.debian" for DRBD-8.2.x. This is hosted on googledocs and will change as I invest time into it.

The whole "normal procedure" for the unstable version of DRBD over a minimal Debian/SID install would be summarized as :

# apt-get install git-core
# cd /usr/local/src
# git-clone git://git.drbd.org/drbd-8.2.git drbd-8.2
# apt-get install linux-headers-`uname -r` build-essential flex docbook-utils
# cd /usr/local/src/drbd-8.2
# make
# make doc
# make install

This will give you a valid DRBD-8.2.5 installation. You’ll need to modify /etc/drbd.conf to match your setup. One cool new feature is the "online verification":

You add the following line inside your syncer section of /etc/drbd.conf and modprobe the kernel module:

// in /etc/drbd.conf, syncer section: verify-alg crc32c;
# modprobe crc32c

# drbdadm verify store

where store is my ressource name. But…. this isn’t the end of my problems… because the command doesn’t work here. This cause my primary system to lose connection with the secondary node. Humfff… i’ll see what I can do about that tomorrow.

NOTE: finall, the problem is easy enough : the unstable is not a working version of DRBD.

For 640-863 or even 642-642 it is important to have some background knowledge of 70-292 and 70-528. If you already have 70-536 to your credit, you may be exempted from SY0-101 as well.