Another technical post :
Untangle installation in router mode.
Untangle is an all inclusive, statefull, packet router. It can deal with virus analysis, spam filtering, intrusion detection, firewall, nating, vpn server, remote access portal and much, much more. It comes as a
live knoppix cdrom of 400mb offering an intuitive installation wizard (note: will wipe your HD). It does seem to support a lot of hardware configuration ‘out-of-the-box’ since it installed on a DELL 4600 (dual xeon 2.4ghz) with an old raid controller (perc/3) with only some small difficulties concerning the USB controller (keyboard interface, not the mouse) which was easily fixed by switching to PS/2 devices. I was unable to see the error since the keyboard was automatically deactivated by the hardware detection process and “alt-f2″ (to show boot process) was unavailable.
The installation process is very straight forward. You don’t even need the online documentation (
wiki,
UserGuide,
QuickStart) : even the admin password is defined by the user in the first boot process. There is one ‘must-known’ thought : the post-installation process (configuration of the ‘rack’, a list of software affecting inbound connection) require an internet access and an access to untangle ‘web-store’. This isn’t very fun if you want to replace a live router or if you are installing behind a proxy.
While it is a great product, Untangle allows a fallback to console/terminal for advanced tech guys, I’ve had quite a few troubles with this error:
cannot start a transaction within a transaction. Untangle uses SQLite databases which easily goes into deadlocks when 2 operations are committed at the same time (like 2 hits on “save”). The best advice I can give you : if you see this error, immediately go through the computer restart procedure. Seem an harsh solution, but it work and will prevent your database of queuing requests that will, anyway, never be completed.
Conclusion : good GUI for an easy to configure router, easy to fallback to GNU/Linux and modify the system. Available as a
vmware image,
windows installer (re-router) or
downloadable iso. It’s a recommendation.
Stuff moving pretty fast:
In the next few days, Laboratoires Phoenix (my corp.) will be unveiling new services to the current holder of premium account - those are normally people with whom I already have a business relationship. We are speaking of computer-based monitoring from locations around the world, human-based monitoring and generic maintenance.
Since I already have a very decent job (”CloudMaster @ PraizedMedia” / Operation specialist), the income generated by Laboratoire Phoenix are reinvested in infrastructure to help Canadian start-up. There is already quite a few people that have shown interest. If you have an interesting project feel free to reach me on pascal.charest@gmail.com ; If I’m interested and have free time I can contribute servers, bandwidth and even do some free-of-charge consulting. My network of contact can also be of use.
Since I’m speaking about myself (;-)) : I have also been asked to draft/compose a 8-10 pages articles on “initiation au cloud computing” to be published at 25k+ printed version in Europe. On the same subject, I’ve been selected to contribute a chapter to an upcoming guide @ O’Reilly. Life could be harder ;-) - but this is meant as an early “warning” - if anyone is interested in developing a career as technical writer, feel free to get in touch with me. Author can easily become co-author ;-).
Bear with me, this should not be too painful - will start weird, but everything will be clear pretty quickly.
.
You have a shower in your bathroom (warned ya!). The water pipe pressure is an important factor for the enjoyment of your time in the said shower. This is the normal analogy used to explain network bandwidth (some time the size of the pipe is used, but then, who can “dynamically change the pipe diameter?”). This is the based of most network scalability research : getting more/less pressure to a given pipe and observe the result. Everyone understand this comparison and those studies ; but it doesn’t go far enough. You pressure isn’t everything!
.
What most peoples doesn’t take into account is the time it take, from turning the knob to ‘wetness’. Think about it: there is a time threshold over which your ‘morning schedule’ would change if the time ‘it take’ get higher. Well this ‘limit’, in the networked world, is called latency and have been the subject of studies by Google and Amazon:
.
So, Marissa ran an experiment where Google increased the number of search results to thirty. Traffic and revenue from Google searchers in the experimental group dropped by 20%.
[...]
After a bit of looking, Marissa explained that they found an uncontrolled variable. The page with 10 results took .4 seconds to generate. The page with 30 results took .9 seconds.
source: Greg Linden (blog) speaking about Google VP’s Marissa Mayer speech @ web2.0 conference in 2006.
.
This conclusion may be surprising — people notice a half second delay? — but we had a similar experience at Amazon.com. In A/B tests, we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue.
Source: Ibid, now speaking about Amazon
.
This latency (well, duh!) is the underdog of network variables. It is also the one that is going to cost you the most customer. Think about it: When your apps/site fail, your not lacking bandwidth, its your latency that goes over a given limit and thing start to fail as a safety. Without this limit, websites would work (or don’t work, this is another subject) like queues in any gov. agency : you wait for the resource to be available… You can get out anytime you want, you must wait to get the resource.
.
Now, why am I writing about this ?
.
Thinking about modern websites, I can’t see how they can be built without thinking about latency. Each user connection to a system does so at a cost (ok, small cost) to other users. This is a given, this is why capacity planning exist. You should know where you latency become a limiting factor in your infrastructure.
.
Here’s an example where they didn’t knew the limit: buzzz.tv. Their latency got over the threshold in the last election talk and the system wasn’t available anymore. What’s this error cost ? My humble opinion is that they burned themselves. Their idea/concept is so nice, giving a voice back to the tv consumer… I can’t understand why they did not invest for 20 (random number) servers @ Amazon AWS for this one night! (20 large servers, 24h x 0.40$/hrs = 192$) ; should have been able to go from 200 users to a couples thousand - easily. This kind of services would then (once you show your able to build reliable infrastructure) be so easy to sell to any “series pilot”/”interactive show”.
.
Anyway, don’t want to start rambling about random corporation/group. The bottom line is: “If you have a known incoming traffic surge, spend a couple bucks to do some capacity planning… at the end, it does make a big difference on your service success..”. The service might not always be cheap, but relying on “its going to work/survive the spike” isn’t a bright business move… (once again, my opinion, tainted by the fact that I do some capacity planning consultation).
Labs: installation & configuration of GlusterFS as synchronous data storage solution.
By: Pascal Charest, Freesoftware consultant
Date: September, 2008.
Synchronization of files in a cloud environment is a challenge in the path of high-{availability, performance}. From simple load balanced web sites to full-blown applications - some files always need to be in sync. Peoples, for simplicity, rely on asynchronous transfer (ie: rsync ), others deploy bigger solutions (ie: block device replication through DRBD or shared storage through AoE protocol & concurrency management with OCFSv2) or even go for the “lazy” “no-shared-storage” solution through NFS.
To address this problem in the PraizedMedia software stack, I decided to give FUSE based GlusterFS a try. Awesome, really ! The technical knowledge to deploy a basic solution is very very low. The modularity of the program also help to have “something working right now”. This isn’t meant as a direct alternative to DRBD or a good SAN deployment but in my use case, it fit perfectly.
In this lab, I will guide you through the installation of GlusterFS on 2 networked systems. They will be both used as “servers” & “client” for the GlusterFS filesystem. They will be sharing a directory (on both system : /var/production/brick), re-mounted as /var/production/static through GlusterFS. Any write I/O on this directory (of any client server) will be synchronized to the pool. This last feature is called “AFR” (for automatic file replication) and is a module (called a translator) to the GlusterFS file system.
The specificity of my environment is around the file-locking management : I don’t need any. By design, the application will never try to write the same file twice on any of the server.
#Installation of requirement (standard tools)
apt-get install flex bison libfuse-dev linux-headaers-`uname -r` curl
#download of the sources
cd /usr/local/src/
curl -O http://ftp.zresearch.com/pub/gluster/glusterfs/1.3/glusterfs-CURRENT.tar.gz
tar zxf glusterfs-CURRENT.tar.gz
# configure
cd glusterfs-1.3.11
./configure --prefix=/usr/local/glusterfs-1.3.11
make && make install
ln -s /usr/local/glusterfs-1.3.11 /usr/local/glusterfs
So we now have a basic 2 servers GlusterFS systems installed. Lets be honest, that wasn’t really hard! We are still missing configuration files though.
#Editing /usr/local/glusterfs/etc/glusterfs/glusterfs-server.vol
#
# glusterfs-servers definition
# volume definition are on first lvl, other are on second lvl (tabbed)
volume brick
type storage/posix
option directory /mnt/production/brick
end-volume
volume server
type protocol/server
option transport-type tcp/server
option auth.ip.brick.allow *
subvolumes brick
end-volume
#Editing the /usr/local/glusterfs/etc/glusterfs/glusterfs-client.vol
#
# glusterfs-client.vol
# volume definition are on first lvl, other are on second lvl (tabbed)
#
volume remote1
type protocol/client
option transport-type tcp/client
option remote-host 002.praized.com
option remote-subvolume brick
end-volume
volume remote2
type protocol/client
option transport-type tcp/client
option remote-host 001.praized.com
option remote-subvolume brick
end-volume
volume mirror0
type cluster/afr
subvolumes remote1 remote2
end-volume
#Launching services (servers and clients)
mkdir -p /mnt/production/brick
/usr/local/glusterfs-1.3.11/sbin/glusterfsd -f /usr/local/glusterfs-1.3.11/etc/glusterfs/glusterfs-server.vol
mkdir -p /mnt/production/static
/usr/local/glusterfs-1.3.11/sbin/glusterfs -f /usr/local/glusterfs-1.3.11/etc/glusterfs/glusterfs-client.vol /mnt/production/static/
You now possess a synchronized directory between your two systems. Please note that GlusterFS require TCP/6996 port to be open. There is also some improvement that can be done to this setup through adding a locking mechanism & i/o thread - I don’t currently need them, but you might.
Enjoy!
Debugging notes ; after starting the server process you should have a kernel process call glusterfs. All log files are in /usr/local/glusterfs/var/log/glusterfs*. After starting the client, “df -h” should show you your new mount point. Careful with UID/GID (&Permission), there is no such thing as root_squash_fs in GlusterFS yet.
Other notes ; Using Amazon EBS would have been the perfect solution if they did allow multiple servers-volume mount and lets us deal with concurrency / lock problems. But, they don’t.
OpenSSH client keep a fingerprint of servers to which connections (ssh-client) have been made. Such fingerprints are stored in .ssh/know_hosts and are automatically compared with the current server fingerprint on connection acknowledgment.
Hence, the .ssh/known_hosts file is crucial to system security against man-in-the-middle attack in a networked environment. This file is also a very very good vector of attack on system administrator computer and hashing the content of the file is a good practices. Especially with the current wave of big bugs hitting GNU/Linux systems.
The first step is to enable hashing of the new fingerprints:
# cat /etc/ssh/ssh_config
Host *
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication no
GSSAPIDelegateCredentials no
TCPKeepAlive yes
ServerAliveInterval 60
The “HashKnownHosts yes” configuration option is the way to go - it is a general setting affecting all users on your system (Host *). If you don’t have access to the central ssh_config option, don’t forget you have personalized user setting in .ssh/config.
This enable the hashing of future fingerprints. To modify your actual file, use the following ssh-keygen command. Your unmodified know_hosts will be save as know_hosts.old .
# ssh-keygen -H -f .ssh/know_hosts
Have fun, stay safe.