email servers “in the cloud”

I’ve been asked about the possibility of harnessing the power “of the cloud” in the context of an email server. For the simplicity of this blog post, I’ll assume the definition of “cloud computing” to be equivalent to “Amazon AWS” offer.

When emails goes in
This is the easy part. Receiving email in an EC2 (Elastic Cloud Computing) instance is as easy as receiving it anywhere. You launch 2 instances in different availability zone, grab 2 IP and change your MX records. With the recent availability of EBS (Elastic blocks store), you even have access to permanent storage for email. In hours (big maximum) you have a complete setup supporting fail-over and backup capability (leave your queue/data store on EBS for persistence and snapshot for backup).

Being in a full virtual environment also negate most scaling problems. You dynamically start and stop anti-{spam,virus} scanning instances following the need of your clients and customers. This setup is also very cost-effective: you don’t have to pay for hardware (servers, switches, hard drive..), maintenance, power and all the network management involved in having public infrastructure (bgp, firewall, etc…).You don’t even have to vouch for a long term contract.

For your customer, this represent a very decent offer: speed and latency in the Amazon cloud are very nice - way better than most small technical shop can afford.

Then emails have recipient
Emails are not only coming IN your infrastructure, they - sometime - must be transmitted to other people’s networks. This is where archaic email management style really fail. Emails as a services is a dynasty based on the conception that internet proprieties are big, controllable, static and permanent. This is the exact opposite of what you would get placing an email server inside Amazon Cloud.

You do not control IP space/range - even if, you are leased “1″ IP. This is the big “bug”. You have no idea what peoples do in their instances. Get used to it, your range will be tagged, {grey,black} listed often in dns based blocking list. Very often. White list will refuse your queries, since you cannot vouch for Amazon customer use of the cloud.

Solution, you can still use a smtp server install somewhere else, but… kind of defeat the whole purpose. The financial exercise of fighting dnsbl vs maintaining hardware infrastructure is left to the reader.

hashing the know_hosts file

OpenSSH client keep a fingerprint of servers to which connections (ssh-client) have been made. Such fingerprints are stored in .ssh/know_hosts and are automatically compared with the current server fingerprint on connection acknowledgment.

Hence, the .ssh/known_hosts file is crucial to system security against man-in-the-middle attack in a networked environment. This file is also a very very good vector of attack on system administrator computer and hashing the content of the file is a good practices. Especially with the current wave of big bugs hitting GNU/Linux systems.

The first step is to enable hashing of the new fingerprints:

# cat /etc/ssh/ssh_config
Host *
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication no
GSSAPIDelegateCredentials no
TCPKeepAlive yes
ServerAliveInterval 60

The “HashKnownHosts yes” configuration option is the way to go - it is a general setting affecting all users on your system (Host *). If you don’t have access to the central ssh_config option, don’t forget you have personalized user setting in .ssh/config.

This enable the hashing of future fingerprints. To modify your actual file, use the following ssh-keygen command. Your unmodified know_hosts will be save as know_hosts.old .

# ssh-keygen -H -f .ssh/know_hosts

Have fun, stay safe.

security of package manager

The subject of the week seem to be information security, so I’ll get on with another post that should keep you awake - well… if you are a system administrator doing his job.

With the DNS vulnerability, we thought that this was the bottom of the barrel. Yet researcher are always able to amaze us: Attacks on package managers.

Ok, I must admit that it isn’t as bad as others bugs. Most of the risk can me mitigated by requesting meta-data verification (openssl) from your packager source or selecting a trusted repository. Still - I’ll verify all my sources…

The days the routers died…

By now, most of you must have seen the news and already freaked out as it warrant - but I’ll take a guess and suppose that some might have been, like me, hidden all days, working on some obscure deployment and haven’t heard the news:

So, “Multiple DNS implementations vulnerable to cache poisoning“.. in my book, this goes right with debian-openssl fiasco as some of the worst bug I’ve seen in years (well, Solaris telnet & Microsoft ping of death might also be good candidates).

So get ready for another night of server update…

Issues of data in the cloud…

If you exclude all discussions about who invented what and whose name should appear near the definition of cloud computing (which is still less than an embryo), there is some pretty good threads going on over this “Cloud-Computing” group. One of my favorite is the challenges that computing “in the cloud” is bringing us.

I’m not that interested in defining “cloud computing” - there is so much discussion around the exact wording and how it compare to grid computing, SaaS, utility computing … it’s not even funny anymore. In addition to that, I’ve built my first “cloud-like” system in last January (2008), which is a big 6 months after google trend start acknowledging the word. I’m kinda late to this party.

Yet, in order to allow everyone to understand the next few posts, I’ll need to explain what it is. Do not mistake this text for a definition, its really only a very general - non technical - description of a how a cloud might appear:

a fully virtualized environment where the client control the application (sometime integrated with an operating system) and the provider offer a visualization layer over physically distributed hardware. The easiest example there is : Amazon EC2 & Enomaly.

Your application (which is a part or whole operating system) is run dynamically on computers around the world. If the computer where your code is crash, another one take the load. Your application can be migrated without you knowing it (no slow-down or interruption of services) and their infrastructure can easily evolve.

You control your application - they control the hardware. In other words, we are speaking of adding a layer of abstraction between the device driver and the application - a second operating system.

So, in a third party cloud system, we are in presence of dynamically allocated resources - you do not own - to your application. As a preview for my next posts, lets see how this might be dangerous.

Security, laws and localization of data in the cloud
This is really the issue which will be the most present in the next few years : in such services, you can’t know the exact localization of your data. Which mean you can’t know which law applies - and when.

By automatic process, your application can be migrated to another datacenter, in another country, under specific laws. Which could then allow … you to run your precious code… or…. them to read your private files.

Cloud computing based corporation still have a lot of work to defined all those variable - especially if they want to be an interesting option to corporation & government where privacy is important.