Thursday, October 18, 2007

Mixing HTTP and HTTPS without getting browser warnings

Great tip by Ned Batchelder: On HTTPS sites you can link to HTTP assets without getting a browser warning by using instead of
  http://fast.cdn.net/pix/smiley.jpg
the following syntax:
  //fast.cdn.net/pix/smiley.jpg

Wednesday, October 17, 2007

Why Erlang ?

Other functional languages such as Haskell or OCaml are for most people just a bit too academic. RubyOnRails is reaching end-of-hype, it's underlying language Ruby is dog slow and not scalable. Java is far too complicated for me and without blooooated tools such as eclipse or IDEA, it is a just mess to deal with, at least that was my painful experience I made recently: I developed a prototype of a Flash video streaming server in Erlang. There exists a more complete Java open source equivalent: Red5. I have a couple of years of experience as Java developer, so I thought the easiest was just to analyze the Java source code and port it to Erlang. Wrong. The Java code is so complex, that is was easier for me to reverse engineer the proprietary RTMP video streaming protocol by analyzing the TCP/IP packet flow of a running Red5 instance by using a network sniffer instead of reading its source code. Is there any other language I think one should not choose instead of Erlang ? Python. I actually had only heard good things about it, until I got to know about its dictated indentation style, which put an abrupt end for me to any further digging into that otherwise probably great language ...

Ok, end of ranting. Developers are a highly opinionated species. Sometimes discussions among developers about the right programming language take amusant, ridiculous and even religious dimensions, despite everybody trying to be objective (well, expect for his/her preferred language). So just use the right tool for the right job and don't listen to me... But listen to what the really smart and experienced people have to say. There is an interesting thread currently going on at the Erlang mailing list. A Java coder asks whether he should learn Erlang or OCaml next. Bob Ippolito (recently interviewed) says:
When I was evaluating Python alternatives for building the core
technology behind MochiAds I tried out a bunch of languages and Erlang
was the only one that was easy for me to learn and had the right
balance of features, performance, and reliability. A year later we
have about 16 machines running 80 Erlang nodes powering about 16
different "components" of our infrastructure and 4 people working on
it at the moment (originally it was just me). It worked out so well
that we rewrote the server component of our MochiBot service in Erlang
and we've been using it to build lots of internal tools such as our
monitoring software, our single sign-on service, etc. as well. None of
us had previous Erlang experience, but we're all very comfortable with
it now.

After about a year with Erlang, I'm not sure I could part with hot
code loading, light-weight processes, and multiplexed socket IO for
writing servers. Also, Mnesia has been really useful to us to
temporarily store "real-time" data (ram_copies) so that we don't have
to make users wait for it to get batched into the SQL databases. The
distribution stuff mostly Just Works once you figure out how to set it
up (though we did have one bad experience with a network partition due
to a switch acting up, it was recoverable manually).

O'Caml is a useful language too, but for writing a network app I can't
really imagine going with anything but Erlang if you're looking for
redundancy and scale. Unless you want to write your own half-baked
Erlang-like system before even trying to solve something a little
closer to your actual problem domain.

Joe Armstrong, one of the original authors of the Erlang language and Programming Erlang author says:
Erlang/OCaml/Haskell belong to the same language family - if you learn any one
of them then learning the next one in the family will be a lot easier
than starting from scratch.

These language differ - but have the same core concepts - the idea of immutable
state - programming with immutable state is the thing that you need to learn.
The details of how you do this vary from language to language (you can
use processes
with tail recursion to model state in Erlang, or monads in haskell, etc.).

I'd start with the language that most suits your problem domain - a
rough guess might
be to think of these languages as follows:

OCaml - use as a replacement for C - good for implementing virtual
machine emulators
tightly coded non-distributed applications.

Erlang - use as a replacement for Java - good for programming
distributed fault-tolerant
applications - good support for multicores/concurrency. Good as a glue
language to
glue together components co-ordinate activities on different machines etc.

Haskell - use for implementing domain specific languages, symbolic
computations etc.

And what has Erlang in the bag for Web developers ?

Not so much yet, if you look for an easy-to-learn, convention-over-configuration one-size-fits-all framework a la RubyOnRails. Web companies use Erlang today to overcome scalability problems for web based instant messaging (e.g.: ejabberd at twitter and Meebo). Among the few publicly known partially-to-mostly Erlang powered sites are MochiAds and Slideshare.

Here a little overview about some Erlang based web servers and frameworks:
  • Yaws, the most popular Erlang web server. Active development since 2002. Many contributors, lots of add-ons. If you look for an Erlang-based, Apache-like web server, than yaws is the right thing for you.
  • Erlyweb, by Yariv Sadan. The most popular Erlang MVC framework, built on top of yaws, enables you to easily do any-web-thing you can imagine, if you are a comfortable with Erlang and don' t mind to integrate yourself the AJAX toolkit of your choice. But don't expect a learning curve as with RubyOnRails, where you can start in the morning, without ever having heard anything about Ruby before, and at night you have your first simple web app running and have learnt Ruby without even noticing it.
  • Tercio, by Eric Merrit. Different philosophy than Erlyweb, targeting AJAX apps which do most or even all rendering at client side.
  • Mochiweb by Bob Ippolito. My preferred toolkit to easily build a custom HTTP server.
  • And last and least, a shameless plug for my own upcoming web framework and service, which aims to lower the barrier to entry for individuals and companies doing utility computing based development and hosting of scalable AJAX / Comet web apps. You won't even need to know Erlang to start with, unless you want to customize the framework itself. More about this when I actually have something to show ...

Tuesday, October 16, 2007

Amazon EC2 instances now with up to 15 GB RAM

Utility computing is getting more and more interesting for scalable webhosting. So far Amazon EC2 only had one instance type with 1.7 GB RAM, 160 GB HD, 32-bit platform and $0.10 per instance hour. Now there are two new Amazon EC2 instance type, both 64-bit platform and with significantly more power:

Large instance (new):
7.5 GB RAM, 850 GB HD, four times more computing performance, $0.40 per instance hour

Extra large instance (new):
15 GB RAM, 1690 GB HD, eight times more computing performance, $0.80 per instance hour

I am working right now on some Erlang tools to simplify web application hosting on utility computing infrastructure and look forward to get my hands dirty on this new EC2 instances !

Monday, October 15, 2007

Improving web application security without degrading user experience

Generally speaking, web application security involves a trade-off between security and the user's convenience. But I am not generally speaking in this article. I am trying to explore some areas where it is possible to improve security and let equal or even improve the user experience.

Single-Sign-On with OpenID based authentication

First, this makes live easier for the application developer, because to start with he has only to implement a client library (available for PHP, Ruby, Python, ...) at his server back end and not a whole authentication server. The end user gets a Single-Sign-On and he can chose from any public OpenId provider he wants. That's the theory. In practice, OpenID has been something new, just confusing the user. But this is changing rapidly, according to this document, the adoption rate is exponential and there are already 120 Mio OpenIdDs out there.
Because OpenID decouples the choice for an authentication server from the web application itself, the user can minimize the trade-off between security and convenience purely based on his preferences or his particular situation. And now there exist also innovative, password-less approaches:
  • Image-sequence based login: Vidoop. Instead of a password, the user needs to remember an image sequence. There are quite a few interesting facts about Vidoop: e.g. it can generate ad-revenue from the images for the Web application provider, it is resistant against repetitive logins with a stolen token (the image sequence) because it is based on a challenge-response method which is different for every login attempt.
  • Browser-certificate based login: MyOpenid. This is the most secure and most convenient method, as long and only as long as the authorized user is the only person with physical access to the computer with the browser, which contains that certificate.
Javascript filtering of user generated content and applications

Javascript is dangerous and common practice today, when the user is allowed to provide content, e.g. at the comment section of a blog, is to filter out at server side any possible Javascript elements in the HTML text the user provided in his comment. If interactivity is explicitly desired for user generated applications which run on the platform of the hosting application provider, then a subset of Javascript must be allowed, otherwise Web 2.0 user experience is gone. Facebook deals with this by defining its own Javascript subset called FBJS. Recently various technical approaches have been announced / discussed / released which deal with filtering of Javascript:
  • AdSafe: A Javscript subset defined by Douglas Crockford. He is initially targeting the advertising industry with it, so that interactive ads can be placed on web pages without compromising the user's security. ADsafe can also be used for of mashup components such as widgets.
  • Caja: From google. Does Javascript source-to-source translation. Currently written in Javascript, but the source code repository contains an empty folder titled "Java", so I guess that is what will come next.
  • JStify: Filtering and also automatic replacement of unsecure Javascript code. Announced, no code released yet, written in Ocaml.
  • and last and least my own not yet formally announced approach based on lexical analysis and parsing of Javascript in Erlang, as part of a Javascipt-to-Erlang compiler (to be open sourced) which I'm gonna use at the server side of my startup skast.com.

Sunday, October 14, 2007

Mount Amazon S3 on your Mac

I recently discovered s3fs, a fuse based file system which allows to mount an Amazon S3 bucket like a normal file system at your PC. With other words, it is very easy and relatively cheap now to expand your local hard drive with terabytes of backup or Photo/Video storage. And with the recently announced Amazon S3 SLA, it seems Amazon is committed to continue with this service.
s3fs is currently only provided in form of source code. But with a little bit of tweaking I got it compiling and running on my Mac, see below the instructions and the required Mac specific modifications:

Get and install the latest MacFUSE-Core. This is just background process, without any
GUI elements.

Checkout the s3fs source code. Because it is just one file, you can even copy it manually in a newly created directory and add the following line to s3fs.cpp
(after #define FUSE_USE_VERSION 26):
 #define __off_t off_t
Start a terminal, change to the directory with the s3fs.cpp file inside and prepare the environment:
  export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
Next modify the Makefile (or create one, if you just downloaded the s3fs.cpp):
  all:
g++ -Wall -D__FreeBSD__=10 -D_FILE_OFFSET_BITS=64 $(shell pkg-config fuse --cflags --libs) -lcurl -lcrypto $(shell xml2-config --cflags --libs) -ggdb s3fs.cpp -o s3fs
@echo ok!

clean:
rm -f s3fs s3fs.o
If everything went well, you should have now a binary file s3fs in that directory. Now create a file /etc/passwd-s3fs which contains just your Amazon ID and secret key separated by ":", e.g:
  example-id:example-secret-key
now you create a new Amazon S3 bucket. I have been using the S3 Browser for that. Define a mount point, I just created a new directory in my home folder for that. Now you can mount that directory to your newly created Amazon S3 Bucket by running the following command:
  ./s3fs your-bucket-name your-mount-point-directory
Now you should see the MacFUSE icon in the Mac OS X Finder. And any file you put into your mount point directory is now physically stored at your bucket at Amazon S3. It's not very user friendly yet, let's hope the MacFusion guys integrate it soon into their excellent Fuse tool. I plan to integrate this S3 bucket access via s3fs into my software development toolchain. Because everything is scripted there, s3fs works fine for me, even if lots of important features are still missing.

Update:

In case you don't have libxml2 installed already, you need to install it first:
  sudo port install libxml2