Apr 052008

In my previous posts about running Blat searches on the Sun Grid Engine, I mentioned I would follow up to report what kind of money we’re looking into when running such searches for mapping large amounts of sequences. The results were pretty impressive and nothing like my first benchmarks suggested. At first I reported that it would cost approximately 730$ to map about 280Mb of dna sequence. This was sort of expensive, but not expensive enough to prevent us from running it. However I inferred this only on the results of one run where the best thing to do would have been to base myself on running 2 runs because of the overhead costs of starting a program (and loading all the needed resources). It turns out that the test program run time was mostly due to overhead costs. To my great surprise when I did the actual blat run on the 280Mb of sequences which I expected to be roughly $400-500 or cpu hours it turned out it took only 11 cpu hours !! In addition surprisingly the Repeat Masker step is the one that is now more costly: where as it used to cost roughly a third or a fourth of the compute time of blat on a standalone workstation, it is now costing about 3 times more in the grid engine setup.

Under the grid engine setup on Network.com, we can not only run blat on several hundreds of nodes, it also will run roughly 10 times faster on each node than it does on my workstation. It seems blat benefits a lot from being run in 64 bit mode, or the memory installed on the sun hardware is top of the line, since blat is mostly memory accesses.