Apr 052008

In my previous posts about running Blat searches on the Sun Grid Engine, I mentioned I would follow up to report what kind of money we’re looking into when running such searches for mapping large amounts of sequences. The results were pretty impressive and nothing like my first benchmarks suggested. At first I reported that it would cost approximately 730$ to map about 280Mb of dna sequence. This was sort of expensive, but not expensive enough to prevent us from running it. However I inferred this only on the results of one run where the best thing to do would have been to base myself on running 2 runs because of the overhead costs of starting a program (and loading all the needed resources). It turns out that the test program run time was mostly due to overhead costs. To my great surprise when I did the actual blat run on the 280Mb of sequences which I expected to be roughly $400-500 or cpu hours it turned out it took only 11 cpu hours !! In addition surprisingly the Repeat Masker step is the one that is now more costly: where as it used to cost roughly a third or a fourth of the compute time of blat on a standalone workstation, it is now costing about 3 times more in the grid engine setup.

Under the grid engine setup on Network.com, we can not only run blat on several hundreds of nodes, it also will run roughly 10 times faster on each node than it does on my workstation. It seems blat benefits a lot from being run in 64 bit mode, or the memory installed on the sun hardware is top of the line, since blat is mostly memory accesses.

Mar 082008

Some follow up with running blat on network.com. Unfortunately, using blat on smaller chunks of DNA, like chromosomes wasn’t the way to go: I quickly rearranged my qsub submissions so that individual chromosomes maybe searched instead of whole genomes at once. twoBitInfo, which retrieves the information related to the chromosomes in a 2bit file, was my friend for that, however I obtained the same “killed” error message for the twoBitInfo utility, hinting seriously at some compilation issues. I went back on track with my efforts to cross compile blat for amd64 on my RHEL4 box but that still gave me error message on the grid engine.

By the way, i disgress a bit but unless you didn’t realize, testing those binaries isn’t the most fun thing to do if you don’t have a solaris box setup, as each time you need to reupload the binaries and scripts and test them live, while at the same time wasting 1cpu/hour. Also somehow lately jobs are taking forever to start up, even though the job id increments only by one (hinting that there were no other jobs running during my wait time).

Seeing how my efforts at cross compilation failed miserably, I decided that my next move would be to try and compile blat natively… After running grid engine jobs solely for compiling blat (a job that takes 5 seconds on my machine, a bit of an overkill to use the cluster but thank god i’m not using all 5000 cpus) I managed to compile the blatSuite successully. The major bit to “porting” was the error message:

pscmGfx.c: In function `colinearPoly':
pscmGfx.c:390: warning: implicit declaration of function `isinf'
gmake[1]: *** [pscmGfx.o] Error 1
gmake: *** [topLibs] Error 2

it seems the isinf function is a source of headache for people porting to solaris, some suggestions (http://www.ruby-forum.com/topic/70926) are to change the compiler flags to gnu99, but that was to no avail. I resolved it by removing pscmGfx.o from the makefile, since it’s not used by blat. the mods I made to the common make file follow, some changes maybe useless but I did not take the time to test:

CFLAGS=-L/usr/sfw/lib/amd64 -L/usr/lib -lnsl -lsocket -lresolv
HG_INC=-I../inc -I../../inc -I../../../inc -I../../../../inc -I../../../../../inc -I/usr/include -I/usr/sfw/lib/gcc/i386-pc-solaris2.10/3.4.3/include/

more on benchmarking later…

update: Benchmarking estimate tell me that it will cost 730$ to repeat mask and map a whole library’s worth of end sequences (about 280 Megabases onto the human genome)

Mar 062008

The standard solaris and opteron binaries for blat don’t work on network.com’s grid engine. the processes don’t get a chance to start and are killed on the spot.

the error message is pretty cryptic too:

/var/tmp/spool/r130c23z1/job_scripts/295234: line 52: 7090 Killed "$BLAT" "$DBPATH" "$CHUNK" "$OUTFILE" $PARAMS -dots=`echo $(($CHUNKSIZE/10))`

I will have to recompile them from the sources with different architecture flags.

update: recompilation won’t be necessary, it seems blat is also being spontaneously killed locally on my server, a behavior I had forgotten about. It seems that blatting individual chromosomes instead of whole genomes will be the way to go.