Contact and travel information
|
Download this tutorial in zip format here Patch for openMosix 2.4.17 and NCBI BLAST Installing NCBI BLAST (v. 20020426) on openMosix-2.4.1 0. Introduction 1. Downloading NCBI BLAST 2. Preparing NCBI BLAST for
openMosix 3. Tips and Troubleshooting
openMosix (http://www.openMosix.org) on a Linux cluster provides a transparent and very convenient mechanism for migrating BLAST processes across the cluster and load balancing without needing to link in special libraries or re-engineer existing code. If one installs duplicate copies of BLAST database files on every node, the DFSA/MFS filesystem included with openMosix take job I/O load into account when load-balancing and eliminate the need to send massive sequence databases across the network as BLAST jobs migrate. Every BLAST job spawned from the user's login node can transparently migrate to another node and read the *local* database on that node. Unfortunately, NCBI BLAST (and WU-BLAST) have features enabled that are not yet supported by the openMosix process migration, specifically memory mapped I/O to the migratable file system and support for multithreading with shared memory partitions. Hence one must recompile BLAST before installing it on the openMosix cluster. Note that this recompile is necessary even for users with small search databases as the memory-mapped I/O in BLAST will prevent openMosix process migration even in the absence of MFS! 0a. Prerequisites
You must verify that simple Perl/awk/python scripts at least can migrate to other nodes! Also you should verify that the /mfs filesystem is mounted and can access files on all your other nodes. For both of these, please visit the openMosix.org website for setup information.
The machines are adequate for running BLAST, but one may want more RAM for very heavily loaded clusters. Also, I assume throughout this install you are doing everything AS ROOT. This is more of a convenience for me but you may choose to do otherwise. Keep in mind permissions, etc. if you work as non-root will be different and could lead to errors not detailed here.
ftp://ftp.ncbi.nih.gov The version as tested was the 20020426 release,
you need all three files:
1a. Test compile the original installation by extracting them into one directory: Note '>' below is the UNIX prompt do not type it in! >cd ~
>mkdir ncbiToolbox
>cp ncbi.tar.gz data.tar.gz README.bls ./ncbiToolbox
>cd ncbiToolbox
>gunzip *.tar.gz
>tar xvf data.tar
>tar xvf ncbi.tar
Then compile BLAST by running: >./ncbi/make/makedis.csh
The compile will proceed. You can ignore most errors as long as they are not fatal. If the compile fails to complete successfully, go no further, as you will not be able to run this installation on your system without resolving them first! 1b. Create the BLAST database You can read the file ncbi/doc/README.formatdb and create your own database following their suggestions, or you can choose to use a pre-formatted database. We use 'nt' pre-formatted for testing. Download it at: ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/nt.tar.gz Create a future home for BLAST: >mkdir /opt/blast
For testing, install the database into the future BLAST database directory /opt/blast/db (or your directory of choice! It could also be /usr/local/blast or whatever you feel is most convenient). Install the database file into this directory >mv ~/nt.tar.gz /opt/blast/db
>cd /opt/blast/db
>gunzip nt.tar.gz
>tar xvf nt.tar
In nt's case, you will end up with the 7 database files of the 'nt' release:
These seven files are the database and ALL of them need to ultimately be duplicated onto every node of your cluster. 1c. Move the BLAST home with
databases onto every node in your cluster The BLAST executables themselves are small and shouldn't require installation on every single node. The databases, however, are very large and will be read in their entirety by every search. Having them stored locally on all other nodes is a must. After extracting and installing nt on the master node, copy your currently empty (well, with databases!) BLAST home to all nodes. Depending on your network and the db size within them this will take several minutes to hours. (You should probably write a shell script to do this). The long way: >cp -vr /opt/blast /mfs/2
>cp -vr /opt/blast /mfs/3
>cp -vr /opt/blast /mfs/4
...etc
Make sure all databases made it there after a while. Note: The directory structure you chouse to house the databases MUST be replicated on every node throught your cluster! This is crucial that the directory structures are identical. Otherwise migrating BLAST processes will not know where to find the localized DBs!
Before proceeding, make sure that BLAST runs at least on your local node and can read the local database. You must create a query file (make one up in FASTA format). Create a file called testblast.fa and paste the following two lines into it exactly (including the '>'): >AF24986.1|AF116242_1 (AF116242) K-Cl cotransporter KCC3 [Homo sapiens] ATAGGATAGGACCAGATTAGGACCACACAGGATAGGGACCACCCCCAAGAGAATAAGGACACAAACCACA Copy it into your BLAST home on your local machine (doesn't need to be copied to all others): >cp testblast.fa /opt/blast
Test your recently compiled BLAST: >cd ~/ncbiToolbox/ncbi/build/
>./blastall -p blastn -d /opt/blast/db/nt -i /opt/blast/testblast.fa
It should run smoothly and give you several matches against the 'nt' dataset. Now for the fun part...
Once you have successfully shown that you can build and run a simple local node BLAST query, you'll have to apply a patch to enable BLAST to run without several conflicting features enabled by default. The patch is available at: http://www.stl.bioinformatics.med.umich.edu/OM_BLAST_patch/ncbi_blast_openmosix2.4.17.patch
Test the patch harmlessly as follows: >cd ~ >patch -Np1 -d ./ncbiToolbox/ --verbose -i ~/ncbi_blast_openmosix2.4.17.patch --dry-run If it does not work, check the path to make sure all directories
are correct. 2a. Apply the patch to your
NCBI BLAST source directory: >patch -Np1 -d ./ncbiToolbox/ --verbose -i ~/ncbi_blast_openmosix2.4.17.patch
2b. Now recompile BLAST as before
>cd ncbiToolbox >./ncbi/make/makedis.csh The recompilation should proceed without error and overwrite the ./ncbi/build directory with new binaries. 2c. Test these new binaries against a local database After recompiling, verify that your new binaries work locally by doing a sample query: >cd ~/ncbiToolbox/ncbi/build/
>./blastall -p blastn -d /opt/blast/db/nt -i /opt/blast/testblast.fa
Then test DFSA migration! 1. Open a second terminal window, run the openMosix status monitor so you can see: >mosmon -t
2. In your main login, run BLAST from your main node,
but using a database stored elsewhere: You should see the process load on node #2 increase! 2d. Copy your new binaries into their final BLAST home: >cd ~/ncbiToolbox/ncbi/build/
>cp bl2seq blastall blastclust blastpgp coat megablast rpsblast seedtop /opt/blast
Also make sure you have the data.tar.gz extracted into the BLAST home: >mv ~/data.tar.gz /opt/blast >cd /opt/blast >gunzip data.tar.gz >tar xvf data.tar Congrats! You are done. 3. Running simultaneous BLAST jobs We have found that the node migration works best on our cluster if we slightly space the requests to the individual nodes. You can get away with process migration automatically (without specifying a node as in the last query above), but running BLAST using the "here" mfs simlink: >/opt/blast/blastall -p blastn -d /mfs/here/opt/blast/db/nt -i ./testblast1.fa -o out.txt & >sleep 1 >/opt/blast/blastall -p blastn -d /mfs/here/opt/blast/db/nt -i ./testblast2.fa -o out2.txt & >sleep 1 >/opt/blast/blastall -p blastn -d /mfs/here/opt/blast/db/nt -i ./testblast3.fa -o out3.txt & >sleep 1
Please visit the openMosix mailing list for more details on this. Carlos Santos
|
||