Molecular biology search

States lab home

About the group

Research projects

People

Collaborators

News and events

Lab meetings

Rotation opportunities

Recruiting

Software and data

Contact and travel information

 

 

 

 

Download this tutorial in zip format here

Patch for openMosix 2.4.17 and NCBI BLAST

Installing NCBI BLAST (v. 20020426) on openMosix-2.4.1

0. Introduction
0a. Prerequisites

1. Downloading NCBI BLAST
1a. Test compile the original application
1b. Create/download a test database
1c. Move the databases onto every node in your cluster
1d. Test that BLAST works locally!

2. Preparing NCBI BLAST for openMosix
2a. Apply the openMosix-BLAST patch
2b. Recompile BLAST
2c. Test BLAST migration
2d. Install built executables in final BLAST home

3. Tips and Troubleshooting


0. Introduction

openMosix (http://www.openMosix.org) on a Linux cluster provides a transparent and very convenient mechanism for migrating BLAST processes across the cluster and load balancing without needing to link in special libraries or re-engineer existing code.

If one installs duplicate copies of BLAST database files on every node, the DFSA/MFS filesystem included with openMosix take job I/O load into account when load-balancing and eliminate the need to send massive sequence databases across the network as BLAST jobs migrate. Every BLAST job spawned from the user's login node can transparently migrate to another node and read the *local* database on that node.

Unfortunately, NCBI BLAST (and WU-BLAST) have features enabled that are not yet supported by the openMosix process migration, specifically memory mapped I/O to the migratable file system and support for multithreading with shared memory partitions. Hence one must recompile BLAST before installing it on the openMosix cluster.

Note that this recompile is necessary even for users with small search databases as the memory-mapped I/O in BLAST will prevent openMosix process migration even in the absence of MFS!

0a. Prerequisites

  • 1 Fully functioning openMosix-enabled cluster

You must verify that simple Perl/awk/python scripts at least can migrate to other nodes!

Also you should verify that the /mfs filesystem is mounted and can access files on all your other nodes. For both of these, please visit the openMosix.org website for setup information.

Example (at UM Bioinfo) cluster

  • 13 Dell Dimension workstations with
  • 2x1.4Ghz Xeon cpu
  • 500Mb RAM (should be more perhaps)
  • 40Gb Hard disk
  • 100Mbit ethernet

Software on every node

  • Linux RedHat 7.2 (Last release before RH7.3)
  • openMosix-2.4.17-smp kernel applied (using the RPM!)
  • MFS filesystem mounted on every node as /mfs

The machines are adequate for running BLAST, but one may want more RAM for very heavily loaded clusters.

Also, I assume throughout this install you are doing everything AS ROOT. This is more of a convenience for me but you may choose to do otherwise. Keep in mind permissions, etc. if you work as non-root will be different and could lead to errors not detailed here.


1. Downloading BLAST

The current BLAST are part of the NCBI Toolbox and are freely available from the NCBI at their FTP server

ftp://ftp.ncbi.nih.gov

The version as tested was the 20020426 release, you need all three files:

1a. Test compile the original installation by extracting them into one directory:

Note '>' below is the UNIX prompt do not type it in!

>cd ~
>mkdir ncbiToolbox
>cp ncbi.tar.gz data.tar.gz README.bls ./ncbiToolbox
>cd ncbiToolbox
>gunzip *.tar.gz
>tar xvf data.tar
>tar xvf ncbi.tar

Then compile BLAST by running:

>./ncbi/make/makedis.csh

The compile will proceed. You can ignore most errors as long as they are not fatal. If the compile fails to complete successfully, go no further, as you will not be able to run this installation on your system without resolving them first!

1b. Create the BLAST database

You can read the file ncbi/doc/README.formatdb and create your own database following their suggestions, or you can choose to use a pre-formatted database. We use 'nt' pre-formatted for testing. Download it at:

ftp://ftp.ncbi.nih.gov/blast/db/FormattedDatabases/nt.tar.gz

Create a future home for BLAST:

>mkdir /opt/blast

For testing, install the database into the future BLAST database directory /opt/blast/db (or your directory of choice! It could also be /usr/local/blast or whatever you feel is most convenient). Install the database file into this directory

>mv ~/nt.tar.gz /opt/blast/db
>cd /opt/blast/db
>gunzip nt.tar.gz
>tar xvf nt.tar

In nt's case, you will end up with the 7 database files of the 'nt' release:

  1. nt.nnd
  2. nt.nni
  3. nt.nsd
  4. nt.nsi
  5. nt.nsq
  6. nt.ntd
  7. nt.nti

These seven files are the database and ALL of them need to ultimately be duplicated onto every node of your cluster.

1c. Move the BLAST home with databases onto every node in your cluster

The BLAST executables themselves are small and shouldn't require installation on every single node. The databases, however, are very large and will be read in their entirety by every search. Having them stored locally on all other nodes is a must.

After extracting and installing nt on the master node, copy your currently empty (well, with databases!) BLAST home to all nodes. Depending on your network and the db size within them this will take several minutes to hours. (You should probably write a shell script to do this).

The long way:

>cp -vr /opt/blast /mfs/2
>cp -vr /opt/blast /mfs/3
>cp -vr /opt/blast /mfs/4
         ...etc

Make sure all databases made it there after a while.

Note: The directory structure you chouse to house the databases MUST be replicated on every node throught your cluster!

This is crucial that the directory structures are identical. Otherwise migrating BLAST processes will not know where to find the localized DBs!


1d. Test the original BLAST executables against your new database

Before proceeding, make sure that BLAST runs at least on your local node and can read the local database. You must create a query file (make one up in FASTA format).

Create a file called testblast.fa and paste the following two lines into it exactly (including the '>'):

>AF24986.1|AF116242_1 (AF116242) K-Cl cotransporter KCC3 [Homo sapiens]
ATAGGATAGGACCAGATTAGGACCACACAGGATAGGGACCACCCCCAAGAGAATAAGGACACAAACCACA

Copy it into your BLAST home on your local machine (doesn't need to be copied to all others):

>cp testblast.fa /opt/blast

Test your recently compiled BLAST:

>cd ~/ncbiToolbox/ncbi/build/
>./blastall -p blastn -d /opt/blast/db/nt -i /opt/blast/testblast.fa

It should run smoothly and give you several matches against the 'nt' dataset.

Now for the fun part...


2. Preparing NCBI BLAST for openMosix

Once you have successfully shown that you can build and run a simple local node BLAST query, you'll have to apply a patch to enable BLAST to run without several conflicting features enabled by default.

The patch is available at:

http://www.stl.bioinformatics.med.umich.edu/OM_BLAST_patch/ncbi_blast_openmosix2.4.17.patch


Download and save the patch to the directory where you extracted BLAST sources.

Test the patch harmlessly as follows:

>cd ~
>patch -Np1 -d ./ncbiToolbox/ --verbose -i ~/ncbi_blast_openmosix2.4.17.patch --dry-run

If it does not work, check the path to make sure all directories are correct.
If it still doesn't work, check the version of BLAST you downloaded (should be 20020426!)

2a. Apply the patch to your NCBI BLAST source directory:

>patch -Np1 -d ./ncbiToolbox/ --verbose -i ~/ncbi_blast_openmosix2.4.17.patch
2b. Now recompile BLAST as before
>cd ncbiToolbox
>./ncbi/make/makedis.csh

The recompilation should proceed without error and overwrite the ./ncbi/build directory with new binaries.

2c. Test these new binaries against a local database

After recompiling, verify that your new binaries work locally by doing a sample query:

>cd ~/ncbiToolbox/ncbi/build/
>./blastall -p blastn -d /opt/blast/db/nt -i /opt/blast/testblast.fa

Then test DFSA migration!

1. Open a second terminal window, run the openMosix status monitor so you can see:

>mosmon -t

2. In your main login, run BLAST from your main node, but using a database stored elsewhere:

>cd ~/ncbiToolbox/ncbi/build/
>./blastall -p blastn -d /mfs/2/opt/blast/db/nt -i /opt/blast/testblast.fa

You should see the process load on node #2 increase!

2d. Copy your new binaries into their final BLAST home:

>cd ~/ncbiToolbox/ncbi/build/
>cp bl2seq blastall blastclust blastpgp coat megablast rpsblast seedtop /opt/blast

Also make sure you have the data.tar.gz extracted into the BLAST home:

>mv ~/data.tar.gz /opt/blast
>cd /opt/blast
>gunzip data.tar.gz
>tar xvf data.tar

Congrats! You are done.

3. Running simultaneous BLAST jobs
----------------------------------

We have found that the node migration works best on our cluster if we slightly space the requests to the individual nodes.

You can get away with process migration automatically (without specifying a node as in the last query above), but running BLAST using the "here" mfs simlink:

>/opt/blast/blastall -p blastn -d /mfs/here/opt/blast/db/nt -i ./testblast1.fa -o out.txt &
>sleep 1
>/opt/blast/blastall -p blastn -d /mfs/here/opt/blast/db/nt -i ./testblast2.fa -o out2.txt &
>sleep 1
>/opt/blast/blastall -p blastn -d /mfs/here/opt/blast/db/nt -i ./testblast3.fa -o out3.txt &
>sleep 1


Do enough of these and you should see them distributing across the cluster.

Please visit the openMosix mailing list for more details on this.

Carlos Santos
University of Michigan Bioinformatics

csantos at umich.edu

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NCIBINational Cancer InstituteNational Library of MedicineNational Institutes of General Medical ScienceNational Center for Research ResourcesDepartment of EnergyUniversity of Michigan Medical School

Copyright (c) 2006 by David J. States