No. Although it may be possible to parallelize these search algorithms using database segmentation, our preliminary studies indicate they would not benefit as much as the other blast search types do from such a parallelization scheme.
No. We are focusing our efforts on blastn, blastp, blastx, tblastn, and tblastx.
Yes. On systems without local storage, turn on the use-virtual-frags option for better performance.
Yes, simply execute the desired number of MPI processes using the -np flag. The minimum is -np 3.
mpiBLAST only yields super-linear speedup when the database being searched is significantly larger than the core memory on an individual node. The super-linear speedup results published in the ClusterWorld 2003 paper describing mpiBLAST are measurements of mpiBLAST v0.9 searching a 1.2GB (compressed) database on a cluster where each node has 640MB of RAM. A single node search results in heavy disk I/O and a long search time.
Yes, mpiBLAST versions 1.3.0 or later support Mac OS X.
Please see the instructions on the development page.
Large databases like
nt can consume several gigabytes of disk space and it is preferable to store them in compressed form. Starting with mpiBLAST 1.4.0 it is possible to pipe FastA formatted sequence data into
mpiformatdb. This feature provides the ability to directly format a compressed (gzip/bzip etc.) database using command line syntax like:
zcat nt.gz | mpiformatdb -i stdin -N 100 -t nt -p F
mpiformatdb needs the
-t <title> and
-p <T|F> options to format a database piped via standard input.
In mpiBLAST 1.3 or later, they are exact for all supported search types. In versions 1.2.1 and earlier, e-values for blastn were loosely approximated using a linear equation. For blastp, blastx, tblastn, and tblastx they were inaccurate in versions 1.2.1 and earlier. Note that by "exact" we mean exactly the same as those generated by NCBI-BLAST with the traditional search engine. As of 2009, NCBI is still refining the e-value calculations in their blast implementation.