mpiBLAST: Open-Source Parallel BLAST

| Home | Support | Download | Site Map |



Version History for mpiBLAST Releases

Known Bugs and Limitations of mpiBLAST 1.4.x

  • A maximum of 250 database fragments can be created by mpiformatdb on most systems
  • Tiny differences in e-value between NCBI BLAST and mpiBLAST may exist due to numerical instability in result processing. Most results are unaffected.

Changes between 1.3.0 and 1.4.0

General changes:

  • mpiBLAST more accurately reproduces NCBI BLAST output formats. Specifically, results are now output in the same order even if they have identical bit scores and e-values. Search statistics such as the number of hits and number of extensions are correctly collected.
  • Scalability has improved dramatically. The speedup results from streamlined communication between worker and writer processes. Thanks to Mike Firpo and Adam Moody of LLNL for the suggestion.
  • mpiBLAST performs better when the database has not already been distributed. A new fragment copy scheduling algorithm copies each fragment to at least one worker prior to search startup. By default, a single complete copy of the database is distributed. The number of replicate copies of the database that will be distributed can be controlled with the --db-replicate-count argument to mpiblast. Settings larger than 1 enable mpiblast to load balance the search.
  • An MPI program called mpiblast_cleanup has been added. When run on a set of nodes using mpirun it cleans up mpiblast data on each node's local storage device
  • MPICH 2 is supported on unix, and is required on Windows
  • An optional --with-mpi=</path/to/mpi> argument to ./configure can specify the path to the MPI installation during compilation

Changes to mpiblast:

  • Using more workers than fragments has been fixed
  • --copy-via=mpi has been fixed
  • --removedb has been fixed. The new behavior is to create a temporary directory on local storage where all work gets done. The temporary directory is automatically removed upon successful job completion.
  • timestamp checking has been fixed. mpiblast now checks the BLAST database timestamp in the .phr file to see whether fragments on local storage match fragments on shared storage. If the fragments on shared storage have a different timestamp, the local fragments are ignored.
  • during result communication between workers and the writer, the workers send only the portion of aligned biosequences used in the alignment instead of the entire biosequence. This change allows nucleotide databases with large sequences such as nt and human_chromosome to be searched without the --disable-mpi-db flag.

Changes to mpiformatdb:

  • By default mpiformatdb writes a temporary file with reordered input sequences. Reordering the sequence file facilitates balanced fragment sizes. *Rewriting the input database slows formatting and can be skipped with the --skip-reorder option.
  • mpiformatdb supports reading databases from stdin using -i stdin in conjunction with --skip-reorder -t <db title>.

Known Bugs and Limitations of mpiBLAST 1.3.x

  • A maximum of 250 database fragments can be created by mpiformatdb on most systems
  • NCBI blastall may report results with the same e-value and bit score in a different order than mpiBLAST does
  • mpiBLAST does not report search statistics like the number of hits to the database or the number of extensions
  • When requesting XML format output, mpiBLAST may generate warning messages about deleting a locked sequence. This is due to a bug in the NCBI * Toolbox and can be safely ignored. Search results do not appear to be affected.

Changes between 1.2.1 and 1.3.0

IMPORTANT! The build process has changed. The NCBI Toolbox must be patched and re-compiled prior to mpiBLAST compilation. See the README for more details.

General changes:

  • mpiBLAST requires the October 2004 release of the NCBI Toolbox (version 2.2.10 of blastall)
  • Up to 250 database fragments are supported by default. No patching to the NCBI toolbox necessary.
  • mpiblast.conf is no longer used! The shared and local storage directories should be specified directly in the .ncbirc configuration file in the following manner:

[mpiBLAST] Shared=/path/to/shared/storage Local=/path/to/local/storage

  • If the shared and local storage paths can't be found in .ncbirc then the environment variables MPIBLAST_SHARED and MPIBLAST_LOCAL are checked. *If neither .ncbirc nor MPIBLAST_LOCAL have the local storage directory, then $TMPDIR/mpiBLAST_local_db is used, if $TMPDIR isn't defined it defaults to /tmp.
  • Shared and local directories are checked for existence and proper permissions, if they don't exist they are created with perms 775.
  • replaced setenv() with putenv() to support Solaris

Changes to mpiblast:

  • mpiblast does e-value adjustment using both the effective database and the effective query lengths, leading to more accurate e-value statistics.
  • mpiblast has been updated with the latest changes to blastall version 2.2.9
  • mpiblast implements query pipelining. This means that queries are searched in order and the workers send results to the master as queries complete. As the master receives results it writes them to disk. Query pipelining eliminates the extreme memory requirements that mpiblast previously had for large query sets. It also permits better parallelization for jobs with few database fragments and many queries.
  • The text output formats better reflect the actual NCBI text output
  • mpiblast supports database pipelining. This means that instead of all the workers copying their fragments from shared storage at once, users have the option to limit the number concurrently accessing shared storage via the --concurrent option. Slow NFS disks should probably use --concurrent=1 in order to see speedups of a factor of 5 or greater. Faster shared storage should use a higher value or use "--copy-via=none" instead (see below for details).
  • User can set copy functionality at runtime throught the --copy-via option. Possibilities are "cp" to use the standard copy command, "scp" to use ssh, "rcp" to use rsh, "none" to not copy at all (very useful for fast parallel file systems like GFS, PFS, and PVFS), and "mpi" to use MPI_Send/MPI_Recv.
  • If using --copy-via=mpi, user can set the maximum buffer size that MPI will use when copying files through the "--mpi-size=" option.
  • The -z option can be used to specify an effective database size

Changes to mpiformatdb:

  • mpiformatdb now creates exactly the requested number of fragments reliably
  • created fragments will be identically sized for better load-balance. In order to support this behavior, the input database is rewritten with its sequence entries in a different order in the system's temp directory.
  • the reordering can be skipped with the --skip-reorder option
  • the temp directory for reordering defaults to $TMPDIR, if $TMPDIR is not set then /tmp is used.
  • mpiformatdb supports GI list creation using formatdb's -F -B -L options
  • --decomp is no longer supported
  • 3 digit fragment identifiers are ALWAYS used
  • mpiformatdb now returns 0 on success instead of the number of fragments *created because it guarantees the requested # of frags

Known Bugs and Limitations of mpiBLAST 1.0.x, 1.1.x, and 1.2.x

  • mpiBLAST 1.2.0 outputs invalid alignment results when the query set contains multiple queries with the same defline and different sequences.
  • When writing results from translated searches in XML or Tab-delimited-text format mpiBLAST may print warning messages like: [blastall] ERROR: query 1;: BioseqFindFunc: couldn't uncache When this happens the query sequence in the alignments may be replaced with X's.
  • Errors can occur when using the -m [1-6] output options with tblastn searches
  • Translated searches (blastx, tblastn, and tblastx) in the 1.0.x and 1.1.x releases do not include alignments in the results file. This problem was fixed in the 1.2.0 release.
  • mpiBLAST runs out of memory when formatting result output for very large query sets, causing a crash.
  • mpiBLAST 1.0.x and 1.1.x occasionally crash during result output, especially when XML or tab-delimited text output has been selected (-m 7, -m 8, or -m 9). This problem was fixed in the 1.2.0 release.
  • The current release of mpiBLAST does not print the Karlin-Altschul statistics or the database info at the bottom of each query's BLAST results.
  • mpiBLAST uses the actual number of nucleotides in the database to calculate the E-value instead of the effective number of nucleotides in the database. In some cases this results in a discrepancy between the E-value reported by mpiBLAST and that reported by NCBI-BLAST. For protein sequence searches the difference in E-value is more pronounced due to higher variability of effective database search lengths.
  • BLAST results for a query that have the same bit score may be returned in a different order by mpiBLAST than they would by NCBI-BLAST.

Changes between 1.2.0 and 1.2.1:

General changes:

  • Added a 1000 fragments patch for the November 2003 NCBI toolbox release
  • Bugs discovered:
    • Under Windows mpiBLAST may print incorrect alignments for queries which have a large number (e.g. > 100) of results. mpiBLAST prints the following error message when this occurs: "Error: Timed out waiting for biosequence from workers" This problem can be avoided by setting a more restrictive E-value cutoff using the -e command line option. This behavior has not been observed under Linux.

Changes to mpiblast:

  • Fixed a memory deallocation bug that caused crashes during result output
  • mpiblast now prints "No hits found" for all queries without results at the end of a query set. Previously if a query file had no results mpiblast would write an empty results file.

Changes between 1.1.1 and 1.2.0:

New Features:

  • When writing result alignments, mpiBLAST uses the database as distributed on the worker nodes instead of reading the database from the shared filesystem. This can result in significant speedups, especially when the file server is slow or loaded.
  • mpiBLAST prints alignments for translated searches (blastx,tblastn,tblastx)
  • Database update functionality. New sequences can be added to an existing mpiBLAST database.
  • mpiBLAST has a web interface. A script to interface mpiBLAST to NCBI's wwwblast web service has been included.
  • Updated the BLAST functionality to be consistent with the latest NCBI-BLAST release. mpiBLAST now accepts the -m 10 and -m 11 output format options to write output in text or binary ASN.1 format. Previously the -O option was used.

General changes:

  • mpiBLAST 1.2.0 requires the April 2003 or later release of NCBI Toolbox
  • getopt1.c is now included in the build for systems lacking getopt_long (AIX)
  • MS Visual Studio 7 .NET projects are now included to assist users who would like to compile mpiBLAST under Windows.

Fixes to mpiBLAST:

  • Fixed a crash when writing tabular format output (-m 8 and -m 9)
  • Fixed a crash when writing XML format output (-m 7)
  • The --removedb option will now remove the database even if a search is not being performed

Changes between 1.1.0 and 1.1.1:

General Fixes:

  • mpiblast now correctly looks for the configuration file in the $INSTALL_PREFIX/etc/ directory if it is not at $HOME/.mpiblastrc
  • Fixed text README to reflect that the configuration file only contains two lines
  • mpiblast no longer prints warnings when using databases formatted without indices ( The -o F formatdb option )
  • Numerous compiler-specific compilation error workarounds

Fixes to mpiformatdb:

  • mpiformatdb directly outputs the database to the destination instead of trying to copy it
  • The gcc 3.x standard c++ library has a large file bug that prevented counting the database size correctly on databases > 2GB. A workaround using C file I/O was contributed by Cesar Delgado.
  • mpiformatdb does a better job fragmenting the database into the requested number of fragments, thanks to a patch by Jason Gans
  • There is a --decomp option to mpiformatdb that prints APPROXIMATE database sizes based on number of fragments (also by J.D. Gans)

Fixes to mpiblast:

  • mpiblast now uses MPI_Abort() when exiting on an error condition
  • Several memory leaks were corrected and memory requirements reduced by a patch contributed by Jason Gans

Changes between 1.0.1 and 1.1.0:

  • Ported to Windows/mpich-nt
  • Rewrote mpiformatdb in C++, now it directly links to the NCBI formatdb code. As a result it is no longer necessary to install the formatdb or BLAST executables, or to specify their location in the mpiBLAST configuration file
  • Fixed a file copy bug when formatting protein databases with mpiformatdb
  • mpiformatdb no longer needs to be run from the directory containing the unformatted database.
  • The default configuration file semantics have been changed. Under Unix both mpiblast and mpiformatdb default to ~/.mpiblastrc. If ~/.mpiblastrc does not exist or the $HOME environment variable is not set then $INSTALL_PREFIX/etc/mpiblast.conf is used, where $INSTALL_PREFIX is the path given to ./configure for your mpiBLAST installation. If the configuration file is specified on the command line using --config-file the defaults are overriden.
  • Under Windows the default configuration file location is \.mpiblastrc. If \.mpiblastrc does not exist or is not set then mpiBLAST tries \mpiblast.ini. As in Unix, the default config file location can be overriden with the --config-file command line argument.
  • Some versions of the NCBI toolkit have a bug that causes mpiBLAST to print warning messages about taxdb.bti. These are harmless. To eliminate the warning message the following line can be deleted from tools/readdb.c in the NCBI development library. Of course you will need to recompile after the modification:

/tools/readdb.c --- /home/koadman/software/ncbi/tools/readdb-orig.c 2003-02-17 12:29:30.000000000 -0800 +++ /home/koadman/software/ncbi/tools/readdb.c 2003-02-17 12:29:51.000000000 -0800 -2497,7 +2497,6

             return buffer;
         } else {
             /* we cannot find directory :( */

- ErrPostEx(SEV_WARNING, 0, 0, "Could not find %s", filename);

             MemFree(buffer); MemFree(buffer1);
             return NULL;

Changes between 1.0.0 and 1.0.1:

  • Fixed a bug causing a crash when the database was formatted without indexes
  • Added support for up to 1000 database fragments (see README for details)
  • Added support for cleaning up local storage of database fragments
  • Included GNU getopt in the distribution for compilation on systems such as *AIX and Solaris that do not have getopt_long().

Changes between 0.9.0 and 1.0.0:

  • Dynamic database distribution: This change has many implications. Database fragments are no longer distributed when formatting the database with mpiformatdb. Instead, database fragments are copied from shared storage to worker nodes as necessary in order to complete each BLAST search. Once copied, a fragment remains on the worker's local storage for use by future searches.
  • Use of NCBI library to output merged results: Previously results were merged with a text file parser. BLAST results are now merged by mpiBLAST and output directly using the NCBI library. As a result, users can now choose to output BLAST results in most formats supported by NCBI BLAST, including XML and ASN.1.
  • Corrected E-value statistics: E-values are now adjusted based on the size of the entire database being searched. NOTE 06/11/2005: The 1.0.0 release of mpiBLAST corrected only blastn (nucleotide) search statistics!
  • All nucleotide DB fragment index files are now copied to workers correctly
  • mpiformatdb is now standalone and should be run without mpirun. This is a side effect of the dynamic database distribution described above.
| Edit | Print |