Overview
Description
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. mpiBLAST takes advantage of distributed computational resources, i.e., a cluster, through explicit MPI communication and thereby utilizes all available resources unlike standard NCBI BLAST which can only take advantage of shared-memory multi-processors (SMPs).
The primary advantage to using mpiBLAST versus traditional NCBI BLAST is performance. mpiBLAST can increase performance by several orders of magnitude while still retaining identical results as output from NCBI BLAST.
Specifically, through the use of database fragmentation and query segmentation mpiBLAST performs a BLAST search in parallel. Database fragmentation partitions a database into multiple fragments and by distributing the fragments across many computational-resources (e.g. cluster-nodes, CPU-cores, clusters, etc.), each fragment can be searched simultaneously. Furthermore, by segmenting the query into multiple, independent searches, multiple BLAST searches can be simultaneously performed. With intelligent job scheduling, the multiple database fragments and query segments are searched in parallel with fault-tolerant execution.
In short, database fragmentation reduces execution latency because each node's database fragment resides in main memory, yielding a significant speedup due to the elimination of disk I/O. Furthermore, query segmentation increases execution throughput as a single multi-sequence query is now split into separate, independent subqueries and each subquery is executed in parallel.
An advanced feature found in mpiBLAST-PIO is parallel input-output optimizations. mpiBLAST-PIO offloads the formatting and writing of results from the master to the workers thereby increasing the scalability of mpiBLAST to hundreds of thousands of processors.
mpiBLAST is also portable across many different operating-systems and computer-systems. mpiBLAST runs on Linux, BSD, Windows, MacOS X, and many varieties of Unix. mpiBLAST will also run on many different hardware configurations ranging from single processor PCs, to clusters of AMD Opterons, to IBM Blue Gene systems.
The only requirements to executing mpiBLAST are:
- An installed and working MPI implementation. Two popular and free implementations are:
- A compiled and matching version of the NCBI Toolbox, ftp://ftp.ncbi.nih.gov/toolbox/
For more information about mpiBLAST, please peruse this website by using the links in the menu on the left. You can find further technical information about mpiBLAST in the many papers, presentations, and posters found on the publications page. Additionally, more information regarding the usage, installation, development, and future of mpiBLAST can be found through the mailing lists.
Supporters
The mpiBLAST project would like to thank the following institutions for their support:
IBM
AMD
Eli Lilly
Users
A brief listing of known users of mpiBLAST is below:
- Hospitals and Computing Centers
- Software Suites
- Education Courses and Tutorials
- IEEE/ACM Supercomputing Conference 2006. Course: Tutorial M11 - High-Performance Computing Methods for Computational Genomics. Year: 2006.
- Miami University of Ohio. Course: Introduction to Cluster Computing. Year: 2006.
- 13th International Conference on Intelligent Systems for Molecular Biology. Course: Tutorial PM12 A Bioinformatics Introduction to Cluster Computing by Boyd and Bose. Year: 2005.
- University of Arkansas. Course: CSCE 5203 Advanced Database Management. Year: 2005.
- University of Puerto Rico. Course: ICOM 6025 High-Performance Computing. Year: 2005.
- College of William & Mary, B.S. Honors Thesis. Student: Evan McCreedy. Year: 2004.
- UC-Berkeley. Course: CS 267 Applications in Parallel Computers. Year: 2004.
- University of Maryland. Course: CMSC 838T Bioinformatics and High Performance Computing. Year: 2004.
(Please send email to info AT mpiblast DOT org to have your organization included.)