DNA Sequence Comparison
String comparison is a critical issue in many application domains, including speech recognition, contents search, and bioinformatics. In this experience we present a parallel implementation of string comparison on the BioWall.


Nowadays the comparison and the alignment of characters taken from a finite alphabet is a fundamental task in many applications, ranging from full-text search to computational biology. In particular, string comparison is a critical issue in the field of molecular biology. In fact, both DNA fragments and proteins can be represented as sequences of characters (taken from alphabets of 4 and 20 symbols, respectively) and sequence similarities provide useful information about the functional, structural and evolutionary relationships between the corresponding molecules. Biological sequences may differ because of local substitutions, insertions and deletions of one or more characters. The complexity of string comparison comes from the large number of possible combinations of these three basic mutations.

The similarity between two strings can be evaluated either in terms of edit distance or in terms of similarity score. The edit distance is a measure of the minimum number of mutations that may have transformed the first string in the second, or, in other terms, of the minimum number of edit operations required to make the two strings equal to each other. The similarity score is a measure of the maximum number of residual matches between the two strings. Both metrics can be evaluated in polynomial time by means of dynamic programming techniques.

The key algorithm for evaluating the similarity between two strings of length N and M was developed by Needleman and Wunsch and takes O(NxM) steps to complete execution. The structure of the algorithm makes it suitable for a parallel implementation on a systolic array. In particular, hardware parallelism can be exploited to perform string comparison in O(N+M) steps. In this experience we present a parallel implementation of the Needleman-Wunsch algorithm on the BioWall.

The BioWall cannot compete in performance with existing parallel implementations of the Needleman-Wunsch algorithm, since it suffers from the typical performance limitations of a large prototyping platform. Nevertheless, the implementation of the Needleman-Wunsch algorithm on the BioWall is a significant design experiment in the field of reconfigurable computing because of the peculiarities of the target architecture.


For further information
  • M. Canella, F. Miglioli, A. Bogliolo, E. Petraglio, E. Sanchez. "Performing DNA Comparison on a Bio-Inspired Tissue of FPGAs". Proc. 10th Reconfigurable Architectures Workshop (RAW03), Nice, France, April 2003.
  • S. B. Needleman and C. D. Wunsch. "A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins" J. Mol. Biol., 48:443-453, 1970.

Resources

The application on the BioWall.
© E. Petraglio


1,223KB JPEG