Graduation Semester and Year

2007

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

Nikola Stojanovic

Abstract

Biologists study evolution, discover functional and structural information in genomic or protein data through the extensive use of sequence alignments. It is very difficult to manually align long regions, so development of methods for this task continues to be an active area of research. Alignment construction algorithms, based on dynamic programming, approach the problem from a mathematical perspective. The optimal alignment is computed relative to a scoring scheme. The resulting layout is guaranteed to be mathematically optimal, though not necessarily biologically meaningful. Sequence alignments usually account only for single letter substitutions and relatively short indels, representing the latter as gaps. Other possible evolutionary scenarios like rearrangements and inversions are generally not considered. In a multiple sequence alignment, the most popular method for reigning in complexity is by using a progressive approach. Progressive alignment techniques can incorporate limited evolutionary information in the form of a phylogenetic tree while building alignments. Except for that, they generally do not make use of knowledge that shed light on the sequences in a biological context, although there are some notable exceptions. Progressive techniques may be implemented in iterative refining steps; however they still remain both computationally and biologically approximate. At present, biologists are often forced to manually adjust the alignments built through automated means. These adjustments include the placement of sites of experimentally confirmed homology or characteristic structural features in conformation, when such similarities are not well reflected in the sequences, thus misguiding the automated (mathematical) optimization process. Many tools for alignment visualization provide extensive annotation facilities, but in most cases they are passive. Some feature editors are available; however these are prominently sequence-only editors. Since they would permit sequences to be removed, new bases introduced and/or existing ones deleted, the entire concept is somewhat at odds with the idea of alignment editing, in which the sequences and order of residues in individual sequences should not be disturbed. With these issues in mind, we have undertaken the design of an editor which would facilitate post-processing of sequence alignments. The core editor features provide for drag-and-drop movement of regions within the aligned sequences, followed by the realignment of the affected area or a broader context, depending on the user's selection. The realignment is done through the use of external freely available software. However, it should be noted that at any particular installation not all choices will be supported, as the external software packages may not run on every platform. The region movement and realigning can be repeated as many times as necessary. This utility has been developed using Java Swing library. It can be run on any installation which supports Java, executed locally on the user's machine.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS