Graduation Semester and Year

2005

Language

English

Document Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science and Engineering

First Advisor

David Levine

Abstract

In the last few years many computer and laboratory improvements in the production and analysis of DNA sequences have made possible the complete sequencing of whole genomes. This provides us with a wealth of raw genomes that needs to be processed and annotated. 5% to 80% of eukaryotic genomes contain repetitive DNA consisting of transposable elements and tandem repeats which needs to be identified, classified and annotated in order to sequence and annotate the entire genome accurately. Existing tools allow us to identify and annotate transposable elements (TE) but no tool exists for their classification. This thesis work introduces REPCLASS an automated tool for the classification of transposable elements that are identified de novo in new genomes. REPCLASS consists of a workflow consisting of several methods to provide a tentative classification of TE consensus sequences. REPCLASS is also a distributed application utilizing high performance cluster computing for performing the computationally intensive task of classification.

Disciplines

Computer Sciences | Physical Sciences and Mathematics

Comments

Degree granted by The University of Texas at Arlington

Share

COinS