Biologically Relevant Multiple Sequence Alignment

[Pages:137]BIOLOGICALLY RELEVANT MULTIPLE SEQUENCE ALIGNMENT by

Hyrum D. Carroll

A dissertation submitted to the faculty of Brigham Young University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Department of Computer Science Brigham Young University December 2008

Copyright c 2008 Hyrum D. Carroll All Rights Reserved

BRIGHAM YOUNG UNIVERSITY GRADUATE COMMITTEE APPROVAL

of a dissertation submitted by Hyrum D. Carroll

This dissertation has been read by each member of the following graduate committee and by majority vote has been found to be satisfactory.

Date Date Date Date Date

Mark J. Clement, Chair Quinn O. Snell David A. McClellan Kevin D. Seppi Daniel Zappala

BRIGHAM YOUNG UNIVERSITY

As chair of the candidate's graduate committee, I have read the dissertation of Hyrum D. Carroll in its final form and have found that (1) its format, citations, and bibliographical style are consistent and acceptable and fulfill university and department style requirements; (2) its illustrative materials including figures, tables, and charts are in place; and (3) the final manuscript is satisfactory to the graduate committee and is ready for submission to the university library.

Date

Mark J. Clement Chair, Graduate Committee

Accepted for the Department

Date

Kent E. Seamons Graduate Coordinator

Accepted for the College

Date

Thomas W. Sederberg Associate Dean, College of Physical and Mathematical Sciences

ABSTRACT

BIOLOGICALLY RELEVANT MULTIPLE SEQUENCE ALIGNMENT

Hyrum D. Carroll Department of Computer Science

Doctor of Philosophy

Researchers use multiple sequence alignment algorithms to detect conserved regions in genetic sequences and to identify drug docking sites for drug development. In this dissertation, a novel algorithm is presented for using physicochemical properties to increase the accuracy of multiple sequence alignments. Secondary structures are also incorporated in the evaluation function. Additionally, the location of the secondary structures is assimilated into the function. Multiple properties are combined with weights, determined from prediction accuracies of protein secondary structures using artificial neural networks.

A new metric, the PPD Score is developed, that captures the average change in physicochemical properties. Using the physicochemical properties and the secondary structures for multiple sequence alignment results in alignments that are more accurate, biologically relevant and useful for drug development and other medical uses.

In addition to a novel multiple sequence alignment algorithm, we also propose a new protein-coding DNA reference alignment database. This database is a collection of multiple sequence alignment data sets derived from tertiary structural alignments. The primary purpose of the database is to benchmark new and existing multiple sequence alignment algorithms with DNA data. The first known comparative study of protein-coding DNA alignment accuracies is also included in this work.

ACKNOWLEDGMENTS

I would like to first and foremost thank the two most important people in my life, my wife Melissa and my Heavenly Father. They have provided encouragement, faith in me and positive attitudes throughout the entire project.

I would also like to thank my father for his interest, insightful questions and love.

My advisor, Dr. Mark J. Clement and "pseudo-advisor", Dr. Quinn O. Snell, have been supportive, interested and flexible. Additionally, Dr. David A. McClellan has always provided encouraging discussions.

Finally, I am grateful for the stimulating conversations about doctoral degrees with the late Dr. James Edwin Dalley (1922?2008), my maternal grandfather, to whom this work is dedicated.

viii

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download