Mavis blog: 2010

Task 1 Research topic:

Sequential Alignment

Task 2 Journal review:

Nucleic Acids Research – Analysis of protein sequence and interaction Data for candidate disease gene prediction

Summary:

Inflammatory bowel disease (IBD) is a group of inflammatory conditions of the colon and small intestine, new evidence show that IBD may have an elevated risk of endothelial dysfunction and coronary artery disease. Hence is it crucial to know the functional loci and to identify the relationship between the significance of the functional loci.

Task 3:

By the use of National Center for Biotechnology Information (NCBL) data bank, choose two relatively similar structure and sequence to distinguish the significance.

Task 4:

Method:

Two similar bowel protein sequences are chosen from NCBI data bank, use GenBank, EMBL and FASTA formats for comparison; follow by format FASTA to copy both sequence data to the notepad. Result will be shown in graphics showing the genome/chromosome map location of the gene. Further run both structures using bioinformatics tool BLASTp and compare the similarities of both sequences.

Task 5:

The following proteases are selected from National Center for Biotechnology Information (NCBI):

1. NCBI Reference Sequence: NP_033429.1

2. NCBI Reference Sequence: NP_005840.2

FASTA format of both sequences:

First sequence:

NCBI Reference Sequence: NP_005840.2

>gi|93004094|ref|NP_005840.2| immunoglobulin superfamily member 6 precursor [Homo sapiens]

MGTASRSNIARHLQTNLILFCVGAVGACTLSVTQPWYLEVDYTHEAVTIKCTFSATGCPSEQPTCLWFRYGAHQPENLCLDGCKSEADKFTVREALKENQVSLTVNRVTSNDSAIYICGIAFPSVPEARAKQTGGGTTLVVREIKLLSKELRSFLTALVSLLSVYVTGVCVAFILLSKSKSNPLRNKEIKEDSQKKKSARRIFQEIAQELYHKRHVETNQQSEKDNNTYENRRVLSNYERP

Second sequence:

NCBI Reference Sequence: NP_033429.1

>gi|6678387|ref|NP_033429.1| tumor necrosis factor ligand superfamily member 8 [Mus musculus]

MEPGLQQAGSCGAPSPDPAMQVQPGSVASPWRSTRPWRSTSRSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSEDLFCTLKSTPSKKSWAYLQVSKHLNNTKLSWNEDGTIHGLIYQDGNLIVQFPGLYFIVCQLQFLVQCSNHSVDLTLQLLINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLDNVLSVFLYSSSD

By running BLASTp tool, the following result is obtained:

BLAST

Basic Local Alignment Search Tool

Blast 2 sequences:

Protein Sequence (241 letters)

Results for: Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to show BLAST results for.

Query ID: lcl|25993

Description: None

Molecule type: amino acid

Query Length: 241

Subject ID: 25995

Description: None

Molecule type: amino acid

Subject Length: 239

Program: BLASTP 2.2.24+ Citation

Search parameter name Search parameter value

Program: blastp

Word size: 3

Expect value: 10

Hitlist size: 100

Gapcosts: 11,1

Matrix: BLOSUM62

Filter string: F

Genetic Code: 1

Window Size: 40

Threshold: 11

Composition-based stats: 2

Karlin-Altschul statistics

Params Ungapped Gapped

Lambda 0.317322 0.267

K 0.130185 0.041

H 0.38006 0.14

Results Statistics

Results Statistics parameter name Results Statistics parameter value

Effective search space 47088

Graphic Summary

Distribution of 2 Blast Hits on the Query Sequence

An overview of the database sequences aligned to the query sequence is shown. The score of each alignment is indicated by one of five different colors, which divides the range of scores into five groups. Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence takes the user to the associated alignments. New: This graphic is an overview of database sequences aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple alignments on the same database sequence are connected by a dashed line. Mousing over an alignment shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment detail.

Dot Matrix View

Plot of lcl|25993 vs 25995

This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is represented on the X-axis and the numbers represent the bases/residues of the query. The subject is represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper right corner, minus strand matches are slanted from the upper left to the lower right. The number of lines shown in the plot is the same as the number of alignments found by BLAST.

Descriptions

Legend for links to other resources:

UniGene GEO Gene Structure Map Viewer PubChem BioAssay

Sequences producing significant alignments:

Accession Description Max score Total score Query coverage Evalue Links

25995 unnamed protein product 21.9 38.5 36% 0.012

Alignments

>lcl|25995 unnamed protein product

Length=239

Sort alignments for this subject sequence by:

E value Score Percent identity Query start position Subject start position

Score = 21.9 bits (45), Expect = 0.012, Method: Compositional matrix adjust.

Identities = 17/66 (26%), Positives = 29/66 (44%), Gaps = 11/66 (16%)

Query 184

LRNKEIKEDS-----QKKKSARRIFQEIAQELYHKRHVETNQQSEKDN------NTYENR 232

L N +IK+ + + ++ I+Q ++Q L H V + DN NT+

LINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLD 238

Sbjct 168

Query 233

RVLSNY 238

VLS +

NVLSVF 233

Sbjct 228

Score = 16.5 bits (31), Expect = 0.54, Method: Compositional matrix adjust.

Identities = 11/49 (23%), Positives = 19/49 (39%), Gaps = 8/49 (16%)

Query 152

RSFLTALVSLLSVYVTGVCVAFILLSKSKSN--------PLRNKEIKED 192

RS+ +kL V nVj+ +L+m+ + Kkj PL+ ED RSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSED 90

Sbjct 42

Results and Discussion:

After comparing the two selected sequences using BLAST, result show alignment between both sequences, this also suggests that there is a relationship between both sequences and the bowel disease.

For further research, additional sequences are required for analysis to compare the result of the current sequences. If the additional sequences show to be more relevant than the current sequences, hence the additional sequences are said to be more dependent to the inflammatory disease; however if the additional sequences are not as relevant as the current sequences, the current sequences are said to be more dependent to the inflammatory disease.

HTI 5052 Bioinformatics in Health Sciences

Mini Research Project

Student Name: Tse Mavis Wing Yee

Student No. : 10677249G

Task 1:

Research topic is Sequential Alignment.

Task 2:

Journal review: Nucleic Acids Research

Journal topic: AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis

Mohamed Radhouene Aniba1,2,3,4, Olivier Poch1,2,3,4, Aron Marchler-Bauer5 and Julie Dawn Thompson1,2,3,4,*

1Department of Structural Biology and Genomics, Institut de Ge´ ne´ tique et de Biologie Mole´ culaire et Cellulaire (IGBMC), 2Institut National de la Sante´ et de la Recherche Me´ dicale (INSERM), 3The Centre National de la Recherche Scientifique (CNRS), UMR7104, F-67400 Illkirch, 4Universite´ Louis Pasteur, F-67000 Strasbourg, France and 5NCBI/NLM/NIH, 8600 Rockville Pike, Bldg. 38A, Bethesda, MD 20894, USA

Received January 29, 2010; Revised April 26, 2010; Accepted May 25, 2010

Multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequence, the input set of query sequences are assumed to have share a common ancestor via evolution lineage, to conduct the evolutionary origins of the sequences, phylogenetic analysis can be conducted; as a resulting MSA, phylogenetic analysis can be conducted. MSAs are more computationally complex, hence require more sophisticated methodologies than pair-wise alignment.

Nowadays many different algorithms are developed to construct MSAs, however a new technology AlexSys is an intelligent engine which based on the sequence input to auto select appropriate aligner a priori, it is suitable for high throughput project according to the good compromise between alignment quality and the operation duration. Previous studies show that not a single aligner outperform the other; even though previous methods provide more accurate alignment, they are less efficient due to the need to run the sequence and the best solution is said to be a posteriori. Therefore AlexSys is designed to combine the power of the existing approaches in a single system which is both efficient and easy to use.

Task 3:

AlexSys is therefore designed to combine the power of the existing approaches in a single system which is both efficient and easy to use for the biologist; however, there is no single algorithm that works best on all problems.

The Problem can be solved by alter the system from a combined system to a multiply-step single system, the disadvantage however is the duration of collecting data will be longer.

Mavis blog

2010年11月23日星期二

Bioinformatics in Health Sciences Mini Research Project Final Report

2010年10月19日星期二

Bioinformatics in Health Sciences Mini Research Project submission

2010年11月23日 星期二

Bioinformatics in Health Sciences Mini Research Project Final Report

2010年10月19日 星期二

Bioinformatics in Health Sciences Mini Research Project submission

2010年11月23日星期二

2010年10月19日星期二