Mavis blog: Bioinformatics in Health Sciences Mini Research Project Final Report

Task 1 Research topic:

Sequential Alignment

Task 2 Journal review:

Nucleic Acids Research – Analysis of protein sequence and interaction Data for candidate disease gene prediction

Summary:

Inflammatory bowel disease (IBD) is a group of inflammatory conditions of the colon and small intestine, new evidence show that IBD may have an elevated risk of endothelial dysfunction and coronary artery disease. Hence is it crucial to know the functional loci and to identify the relationship between the significance of the functional loci.

Task 3:

By the use of National Center for Biotechnology Information (NCBL) data bank, choose two relatively similar structure and sequence to distinguish the significance.

Task 4:

Method:

Two similar bowel protein sequences are chosen from NCBI data bank, use GenBank, EMBL and FASTA formats for comparison; follow by format FASTA to copy both sequence data to the notepad. Result will be shown in graphics showing the genome/chromosome map location of the gene. Further run both structures using bioinformatics tool BLASTp and compare the similarities of both sequences.

Task 5:

The following proteases are selected from National Center for Biotechnology Information (NCBI):

1. NCBI Reference Sequence: NP_033429.1

2. NCBI Reference Sequence: NP_005840.2

FASTA format of both sequences:

First sequence:

NCBI Reference Sequence: NP_005840.2

>gi|93004094|ref|NP_005840.2| immunoglobulin superfamily member 6 precursor [Homo sapiens]

MGTASRSNIARHLQTNLILFCVGAVGACTLSVTQPWYLEVDYTHEAVTIKCTFSATGCPSEQPTCLWFRYGAHQPENLCLDGCKSEADKFTVREALKENQVSLTVNRVTSNDSAIYICGIAFPSVPEARAKQTGGGTTLVVREIKLLSKELRSFLTALVSLLSVYVTGVCVAFILLSKSKSNPLRNKEIKEDSQKKKSARRIFQEIAQELYHKRHVETNQQSEKDNNTYENRRVLSNYERP

Second sequence:

NCBI Reference Sequence: NP_033429.1

>gi|6678387|ref|NP_033429.1| tumor necrosis factor ligand superfamily member 8 [Mus musculus]

MEPGLQQAGSCGAPSPDPAMQVQPGSVASPWRSTRPWRSTSRSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSEDLFCTLKSTPSKKSWAYLQVSKHLNNTKLSWNEDGTIHGLIYQDGNLIVQFPGLYFIVCQLQFLVQCSNHSVDLTLQLLINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLDNVLSVFLYSSSD

By running BLASTp tool, the following result is obtained:

BLAST

Basic Local Alignment Search Tool

Blast 2 sequences:

Protein Sequence (241 letters)

Results for: Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to show BLAST results for.

Query ID: lcl|25993

Description: None

Molecule type: amino acid

Query Length: 241

Subject ID: 25995

Description: None

Molecule type: amino acid

Subject Length: 239

Program: BLASTP 2.2.24+ Citation

Search parameter name Search parameter value

Program: blastp

Word size: 3

Expect value: 10

Hitlist size: 100

Gapcosts: 11,1

Matrix: BLOSUM62

Filter string: F

Genetic Code: 1

Window Size: 40

Threshold: 11

Composition-based stats: 2

Karlin-Altschul statistics

Params Ungapped Gapped

Lambda 0.317322 0.267

K 0.130185 0.041

H 0.38006 0.14

Results Statistics

Results Statistics parameter name Results Statistics parameter value

Effective search space 47088

Graphic Summary

Distribution of 2 Blast Hits on the Query Sequence

An overview of the database sequences aligned to the query sequence is shown. The score of each alignment is indicated by one of five different colors, which divides the range of scores into five groups. Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence takes the user to the associated alignments. New: This graphic is an overview of database sequences aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple alignments on the same database sequence are connected by a dashed line. Mousing over an alignment shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment detail.

Dot Matrix View

Plot of lcl|25993 vs 25995

This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is represented on the X-axis and the numbers represent the bases/residues of the query. The subject is represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper right corner, minus strand matches are slanted from the upper left to the lower right. The number of lines shown in the plot is the same as the number of alignments found by BLAST.

Descriptions

Legend for links to other resources:

UniGene GEO Gene Structure Map Viewer PubChem BioAssay

Sequences producing significant alignments:

Accession Description Max score Total score Query coverage Evalue Links

25995 unnamed protein product 21.9 38.5 36% 0.012

Alignments

>lcl|25995 unnamed protein product

Length=239

Sort alignments for this subject sequence by:

E value Score Percent identity Query start position Subject start position

Score = 21.9 bits (45), Expect = 0.012, Method: Compositional matrix adjust.

Identities = 17/66 (26%), Positives = 29/66 (44%), Gaps = 11/66 (16%)

Query 184

LRNKEIKEDS-----QKKKSARRIFQEIAQELYHKRHVETNQQSEKDN------NTYENR 232

L N +IK+ + + ++ I+Q ++Q L H V + DN NT+

LINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLD 238

Sbjct 168

Query 233

RVLSNY 238

VLS +

NVLSVF 233

Sbjct 228

Score = 16.5 bits (31), Expect = 0.54, Method: Compositional matrix adjust.

Identities = 11/49 (23%), Positives = 19/49 (39%), Gaps = 8/49 (16%)

Query 152

RSFLTALVSLLSVYVTGVCVAFILLSKSKSN--------PLRNKEIKED 192

RS+ +kL V nVj+ +L+m+ + Kkj PL+ ED RSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSED 90

Sbjct 42

Results and Discussion:

After comparing the two selected sequences using BLAST, result show alignment between both sequences, this also suggests that there is a relationship between both sequences and the bowel disease.

For further research, additional sequences are required for analysis to compare the result of the current sequences. If the additional sequences show to be more relevant than the current sequences, hence the additional sequences are said to be more dependent to the inflammatory disease; however if the additional sequences are not as relevant as the current sequences, the current sequences are said to be more dependent to the inflammatory disease.

Mavis blog

2010年11月23日星期二

Bioinformatics in Health Sciences Mini Research Project Final Report

1 則留言:

2010年11月23日 星期二

Bioinformatics in Health Sciences Mini Research Project Final Report

1 則留言:

2010年11月23日星期二