2010年11月23日 星期二

Bioinformatics in Health Sciences Mini Research Project Final Report

Task 1 Research topic:
Sequential Alignment

Task 2 Journal review:
Nucleic Acids Research – Analysis of protein sequence and interaction Data for candidate disease gene prediction
Summary:
Inflammatory bowel disease (IBD) is a group of inflammatory conditions of the colon and small intestine, new evidence show that IBD may have an elevated risk of endothelial dysfunction and coronary artery disease.  Hence is it crucial to know the functional loci and to identify the relationship between the significance of the functional loci.

Task 3:
By the use of National Center for Biotechnology Information (NCBL) data bank, choose two relatively similar structure and sequence to distinguish the significance.

Task 4:
Method:
        Two similar bowel protein sequences are chosen from NCBI data bank, use GenBank, EMBL and FASTA formats for comparison; follow by format FASTA to copy both sequence data to the notepad.  Result will be shown in graphics showing the genome/chromosome map location of the gene.  Further run both structures using bioinformatics tool BLASTp and compare the similarities of both sequences.

Task 5:
The following proteases are selected from National Center for Biotechnology Information (NCBI):
1.    NCBI Reference Sequence: NP_033429.1
2.    NCBI Reference Sequence: NP_005840.2


FASTA format of both sequences:

First sequence:
NCBI Reference Sequence: NP_005840.2
>gi|93004094|ref|NP_005840.2| immunoglobulin superfamily member 6 precursor [Homo sapiens]
MGTASRSNIARHLQTNLILFCVGAVGACTLSVTQPWYLEVDYTHEAVTIKCTFSATGCPSEQPTCLWFRYGAHQPENLCLDGCKSEADKFTVREALKENQVSLTVNRVTSNDSAIYICGIAFPSVPEARAKQTGGGTTLVVREIKLLSKELRSFLTALVSLLSVYVTGVCVAFILLSKSKSNPLRNKEIKEDSQKKKSARRIFQEIAQELYHKRHVETNQQSEKDNNTYENRRVLSNYERP

Second sequence:
NCBI Reference Sequence: NP_033429.1
>gi|6678387|ref|NP_033429.1| tumor necrosis factor ligand superfamily member 8 [Mus musculus]
MEPGLQQAGSCGAPSPDPAMQVQPGSVASPWRSTRPWRSTSRSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSEDLFCTLKSTPSKKSWAYLQVSKHLNNTKLSWNEDGTIHGLIYQDGNLIVQFPGLYFIVCQLQFLVQCSNHSVDLTLQLLINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLDNVLSVFLYSSSD

By running BLASTp tool, the following result is obtained:

BLAST
Basic Local Alignment Search Tool
Blast 2 sequences:
Protein Sequence (241 letters)
Results for: Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to show BLAST results for.
Query ID: lcl|25993
Description: None
Molecule type: amino acid
Query Length: 241
Subject ID: 25995
Description: None
Molecule type: amino acid
Subject Length: 239
Program: BLASTP 2.2.24+ Citation
Search parameter name Search parameter value
Program: blastp
Word size: 3
Expect value: 10
Hitlist size: 100
Gapcosts: 11,1
Matrix: BLOSUM62
Filter string: F
Genetic Code: 1
Window Size: 40
Threshold: 11
Composition-based stats: 2
Karlin-Altschul statistics
Params Ungapped Gapped
Lambda   0.317322        0.267
K            0.130185        0.041
H             0.38006 0.14
Results Statistics
Results Statistics parameter name Results Statistics parameter value
Effective search space              47088

Graphic Summary
Distribution of 2 Blast Hits on the Query Sequence
An overview of the database sequences aligned to the query sequence is shown. The score of each alignment is indicated by one of five different colors, which divides the range of scores into five groups. Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence takes the user to the associated alignments. New: This graphic is an overview of database sequences aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple alignments on the same database sequence are connected by a dashed line. Mousing over an alignment shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment detail.
Dot Matrix View
Plot of lcl|25993 vs 25995
This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is represented on the X-axis and the numbers represent the bases/residues of the query. The subject is represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper right corner, minus strand matches are slanted from the upper left to the lower right. The number of lines shown in the plot is the same as the number of alignments found by BLAST.
Descriptions
Legend for links to other resources:
UniGene   GEO       Gene       Structure         Map Viewer    PubChem BioAssay

Sequences producing significant alignments:
Accession Description               Max score              Total score             Query coverage               Evalue Links
25995          unnamed protein product        21.9                        38.5                        36%                                   0.012



Alignments
>lcl|25995 unnamed protein product
Length=239
Sort alignments for this subject sequence by:
E value Score Percent identity Query start position Subject start position

Score = 21.9 bits (45), Expect = 0.012, Method: Compositional matrix adjust.
Identities = 17/66 (26%), Positives = 29/66 (44%), Gaps = 11/66 (16%)
Query 184
LRNKEIKEDS-----QKKKSARRIFQEIAQELYHKRHVETNQQSEKDN------NTYENR         232
L N +IK+ +        ++ I+Q ++Q L H   V +      DN      NT+
LINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLD         238
Sbjct 168

Query 233
RVLSNY                                                               238
   VLS +
NVLSVF                                                                                                                                                                        233
Sbjct 228

Score = 16.5 bits (31), Expect = 0.54, Method: Compositional matrix adjust.
Identities = 11/49 (23%), Positives = 19/49 (39%), Gaps = 8/49 (16%)

Query 152
RSFLTALVSLLSVYVTGVCVAFILLSKSKSN--------PLRNKEIKED  192
RS+     +kL   V nVj +L+m+ + Kkj      PL+     ED RSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSED  90
Sbjct 42

Results and Discussion:
 
After comparing the two selected sequences using BLAST, result show alignment between both sequences, this also suggests that there is a relationship between both sequences and the bowel disease.
For further research, additional sequences are required for analysis to compare the result of the current sequences.  If the additional sequences show to be more relevant than the current sequences, hence the additional sequences are said to be more dependent to the inflammatory disease; however if the additional sequences are not as relevant as the current sequences, the current sequences are said to be more dependent to the inflammatory disease.

1 則留言:

  1. It is unclear why these two protein sequences were selected for analysis.
    What is the implication of good similarity between them.
    Why don't consider protein interaction?
    The research question was not stated.

    Grade: B

    回覆刪除