Task 1 Research topic:
Sequential Alignment
Task 2 Journal review:
Nucleic Acids Research – Analysis of protein sequence and interaction Data for candidate disease gene prediction
Summary:
Inflammatory bowel disease (IBD) is a group of inflammatory conditions of the colon and small intestine, new evidence show that IBD may have an elevated risk of endothelial dysfunction and coronary artery disease. Hence is it crucial to know the functional loci and to identify the relationship between the significance of the functional loci.
Task 3:
By the use of National Center for Biotechnology Information (NCBL) data bank, choose two relatively similar structure and sequence to distinguish the significance.
Task 4:
Method:
Two similar bowel protein sequences are chosen from NCBI data bank, use GenBank, EMBL and FASTA formats for comparison; follow by format FASTA to copy both sequence data to the notepad. Result will be shown in graphics showing the genome/chromosome map location of the gene. Further run both structures using bioinformatics tool BLASTp and compare the similarities of both sequences.
Task 5:
The following proteases are selected from National Center for Biotechnology Information (NCBI):
1. NCBI Reference Sequence: NP_033429.1
2. NCBI Reference Sequence: NP_005840.2
FASTA format of both sequences:
First sequence:
NCBI Reference Sequence: NP_005840.2
>gi|93004094|ref|NP_005840.2| immunoglobulin superfamily member 6 precursor [Homo sapiens]MGTASRSNIARHLQTNLILFCVGAVGACTLSVTQPWYLEVDYTHEAVTIKCTFSATGCPSEQPTCLWFRYGAHQPENLCLDGCKSEADKFTVREALKENQVSLTVNRVTSNDSAIYICGIAFPSVPEARAKQTGGGTTLVVREIKLLSKELRSFLTALVSLLSVYVTGVCVAFILLSKSKSNPLRNKEIKEDSQKKKSARRIFQEIAQELYHKRHVETNQQSEKDNNTYENRRVLSNYERPSecond sequence:
NCBI Reference Sequence: NP_033429.1
>gi|6678387|ref|NP_033429.1| tumor necrosis factor ligand superfamily member 8 [Mus musculus]MEPGLQQAGSCGAPSPDPAMQVQPGSVASPWRSTRPWRSTSRSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSEDLFCTLKSTPSKKSWAYLQVSKHLNNTKLSWNEDGTIHGLIYQDGNLIVQFPGLYFIVCQLQFLVQCSNHSVDLTLQLLINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLDNVLSVFLYSSSDBy running BLASTp tool, the following result is obtained:
BLAST
Basic Local Alignment Search Tool
Blast 2 sequences:
Protein Sequence (241 letters)
Results for: Your BLAST job specified more than one input sequence. This box lets you choose which input sequence to show BLAST results for.
Query ID: lcl|25993
Description: None
Molecule type: amino acid
Query Length: 241
Subject ID: 25995
Description: None
Molecule type: amino acid
Subject Length: 239
Program: BLASTP 2.2.24 + Citation
Search parameter name Search parameter value
Program: blastp
Word size: 3
Expect value: 10
Hitlist size: 100
Gapcosts: 11,1
Matrix: BLOSUM62
Filter string: F
Genetic Code: 1
Window Size: 40
Threshold: 11
Composition-based stats: 2
Karlin-Altschul statistics
Params Ungapped Gapped
Lambda 0.317322 0.267
K 0.130185 0.041
H 0.38006 0.14
Results Statistics
Results Statistics parameter name Results Statistics parameter value
Effective search space 47088
Graphic Summary
Distribution of 2 Blast Hits on the Query Sequence
An overview of the database sequences aligned to the query sequence is shown. The score of each alignment is indicated by one of five different colors, which divides the range of scores into five groups. Multiple alignments on the same database sequence are connected by a striped line. Mousing over a hit sequence causes the definition and score to be shown in the window at the top, clicking on a hit sequence takes the user to the associated alignments. New: This graphic is an overview of database sequences aligned to the query sequence. Alignments are color-coded by score, within one of five score ranges. Multiple alignments on the same database sequence are connected by a dashed line. Mousing over an alignment shows the alignment definition and score in the box at the top. Clicking an alignment displays the alignment detail.
Dot Matrix View
Plot of lcl|25993 vs 25995
This dot matrix view shows regions of similarity based upon the BLAST results. The query sequence is represented on the X-axis and the numbers represent the bases/residues of the query. The subject is represented on the Y-axis and again the numbers represent the bases/residues of the subject. Alignments are shown in the plot as lines. Plus strand and protein matches are slanted from the bottom left to the upper right corner, minus strand matches are slanted from the upper left to the lower right. The number of lines shown in the plot is the same as the number of alignments found by BLAST.
Descriptions
Legend for links to other resources:
UniGene GEO Gene Structure Map Viewer PubChem BioAssay
Sequences producing significant alignments:
Accession Description Max score Total score Query coverage Evalue Links
25995 unnamed protein product 21.9 38.5 36% 0.012
Alignments
>lcl|25995 unnamed protein product
Length=239
Sort alignments for this subject sequence by:
E value Score Percent identity Query start position Subject start position
Score = 21.9 bits (45), Expect = 0.012, Method: Compositional matrix adjust.
Identities = 17/66 (26%), Positives = 29/66 (44%), Gaps = 11/66 (16%)
Query 184
LRNKEIKEDS-----QKKKSARRIFQEIAQELYHKRHVETNQQSEKDN------NTYENR 232
L N +IK+ + + ++ I+Q ++Q L H V + DN NT+
LINSKIKKQTLVTVCESGVQSKNIYQNLSQFLLHYLQVNSTISVRVDNFQYVDTNTFPLD 238
Sbjct 168
Query 233
RVLSNY 238
VLS +
NVLSVF 233
Sbjct 228
Score = 16.5 bits (31), Expect = 0.54, Method: Compositional matrix adjust.
Identities = 11/49 (23%), Positives = 19/49 (39%), Gaps = 8/49 (16%)
Query 152
RSFLTALVSLLSVYVTGVCVAFILLSKSKSN--------PLRNKEIKED 192
RS+ +kL V nVj+ +L+m+ + Kkj PL+ ED RSYFYLSTTALVCLVVAVAIILVLVVQKKDSTPNTTEKAPLKGGNCSED 90
Sbjct 42
Results and Discussion:
After comparing the two selected sequences using BLAST, result show alignment between both sequences, this also suggests that there is a relationship between both sequences and the bowel disease.
For further research, additional sequences are required for analysis to compare the result of the current sequences. If the additional sequences show to be more relevant than the current sequences, hence the additional sequences are said to be more dependent to the inflammatory disease; however if the additional sequences are not as relevant as the current sequences, the current sequences are said to be more dependent to the inflammatory disease.