From what you have learned in class, one simple change in a nucleotide changes everything. The codon that usually codes for glutamate, is instead changed to a valine. What results is a drastic change from a normal looking red blood cell to a sickle-shaped red blood cell. Let's try to find the gene.
You, the investigator, currently only has two pieces of data. You know that (a) sickle cell is a point mutation that affects red blood cells, and that (b) the following sequence:
tgggggatat tatgaagggc cttgagcatt tggattctgc
is located within the gene that sickle cell affects. You have no idea where sickle cell begins (e.g. in which protein), except for the fact that it is found in the red blood cell.
To figure out what gene you need to target, blast the sequence by going to:
http://www.ncbi.nlm.nih.gov/BLAST.
and click on “nucleotide blast”— then click on the blastn
tab, then under Database choose Reference mRNA sequences.
Then click BLAST! It make take a while, since there are many requests being
made at any time. Look at the top hit that is human (homo sapiens).
what does this gene code for? _______________________________________________________________________
Notice that this sequence is mRNA, which is ultimately turned into protein. Notice, that the BLAST query also supplies you with the amino acid sequence. Write down the first 15 amino acids from your search.
First 15 amino acids: ____________________________________
What occurs during sickle cell is that there is a change from a glutamate to a valine. Notice that the first string below is a substring of the 15 amino acids you wrote above.
Your job is to hunt down whether or not a valine exists within your DNA code. Say your patient has the following sequence:
CAC CTG ACT CCT GTG GAG AAG TCT GCC
Your goal is simply to see if a valine exists within your sequence. Write down the amino acids below for the given DNA sequence. Use the table below.
TTT | Phe | TCT | Ser | TAT | Tyr | TGT | Cys | |||
TTC | Phe | TCC | Ser | TAC | Tyr | TGC | Cys | |||
TTA | Leu | TCA | Ser | TAA | STOP | TGA | STOP | |||
TTG | Leu | TCG | Ser | TAG | STOP | TGG | Trp | |||
CTT | Leu | CCT | Pro | CAT | His | CGT | Arg | |||
CTC | Leu | CCC | Pro | CAC | His | CGC | Arg | |||
CTA | Leu | CCA | Pro | CAA | Gln | CGA | Arg | |||
CTG | Leu | CCG | Pro | CAG | Gln | CGG | Arg | |||
ATT | Ile | ACT | Thr | AAT | Asn | AGT | Ser | |||
ATC | Ile | ACC | Thr | AAC | Asn | AGC | Ser | |||
ATA | Ile | ACA | Thr | AAA | Lys | AGA | Arg | |||
ATG | Met* | ACG | Thr | AAG | Lys | AGG | Arg | |||
GTT | Val | GCT | Ala | GAT | Asp | GGT | Gly | |||
GTC | Val | GCC | Ala | GAC | Asp | GGC | Gly | |||
GTA | Val | GCA | Ala | GAA | Glu | GGA | Gly | |||
GTG | Val | GCG | Ala | GAG | Glu | GGG | Gly |
amino acid sequence for codons above: ______________________________________
Does your patient contain a valine (and thus the mutation for sickle cell)? YES / NO
What is the index of this codon within your DNA sequence? __________
Searching a sequence for one type of amino acid can be tedious by hand. Let's write a program to do the same thing.
Given a strand of DNA and a query amino-acid symbol, return the index of the first location of a codon in the strand that codes for the amino-acid. Return -1 if there is no codon that codes for the amino acid. Take a look at the assignment here.
1. You are going to store the codons and their corresponding amino acids into two String arrays.
List of codons: String codons [] = {"TTT", "TTC", "TTA", "TTG", "TCT", "TCC", "TCA", "TCG", "TAT", "TAC", "TGT", "TGC", "TGG", "CTT", "CTC", "CTA", "CTG", "CCT", "CCC", "CCA", "CCG", "CAT", "CAC", "CAA", "CAG", "CGT", "CGC", "CGA", "CGG", "ATT", "ATC", "ATA", "ATG", "ACT", "ACC", "ACA", "ACG", "AAT", "AAC", "AAA", "AAG", "AGT", "AGC", "AGA", "AGG", "GTT", "GTC", "GTA", "GTG", "GCT", "GCC", "GCA", "GCG", "GAT", "GAC", "GAA", "GAG", "GGT", "GGC", "GGA", "GGG"}; Corresponding amino-acids for each codon above: String aas [] = {"F", "F", "L", "L", "S", "S", "S", "S", "Y", "Y", "C", "C", "W", "L", "L", "L", "L", "P", "P", "P", "P", "H", "H", "Q", "Q", "R", "R", "R", "R", "I", "I", "I", "M", "T", "T", "T", "T", "N", "N", "K", "K", "S", "S", "R", "R", "V", "V", "V", "V", "A", "A", "A", "A", "D", "D", "E", "E", "G", "G", "G", "G"};
Thus, the two arrays will look like this:
index | 0 | 1 | 2 | 3 | 4 | . . . |
codons | "TTT" | "TTC" | "TTA" | "TTG" | "TCT" | . . . |
a.a.s | "F" | "F" | "L" | "L" | "S" | . . . |
2. The general body of code will be as follows:
String, String
int
public int find(String strand, String aa)(be sure your method is public)
public class ProteinLocater { public int find(String strand, String aa) { // fill in code here } }
Your job is to fill in find. Make sure you keep the following constraints in mind:
2. What you need to do:
3. Tips and tricks:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
A | B | C | D | E | F | G | H | I | J | K |
TEST APT
Tiffany Chen
APT taken from here
information about sickle cell:
http://www.carnegieinstitution.org/first_light_case/horn/lessons/sickle.html
http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/C/Codons.html
Last Revised: 26 Nov 2007