CompSci 18S – Disease, DNA, and Proteins – FALL 2009


Classwork 12: 10 pts

Disease Hunting

You are a computational disease detective. At your access are sequencers, alignment tools (BLAST), and a knowledge of sickle-cell anemia. You want to figure out if your patient has the mutation for sickle cell.

PART ONE: Hunting Down the Gene--starting at 3 billion base pairs

From what you have learned in class, one simple change in a nucleotide changes everything. The codon that usually codes for glutamate, is instead changed to a valine. What results is a drastic change from a normal looking red blood cell to a sickle-shaped red blood cell. Let's try to find the gene.

You, the investigator, currently only has two pieces of data. You know that (a) sickle cell is a point mutation that affects red blood cells, and that (b) the following sequence:

tgggggatat tatgaagggc cttgagcatt tggattctgc

is located within the gene that sickle cell affects. You have no idea where sickle cell begins (e.g. in which protein), except for the fact that it is found in the red blood cell.

To figure out what gene you need to target, blast the sequence by going to:

http://www.ncbi.nlm.nih.gov/BLAST.

and click on “nucleotide blast”— then click on the blastn tab, then under Database choose Reference mRNA sequences. Then click BLAST! It make take a while, since there are many requests being made at any time. Look at the top hit that is human (homo sapiens).

 

 

what does this gene code for? _______________________________________________________________________

 

Notice that this sequence is mRNA, which is ultimately turned into protein. Notice, that the BLAST query also supplies you with the amino acid sequence. Write down the first 15 amino acids from your search.

First 15 amino acids: ____________________________________

PART TWO: Hunting Down the Mutation--searching strings by hand

What occurs during sickle cell is that there is a change from a glutamate to a valine. Notice that the first string below is a substring of the 15 amino acids you wrote above.

Your job is to hunt down whether or not a valine exists within your DNA code. Say your patient has the following sequence:

CAC CTG ACT CCT GTG GAG AAG TCT GCC

Your goal is simply to see if a valine exists within your sequence. Write down the amino acids below for the given DNA sequence. Use the table below.

The Genetic Code (DNA)

TTT Phe TCT Ser TAT Tyr TGT Cys
TTC Phe TCC Ser TAC Tyr TGC Cys
TTA Leu TCA Ser TAA STOP TGA STOP
TTG Leu TCG Ser TAG STOP TGG Trp
CTT Leu CCT Pro CAT His CGT Arg
CTC Leu CCC Pro CAC His CGC Arg
CTA Leu CCA Pro CAA Gln CGA Arg
CTG Leu CCG Pro CAG Gln CGG Arg
ATT Ile ACT Thr AAT Asn AGT Ser
ATC Ile ACC Thr AAC Asn AGC Ser
ATA Ile ACA Thr AAA Lys AGA Arg
ATG Met* ACG Thr AAG Lys AGG Arg
GTT Val GCT Ala GAT Asp GGT Gly
GTC Val GCC Ala GAC Asp GGC Gly
GTA Val GCA Ala GAA Glu GGA Gly
GTG Val GCG Ala GAG Glu GGG Gly
*When within gene; at beginning of gene, ATG signals start of translation.

amino acid sequence for codons above: ______________________________________

Does your patient contain a valine (and thus the mutation for sickle cell)? YES / NO

What is the index of this codon within your DNA sequence? __________

 

PART THREE: Writing Code to Search for Amino Acids

Searching a sequence for one type of amino acid can be tedious by hand. Let's write a program to do the same thing.

Given a strand of DNA and a query amino-acid symbol, return the index of the first location of a codon in the strand that codes for the amino-acid. Return -1 if there is no codon that codes for the amino acid. Take a look at the assignment here.

1. You are going to store the codons and their corresponding amino acids into two String arrays.

List of codons:
	String codons [] = {"TTT",  "TTC",  "TTA",  "TTG",  "TCT",  "TCC",  "TCA",  "TCG",  
	"TAT",  "TAC",  "TGT",  "TGC",  "TGG",  "CTT",  "CTC",  "CTA",  
	"CTG",  "CCT",  "CCC",  "CCA",  "CCG",  "CAT",  "CAC",  "CAA",  
	"CAG",  "CGT",  "CGC",  "CGA",  "CGG",  "ATT",  "ATC",  "ATA",  
	"ATG",  "ACT",  "ACC",  "ACA",  "ACG",  "AAT",  "AAC",  "AAA",  
	"AAG",  "AGT",  "AGC",  "AGA",  "AGG",  "GTT",  "GTC",  "GTA",  
	"GTG",  "GCT",  "GCC",  "GCA",  "GCG",  "GAT",  "GAC",  "GAA",  
	"GAG",  "GGT",  "GGC",  "GGA",  "GGG"};
    
Corresponding amino-acids for each codon above:

	String aas [] = {"F", "F", "L", "L", "S", "S", "S", "S", 
	"Y", "Y", "C", "C", "W", "L", "L", "L", 
	"L", "P", "P", "P", "P", "H", "H", "Q", 
	"Q", "R", "R", "R", "R", "I", "I", "I", 
	"M", "T", "T", "T", "T", "N", "N", "K", 
	"K", "S", "S", "R", "R", "V", "V", "V", 
	"V", "A", "A", "A", "A", "D", "D", "E", 
	"E", "G", "G", "G", "G"};

Thus, the two arrays will look like this:

index 0 1 2 3 4 . . .
codons "TTT" "TTC" "TTA" "TTG" "TCT" . . .
a.a.s "F" "F" "L" "L" "S" . . .

 

2. The general body of code will be as follows:

public class ProteinLocater { 

public int find(String strand, String aa) { // fill in code here } }

Your job is to fill in find. Make sure you keep the following constraints in mind:

Constraints

2. What you need to do:

3. Tips and tricks:

 

TEST APT




Tiffany Chen
APT taken from here

information about sickle cell:

http://www.carnegieinstitution.org/first_light_case/horn/lessons/sickle.html

http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/C/Codons.html
Last Revised: 26 Nov 2007