Generally three base-pairs in DNA code for proteins (the process is complex, but the general idea is simple). To help build your nascent Java programming skills, you'll work on the process of generating the linear protein structure from a digital-representation of DNA. Your task is to do this in a straightforward way, without using complicated Java data structures or code --- basically using the language tools we've talked about and seen in class. This will involve some code-drudgery, but we'll use this drudgery as a motivator for exploring better (and more exciting!) ways of solving the problem.
For example, if your program processes this strand:
CGATGCATCCCTTTAATTAAit should return
HPFNwhich represents the protein sequence Histidine, Proline, Phenylalanine, Asparagine.
You'll code by yourself for one week, then have a partner for one week. You'll need to turn in Part 1 working by yourself.
You must write a class named DnaToProtein
with a method named
convert
that converts a String representing a Strand of DNA to
a String representing the first-found protein as describe above. In
brief, find the first codon, find a stop codon after this, and convert
the codons between these. If no start or stop codons are found return an
empty string: "".
In implementing DnaToProtein
you must write and use another
class named CodonToProtein
which is started here. You'll
need to complete the method convert
shown in the code. The
comments are intended to be enough of a specification, if you have
questions, use the class bulletin board.
To find the start and stop codons you must use one of two String methods
indexOf
that search for one String in another. The
specification for these can be found in in the online
javadoc for java.lang.String which is part of Javadoc for all
Java classes. You'll need the two-parameter indexOf
method
to find a stop codon that occurs after the start codon.
DnaToProteinTester
has been started for you. You
should fill in the main
method with as many test cases as
you think are needed to test your code (and to test anyone else's code).
When testing the code you write to convert DNA to a protein, you can use
the testing class as well as the class that actually does the
conversion. Ultimately you'll want to run just the testing class since
it will thoroughly test your code every time it's run.
After you've tested your program and are confident that it works you
should change the method signature of the method convert
in
CodonToProtein
. Currently it takes a String
parameter. This means you'll have used the substring
method
of the String class to find codon triplets. As we'll discuss, this isn't
very efficient and for large strands of DNA this efficiency could be
important. Instead, the code in CodonToProtein should be
rewritten to use the String method regionMatches
. You'll
need to pass the entire String/DNA-strand to the convert
method in CodonToProtein
and an index at which the codon
being checked starts. The general idea is to avoid creating substrings
in the DnaToProtein
code you write. Instead, pass an entire
String and an index for each codon converted in the process of
converting a region of DNA to protein. As your code in
DnaToProtein
loops over codons between the start and stop
codon, the index passed will change, but the DNA strand passed remains
the same.
For more extra credit you'll write and test new methods to find
all proteins.
For A+/Extra credit, create a new method in DnaToProtein
that returns all the proteins found in a string of DNA. This means
you'll need to find every start/corresponding stop codon and convert the
codon-regions to proteins. Your new method should return an array of
String object, where the String stored at index zero is the first
protein found, the String stored at index one is the second protein
found, etc. If no proteins are found return an array with no elements.
Your new method should be named convertAll
, its signature
follows.
convert
method in DnaToProtein
that you've already written and
tested. In addition to writing the convertAll
code you'll
need to write testing code. You should add a new series of
tests in the class DnaToProteinTester
. To facilitate
passing
an array representing the correct proteins, see the valid Java code
fragment that follows for testing a DNA strand with no proteins and
one with two proteins.
convertAll
code should not
create any substrings. Don't worry about this part at first. You should
write and test convertAll
and when you're confident it
works properly, try to think of a refactoring that avoids substring
creation.
DnaToProtein
, DnaToCodon
, and
DnaToProteinTester
. Grading criteria follows.
First Code | ||
---|---|---|
DnaToProtein | 10 points | Works correctly (7 points), well-structured and written (3 points) |
DnaToCodon | 4 points | All cases covered well-structured |
testing | 10 points | thorough and complete |
README | 2 points | exists, complete |
Refactored Code | ||
DnaToProtein | 6 points | Works correctly (4 points), well-structured and written (2 points) |
DnaToCodon | 2 points | All cases covered well-structured |
README | 2 points | exists and complete |
Extra Credit | ||
DnaToProtein/all | 10 points | Works correctly (7 points), well-structed and written (3 points) |
testing | 10 points | thorough and complete |
testing breaks other code | +2 per valid test | if valid tests causes other code to fail |