merge
method so that strands are correctly
removed and added to the strand collection when merging occurs.
After you've added this code correctly, you can test your code with the simplefasta.txt file which should create this strand:
tgaaaattcctttctattttaggcccatgcaatggcattagggcggttaaYou can also test with the largefasta.txt file which should run quickly. However, if you test with the hugefasta.txt file, you'll find that your program takes a long time to run. If you look at the code in ShotgunReconstructor you'll see that the threshold is hard-wired to be 12. Eventually you'll need to change this so that the value can be set in the main method from Shotgun.
You'll need to make two changes in this version of the program.
regionMatches
and/or other string methods
to determine if two strands overlap. Your goal is to make
to develop an implementation that is correct, but which yields
much faster shotgun reconstruction. The points/score you
earn on this part are based 80% on correctness and 20%
on the efficiency of your code (it should be faster).
You'll need to create a new IStrandFactory implementation as well, and configure the reading class appropriately.
contains
method always return false
and moving its functionality into the merge
method.
(This is a code-refactoring). In this case you'll want
to change code in
ShotgunReconstructor
too so that the reduce
method doesn't iterator
over all strands, but simply returns false.
This should save lots of time since you'll avoid iterating over
the strings twice: once for contains and once for merge. Since
the merge
method already looks for matches, you could
change it to look for containing matches too. In this case
you can return the containing strand rather than creating
a new one.
In your final README/write-up you should discuss if your refactoring was successful, and whether it reduced the time for completing a shotgun reconstruction.
IStrand
implementation that is based on
the Sequence returned when reading, this will avoid, perhaps, creating
strings for storing the DNA strands.
The new IStrand
implementation will store
the Sequence, and use it and other biojava objects rather than
Strings to do the merging and containing. You'll need to
examine base-pairs in a sequence one-at-a-time, you might want
to develop a method similar to the String method
regionMatches
for this purpose.