Compsci 6, Timing FUN II, March 2

Name____________________   net-id _________       

Name____________________   net-id _________       

Name____________________   net-id _________       

GCTiming

The questions here are about the code in GCTiming.py in which two algorithms are used to calculate the largest gc-window in a strand of DNA (a likely protein-coding region).

The first data is for function max_index, a plot and fit from Excel is shown for the data on the right. The data is generated using a fixed strand size (10000 nucleotides) and varying the windowsize as shown.

timings strand size = 10000 function = <function max_index > win = 1000 index = 3565 time=1.525 win = 1100 index = 3486 time=1.701 win = 1200 index = 3567 time=1.857 win = 1300 index = 3218 time=1.955 win = 1400 index = 3166 time=2.055 win = 1500 index = 2897 time=2.214 win = 1600 index = 2952 time=2.322 win = 1700 index = 2881 time=2.425 win = 1800 index = 2941 time=2.750 win = 1900 index = 2950 time=2.639 win = 2000 index = 2397 time=3.055 win = 2100 index = 2413 time=3.237 win = 2200 index = 2385 time=3.018 win = 2300 index = 2285 time=3.163 win = 2400 index = 2181 time=3.129 win = 2500 index = 2378 time=3.387 win = 2600 index = 2182 time=3.311 win = 2700 index = 2182 time=3.396 win = 2800 index = 2414 time=3.475 win = 2900 index = 2414 time=3.524 win = 3000 index = 2316 time=3.609

  1. Which is the best explanation as to why increasing the window-size increases the running time (see function max_index for coding details).

    1. The loop over range(0,len(strand)-windowSize+1)) will iterate more times as the value of windowSize increases.

    2. The time to execute the function call cgratio(strand[i:i+windowSize]) increases as the value of windowSize increases.

    3. The time to pass the value of parameter windowSize to the function max_index increases as the value of windowSize increases.

  2. If the window size is fixed, e.g., at 1000 and the size of the strand increases from 10,000 to 80,000, what do you expect to happen to the running time of max_index?

    1. The running time will increase linearly, e.g., as the strand size doubles the running time will double.

    2. The running time will increas quadratically, e.g., as the strand size doubles the running time will quadruple.

    3. The running time depends on the window-size and not on the strand size, so the running time will remain constant.


The timings below are for a fixed strand size of 10,000 as the window-size varies for the function running_max.

function = <function running_max > win = 1000 index = 3565 time=0.008 win = 1100 index = 3486 time=0.005 win = 1200 index = 3567 time=0.006 win = 1300 index = 3218 time=0.005 win = 1400 index = 3166 time=0.006 win = 1500 index = 2897 time=0.005 win = 1600 index = 2952 time=0.005 win = 1700 index = 2881 time=0.006 win = 1800 index = 2941 time=0.005 win = 1900 index = 2950 time=0.005 win = 2000 index = 2397 time=0.006 win = 2100 index = 2413 time=0.005 win = 2200 index = 2385 time=0.007 win = 2300 index = 2285 time=0.005 win = 2400 index = 2181 time=0.005 win = 2500 index = 2378 time=0.006 win = 2600 index = 2182 time=0.005 win = 2700 index = 2182 time=0.005 win = 2800 index = 2414 time=0.005 win = 2900 index = 2414 time=0.006 win = 3000 index = 2316 time=0.005

  1. Which best describes the value stored in counters[i]?

    1. The number of occurrences of 'g' or 'c' in range(0,i).

    2. The number of occurrences of 'g' or 'c' in range(0,i+1).

    3. The number of occurrences of 'g' or 'c' in range(i,len(strand)).

  2. Which best explains why the running time is roughly constant?

    1. The running time of the two loops in running_max do not depend on the value of windowSize at all.

    2. The first loop in running_max does not depend on the value of windowSize and the second loop does roughly the same amount of "work" as the first loop even though its running time will depend to some extent on the value of windowSize

    3. The running time is not constant, but it's so small that the timings don't truly show the variation, that's why the times go up and down as the window size increases.

    4. True or False: the running time of running_max will increase linearly with a fixed window-size (e.g., 1000) as the size of the strand varies (e.g., from 10,000 to 100,000).