The Sorting Cheat Sheet

J. Forbes

This handout is intended as a very rough guide. It's for those of you who want the quick and dirty answers to what how does sorting algorithm "x" work and how does it compare to the other ones we've learned. If you want to really understand these algorithms, you should look at the code and notes from lecture.

For any particular sorting algorithm, we want to do it fast. In practice, constant factors and ease of implementation can be important in picking an algorithm. We'll assume for all of these algorithms that our input is presented in an array of length n.

AVERAGE CASE TIME
given an arbitrary input, what do we expect the running time to be
WORST CASE TIME
for a particular degenerate case, how bad will the algorithm perform
BEST CASE TIME
for a particularly benevolent input case, what is the best case performance. This will always be at least O(n). Why?
SITUATIONS WHERE USEFUL
for what inputs and applications, is this kind of sort useful

Basic sorts


The "basic" sorts generally make n passes through the input vector An important measure for any of these sorts is:
SITUATION AFTER ITH PASS
what the vector looks like after the going through the vector i times

Insertion sort

insert an as-yet-unprocessed record into a (sorted) list of the records processed so far

Shell's sort

The problem with insertion sort is that elements can only be moved one slot at a time. In the reverse order case, we had to move each element all the way down the vector each time. Shell's sort makes things better on average by sorting the vector with various "strides."

Selection sort

Just find the minimum of the remaining elements each time.

Bubble sort

On each pass, if two elements are out of order swap them. You're done when you can go through the entire vector with no swaps.

Divide and Conquer Sorts


Well, n2 or even n1.25 is good, but we can do better. In fact, we can prove that we can do better.
BASE CASE
the end of the recursion

DIVIDE STEP
splitting up the problem into more manageable pieces

Merge sort

Quick sort

An interesting result is that for any sort which relies on just comparisons, you will always have to do O(nlogn) comparisons in the worst case. Why is this?


"Restricted" Sorts


If we only rely on comparisons, our algorithm must take O(nlogn) worst case time. When we know more about our input, we can possibly make our bound O(n).
RESTRICTION
in what form do the input elements need to be.

K FACTOR
how does the form of the input elements affect running time

Distribution sort

Distribution sort is useful when the records to be sorted are in small range of integers or other cardinal values. So let's say our numbers are between U and L. Thus, we store our range (k) as U-L+1. Set counting vector of size k to all zeroes For each of n in original set (n of them) increment C[i] every time we see the number L+i. (O(1)) // C[i] now contains elements equal to L+i For each element in counting vector (k of them) C[j] += C[j-1] // C[i] now has contains the number of elements <= L+i // Copy elements from original vector A into new vector B into a place // spot designated by C[i] For each of n in original set (n of them) put element in new vector B from A[i] in the spot dictated by C[A[i]-L] since C[A[i]-L] is the number of elements less than A[i], that will be a proper spot for the sorted A[i] increment count vector to correspond with new position for next example of A[i]

Radix sort

For each digit going least significant to most significant: For all n numbers Place number in proper bin Distribution sort each bin Concatenate sorted results from each bin and repeat

Bucket sort

Bucket sort assumes that the input is generated by some random process which uniformly distributes the numbers over some interval. Divide interval into n equally sized buckets. for each element in original vector insert sorted into proper bucket O(1 + length of list in bucket) concatenate the lists from all of the buckets together

If we have an input of size n, we have n buckets. On average each bucket will have 1 element, so the length of the list in each bucket is 1. We can then do insertion sort for a particular bucket in constant time.


Jeff Forbes
Last modified: Tue Nov 4 12:11:58 EST 2003