Randomized Algorithms

 

* the Minimum cut problem  (Chapter 1.1, 10.2)

  A simple, fast monte-carlo algorithm

 

===========================================================================

 

THE MINIMUM CUT PROBLEM

 

Given a graph G, a CUT is a set of edges whose removal splits the

graph into at least two connected components.  The MINIMUM CUT is the

cut of minimum size.  For instance, say this graph represents a

network, and we want to know how "robust" it is in the sense of the

the minimum number of links whose failure causes the network to become

disconnected.   The minimum cut problem is to find a cut of minimum size.

 

Easy fact: the minimum cut has size at most the minimum degree of any node.

 

You're probably more familiar with the "s-t minimum cut" problem,

where you want a cut that separates to specific vertices s and t,

since this is the dual to the max flow problem.  In fact, for a while,

the best algorithm for finding the global minimum cut was to solve the

s-t cut problem n times.

 

Here's a really simple randomized algorithm:  [due to D. Karger]

 

THE SIMPLE ALGORITHM  (view #1)

 

  1. Pick an edge (x,y) at random in G.

 

  2. CONTRACT the edge, keeping multi-edges, but removing self-loops.

 

     (i.e., if there were edges (x,v) and (y,v), we keep both of them.)

 

  3. If there are more than 2 nodes, go back to 1.  Else, output the

     edges remaining as your cut.

 

  (Note: step 1 is NOT the same as picking two connected vertices x,y

     at random.  Why?)

 

 

Do example on graph:     *-------*

                        /|       |\

                       / |       | \

                      *  |       |  *

                       \ |       | /

                        \|       |/

                         *-------*

 

Easy algorithm, want to show it has a reasonable (1/n^2) chance of working.

(So, run n^2 * log n times to get success w.h.p.)

 

EQUIVALENT FORMULATION:

 

   1. Randomly rank the edges.

 

   2. Use Kruskal to find minimum spanning tree (i.e., start with

     lighted edge, then put in next lightest that connects two different

     components, etc.)

 

 

   3. remove the last edge.  This gives you the two components.

 

 

 

ANALYSIS

--------

1) Say we fix some cut C.  So long as none of edges in C have been

selected, C remains a cut.  Reason: only contract edges, so things on

left side of C stay on left side, and things on right side stay on

right side.

 

2) in the course of the algorithm, does the min cut size go down or

up?  Claim: any cut in the new graph is also a cut in the original

graph.  Why?  So, min cut size can't go down.

 

3) Say original graph has min cut of size k.  Claim: when have n-i

nodes remaining (for some i>= 0), there are at least (n-i)k/2 edges in

the graph.  Why?

 

   The point of (3) is that we want to have lots of edges to reduce

   the chance that we kill our favorite cut.

 

So,..... Fix some cut C of size k.  What is the prob that C survives

the first round?  

 

     1 - k/(# edges) >=  1 - k/(nk/2)  =  1 - 2/n.

 

Suppose it's survived so far, and there are now n-i vertices.  Prov of

survival is at least:

                     1 - 2/(n-i).

 

So, prob that C survives overall is at least:

 

     Prod_{i=0}^{n-3} (1 - 2/(n-i))

 

     = (n-2)/n * (n-3)/(n-1) * (n-4)/(n-2) * ... * 1/3

 

        = 2/(n(n-1))

 

Neat implication: How many minimum cuts are there? A: at most n(n-1)/2.

 

Running time: can implement in O(m log m) time, or O(n^2) time,

results in O(mn^2 log^2 n) overall if we run it O(n^2 log(n)) times.

Simple, but not all THAT fast.  How can we speed all this up?

 

 

SPEEDING UP THE ALGORITHM   [Due to D. Karger and C. Stein]

-------------------------

 

Claim: earlier steps are less risky than later steps.  Why?

 

At what point is our chance of having lost C about 50/50? 

 

Ans: when we have about n/sqrt(2) vertices left.  By that time, our

success probability is about:

 

          (n/sqrt(2))^2 / n^2 = 1/2.

 

So, this suggests a speedup:

 

     Instead of repeating from the start, let's repeat from only

     partway down.  E.g., from when we had just n/sqrt(2) vertices left.

 

Idea: build a binary tree.

 

     Depth is 2lg(n).  # leaves is n^2.

 

  Can think of as: each edge is independently destroyed with prob 1/2.

  What's the probability some path to a leaf survives?

 

     (can think of the strategy of doing n^2 repititions as a root

     with n^2 branches leading, in depth 2lg(n) to n^2 leaves.)

 

 

         

 

First, what is running time:

 

     T(n) = O(n^2) + 2T(n/sqrt(2)) = O(n^2 log n)

 

 

Now: want to show that a path survives with probability at least 1/log(n).

 

Proof:

 

 For tree of depth d, let P_d be the probability a path survives.

Looking recursively, we get:

 

     P_d = (1/2) P_{d-1} + (1/4)(1 - (1-P_{d-1})^2)

 

            = (1/2) P_{d-1} + (1/4)(2P_{d-1} - (P_{d-1})^2)

 

            = P_{d-1} - (1/4) (P_{d-1})^2.

 

We can now verify that P_d > 1/d, since:

  

              1/(d-1) - 1/(4(d-1)^2)  >  1/(d-1) - 1/(d(d-1)) = 1/d

 

 

So, we can just run our algorithm log^2(n) times to get good

probability of success.

 

 

One good way of looking at what we've shown is this:  we have shown

that we can replace n^2 indepdendent runs of the algorithm by n^2

DEPENDENT runs of the algorithm in a way that drastically reduces

running time ((n^2 * cost-of-one-run) versus (log(n) * cost-of-one-run),

but only slightly reduces the chance of success (constant versus 1/log(n)).