Randomized Algorithms

Today: 2 important ideas:

* k-wise independence.

-- Computing average while preserving secrecry

* linearity of expectation

-- Gambler's odds problem

-- analyzing randomized quicksort

=============================================================================

FINDING THE AVERAGE GRADE, WITHOUT GIVING AWAY YOUR OWN SCORE

-------------------------------------------------------------

* The problem: A group of N friends wants to determine their average

score on a test, but nobody wants to reveal their own score.

* Even if K of the N "friends" are dishonest, then they should still

learn no more than the sum of the remaining N-K honest ones (which

they can figure out anyway from the answer). [in a sec we'll get back

to the question of what it means to say that the protocol leaks no

additional informatgion].

Here's a protocol: We'll compute the sum mod M. M is some

sufficiently large number greater than the sum. E.g., M=101*N. Say

player i has secret S. Player i chooses N-1 numbers X_1, X_2, ...,

X_{N-1} independently at random in the range 0,...,M-1 and writes them

each on a piece of paper. Then he chooses Y in this range so that X_1

+ ... + X_{N-1} + Y = S mod M, and writes Y on a piece of paper.

Player i then distributes N-1 of these to the others (one per person)

and keeps one. Then, player i takes the number he kept, plus all of

the others he received, and adds them up (mod M), and announces the

result. Finally, all players add up the announced results (mod M),

which is the desired sum.

Analysis: Can see that the sum is right by writing out the matrix of

values. But, how do we show that this is secure?

First some definitions: Two events A and B are INDEPENDENT if Pr[A and

B] = Pr[A] * Pr[B]. Easier way to think of it: Pr[A | B] = Pr[A]. More

generally a collection of events are independent if for any subset,

the probability of their intersection is the product of their

probabilities. A collection of events is K-WISE INDEPENDENT if any

subset of K of them is independent.

Similarly, random variables X, Y are independent if for each x,y

we have events X=x and Y=y independent. K-wise independence is

defined in the same way as with events.

Some useful facts: Say X is a random variable uniformly distributed in

S = {0, 1, ... , m-1}. Then, clearly for any fixed a, the variable Y = X

+ a (mod m) is also uniform in S (though X and Y are dependent). If

X_1, ..., X_k are independent r.v's uniform in S and Y is defined to

be a - (X_1 + ... X_k) (mod m) for some fixed a, then these k+1

r.v.s are k-wise independent. Just check that Pr[Y = y | X_2 = x_2,

X_3 = x_3, ..., X_k = x_k] = 1/m.

Proof for protocol: Claim is secure in following sense: for any set of

t honest players, only information that the rest of players have, even

if they cheat, is the sum of the t players' values. What does this

really mean? What is "information"? What we mean is that if you

modify the values for these players keeping the sum the same, then the

probability distribution on the sequence of outputs produced by those

players doesn't change. Proof: consider the t all sent their Y value

to the same honest player. (Doesn't change protocol). For any fixed

sequence of coin tosses, all sets of t values with the same sum lead

to exactly the same information revealed to the other players. So,

all sets with the same sum have the same distribution.

GAMBLER'S ODDS PROBLEM

----------------------

Say you go into casino. Say has fair games: at any time t can bet

some amount and for even money you have 50/50 chance of winning.

Say you have some strategy - determines how much to bet and when and

when to leave, but must leave by end of the day. What's best strategy

for highest expected amount by the end of the day?

Answer: it doesn't matter. All strategies result in same expected amount.

* Defn of expectation: E[X] = SUM_a[a * Pr(X=a)].

* Let's define the random variable X_i to be the gain at time i. If X

is the random variable denoting the amount we

win by the end of the day, then X = X_1 + X_2 + ... + X_N, where

N=#seconds in a day. Note that these X_i's could be horribly

dependent (e.g., depending on whether you won or lost in your last

bet, you might bet more or less). Nonetheless, if we just look at one

of the X_i, it's easy to see that E[X_i] = 0. For instance: let p_ij

be the apriori probability that you will bet j dollars at time at time

i (which might be horrible to calculate, but that's ok, we don't need

to). So, E[X_i] = SUM_j[j * p_ij / 2 + (-j) * p_ij / 2] = 0.

Now, use linearity of expectation: E[X] = E[X_1 + ... + X_N] = E[X_1]

+ ... + E[X_N] = 0.

(Lesson: simplify by setting up simple random variables and using

linearity of expectation).

Similar problem: Say there is a stock that's at some price x. Each

day, it goes up or down by $1, indep of past, unless it's at 0 in

which case it stays there. You have some amount of money m. Each day

you can by or sell as much as you want, until when you die your money

is converted back into cash. Is there a strategy such that the

expected amout of money you will have at death is more than what you

started with? No.

LINEARITY OF EXPECTATION

------------------------

Want to show that for 2 random variables X & Y,

E[X+Y] = E[X] + E[Y]

even if X and Y are dependent.

Let's assume X and Y take on discrete values so that can use

summations instead of integrals.

E[X+Y] = SUM_a [ a * Pr(X+Y = a) ]

= SUM_(b,c) [ (b+c) * Pr(X = b and Y = c) ]

(just splitting the event that the sum is "a" into all the

(disjoint) ways that this could occur)

= SUM_b SUM_c [ b * Pr(X=b and Y=c)] +

SUM_c SUM_b [ c * Pr(X=b and Y=c)]

= SUM_b [b * Pr(X=b)] + SUM_c [c * Pr(Y=c)]

RANDOMIZED QUICKSORT

--------------------

--> describe algorithm. (assume no equal values for simplicity)

--> what is the expected number of comparisons?

Define random variable let e_1 be the smallest element, e_n be the

largest. Define random variable X_ij = 1 if e_i is compared with e_j

in running the algorithm.

So, expected number of comparisons is just SUM_{i<j} E[X_ij], by using

linearity of expectation. E[X_ij] = p_ij, the probability that e_i is

compared to e_j.

p_ij is easy to calculate. Consider the elements e_i, e_{i+1}, ...,

e_j. We compare e_i and e_j if and only if we we pick one of them asa

pivot before choosing any of the other elements in this list as a

pivot. So, p_ij = 2/(j-i+1).

So, expected cost is at most

SUM_{i=1 to n} SUM_{denominator=1 to n-i+1} 2/denominator

< 2 * n * SUM_{k=1 to n} 1/k

= 2 * n * H_n, where H_n is approx ln(n).

A strange puzzle

----------------

Consider the following game. We have a probability distribution on

the following countably infinite deck of cards.

Card # front back probability

------ ----- ---- -----------

1 1 3 1/2

2 3 9 1/4

3 9 27 1/8

... ... ... ...

A card is selected according to this distribution, and you are shown,

at random, one of its two sides. (Let's say that the number you see

is X, and the one on the other side is Y). You may now decide to "keep" or

"flip". If you decide "keep", you gain X-Y. If you decide "flip",

then you gain Y-X.

What to do? Well, if you see a 1, you definitely should flip. But

what if you see a 3? In this case, with probability 2/3, Y is 1 and

with probability 1/3, Y is 9. (More generally, if you see X>1, then

with prob 2/3, Y is X/3 and with prob 1/3, Y is 3X). So the expected

gain of "flip" is 2/3 * (-2) + 1/3 * 6 = 2/3 > 0. So you should flip.

But, we are saying that no matter what you see, you should flip. So,

might as well flip before observing anything. But, this is just like

not flipping since you were shown a random one of the two sides....

So, what gives.....?