Randomized Algorithms
Today: 2 important ideas:
* k-wise independence.
--
Computing average while preserving secrecry
* linearity of expectation
--
Gambler's odds problem
--
analyzing randomized quicksort
=============================================================================
FINDING THE AVERAGE GRADE, WITHOUT GIVING AWAY YOUR OWN
SCORE
-------------------------------------------------------------
* The problem: A group of N friends wants to determine
their average
score on a test, but nobody wants to reveal their own
score.
* Even if K of the N "friends" are dishonest,
then they should still
learn no more than the sum of the remaining N-K honest
ones (which
they can figure out anyway from the answer). [in a sec
we'll get back
to the question of what it means to say that the protocol
leaks no
additional informatgion].
Here's a protocol: We'll compute the sum mod M. M is some
sufficiently large number greater than the sum. E.g., M=101*N. Say
player i has secret S. Player i chooses N-1 numbers X_1, X_2, ...,
X_{N-1} independently at random in the range 0,...,M-1
and writes them
each on a piece of paper. Then he chooses Y in this range
so that X_1
+ ... + X_{N-1} + Y = S mod M, and writes Y on a piece of
paper.
Player i then distributes N-1 of these to the others (one
per person)
and keeps one.
Then, player i takes the number he kept, plus all of
the others he received, and adds them up (mod M), and
announces the
result.
Finally, all players add up the announced results (mod M),
which is the desired sum.
Analysis: Can see that the sum is right by writing out
the matrix of
values. But,
how do we show that this is secure?
First some definitions: Two events A and B are
INDEPENDENT if Pr[A and
B] = Pr[A] * Pr[B]. Easier way to think of it: Pr[A | B]
= Pr[A]. More
generally a collection of events are independent if for
any subset,
the probability of their intersection is the product of
their
probabilities.
A collection of events is K-WISE INDEPENDENT if any
subset of K of them is independent.
Similarly, random variables X, Y are independent if for
each x,y
we have events X=x and Y=y independent. K-wise independence is
defined in the same way as with events.
Some useful facts: Say X is a random variable uniformly
distributed in
S = {0, 1, ... , m-1}. Then, clearly for any fixed a, the variable Y = X
+ a (mod m) is also uniform in S (though X and Y are
dependent). If
X_1, ..., X_k are independent r.v's uniform in S and Y is
defined to
be a - (X_1 + ... X_k) (mod m) for some fixed a, then
these k+1
r.v.s are k-wise independent. Just check that Pr[Y = y | X_2 = x_2,
X_3 = x_3, ..., X_k = x_k] = 1/m.
Proof for protocol: Claim is secure in following sense:
for any set of
t honest players, only information that the rest of
players have, even
if they cheat, is the sum of the t players' values. What does this
really mean?
What is "information"?
What we mean is that if you
modify the values for these players keeping the sum the
same, then the
probability distribution on the sequence of outputs
produced by those
players doesn't change. Proof: consider the t all sent their Y value
to the same honest player. (Doesn't change protocol). For any fixed
sequence of coin tosses, all sets of t values with the
same sum lead
to exactly the same information revealed to the other
players. So,
all sets with the same sum have the same distribution.
GAMBLER'S ODDS PROBLEM
----------------------
Say you go into casino. Say has fair games: at any time t can bet
some amount and for even money you have 50/50 chance of
winning.
Say you have some strategy - determines how much to bet
and when and
when to leave, but must leave by end of the day. What's best strategy
for highest expected amount by the end of the day?
Answer: it doesn't matter. All strategies result in same expected amount.
* Defn of expectation: E[X] = SUM_a[a * Pr(X=a)].
* Let's define the random variable X_i to be the gain at
time i. If X
is the random variable denoting the amount we
win by the end of the day, then X = X_1 + X_2 + ... +
X_N, where
N=#seconds in a day. Note that these X_i's could be horribly
dependent (e.g., depending on whether you won or lost in
your last
bet, you might bet more or less). Nonetheless, if we just look at one
of the X_i, it's easy to see that E[X_i] = 0. For instance: let p_ij
be the apriori probability that you will bet j dollars at
time at time
i (which might be horrible to calculate, but that's ok,
we don't need
to). So,
E[X_i] = SUM_j[j * p_ij / 2 + (-j) * p_ij / 2] = 0.
Now, use linearity of expectation: E[X] = E[X_1 + ... +
X_N] = E[X_1]
+ ... + E[X_N] = 0.
(Lesson: simplify by setting up simple random variables
and using
linearity of expectation).
Similar problem: Say there is a stock that's at some
price x. Each
day, it goes up or down by $1, indep of past, unless it's
at 0 in
which case it stays there. You have some amount of money
m. Each day
you can by or sell as much as you want, until when you
die your money
is converted back into cash. Is there a strategy such that the
expected amout of money you will have at death is more
than what you
started with? No.
LINEARITY OF EXPECTATION
------------------------
Want to show that for 2 random variables X & Y,
E[X+Y]
= E[X] + E[Y]
even if X and Y are dependent.
Let's assume X and Y take on discrete values so that can
use
summations instead of integrals.
E[X+Y]
= SUM_a [ a * Pr(X+Y = a) ]
=
SUM_(b,c) [ (b+c) * Pr(X = b and Y = c) ]
(just
splitting the event that the sum is "a" into all the
(disjoint)
ways that this could occur)
= SUM_b SUM_c [ b * Pr(X=b and Y=c)] +
SUM_c SUM_b [ c * Pr(X=b and Y=c)]
= SUM_b
[b * Pr(X=b)] + SUM_c [c *
Pr(Y=c)]
RANDOMIZED QUICKSORT
--------------------
--> describe algorithm. (assume no equal values for simplicity)
--> what is the expected number of comparisons?
Define random variable let e_1 be the smallest element,
e_n be the
largest.
Define random variable X_ij = 1 if e_i is compared with e_j
in running the algorithm.
So, expected number of comparisons is just SUM_{i<j}
E[X_ij], by using
linearity of expectation. E[X_ij] = p_ij, the probability that e_i is
compared to e_j.
p_ij is easy to calculate. Consider the elements e_i, e_{i+1}, ...,
e_j. We
compare e_i and e_j if and only if we we pick one of them asa
pivot before choosing any of the other elements in this
list as a
pivot. So,
p_ij = 2/(j-i+1).
So,
expected cost is at most
SUM_{i=1
to n} SUM_{denominator=1 to n-i+1} 2/denominator
<
2 * n * SUM_{k=1 to n} 1/k
=
2 * n * H_n, where H_n is approx ln(n).
A strange puzzle
----------------
Consider the following game. We have a probability distribution on
the following countably infinite deck of cards.
Card
# front back probability
------ ----- ---- -----------
1 1 3 1/2
2 3 9 1/4
3 9 27 1/8
... ... ... ...
A card is selected according to this distribution, and
you are shown,
at random, one of its two sides. (Let's say that the number you see
is X, and the one on the other side is Y). You may now decide to "keep"
or
"flip". If you decide "keep", you gain X-Y. If you decide "flip",
then you gain Y-X.
What to do?
Well, if you see a 1, you definitely should flip. But
what if you see a 3? In this case, with probability 2/3, Y is 1 and
with probability 1/3, Y is 9. (More generally, if you see X>1, then
with prob 2/3, Y is X/3 and with prob 1/3, Y is 3X). So the expected
gain of "flip" is 2/3 * (-2) + 1/3 * 6 = 2/3
> 0. So you should flip.
But, we are saying that no matter what you see, you
should flip. So,
might as well flip before observing anything. But, this is just like
not flipping since you were shown a random one of the two
sides....
So, what gives.....?