Compsci 6/101, Spring 2011, Lab 5

Introduction: Monte Carlo Simulations

A room of computers working at the National Advisory Committee for Aeronautics, 1949. Courtesy of Wikipedia.

"Monte Carlo" simulations are a very powerful technique for calculating values which cannot be calculated exactly. They have been used since the dawn of computing, when scientists on the Manhattan Project were working to design an atomic bomb. Back then, a "computer" was a woman whose job was to do math problems all day long. The nuclear physicists would construct various scenarios of uranium reactions, and then hand these problems off to a room full of computers, similar to a modern data center. Each individual problem didn't explain much about fission, but taking hundreds of scenarios in aggregate allowed the physicists to design the bomb.

There are four parts of this lab: the first doesn't involve writing code, the others do. For the other parts you should snarf the code for the lab which you can browse online here.

Part 0: Thinking about Code

For this problem you'll be asked to think about the code in both DiceRolling.py and HeadsTails.py and to reason about the code.

In DiceRolling.py the function dice returns the sum of two dice rolls, it's reproduced below. If the last statement is changed to return a+a what will the bar-graph that's generated look like? Draw a graph you think is reasonable and provide a justification on the handin page.
In DiceRolling.py the function print_stats takes a list of dice-rolls and analyzes the list. As an alternative, the function could take an int parameter named numRolls indicating the number of dice rolls to generate. Instead of looping over the list, the loop would be: On the handin page describe why this approach could be preferred to generating a list of all the dice rolls.
In HeadsTails.py write a few sentences explaining the variable rc in the function stats. Describe its initialization, use, and update. The name is meant to signify run count.
Explain the initialization, use, and update of counts in the stats function. What does counts[k] represent (in terms of k)?

Part I: Craps and determining Probabilities/Odds

Craps is a dice game played in nearly all casinos. You can play online and you can see what Wikipedia says about the game.

In craps if the player rolls a 7 or 11 on the first/come out roll then you win. If the roll is 2, 3, or 12 then you lose. If the first roll is something else, a more complicated set of bets follows. For this problem you should modify the code in DiceRolling.py so that it prints two statistics both numbers between 0 and 1.

Do this by creating a function named craps_simulation and calling it from the name/main block so that it's the only function executed when the program runs.

The probability that a 7 or 11 is rolled when two dice are rolled and the probabliity that 2, 3, or 12 are rolled. You should simulate 100,000 dice rolls and write code that determines the number of wins (7 or 11) and the number of losses (2, 3, or 12). If there are 50,000 rolls that are either 7 or 11 then the probability of winning is 1/2 (50,000/100,000). You should strive to write code that does not store the 100,000 rolls, but calculates the probabilities by simulating rolling two dice 100,000 times without storing all the rolls, but by incrementing two different counters/variables appropriately. Write your code and answers on the handin pages.

Part II: Poker Odds

In this part of the lab, we'll be estimating the odds of various hands of poker, specifically a version called 5 card stud, using a Monte Carlo method. The idea is to generate many, many random poker hands, and count which fraction of them are two pairs, full houses, straight flushes, which fraction are four of a kind, etc. See the Wikipedia/Poker Odds page for probabilities/odds.

The program to calculate odds is started in Cardtester.py.

Answers on handin page:

Most of the code provided is for dealing cards, managing hands, and printing the hands. You should look carefully at both get_rank_counts, is_pair, and simulate to see how the provided code determines how many hands contain one pair. Run the program a few times and estimate what percentage of five-card poker hands contain one pair. Put that answer on the handin page.
Write a function is_two_pair and implement it so that it returns true if a hand contains exactly two pairs. What's the body of the function and the probability/odds for two pairs? You'll need to modify simulate too.
Write a function is_threekind to determine when a hand has three of a kind exactly (not four of a kind and not a full house which is three of one rank and two of another rank). For example, [J,J,J,Q,3] is three of a kind, but [3,3,3,8,8] is not because it's a full house.
Write a function is_fullhouse and provide both the code and probability of getting a full house.
Challenge: In poker a flush is all cards of the same suit, e.g., all spades, all hearts, etc. Write code to determine if a poker hand is a flush, simulate to find out how many hands are flushes. This might be an overestimate because a straight flush is better than a flush, but that's an extra challenge.
Challenge: In poker a straight is five cards whose ranks are consecutive, e.g., 2,3,4,5,6 or A,2,3,4,5,6 or 10,J,Q,K,A (where Aces are high or low). Write code to determine if a hand is a straight and simulate to estimate the probability of dealing a straight.

Part III: Estimate Pi

Bonus Challenge Pi (written as π) is an irrational number used in calculating trigonometry, measurements of certain curves, and many practical applications such as audio processing. It is approximately 3.14159265, but since it is an irrational number it cannot be written down as a finite decimal number. The area of a circle is π times the square of its radius.

Write code to perform a Monte Carlo simulation to estimate the value of π. To do this, generate random points from the unit square, i.e., points whose (X,Y) coordinates are each between 0 and 1.

unit square

Count how many of these points are within a distance of 1 from the origin --- these points are in one quadrant of the unit-circle, the circle of radius one centered at the origin.

The total area of the square is 1, and the area of the quarter circle within that square is π/4, so the ratio of points we pick that are in the circle to points we pick anywhere should also be π/4.

You will write a function pi_estimate() that performs this calculation and returns an estimate for pi.

For very large simulations, do not use the range() function to make your loop; it takes up an amount of memory proportional to its length. Instead, use the xrange() function, which works exactly the same but takes up constant memory.