In [1]:

from lec_utils import *

Discussion Slides: Arrays and Probability

Agenda 📆¶

Lecture recap 🗒️.
Probability review 🎲.
Worksheet 📝.

Lecture recap 🗒️¶

Python basics: lists, strings, mutability vs. immutability, functions, dictionaries, and for-loops.

numpy arrays and methods, with applications to image processing and matrix multiplication.

np.random.choice, np.random.multinomial, and np.random.permutation.

What questions do we have about these ideas?

Probability review 🎲¶

Probability is covered in EECS 203, and will be needed to answer questions on Homework 2. It may have been a while since you've taken EECS 203, so let's review!

Example: Sampling candy¶

Say you have a bag of candy with 3 Jolly Ranchers, 4 Starbursts, and 5 Reeses Pieces.
🔵🔵🔵🟠🟠🟠🟠🟣🟣🟣🟣🟣

Suppose you pick a piece of candy 3 times, and put it back each time you pick one.
What is the probability you draw Jolly Ranchers all 3 times?

Since we're sampling with replacement, the distribution of the bag does not change from pick to pick!

$$\begin{align*} P(\text{all $3$ JR}) &= P(\text{first is JR}) \cdot P(\text{second is JR, given that first is JR}) \cdot P(\text{third is JR, given that first two are JR}) \\ &= {\frac{\text{total number of JR}}{\text{total pieces of candy}}} \cdot {\frac{\text{total number of JR}}{\text{total pieces of candy}}} \cdot {\frac{\text{total number of JR}}{\text{total pieces of candy}}}\\ &= \left(\frac{3}{3 + 4 + 5} \right)^3\\ &= \left(\frac{1}{4} \right) ^3\\ &= \frac{1}{64} \end{align*}$$

Example continued¶

Say you (still) have a bag of candy with 3 Jolly Ranchers, 4 Starbursts, and 5 Reeses Pieces.
🔵🔵🔵🟠🟠🟠🟠🟣🟣🟣🟣🟣

Suppose you pick a piece of candy 3 times, and don't put it back each time you pick one.
Now, what is the probability you draw Jolly Ranchers all 3 times?

Since we're sampling without replacement, the distribution in the bag does change from pick to pick, so the three probabilities we multiply are not identical.

$$\begin{align*} P(\text{all $3$ JR}) &= P(\text{first is JR}) \cdot P(\text{second is JR, given that first is JR}) \cdot P(\text{third is JR, given that first two are JR}) \\ &= \frac{3}{3 + 4 + 5} \cdot \frac{2}{2 + 4 + 5} \cdot \frac{1}{1 + 4 + 5} \\ &= \frac{1}{4} \cdot \frac{2}{11} \cdot \frac{1}{10} \\ &= \frac{1}{220} \end{align*}$$

Note that this probability is much lower than in the sampling with replacement case! Why?

`np.random.choice`¶

np.random.choice can simulate samples drawn with and without replacement.

For instance, the following cell simulates the act of drawing:
- 3 elements,
- without replacement,
- from options.

In [2]:

# Each time you run this cell, the result (likely) changes!
options = ['JR'] * 3 + ['Star'] * 4 + ['Reese'] * 5
np.random.choice(options, 3, replace=False)

Out[2]:

array(['Star', 'Reese', 'JR'], dtype='<U5')

The default is to sample with replacement.
There are other optional arguments, too, like a p argument, that allows you to specify the probability of each outcome (the default is all are equally likely).

Estimating probabilities via simulation¶

Previously, we found – by hand – that the probability of picking 3 Jolly Ranchers from our bag when sampling without replacement is $\frac{1}{220}$.

One way to estimate this probability is to run the following cell a large number (e.g. 100,000) of times and compute the proportion of runs in which all 3 candies are Jolly Ranchers!

In [3]:

np.random.choice(options, 3, replace=False)

Out[3]:

array(['Reese', 'JR', 'JR'], dtype='<U5')

To do so, we can initialize a counter, all_three, that we add to if we see a sample that consists of 3 Jolly Ranchers.

In [4]:

all_three = 0

for i in range(100_000):
    sample = np.random.choice(options, 3, replace=False)
    if (sample == 'JR').sum() == 3:
        all_three += 1
        
all_three / 100_000

Out[4]:

0.00431

all_three / 100_000 is the proportion of simulations in which we saw 3 Jolly Ranchers. It's very close to the true, theoretical answer!
We use simulations in practice when the theoretical answer is too difficult to calculate.

In [5]:

1 / 220

Out[5]:

0.004545454545454545

Example: Coin flipping¶

Let's consider experiments made up of several trials, each of which is independent of all others and only has two possible outcomes, e.g. flipping a coin repeatedly.

Suppose I flip a biased coin 5 times. Each time we flip it, independently of other flips, the chances of seeing heads is $\frac{3}{4}$. What is the probability of seeing 3 heads and 2 tails?

The probability the coin lands on heads once is $\frac{3}{4}$, so the probability of it landing on heads 3 times is $\left(\frac{3}{4}\right)^3$.

Following similar logic as above, the probability of the coin landing on tails 2 times is $\left(1 - \frac{3}{4}\right)^2 = \left(\frac{1}{4}\right)^2$.

Finally, we need to account for the number of possible sequences of 3 heads and 2 tails:
HTHHT, TTHHH, ... There are ${5 \choose 3}$ such sequences.

All together then, the probability of seeing 3 heads and 2 tails is:

$$ P(H = 3) = \binom{5}{3} \left( \frac{3}{4} \right)^3 \left( \frac{1}{4} \right)^2 $$

The binomial distribution¶

In the previous slide, we used the binomial distribution. The binomial distribution models experiments made up of several repetitions of a binary trial.

In general, if our experiment consists of $n$ repeated trials, each of which has a probability $p$ of success, then the probability of $k$ successes is:

$$P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}$$You'll need to understand this formula for Homework 2!

Think of each binomial trial as a sample drawn with replacement, since each trial is independent.

Remember that the binomial distribution only holds in cases where the process we are repeating (e.g. a coin flip) only has two possible outcomes.

The multinomial distribution¶

What if we're repeating some process in which each trial is independent, but there are more than two possible outcomes?

For example, say we roll a die 10 times. What is the probability we see two 3s, four 5s, and four 6s?

$$P(X_1 = 0, X_2 = 0, \boxed{X_3 = 2}, X_4 = 0, \boxed{X_5 = 4}, \boxed{X_6 = 4}) = \frac{10!}{2!4!4!} \left( \frac{1}{6} \right)^2 \left( \frac{1}{6} \right)^4 \left( \frac{1}{6} \right)^4$$

In general, the answer comes from the multinomial distribution:

$$P(X_1 = x_1, X_2 = x_2, \dots, X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} \cdot p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k}$$

where:
- $n$ represents the total number of trials.
- $k$ represents the total number of categories.
- $p_1, p_2, ... p_k$ represent the probabilities of the different possible outcomes for each category.
- $x_1, x_2, ... x_k$ represent the count of each outcome.

This is a generalization of the binomial distribution to allow for trials with more than 2 outcomes.
But remember, each trial – here, each die roll – is independent.

`np.random.multinomial`¶

np.random.multinomial, shown in lecture, allows us to draw simulated samples from a multinomial distribution!

It takes in (at least) two arguments:
- n: number of experiments.
- pvals: probability of each outcome; this should be a list/array that sums to 1.
The output an array with the same length as pvals, containing the number of times each outcome occurred.

For instance, to simulate drawing 3 candies with replacement from a bag with 3 Jolly Ranchers, 4 Starbursts, and 5 Reeses Pieces:

In [6]:

np.random.multinomial(3, [3 / 12, 4 / 12, 5 / 12])

Out[6]:

array([2, 0, 1])

While np.random.choice can be used for sampling with and without replacement, np.random.multinomial can only be used to sample with replacement. Why?

`np.random.permutation`¶

Finally, np.random.permutation shuffles the elements of the input sequence.

In [7]:

np.random.permutation(['hey', 'there', 'how', 'are', 'you'])

Out[7]:

array(['there', 'are', 'hey', 'how', 'you'], dtype='<U5')

The result always has the same length as the input.