InĀ [1]:
from lec_utils import *

Discussion Slides: Arrays and Probability

Agenda šŸ“†Ā¶

  • Lecture recap šŸ—’ļø.
  • Probability review šŸŽ².
  • Worksheet šŸ“.

Lecture recap šŸ—’ļøĀ¶

  • Python basics: lists, strings, mutability vs. immutability, functions, dictionaries, and for-loops.
  • numpy arrays and methods, with applications to image processing and matrix multiplication.
  • np.random.choice, np.random.multinomial, and np.random.permutation.
  • What questions do we have about these ideas?

Probability review šŸŽ²Ā¶


Probability is covered in EECS 203, and will be needed to answer questions on Homework 2. It may have been a while since you've taken EECS 203, so let's review!

Example: Sampling candy¶

  • Say you have a bag of candy with 3 Jolly Ranchers, 4 Starbursts, and 5 Reeses Pieces.

    šŸ”µšŸ”µšŸ”µšŸŸ šŸŸ šŸŸ šŸŸ šŸŸ£šŸŸ£šŸŸ£šŸŸ£šŸŸ£
  • Suppose you pick a piece of candy 3 times, and put it back each time you pick one.
    What is the probability you draw Jolly Ranchers all 3 times?
  • Since we're sampling with replacement, the distribution of the bag does not change from pick to pick!
$$\begin{align*} P(\text{all $3$ JR}) &= P(\text{first is JR}) \cdot P(\text{second is JR, given that first is JR}) \cdot P(\text{third is JR, given that first two are JR}) \\ &= {\frac{\text{total number of JR}}{\text{total pieces of candy}}} \cdot {\frac{\text{total number of JR}}{\text{total pieces of candy}}} \cdot {\frac{\text{total number of JR}}{\text{total pieces of candy}}}\\ &= \left(\frac{3}{3 + 4 + 5} \right)^3\\ &= \left(\frac{1}{4} \right) ^3\\ &= \frac{1}{64} \end{align*}$$

Example continued¶

  • Say you (still) have a bag of candy with 3 Jolly Ranchers, 4 Starbursts, and 5 Reeses Pieces.

    šŸ”µšŸ”µšŸ”µšŸŸ šŸŸ šŸŸ šŸŸ šŸŸ£šŸŸ£šŸŸ£šŸŸ£šŸŸ£
  • Suppose you pick a piece of candy 3 times, and don't put it back each time you pick one.
    Now, what is the probability you draw Jolly Ranchers all 3 times?
  • Since we're sampling without replacement, the distribution in the bag does change from pick to pick, so the three probabilities we multiply are not identical.
$$\begin{align*} P(\text{all $3$ JR}) &= P(\text{first is JR}) \cdot P(\text{second is JR, given that first is JR}) \cdot P(\text{third is JR, given that first two are JR}) \\ &= \frac{3}{3 + 4 + 5} \cdot \frac{2}{2 + 4 + 5} \cdot \frac{1}{1 + 4 + 5} \\ &= \frac{1}{4} \cdot \frac{2}{11} \cdot \frac{1}{10} \\ &= \frac{1}{220} \end{align*}$$
  • Note that this probability is much lower than in the sampling with replacement case! Why?

np.random.choice¶

  • np.random.choice can simulate samples drawn with and without replacement.
  • For instance, the following cell simulates the act of drawing:
    • 3 elements,
    • without replacement,
    • from options.
InĀ [2]:
# Each time you run this cell, the result (likely) changes!
options = ['JR'] * 3 + ['Star'] * 4 + ['Reese'] * 5
np.random.choice(options, 3, replace=False)
Out[2]:
array(['Star', 'Reese', 'JR'], dtype='<U5')
  • The default is to sample with replacement.
    There are other optional arguments, too, like a p argument, that allows you to specify the probability of each outcome (the default is all are equally likely).

Estimating probabilities via simulation¶

  • Previously, we found – by hand – that the probability of picking 3 Jolly Ranchers from our bag when sampling without replacement is $\frac{1}{220}$.
  • One way to estimate this probability is to run the following cell a large number (e.g. 100,000) of times and compute the proportion of runs in which all 3 candies are Jolly Ranchers!
InĀ [3]:
np.random.choice(options, 3, replace=False)
Out[3]:
array(['Reese', 'JR', 'JR'], dtype='<U5')
  • To do so, we can initialize a counter, all_three, that we add to if we see a sample that consists of 3 Jolly Ranchers.
InĀ [4]:
all_three = 0

for i in range(100_000):
    sample = np.random.choice(options, 3, replace=False)
    if (sample == 'JR').sum() == 3:
        all_three += 1
        
all_three / 100_000
Out[4]:
0.00431
  • all_three / 100_000 is the proportion of simulations in which we saw 3 Jolly Ranchers. It's very close to the true, theoretical answer!
    We use simulations in practice when the theoretical answer is too difficult to calculate.
InĀ [5]:
1 / 220
Out[5]:
0.004545454545454545

Example: Coin flipping¶

  • Let's consider experiments made up of several trials, each of which is independent of all others and only has two possible outcomes, e.g. flipping a coin repeatedly.
  • Suppose I flip a biased coin 5 times. Each time we flip it, independently of other flips, the chances of seeing heads is $\frac{3}{4}$. What is the probability of seeing 3 heads and 2 tails?
  1. The probability the coin lands on heads once is $\frac{3}{4}$, so the probability of it landing on heads 3 times is $\left(\frac{3}{4}\right)^3$.
  1. Following similar logic as above, the probability of the coin landing on tails 2 times is $\left(1 - \frac{3}{4}\right)^2 = \left(\frac{1}{4}\right)^2$.
  1. Finally, we need to account for the number of possible sequences of 3 heads and 2 tails:

    HTHHT, TTHHH, ...
    There are ${5 \choose 3}$ such sequences.
  1. All together then, the probability of seeing 3 heads and 2 tails is:
$$ P(H = 3) = \binom{5}{3} \left( \frac{3}{4} \right)^3 \left( \frac{1}{4} \right)^2 $$

The binomial distribution¶

  • In the previous slide, we used the binomial distribution. The binomial distribution models experiments made up of several repetitions of a binary trial.
  • In general, if our experiment consists of $n$ repeated trials, each of which has a probability $p$ of success, then the probability of $k$ successes is:
$$P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}$$
You'll need to understand this formula for Homework 2!
  • Think of each binomial trial as a sample drawn with replacement, since each trial is independent.
  • Remember that the binomial distribution only holds in cases where the process we are repeating (e.g. a coin flip) only has two possible outcomes.

The multinomial distribution¶

  • What if we're repeating some process in which each trial is independent, but there are more than two possible outcomes?
  • For example, say we roll a die 10 times. What is the probability we see two 3s, four 5s, and four 6s?
$$P(X_1 = 0, X_2 = 0, \boxed{X_3 = 2}, X_4 = 0, \boxed{X_5 = 4}, \boxed{X_6 = 4}) = \frac{10!}{2!4!4!} \left( \frac{1}{6} \right)^2 \left( \frac{1}{6} \right)^4 \left( \frac{1}{6} \right)^4$$
  • In general, the answer comes from the multinomial distribution:

    $$P(X_1 = x_1, X_2 = x_2, \dots, X_k = x_k) = \frac{n!}{x_1! x_2! \cdots x_k!} \cdot p_1^{x_1} p_2^{x_2} \cdots p_k^{x_k}$$

    where:

    • $n$ represents the total number of trials.
    • $k$ represents the total number of categories.
    • $p_1, p_2, ... p_k$ represent the probabilities of the different possible outcomes for each category.
    • $x_1, x_2, ... x_k$ represent the count of each outcome.
  • This is a generalization of the binomial distribution to allow for trials with more than 2 outcomes.
    But remember, each trial – here, each die roll – is independent.

np.random.multinomial¶

  • np.random.multinomial, shown in lecture, allows us to draw simulated samples from a multinomial distribution!
  • It takes in (at least) two arguments:

    • n: number of experiments.
    • pvals: probability of each outcome; this should be a list/array that sums to 1.

    The output an array with the same length as pvals, containing the number of times each outcome occurred.

  • For instance, to simulate drawing 3 candies with replacement from a bag with 3 Jolly Ranchers, 4 Starbursts, and 5 Reeses Pieces:
InĀ [6]:
np.random.multinomial(3, [3 / 12, 4 / 12, 5 / 12])
Out[6]:
array([2, 0, 1])
  • While np.random.choice can be used for sampling with and without replacement, np.random.multinomial can only be used to sample with replacement. Why?

np.random.permutation¶

  • Finally, np.random.permutation shuffles the elements of the input sequence.
InĀ [7]:
np.random.permutation(['hey', 'there', 'how', 'are', 'you'])
Out[7]:
array(['there', 'are', 'hey', 'how', 'you'], dtype='<U5')
  • The result always has the same length as the input.