Summary Statistics and the Constant Model

← return to study.practicaldsc.org


The problems in this worksheet are taken from past exams in similar classes. Work on them on paper, since the exams you take in this course will also be on paper.

We encourage you to complete this worksheet in a live discussion section. Solutions will be made available after all discussion sections have concluded. You don’t need to submit your answers anywhere.

Note: We do not plan to cover all problems here in the live discussion section; the problems we don’t cover can be used for extra practice.


Problem 1

The mean of 12 non-negative numbers is 45. Suppose we remove 2 of these numbers. What is the largest possible value of the mean of the remaining 10 numbers? Show your work.


Problem 2

You may find the following properties of logarithms helpful in this question. Assume that all logarithms in this question are natural logarithms, i.e. of base e.

Billy is trying his hand at coming up with loss functions. He comes up with the Billy loss, L_B(y_i, h), defined as follows:

L_B(y_i, h) = \left[ \log \left( \frac{y_i}{h} \right) \right]^2

Throughout this problem, assume that all y_is are positive.


Problem 2.1

Show that: \frac{d}{dh} L_B(y_i, h) = - \frac{2}{h} \log \left( \frac{y_i}{h} \right)


Problem 2.2

Show that the constant prediction h^* that minimizes average Billy loss for the constant model is:

h^* = \left(y_1 \cdot y_2 \cdot ... \cdot y_n \right)^{\frac{1}{n}}

You do not need to perform a second derivative test, but otherwise you must show your work.

Hint: To confirm that you’re interpreting the result correctly, h^* for the dataset 3, 5, 16 is (3 \cdot 5 \cdot 16)^{\frac{1}{3}} = 240^{\frac{1}{3}} \approx 6.214.



Problem 3

Biff the Wolverine just made an Instagram account and has been keeping track of the number of likes his posts have received so far.

His first 7 posts have received a mean of 16 likes; the specific like counts in sorted order are

8, 12, 12, 15, 18, 20, 27

Biff the Wolverine wants to predict the number of likes his next post will receive, using a constant prediction rule h. For each loss function L(y_i, h), determine the constant prediction h^* that minimizes average loss. If you believe there are multiple minimizers, specify them all. If you believe you need more information to answer the question or that there is no minimizer, state that clearly. Give a brief justification for each answer.


Problem 3.1

L(y_i, h) = |y_i - h|


Problem 3.2

L(y_i, h) = (y_i - h)^2


Problem 3.3

L(y_i, h) = 4(y_i - h)^2


Problem 3.4

L(y_i, h) = \begin{cases} 0 & h = y_i \\ 100 & h \neq y_i \end{cases}


Problem 3.5

L(y_i, h) = (3y_i - 4h)^2


Problem 3.6

L(y_i, h) = (y_i - h)^3

Hint: Do not spend too long on this subpart.



Problem 4

Let R_{sq}(h) represent the mean squared error of a constant prediction h for a given dataset. Find a dataset \{y_1, y_2\} such that the graph of R_{sq}(h) has its minimum at the point (7,16).


Problem 5

Consider a dataset D with 5 data points \{7,5,1,2,a\}, where a is a positive real number. Note that a is not necessarily an integer.


Problem 5.1

Express the mean of D as a function of a, simplify the expression as much as possible.


Problem 5.2

Depending on the range of a, the median of D could assume one of three possible values. Write out all possible median of D along with the corresponding range of a for each case. Express the ranges using double inequalities, e.g., i.e. 3<a\leq8:


Problem 5.3

Determine the range of a that satisfies: \text{Mean}(D) < \text{Median}(D) Make sure to show your work.



Problem 6

Consider a dataset of n integers, y_1, y_2, ..., y_n, whose histogram is given below:


Problem 6.1

Which of the following is closest to the constant prediction h^* that minimizes:

\displaystyle \frac{1}{n} \sum_{i = 1}^n \begin{cases} 0 & y_i = h \\ 1 & y_i \neq h \end{cases}


Problem 6.2

Which of the following is closest to the constant prediction h^* that minimizes: \displaystyle \frac{1}{n} \sum_{i = 1}^n |y_i - h|


Problem 6.3

Which of the following is closest to the constant prediction h^* that minimizes: \displaystyle \frac{1}{n} \sum_{i = 1}^n (y_i - h)^2


Problem 6.4

Which of the following is closest to the constant prediction h^* that minimizes: \displaystyle \lim_{p \rightarrow \infty} \frac{1}{n} \sum_{i = 1}^n |y_i - h|^p



Problem 7

Suppose there is a dataset containing 10000 integers:


Problem 7.1

Calculate the median of this dataset.


Problem 7.2

How does the mean of this dataset compared to its median?



Problem 8

Define the extreme mean (EM) of a dataset to be the average of its largest and smallest values. Let f(x)=-3x+4. Show that for any dataset x_1\leq x_2 \leq \dots \leq x_n, EM(f(x_1), f(x_2), \dots, f(x_n)) = f(EM(x_1, x_2, \dots, x_n)).


Problem 9

Consider a dataset of n values, y_1, y_2, ..., y_n, all of which are non-negative. We’re interested in fitting a constant model, H(x) = h, to the data, using the new “Wolverine” loss function:

L_\text{wolverine}(y_i, h) = w_i \left( y_i^2 - h^2 \right)^2

Here, w_i corresponds to the “weight” assigned to the data point y_i, the idea being that different data points can be weighted differently when finding the optimal constant prediction, h^*.

For example, for the dataset y_1 = 1, y_2 = 5, y_3 = 2, we will end up with different values of h^* when we use the weights w_1 = w_2 = w_3 = 1 and when we use weights w_1 = 8, w_2 = 4, w_3 = 3.


Problem 9.1

Find \frac{\partial L_\text{wolverine}}{\partial h}, the derivative of the Wolverine loss function with respect to h. Show your work, and put a \boxed{\text{box}} around your final answer.


Problem 9.2

Prove that the constant prediction that minimizes average loss for the Wolverine loss function is:

h^* = \sqrt{\frac{\sum_{i = 1}^n w_i y_i^2}{\sum_{i = 1}^n w_i}}


Problem 9.3

For a dataset of non-negative values y_1, y_2, ..., y_n with weights w_1, 1, ..., 1, evaluate: \displaystyle \lim_{w_1 \rightarrow \infty} h^*



👋 Feedback: Find an error? Still confused? Have a suggestion? Let us know here.