← return to study.practicaldsc.org
The problems in this worksheet are taken from past exams in similar classes. Work on them on paper, since the exams you take in this course will also be on paper.
We encourage you to complete this worksheet in a live discussion section. Solutions will be made available after all discussion sections have concluded. You don’t need to submit your answers anywhere.
Note: We do not plan to cover all problems here in the live discussion section; the problems we don’t cover can be used for extra practice.
Billy’s aunt owns a jewellery store, and gives him data on 5000 of the diamonds in her store. For each diamond, we have:
The first 5 rows of the 5000-row dataset are shown below:
carat | length | width | price |
---|---|---|---|
0.40 | 4.81 | 4.76 | 1323 |
1.04 | 6.58 | 6.53 | 5102 |
0.40 | 4.74 | 4.76 | 696 |
0.40 | 4.67 | 4.65 | 798 |
0.50 | 4.90 | 4.95 | 987 |
Billy has enlisted our help in predicting the price of a diamond given various other features.
Suppose we want to fit a linear prediction rule that uses two features, carat and length, to predict price. Specifically, our prediction rule will be of the form
\text{predicted price} = w_0 + w_1 \cdot \text{carat} + w_2 \cdot \text{length}
We will use least squares to find \vec{w}^* = \begin{bmatrix} w_0^* \\ w_1^* \\ w_2^* \end{bmatrix}.
Write out the first 5 rows of the design matrix, X. Your matrix should not have any variables in it.
Suppose the optimal parameter vector \vec{w}^* is given by
\vec{w}^* = \begin{bmatrix} 2000 \\ 10000 \\ -1000 \end{bmatrix}
What is the predicted price of a diamond with 0.65 carats and a length of 4 centimeters? Show your work.
Suppose \vec{e} = \begin{bmatrix} e_1 \\ e_2 \\ ... \\ e_n \end{bmatrix} is the error/residual vector, defined as
\vec{e} = \vec{y} - X \vec{w}^*
where \vec{y} is the observation vector containing the prices for each diamond.
For each of the following quantities, state whether they are guaranteed to be equal to 0 the scalar, \vec{0} the vector of all 0s, or neither. No justification is necessary.
Suppose we introduce two more features:
Suppose we also decide to remove the intercept term of our prediction rule. With all of these changes, our prediction rule is now
\text{predicted price} = w_1 \cdot \text{carat} + w_2 \cdot \text{length} + w_3 \cdot \text{width} + w_4 \cdot (\text{length} \cdot \text{width})
Suppose we want to fit a hypothesis function of the form:
H(x_i) = w_0 + w_1 x_i^2
Note that this is not the simple linear regression hypothesis function, H(x_i) = w_0 + w_1x_i.
To do so, we will find the optimal parameter vector \vec{w}^* = \begin{bmatrix} w_0^* \\ w_1^* \end{bmatrix} that satisfies the normal equations. The first 5 rows of our dataset are as follows, though note that our dataset has n rows in total.
x | y |
---|---|
2 | 4 |
-1 | 4 |
3 | 4 |
-7 | 4 |
3 | 4 |
Suppose that x_1, x_2, ..., x_n have a mean of \bar{x} = 2 and a variance of \sigma_x^2 = 10.
Write out the first 5 rows of the design matrix, X.
Suppose, just in part (b), that after solving the normal equations, we find \vec{w}^* = \begin{bmatrix} 2 \\ -5 \end{bmatrix}. What is the predicted y value for x = 2? Give your answer as an integer with no variables. Show your work.
Let X_\text{tri} = 3 X. Using the fact that \sum_{i = 1}^n x_i^2 = n \sigma_x^2 + n \bar{x}^2, determine the value of the bottom-left value in the matrix X_\text{tri}^T X_\text{tri}, i.e. the value in the second row and first column. Give your answer as an expression involving n. Show your work.
Consider the vectors \vec{u} and \vec{v}, defined below.
\vec{u} = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \qquad \vec{v} = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}
We define X \in \mathbb{R}^{3 \times 2} to be the matrix whose first column is \vec u and whose second column is \vec v.
In this part only, let \vec{y} = \begin{bmatrix} -1 \\ k \\ 252 \end{bmatrix}.
Find a scalar k such that \vec{y} is in \text{span}(\vec u, \vec v). Give your answer as a constant with no variables.
Show that: (X^TX)^{-1}X^T = \begin{bmatrix} 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \end{bmatrix}
Hint: If A = \begin{bmatrix} a_1 & 0 \\ 0 & a_2 \end{bmatrix}, then A^{-1} = \begin{bmatrix} \frac{1}{a_1} & 0 \\ 0 & \frac{1}{a_2} \end{bmatrix}.
In parts (c) and (d) only, let \vec{y} = \begin{bmatrix} 4 \\ 2 \\ 8 \end{bmatrix}.
Find scalars a and b such that a \vec u + b \vec v is the vector in \text{span}(\vec u, \vec v) that is as close to \vec{y} as possible. Give your answers as constants with no variables.
Let \vec{e} = \vec{y} - (a \vec u + b \vec v), where a and b are the values you found in part (c).
What is \lVert \vec{e} \rVert?
0
3 \sqrt{2}
4 \sqrt{2}
6
6 \sqrt{2}
2\sqrt{21}
Is it true that, for any vector \vec{y} \in \mathbb{R}^3, we can find scalars c and d such that the sum of the entries in the vector \vec{y} - (c \vec u + d \vec v) is 0?
Yes, because \vec{u} and \vec{v} are linearly independent.
Yes, because \vec{u} and \vec{v} are orthogonal.
Yes, but for a reason that isn’t listed here.
No, because \vec{y} is not necessarily in
No, because neither \vec{u} nor \vec{v} is equal to the vector
No, but for a reason that isn’t listed here.
Suppose that Q \in \mathbb{R}^{100 \times 12}, \vec{s} \in \mathbb{R}^{100}, and \vec{f} \in \mathbb{R}^{12}. What are the dimensions of the following product?
\vec{s}^T Q \vec{f}
scalar
12 \times 1 vector
100 \times 1 vector
100 \times 12 matrix
12 \times 12 matrix
12 \times 100 matrix
undefined
Suppose we want to predict how long it takes to run a Jupyter notebook on Datahub. For 100 different Jupyter notebooks, we collect the following 5 pieces of information:
cells: number of cells in the notebook
lines: number of lines of code
max iterations: largest number of iterations in any loop in the notebook, or 1 if there are no loops
variables: number of variables defined in the notebook
runtime: number of seconds for the notebook to run on Datahub
Then we use multiple regression to fit a prediction rule of the form
H(\text{cells}_i, \text{lines}_i, \text{max iterations}_i, \text{variables}_i) = w_0 + w_1 \cdot \text{cells}_i \cdot \text{lines}_i + w_2 \cdot (\text{max iterations}_i)^{\text{variables}_i - 10}
What are the dimensions of the design matrix X?
\begin{bmatrix} & & & \\ & & & \\ & & & \\ \end{bmatrix}_{r \times c}
So, what should r and c be for: r rows \times c columns.
In one sentence, what does the entry in row 3, column 2 of the design matrix X represent? (Count rows and columns starting at 1, not 0).
Consider the dataset shown below.
x^{(1)} | x^{(2)} | x^{(3)} | y |
---|---|---|---|
0 | 6 | 8 | -5 |
3 | 4 | 5 | 7 |
5 | -1 | -3 | 4 |
0 | 2 | 1 | 2 |
We want to use multiple regression to fit a prediction rule of the form H(x_i^{(1)}, x_i^{(2)}, x_i^{(3)}) = w_0 + w_1 x_i^{(1)} x_i^{(3)} + w_2 (x_i^{(2)} - x_i^{(3)})^2. Write down the design matrix X and observation vector \vec{y} for this scenario. No justification needed.
For the X and \vec{y} that you have written down, let \vec{w} be the optimal parameter vector, which comes from solving the normal equations X^TX\vec{w}=X^T\vec{y}. Let \vec{e} = \vec{y} - X \vec{w} be the error vector, and let e_i be the ith component of this error vector. Show that 4e_1+e_2+4e_3+e_4=0.
Let X be a design matrix with 4 columns, such that the first column is a column of all 1s. Let \vec{y} be an observation vector. Let \vec{w}^* = (X^TX)^{-1}X^T\vec{y}. We’ll name the components of \vec{w}^* as follows:
\vec{w}^* = \begin{bmatrix} w_0^* \\ w_1^* \\ w_2^* \\ w_3^* \end{bmatrix}
In this problem, we’ll consider various modifications to the design matrix and see how they affect the solution to the normal equations.
Let X_a be the design matrix that comes from interchanging the first two columns of X. Let \vec{v}^* = (X_a^TX_a)^{-1}X_a^T\vec{y}. Express the components \vec{v}^* in terms of w_0^*, w_1^*, w_2^*, and w_3^* (which were the components of \vec{w}^*).
Let X_b be the design matrix that comes from adding one to each entry of the first column of X. Let \vec{v}^* = (X_b^TX_b)^{-1}X_b^T\vec{y}. Express the components \vec{v}^* in terms of w_0^*, w_1^*, w_2^*, and w_3^* (which were the components of \vec{w}^*).
Let X_c be the design matrix that comes from adding one to each entry of the third column of X. Let \vec{v}^* = (X_c^TX_c)^{-1}X_c^T\vec{y}. Express the components \vec{v}^* in terms of w_0^*, w_1^*, w_2^*, and w_3^*, which were the components of \vec{w}^*.