Feature representation (encoding) \[\Phi: \text{item}^{(m)} \rightarrow \mathbf{x}^{(m)}=\begin{pmatrix}x_1^{(m)} \\ \vdots \\ x_N^{(m)}\end{pmatrix}\]
Design matrix \(\mathbf{X}\)
| feature n → item m ↓ |
1 | 2 | … | \(N\) | |
|---|---|---|---|---|---|
| 1 | \(x^{1}_1\) | \(x^{1}_2\) | \(x^{1}_N\) | ||
| 2 | \(x^{2}_1\) | \(x^{2}_2\) | \(x^{2}_N\) | ||
| ⋮ | |||||
| \(M\) | \(x^{M}_1\) | \(x^{M}_2\) | \(x^{M}_N\) | ||
Ratings \(\mathbf{y}\)
| |
|---|
| \(y^{1}\) |
| \(y^{2}\) |
| ⋮ |
| \(y^{M}\) |
Learn a predictor, \(f\), that maps an \(N\)-dimensional vector representation of an item (row in \(\mathbf{X}\)) to an output value (element in \(\mathbf{y}\))
\[f\left(\mathbf{x}^{(m)}\right) \rightarrow y^{(m)}\]
Hypothesis, e.g. linear: \(f(\mathbf{x}^{(m)})=\mathbf{\theta}^T\mathbf{x}^{(m)}\)
Loss function: \(\mathcal{L}=\sum_{m=1}^{M}\left(y^{m}-f(\mathbf{x}^{(m)})\right)^2\)
\(+\) regularisation
Cost function: \[J(\mathbf{\theta})=\frac{1}{2M}\sum_{m=1}^{M}\left(y^{m}-\mathbf{\theta}^T\mathbf{x}^{(m)}\right)^2+\frac{\lambda}{2}||\mathbf{\theta}||^2\]
Minimise: Solve analytically or by gradient descent