\(\mathbf{Y}: M \text{ users} \times N \text{ items}\)
| item n → user m ↓ |
1 | 2 | … | \(N\) | |
|---|---|---|---|---|---|
| 1 | \(y^{(1)}_1\) | \(y^{(1)}_2\) | \(y^{(1)}_N\) | ||
| 2 | \(y^{(2)}_1\) | \(y^{(2)}_2\) | \(y^{(2)}_N\) | ||
| ⋮ | |||||
| \(M\) | \(y^{(M)}_1\) | \(y^{(M)}_2\) | \(y^{(M)}_N\) | ||
Rating matrix is very sparse
| item n → user m ↓ |
1 | 2 | … | \(N\) | |
|---|---|---|---|---|---|
| 1 | 5 | 3 | |||
| 2 | 1 | ||||
| ⋮ | 2 | ||||
| \(M\) | 1 | ||||
Using the Nearest Neighbour approach
Calculate the unknown rating as the average rating of the other users weighted by similarity
E.g. by cosine similarity
\[s_{m,m'}=\frac{\mathbf{x}^{(m)} \cdot \mathbf{x}^{(m')}}{|\mathbf{x}^{(m)}||\mathbf{x}^{(m')}|}\]
Only consider items rated by both users
\[ \begin{align*} s_{1,2} &= \frac{\mathbf{x}^{(1)} \cdot \mathbf{x}^{(2)}}{|\mathbf{x}^{(1)}||\mathbf{x}^{(2)}|} \\ &= \frac{1\cdot 5+3\cdot 4+1 \cdot 4+5 \cdot 1}{\sqrt{1^2+3^2+1^2+5^2}\cdot \sqrt{5^2+4^2+4^2+1^2}} = 0.57 \end{align*} \]
Predicted rating for user \(m=2\), item \(n=4\): \[ \begin{align*} \hat{y}_{2,4} &= \frac{1}{s_{2,3}+s_{2,5}+s_{2,6}} \left(s_{2,3}\cdot y_{3,4}+ s_{2,5} \cdot y_{5,4}+s_{2,6}\cdot y_{6,4}\right) \\ &= \frac{1}{0.73+0.65+1.0} \left( 0.73 \cdot 4+0.65 \cdot 5+1.0 \cdot 4 \right) \\ &= 4.27 \end{align*} \]
For a matrix \(\mathbf{Y}:M\times N\) of rank \(K\) there exist \(\mathbf{U}: N\times K\) and \(\mathbf{V}: M\times K\), such that
\[\mathbf{Y}=\mathbf{U}\mathbf{V}^T\]
\[\mathbf{Y}=\mathbf{U}\mathbf{V}^T\]
\[ \begin{align*} \text{Cost: } J(U,V) &= \frac{1}{2}\sum_{(m,n) \text{ where}\\ y_{m,n}\neq 0} \left(y_{m,n}- \left(\mathbf{u}^{(m)}\cdot \mathbf{v}^{(n)}+b_u^{(m)}+b_v^{(n)}\right)\right)^2 \\ &+ \frac{\lambda}{2}\sum||\mathbf{u}^{(m)}||^2 + \frac{\lambda}{2}\sum||\mathbf{v}^{(n)}||^2 \end{align*} \]
Minimise by alternating least squares or stochastic gradient descent