Rating Matrix

\(\mathbf{Y}: M \text{ users} \times N \text{ items}\)

item n →
user m ↓
1 2 \(N\)
1 \(y^{(1)}_1\) \(y^{(1)}_2\)   \(y^{(1)}_N\)
2 \(y^{(2)}_1\) \(y^{(2)}_2\)   \(y^{(2)}_N\)
       
\(M\) \(y^{(M)}_1\) \(y^{(M)}_2\)   \(y^{(M)}_N\)

Matrix Filling Task

Rating matrix is very sparse

item n →
user m ↓
1 2 \(N\)
1 5     3
2   1    
      2
\(M\) 1      

Collaborative Filtering - The Principle

Using the Nearest Neighbour approach

 

  • User-based vs. item-based

The Nearest Neighbour Approach

Calculate the unknown rating as the average rating of the other users weighted by similarity

E.g. by cosine similarity

\[s_{m,m'}=\frac{\mathbf{x}^{(m)} \cdot \mathbf{x}^{(m')}}{|\mathbf{x}^{(m)}||\mathbf{x}^{(m')}|}\]

  • by row → user-based
  • By columns → item-based

 

Similarity of Users 1 and 2

 

Only consider items rated by both users

\[ \begin{align*} s_{1,2} &= \frac{\mathbf{x}^{(1)} \cdot \mathbf{x}^{(2)}}{|\mathbf{x}^{(1)}||\mathbf{x}^{(2)}|} \\ &= \frac{1\cdot 5+3\cdot 4+1 \cdot 4+5 \cdot 1}{\sqrt{1^2+3^2+1^2+5^2}\cdot \sqrt{5^2+4^2+4^2+1^2}} = 0.57 \end{align*} \]

Similarity Matrix

 

Calculation of an Unknown Rating

 

Predicted rating for user \(m=2\), item \(n=4\): \[ \begin{align*} \hat{y}_{2,4} &= \frac{1}{s_{2,3}+s_{2,5}+s_{2,6}} \left(s_{2,3}\cdot y_{3,4}+ s_{2,5} \cdot y_{5,4}+s_{2,6}\cdot y_{6,4}\right) \\ &= \frac{1}{0.73+0.65+1.0} \left( 0.73 \cdot 4+0.65 \cdot 5+1.0 \cdot 4 \right) \\ &= 4.27 \end{align*} \]

Matrix Factorisation

For a matrix \(\mathbf{Y}:M\times N\) of rank \(K\) there exist \(\mathbf{U}: N\times K\) and \(\mathbf{V}: M\times K\), such that

\[\mathbf{Y}=\mathbf{U}\mathbf{V}^T\]

 

Matrix Factorisation

\[\mathbf{Y}=\mathbf{U}\mathbf{V}^T\]

  • Problem: \(\mathbf{Y}\) is sparse
  • Approach:
    • Calculate \(\mathbf{U}\) and \(\mathbf{V}\) based on available entries in \(\mathbf{Y}\)
    • Use \(\mathbf{U}\) and \(\mathbf{V}\) to predict unknown ratings \(\mathbf{\hat{Y}}\)

Factorisation Machines

 

\[ \begin{align*} \text{Cost: } J(U,V) &= \frac{1}{2}\sum_{(m,n) \text{ where}\\ y_{m,n}\neq 0} \left(y_{m,n}- \left(\mathbf{u}^{(m)}\cdot \mathbf{v}^{(n)}+b_u^{(m)}+b_v^{(n)}\right)\right)^2 \\ &+ \frac{\lambda}{2}\sum||\mathbf{u}^{(m)}||^2 + \frac{\lambda}{2}\sum||\mathbf{v}^{(n)}||^2 \end{align*} \]

Minimise by alternating least squares or stochastic gradient descent

Summary

  • No item features needed
  • User ratings required
  • Current interests infered from historic user behavior
  • Sparsity
  • Cold start problems
  • Users’s range of interests can be expanded