Brief introduction into recommendation systems via Python.

Kelvin Waters
5 min readMay 15, 2020

Recommendations systems have been with us since the late 90’s and are just now coming into it’s own. Hell just this morning I was personally informed via email that my Steam library, which is nearing 100 titles , would now be using to recommend even more games (that I’ll probably never have the time to play). Count Hulu, Spotify, Netflix … as well as all the social media websites and applications that are at our disposal, membership being the commodity item being recommended.

Two Types of recommendation engines

The most prevalent being content based and collaborative filtering recommender systems. Collaborative filtering is widely used to make recommendation based on ratings and history with an item from anything from the car you drive to a recent movie you watched while you “Netflixed and chilled”, you did eventually watch the movie right or was the movie watching YOU? Ahem.

The two collaborative models prevalent today are memory-base and model-based methods. Memory based being the easiest to implement and explain.

User-based collaborative filtering can be can be describe as the social network of collaborative filtering simply because it relies on user interactions with various items or merchandise that could be similar to a lot of other folks using or viewing the identical stuff, this similarity of people is the bread and butter, or for some, margarine that this filtering technique is based on.

Content based recommender would use metadata to make predictions on what a potential customer may want to purchase, based on other similar items purchased. this would be considered and item-based collaborative schema. Duff drinkers would want that duff t-shirt right?

they’re good for recommending music, movies, and books that have a lot of meta data that can be collected, which can easily draw some similarity between a favorite actor, director, genre, or if the new keyboard you’ve been shopping for has blue tooth capabilities as well as a track pad, these are some item meta data that could be used to garner your attention. There’s a hybrid recommender system that would basically encompass both of these technologies into a singular system.

Let’s briefly walk through a movielens dataset just enough to get our feet wet.

We’ll read in some data:

Gather some info:

This could be useful:

The average rating is 3.5 and a max of 5

And the majority of the ratings sit somewhere between 2.5 and 4

Feel free to continue the same steps through movie_csv. file that’s omitted here.

Here timestamp of a rating isn’t advantageous to us, so it’s discarded and for the sake of time I’ll drop the genres column as well (oh no not that meta data!). All while performing a merge of these two datasets.

Now utilizing a pivot_table to get our information in the shape of the matrix we’ll want to use.

Houston we have a problem! Many of these films haven’t been rated at all, perhaps quite a few may have less than n number of user ratings. There is no magical method to handle situations like this but here’s what we’ll do.

Let’s drop any movies that don’t have at least ten user ratings, and since the ratings are scored 1–5 we’ll get rid of the remaining NaN non-values replacing them with 0’s.

This will cut our dataset by more than half but we have the option of adjusting the threshold as needed plus we’ll diminish unnecessary noise in our data.

There are three ways we can create a similarity-matrix that can be used for predictions as using: euclidean, cosine, or pythons own built in correlation with the pearson method, which does double duty by adjusting the mean in the background alleviating any further need to standardize the data.

Here we can clearly see a perfect correlation between a film and itself, now to take advantage of the other similarities to make predictions.

And finally make some predictions!

Here we can see out of the 3 movies we rated, we received three recommendations that we haven’t viewed, although the first two gave exact returns, much like YouTube recommending videos you’ve probably already watched a hundred freaking time! (okay maybe not a hundred)

Hopefully this puts you on the right track in building your own recommendation system . Let me know how you would improve on this model, maybe you would’ve used genres as meta-data, maybe you would have adjusted the threshold, please let me know. Thank You!

--

--