datacommunitydc.org/blog/2013/05/recommendation-engines-why-you-shouldnt-...

Recommendation engines are arguably one of the trendiest uses of data science in startups today. How many new apps have you heard of that claim to "learn your tastes"? However, recommendations engines are widely misunderstood both in terms of what is involved in building a one as well as what problems they actually solve. A true recommender system involves some fairly hefty data science -- it's not something you can build by simply installing a plugin without writing code. With the exception of very rare cases, it is not the killer feature of your minimum viable product (MVP) that will make users flock to you -- especially since there are so many fake and poorly performing recommender systems out there.

A recommendation engine is a feature (not a product) that filters items by predicting how a user might rate them. It solves the problem of connecting your existing users with the right items in your massive inventory (i.e. tens of thousands to millions) of products or content. Which means that if you don't have existing users and a massive inventory, a recommendation engine does not truly solve a problem for you. If I can view the entire inventory of your e-commerce store in just a few pages, I really don't need a recommendation system to help me discover products! And if your e-commerce store has no customers, who are you building a recommendation system for? It works for Netflix and Amazon because they have untold millions of titles and products and a large existing user base who are already there to stream movies or buy products. Presenting users with  recommended movies and products increases usage and sales, but doesn't create either to begin with.

There are two basic approaches to building a recommendation system: the collaborative filtering method and the content-based approach. Collaborative filtering algorithms take user ratings or other user behavior and make recommendations based on what users with similar behavior liked or purchased. For example, a widely used technique in the Netflix prize was to use machine learning to build a model that predicts how a user would rate a film based solely on the giant sparse matrix of how 480,000 users rated 18,000 films (100 million data points in all). This approach has the advantage of not requiring an understanding of the content itself, but does require a significant amount of data, ideally millions of data points or more, on user behavior. The more data the better. With little or no data, you won't be able to make recommendations at all -- a pitfall of this approach known as the cold-start problem. This is why you cannot use this approach in a brand new MVP. 

The content-based approach requires deep knowledge of your massive inventory of products. Each item must be profiled based on its characteristics. For a very large inventory (the only type of inventory you need a recommender system for), this process must be automatic, which can prove difficult depending on the nature of the items. A user's tastes are then deduced based on either their ratings, behavior, or directly entering information about their preferences. The pitfalls of this approach are that an automated classification system could require significant algorithmic development and is likely not available as a commodity technical solution. Second, as with the collaborative filtering approach, the user needs to input information on their personal tastes, though not on the same scale. One advantage of the content-based approach is that it doesn't suffer from the cold-start problem -- even the first user can gain useful recommendations if the content is classified well. But the benefit that recommendations offer to the user must justify the effort required to offer input on personal tastes. That is, the recommendations must be excellent and the effort required to enter personal preferences must be minimal and ideally baked into the general usage. (Note that if your offering is an e-commerce store, this data entry amounts to adding a step to your funnel and could hurt sales more than it helps.) One product that has been successful with this approach is Pandora. Based on naming a single song or artist, Pandora can recommend songs that you will likely enjoy. This is because a single song title offers hundreds of points of data via the Music Genome Project. The effort required to classify every song in the Music Genome Project cannot be understated -- it took 5 years to develop the algorithm and classify the inventory of music offered in the first launch of Pandora. Once again, this is not something you can do with a brand new MVP.

Pandora may be the only example of a successful business where the recommendation engine itself is the core product, not a feature layered onto a different core product. Unless you have the domain expertise, algorithm development skill, massive inventory, and frictionless user data entry design to build your vision of the Pandora for socks / cat toys / nail polish / etc, your recommendation system will not be the milkshake that brings all the boys to the yard. Instead, you should focus on building your core product, optimizing your e-commerce funnel, growing your user base, developing user loyalty, and growing your inventory. Then, maybe one day, when you are the next Netflix or Amazon, it will be worth it to add on a recommendation system to increase your existing usage and sales. In the mean time, you can drive serendipitous discovery simply by offering users a selection of most popular content or editor's picks.


Comments (0)

Sign in to post comments.