Recommendations Galore: How Zalando Tech Makes It Happen
Diving deeper into the tools that allow us to power our recommendation engines.
If you’re a frequent shopper on Zalando, you would have noticed our recommendations for similar items and brands when you’re browsing. It’s a feature we’re constantly iterating on to make the experience as personalised as possible.
Having recently been featured in the Financial Times about algorithms and data to suggest your next purchase, we’ve decided to dive a little deeper into what these powerful tools can do for customers and for our business.
What metrics can Zalando offer that prove the efficacy of using recommendations in terms of items sold, basket values, etc?
Measuring click-through rates is a core metric for recommendations, but you cannot forget about tracking gross revenue. This gives us information about whether customers are making purchases on the same day as the click-through, displaying how much revenue is influenced by recommendations in general. We’re also using visitor conversion rates, revenue per visitor, average order value, average number of recommendation impressions, and share of sold items attributed to recommendations to measure their impact. Alongside this, recommendations are powered by different engines to ensure we’re getting the whole picture behind user interaction and article preference. Though this makes it difficult to attribute an increase in KPIs to one engine, it is crucial as it tracks the overall effect on the customer experience.
Do you use a content-based or collaborative filtering approach for recommendations - or both? Where do the strengths and weaknesses of each approach lie?
It’s important to establish what we mean when we talk about data derived from recommendation engines. User data is what we get when we’re looking at clicks and views, whereas product data is concerned with the items themselves. In most cases, for item-based recommendations, we use a combination of collaborative filtering and content-based filtering, as that works best for collecting and assessing the data we need. In cases when we have to tackle a cold start problem, we use plain content-based recommendations. Alongside this, we have implemented several content-based algorithms and we are also working on some interesting new ones.
In our view, one of the benefits of collaborative filtering is that it is completely user-behaviour based, meaning we can derive similarities between items that might not be observed solely through product data. However, to ensure that collaborative filtering is effective, you need a lot of traffic to ensure that the data communicated is an accurate representation of user behaviour.
For content-based filtering, we can use product data that we already have, meaning it is a great base to start from if we don’t have a gamut of user-behavior figures. The effectiveness though can vary depending on your data model, or how richly defined your data is.
What are the biggest challenges involved in implementing a recommendation engine? What advice would you pass on to other organisations thinking of going down this road?
One of the biggest challenges involved in implementing an effective recommendation engine comes from developing a deep understanding of customer preferences. Fashion purchasing is an emotional interaction, based on evolving personal tastes, which makes it almost impossible to create a “master formula”. Still, this is the joy of recommendation engines: creating something that helps people connect with things they love. It is key that companies understand that customers don’t expect them to be mind readers, rather the expectation is to be helpful and improve the customer experience.
An additional challenge is that for many ecommerce sites, consumer preferences must be inferred from implicit feedback like online behavior, rather than explicit feedback such as a simple “like-button” or similar. Actions such as time spent browsing a particular product or placing a product in a shopping basket gives us an insight into what a customer is interested in, but can also be misleading. What if they left the computer to get a drink? Or purchased a product for a present? For an engine to be successful and provide a positive service, it does not need to be flawless, as long as the customer is receiving relevant recommendations.
There is also the gap between offline prototyping and online implementation that can prove to be challenging, as it is very difficult to evaluate our models offline. The key challenge here is that in an offline setting, you are working with past data, and you don’t really know how a user would have reacted. Instead, one is fixing a point in time and then try to see how well one can forecast the user’s behavior after that time point based on the behavior before. While offline tests are an important step, the real evaluation has to happen online in live A/B-tests. Here, users are randomly assigned to two or more different algorithmic variants and the metrics are tracked separately. Measurements are sometimes noisy and fluctuate over time, so A/B-tests have to be run for a long time to get significant results for metrics like Click Through Rate. This timeframe can vary between one week to several weeks in order to see a variance that is statistically accurate and pertinent.
This is the biggest piece of advice we would offer to other companies adopting recommendation engines: never lose sight of the end-user. While issues such as tracking online behaviour, data quality and effectively implementing prototypes can cause problems, they can also distract you from focusing on how it operates in the real world.
In other words, it is about improving. A recommendation engine should never be complete. Instead, it is in a constant state of improvement with the newest techniques being adopted in order to create the best results possible, something we follow at Zalando.
We're hiring! Do you like working in an ever evolving organization such as Zalando? Consider joining our teams as a Applied Scientist!