Reducing Bias in Recommender Systems

Published in

Tubi Engineering

6 min readApr 26, 2022

Authors: Qiang Chen, Jiayuan Dong, Tal Levy, and Jaya Kawale

Tubi is an advertiser-based video on demand (AVOD) streaming company that allows viewers to watch any content available on the service for free. We have an extensive content library with various content, including movies, TV series, live news, and sports. Naturally, recommender systems form a crucial component of the service.

**Tubi Homepage — Machine learning powers the different recommendations that generate the homepage.**

Presentation Bias in Recommender Systems

Recommender systems help surface content from our vast library of thousands of movies and TV shows. They learn via feedback collected from user behavior. Typically, the recommendation algorithm provides a ranked list of recommendations. The user will take action, for example, play a title. This feedback is collected and is used further for training new models. Thus the current recommendations will impact future recommendations. This scenario leads to the dreaded presentation bias. Over time, the recommendations for a user converge into a small set, which is called an information island in the recommender system. What’s more, some videos are never exposed to the users, so we never know if users would have enjoyed watching them or not.

The recommendation algorithm provides a ranked list of recommendations. The user will take action, for example, play a title. This feedback is collected and is used further for training new models. Thus the current recommendations will impact future recommendations.

We utilize user behavior to generate personalized recommendations, such as clicking and viewing data. If we don’t carefully account for presentation bias, it will create feedback loop issues. For example, a user is only presented with horror content on Halloween. The user watches a horror film. Then the recommender system recommends more horror content, and the users thus watch more horror content, perpetuating the problem. On the other hand, the user might also be interested in other genre titles. Therefore we need some way to allow for an escape hatch.

In this blog, we present some solutions we have adapted for the problem of presentation bias. Let’s examine them in the following sections.

Feature Engineering

Let’s first examine simple approaches utilizing feature engineering to help reduce the presentation bias in the system. The feature typically used by many machine learning models to recommend items is the popularity of content. Popularity, by definition, is biased toward presentation. Popular content has a lot of presentation opportunities. The presentation bias keeps the popularity high even if only a tiny percentage of users like the content.

How do we reduce the popularity bias? A quick fix is to not look at the popularity metric directly but to use some normalization based on presentation. For example, use a popularity per presentation metric. One caveat with this feature is that the metric is unstable for titles with a little presentation. Firstly, to encounter the caveat, We order the content by popularity and group them into X buckets, where X is a hyperparameter. The group step makes each bucket has 1/Xth of the overall popularity. Secondly, we rank content in each bucket by its popularity per presentation. We assume content in each bucket has the same confidence level of popularity per presentation. The online experiments showed that adding this new feature to content ranking gives more presentation opportunities to titles in the long tail that we would otherwise miss. The below figure shows the improvement in rank for the long-tail content.

In the figures, each point is content. The x-axis is popularity rank, and a lower popularity rank means higher popular content. The y-axis represents the improved rank. The improved rank is calculated by ranking content in each bucket by popularity per presentation. The left one is for the top 300 popular content. The right one is for the top 3000 popular content. As you can see, the improved ranks are very different in the top and tail buckets.

Leveraging Additional Sources of Engagement

This section examines additional sources of engagement that are different from the homepage recommendations.

We can incorporate many additional sources like thumbs up/down on content, user’s search query, and watch behavior from search. Let’s take the example of watch behavior from search. We found that some users’ watch history from search is different from the homepage. So there is an opportunity to leverage the search history to improve the homepage recommendation.

Let’s illustrate that using a real example. The figure below shows a user’s watch history from the homepage indicating that they enjoyed watching thrillers and horror movies.

**User A’s watch history from the homepage**

However, this user also searched and watched many documentaries. The figure below shows the user’s watch history based on their past searches.

**User A’s watch history based upon the searches**

This example indicates that our recommender system can benefit from the user’s search history. We can recommend some documentaries to this user apart from Horror. Furthermore, whenever the user searches for something, it could mean a missed opportunity from the point of view of the recommender system. We use collaborative filtering-based approaches to leverage the user’s search history while learning to rank contents. The online experiment results indicated this additional feature improved streaming and retention metrics.

Since the additional sources of engagement bring in a different dimension, they help reduce the homepage presentation bias.

Exploration

One key idea to help reduce the bias is via exploration. The key idea behind exploration is to expose users to parts of the catalog. Exploration is helpful to account for uncertainty when the user feedback is sparse. We usually use it to find audiences for new content and discover new tastes for users. We can naturally use it to alleviate presentation bias as it helps gather feedback on items that the recommender system would not have sought. In general, there is a cost of exploration in the near term, and the cost can vary from poor user experience to unsubscription depending upon the nature of exploration. We focus on exploration as a tool to reduce presentation bias and present some simple solutions for that. The following paragraphs describe some of the methods for exploration.

We tried a straightforward idea: add jitter to our ranking model scores. The ranking scores typically come from an offline model. We use the Boltzman exploration, which is widely used in the literature, to add exploration to our ranking scores online. One problem we saw in our online tests with this exploration was that it harmed the overall user experience.

Another successful idea is having a constant exploration running to collect unbiased feedback from the users. We need to consider the cost vs. benefit of exploring the content. New titles benefit from exploration. For existing titles, it is essential to break the feedback loop and collect feedback on users with different tastes. We continuously run various exploration algorithms on one such container on our homepage called “Something Completely Different.” The feedback from this container has been very helpful in improving the rest of the recommendations pipelines, and we have seen successful follow-up experiments leveraging this.

**“Something Completely Different” Row**

Bandits

Finally, the tradeoff between exploration and exploitation is crucial to make sure we do not hurt the user experience and gather feedback in regions where it is sparse. This tradeoff is central to the design of reinforcement learning or bandit algorithms. The framework either exploits the best results the model can present or explores less ideal alternatives. We can learn that a user who only watches horror films is also interested in other genres like comedies during the exploration phase. The Bandits exploration strategy is also a great way to cold start a title. Cold starting refers to recommending content that was just added and has no training data for the model to learn.

The tricky part is to find the sweet spot for the exploration/exploitation tradeoff. You don’t want to exploit too much that you don’t get enough chances to learn from exploration. On the other hand, exploring too much will hurt your user experience. There is no magic bullet, and with every experiment, you get closer to the right balance for each Bandits use case.

One area where we have seen bandits helping is the new user recommendations.

The new users have no historical viewing data and very few available features. It is too easy to end up in an endless feedback loop of recommending the same popular titles and only learning from this small subset of titles. Our new user bandits model gets us out of that loop by allowing a more diverse set of recommendations while generating a great user experience thanks to the bandit’s explore/exploit strategy.

Summary

We introduced some practical ideas to reduce bias in the recommender system. They can quickly adapt to new systems and new problem settings. Want to learn and develop more ways to reduce bias? Follow our blog and join us.