Junpei Komiyama, Shunya Noda
We propose deviation-based learning, a new approach to training recommender systems. In the beginning, the recommender and rational users have different pieces of knowledge, and the recommender needs to learn the users’ knowledge to make better recommendations. The recommender learns users’ knowledge by observing whether each user followed or deviated from her recommendations. We show that learning frequently stalls if the recommender always recommends a choice: users tend to follow the recommendation blindly, and their choices do not reflect their knowledge. Social welfare and the learning rate are improved drastically if the recommender abstains from recommending a choice when she predicts that multiple arms will produce a similar payoff.