for Search, Recommendation and Ad Placement
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
Off-policy evaluation for slate recommendation