for Search, Recommendation and Ad Placement

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

Off-policy evaluation for slate recommendation