Local Time (EST)

Event

0900

Introduction and opening remarks

0910

Responsible Decision-Making in Batch RL Settings

Responsible decision-making is tough in batch settings because policy improvement involves doing something different from the current behavior policy -- but we only have data from that current behavior policy. In this talk, I'll first briefly share not only the variety of approaches we have taken to identify better hypotension treatment policies for ICU patients, but also our difficulties in trying to validate them. Next, I'll describe recent work which focuses on (a) identifying where clinicians disagree and (b) only making recommendations at those decision points. (The core idea being that, statistically, we only have evidence to suggest an alternate policy in areas where we have observed clinician disagreement.) The result is a set of recommendations that has both more statistical support and is easier for clinicians to inspect for validity.

0940

Poster Session (with coffee break)

1100

Robust Multivalid Uncertainty Quantification

When deciding how to act as a function of our predictions, it is important to be able to quantify the uncertainty in our predictions. Traditional conformal prediction methods give a very simple, general way of attaching uncertainty sets to black box predictions, but have two well-known shortcomings - first, they generally require that the future look like the past --- i.e. that the data be i.i.d. or exchangeable. While mathematically convenient, this often fails in the face of various kinds of distribution shift. Second, they provide marginal coverage guarantees --- i.e. guarantees that are valid only as averaged over all instances. This can paper over weaknesses of a model on small sub-populations, which is especially worrisome when we are making predictions about people. I'll present an exciting (I think :-) new method that solves both of these problems; it can endow arbitrary black box predictors with prediction sets that promise statistically optimal empirical coverage guarantees not just marginally, but conditionally on a large number of arbitrarily defined (possibly intersecting!) subsets of the data, and does so without requiring any distributional assumptions at all --- i.e. it has guarantees even against adversarially chosen streams of data.

1130

Yahav Bechavod
Contributed talk - Individually Fair Learning with One-Sided Feedback

We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. On each round, instances arrive and receive classification outcomes according to a randomized policy deployed by the learner, whose goal is to maximize accuracy while deploying individually fair policies. We first extend the framework of Bechavod et al. (2020), which relies on the existence of a human fairness auditor for detecting fairness violations, to instead incorporate feedback from dynamically-selected panels of multiple, possibly inconsistent, auditors. We then construct an efficient reduction from our problem of online learning with one-sided feedback and a panel reporting fairness violations to the contextual combinatorial semi-bandit problem (Cesa-Bianchi & Lugosi, 2009, Gyšrgy et al., 2007). Finally, we show how to leverage the guarantees of two algorithms in the contextual combinatorial semi-bandit setting Exp2 (Bubeck et al., 2012) and the oracle-efficient Context-Semi-Bandit-FTPL (Syrgkanis et al., 2016), to provide multi-criteria no regret guarantees simultaneously for accuracy and fairness. Our results resolve an open question of Bechavod et al. (2020), showing that individually fair and accurate online learning with auditor feedback can be carried out in the one-sided feedback setting.

1145

Nathan O Lambert
Contributed talk - Reward Reports for Reinforcement Learning

The desire to build good systems in the face of complex societal effects requires a dynamic approach towards equity and access. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and post-deployment performance unexamined. Meanwhile, recent work in reinforcement learning design has shown that the effects of optimization objectives on the resultant system behavior can be wide-ranging and unpredictable. In this paper we sketch a framework for documenting deployed learning systems, which we call \textit{Reward Reports}.

1200

Lunch Break

1400

Dimension Reduction Tools and Their Use in Responsible Data Understanding in Dynamic Environments

Dimension reduction (DR) techniques such as t-SNE, UMAP, and TriMap have demonstrated impressive visualization performance on many real world datasets. They are useful for understanding data and trustworthy decision-making, particularly for biological data. One tension that has always faced these methods is the trade-off between preservation of global structure and preservation of local structure -- past methods can either handle one or the other, but not both. In this work, our main goal is to understand what aspects of DR methods are important for preserving both local and global structure -- it is difficult to design a better method without a true understanding of the choices we make in our algorithms and their empirical impact on the lower-dimensional embeddings they produce. Towards the goal of local structure preservation, we provide several useful design principles for DR loss functions based on our new understanding of the mechanisms behind successful DR methods. Towards the goal of global structure preservation, our analysis illuminates that the choice of which components to preserve is important. We leverage these insights to design a new algorithm for DR, called Pairwise Controlled Manifold Approximation Projection (PaCMAP), which preserves both local and global structure. Our work provides several unexpected insights into what design choices both to make and avoid when constructing DR algorithms.

1430

Explanations in Whose Interests?

In the United States, the law requires that lenders explain their adverse decisions to consumers, one goal of which is to educate consumers about how to receive more favorable decisions in the future. Scholars have recently proposed a range of new techniques to help lenders realize this goal when their decision making relies on machine learning. However, attempts to directly map these techniques onto applications in finance are often somewhat stylized, failing to take into account important aspects of lending in practice. Lending decisions are rarely binary (i.e., lend/don't lend). Machine learning models are often used by lenders to estimate consumers' risk of default, not to classify applicants as creditworthy or not; these estimates of risk inform a more complex decision about the terms on which lenders are willing to grant credit to consumers. Differences in the terms of a loan often result in very different utility for consumers and lenders. In fact, access to credit on unfavorable terms can be actively harmful to consumers, even if it might be profitable for lenders. Very little of the existing scholarship on explainable AI in finance---or that uses lending as a motivating example---takes these crucial details into account. As a result, many of the proposed methods for explaining adverse lending decisions may not help consumers achieve better outcomes---and may even harm them in some cases.

1500

Poster Session (with coffee break)

1600

Exposure-Aware Recommendation using Contextual Bandits

Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This is especially problematic when bias is amplified over time as a few items (e.g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting in a feedback loop. This issue has been extensively studied in the literature on model-based or neighborhood-based recommendation algorithms, but less work has been done on online recommendation models, such as those based on top-K contextual bandits, where recommendation models are dynamically updated with ongoing user feedback. In this work, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We analyze these algorithms on their ability to handle exposure bias and provide a fair representation for items in the recommendation results. Our analysis reveals that these algorithms tend to amplify exposure disparity among items over time. In particular, we observe that these algorithms do not properly adapt to the feedback provided by the users and frequently recommend certain items even when those items are not selected by users. To mitigate this bias, we propose an Exposure-Aware reward model that updates the model parameters based on two factors- 1) user feedback (i.e., clicked or not), and 2) position of the item in the recommendation list. This way, the proposed model controls the utility assigned to items based on their exposure in the recommendation list. Extensive experiments on two real-world datasets using three contextual bandit algorithms show that the proposed reward model reduces exposure bias amplification in long run while maintaining the recommendation accuracy.

1630

Modeling Recommender Ecosystems - Some Considerations

An important goal for recommender systems is to make recommendations that maximize some form of user utility, ideally over extended periods of time. While reinforcement learning has started to find limited application in recommendation settings, for the most part, practical recommender systems remain myopic (i.e., focused on immediate user responses rather than long-term user value). Moreover, they are local in the sense that they rarely consider the impact that a recommendation made to one user may have on the ability to serve other users. These latter "ecosystem effects" play a critical role in optimizing long-term user utility. In this talk, I describe an approach to optimizing user utility and social welfare using reinforcement learning and equilibrium modeling of the recommender ecosystem. I will also draw connections between these models and notions such as fairness and incentive design and outline some future challenges for the community.

1700

Yulian Wu
Contributed talk - Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits

In this paper we investigate the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/sub-Gaussian reward distributions, we focus on the setting where each arm's reward distribution only has $(1+v)$-th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$-DP model. We first provide a near-optimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we establish the lower bound to show that the instance-dependent regret of our improved algorithm is optimal. In the second part, we study the problem in the $\epsilon$-LDP model. We propose an algorithm that can be seen as locally private and robust version of SE algorithm, which provably achieves (near) optimal rates for both instance-dependent and instance-independent regret. Our results reveal differences between the problem of private MAB with bounded/sub-Gaussian rewards and heavy-tailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might be used to other related problems.

1715

Contributed talk - A Game-Theoretic Perspective on Trust in Recommendation

Recommendation platforms---such as Amazon, Netflix, and Facebook---use various strategies in order to engage and retain users, from tracking their data to showing addicting content. Ostensibly these measures improve performance, but they can also erode {\em trust}. In this work, we study the role of trust in recommendation, and show that trust is important to a recommendation platform's success because users are the platforms' data sources. Our main contribution is a game-theoretic view of recommender systems and a corresponding formal definition of trust. Namely, if a user trusts their recommendation platform, then their optimal long-term strategy is to act greedily---and thus report their preferences truthfully---at all times. Our definition reflects the intuition that trust arises when the incentives of the user and plaform are sufficiently aligned. To illustrate the implications of this definition, we explore two simple examples of trust. We show that distrust can hurt the platform and building trust can be good for both the user and the platform.