Papers
Paper

Details

Video

Combining Counterfactuals With Shapley Values To Explain Image Models
Lahiri, Aditya*; Alipour, Kamran; Adeli, Ehsan; Salimi, Babak 
With the widespread use of sophisticated machine learning models in sensitive applications, understanding their decisionmaking has become an essential task. Models trained on tabular data have witnessed significant progress in explanations of their underlying decision making processes by virtue of having a small number of discrete features. However, applying these methods to highdimensional inputs such as images is not a trivial task. Images are composed of pixels at an atomic level and do not carry any interpretability by themselves. In this work, we seek to use annotated highlevel interpretable features of images to provide explanations. We leverage the Shapley value framework from Game Theory, which has garnered wide acceptance in general XAI problems. By developing a pipeline to generate counterfactuals and subsequently using it to estimate Shapley values, we obtain contrastive and interpretable explanations with strong axiomatic guarantees.
Paper
Poster


Individually Fair Learning with OneSided Feedback
Contributed talk
Bechavod, Yahav*; Roth, Aaron 
We consider an online learning problem with onesided feedback, in which the learner is able to observe the true label only for positively predicted instances. On each round, instances arrive and receive classification outcomes according to a randomized policy deployed by the learner, whose goal is to maximize accuracy while deploying individually fair policies. We first extend the framework of Bechavod et al. (2020), which relies on the existence of a human fairness auditor for detecting fairness violations, to instead incorporate feedback from dynamicallyselected panels of multiple, possibly inconsistent, auditors. We then construct an efficient reduction from our problem of online learning with onesided feedback and a panel reporting fairness violations to the contextual combinatorial semibandit problem (CesaBianchi & Lugosi, 2009, Gyšrgy et al., 2007). Finally, we show how to leverage the guarantees of two algorithms in the contextual combinatorial semibandit setting Exp2 (Bubeck et al., 2012) and the oracleefficient ContextSemiBanditFTPL (Syrgkanis et al., 2016), to provide multicriteria no regret guarantees simultaneously for accuracy and fairness. Our results resolve an open question of Bechavod et al. (2020), showing that individually fair and accurate online learning with auditor feedback can be carried out in the onesided feedback setting.
Paper


Robust Reinforcement Learning with Distributional Riskaverse formulation
Clavier, Pierre*; Allassonniere, Stephanie; LE PENNEC, Erwann 
Robust Reinforcement Learning tries to make predictions more robust to changes in the dynamics or rewards of the system. This problem is particularly important when the dynamics and rewards of the environment are estimated from the data. In this paper, we approximate the Robust Reinforcement Learning constrained with a $\Phi$divergence using an approximate RiskAverse formulation. We show that the classical Reinforcement Learning formulation can be robustified using standard deviation penalization of the objective. Two algorithms based on Distributional Reinforcement Learning, one for discrete and one for continuous action spaces are proposed and tested in a classical Gym environment to demonstrate the robustness of the algorithms.
Paper
Poster


Optimal Dynamic Regret in LQR Control
Baby, Dheeraj*; Wang, YuXiang 
We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(n^{1/3} \TV(M_{1:n}^{2/3} \vee 1)$, where $\TV(M_{1:n})$ is the total variation of any oracle sequence of \emph{Disturbance Action} policies parameterized by $M_1,...,M_n$  chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\TV(M_{1:n})+1)} )$ for general convex losses and is informationtheoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz 2020, as well as a new \emph{proper} learning algorithm with an optimal $\tilde{O}(n^{1/3})$ dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.
Paper
Poster


Optimal Rates of (Locally) Differentially Private Heavytailed MultiArmed Bandits
Contributed talk
Wu, Yulian*; Tao, Youming; Zhao, Peng; Wang, Di 
In this paper we investigate the problem of stochastic multiarmed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike previous results that assume bounded/subGaussian reward distributions, we focus on the setting where each arm's reward distribution only has $(1+v)$th moment with some $v\in (0, 1]$. In the first part, we study the problem in the central $\epsilon$DP model. We first provide a nearoptimal result by developing a private and robust Upper Confidence Bound (UCB) algorithm. Then, we improve the result via a private and robust version of the Successive Elimination (SE) algorithm. Finally, we establish the lower bound to show that the instancedependent regret of our improved algorithm is optimal. In the second part, we study the problem in the $\epsilon$LDP model. We propose an algorithm that can be seen as locally private and robust version of SE algorithm, which provably achieves (near) optimal rates for both instancedependent and instanceindependent regret. Our results reveal differences between the problem of private MAB with bounded/subGaussian rewards and heavytailed rewards. To achieve these (near) optimal rates, we develop several new hard instances and private robust estimators as byproducts, which might be used to other related problems.
Paper
Poster


RISE  Robust Individualized Decision Learning with Sensitive Variables
Tan, Xiaoqing*; Qi, Zhengling; Seymour, Christopher; Tang, Lu 
This paper introduces RISE, a robust individualized decision learning framework with sensitive variables, where sensitive variables are collectible data and important to the intervention decision, but their inclusion in decision making is prohibited due to reasons such as delayed availability or fairness concerns. The convention is to ignore these sensitive variables in learning decision rules, leading to significant uncertainty and bias. To address this, we propose a decision learning framework to incorporate sensitive variables during offline training but do not include them in the input of the learned decision rule during deployment. Specifically, from a causal perspective, the proposed framework intends to improve the worstcase outcomes of individuals caused by sensitive variables that are unavailable at the time of decision. Unlike most existing literature that uses meanoptimal objectives, the proposed learning framework robustifies sensitive variables via finding a newly defined quantile or infimumoptimal decision rule for improving the worstoff group among all sensitive variable realizations. The reliable performance of the proposed method is demonstrated through synthetic experiments and three realdata applications.
Paper
Poster


Acting Optimistically in Choosing Safe Actions
Chen, Tianrui*; Gangrade, Aditya; Saligrama, Venkatesh 
We investigate a natural but surprisingly unstudied approach to the multiarmed bandit problem under safety risk constraints. Each arm is associated with an unknown law on safety risks and rewards, and the learner's goal is to maximise reward whilst not playing unsafe arms, as determined by a given threshold on the mean risk. We formulate a pseudoregret for this setting that enforces this safety constraint in a perround way by softly penalising any violation, regardless of the gain in reward due to the same. This has practical relevance to scenarios such as clinical trials, where one must maintain safety for each round rather than in an aggregated sense. We describe doubly optimistic strategies for this scenario, which maintain optimistic indices for both safety risk and reward. We show that schema based on both frequentist and Bayesian indices satisfy tight gapdependent logarithmic regret bounds, and further that these play unsafe arms only logarithmically many times in total. This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema, and probing the domains in which their use is appropriate
Paper


Dynamic Positive Reinforcement For LongTerm Fairness
Puranik, Bhagyashree*; Madhow, Upamanyu; Pedarsani, Ramtin 
As AIbased decisionmaking becomes increasingly impactful on human society, the study of the influence of fairnessaware policies on the population becomes important. In this work, we propose a framework for sequential decisionmaking aimed at dynamically influencing longterm societal fairness, illustrated via the problem of selecting applicants from a pool consisting of two groups, one of which is underrepresented. We consider a dynamic model for the composition of the applicant pool, where the admission of more applicants from a particular group positively reinforces more such candidates to participate in the selection process. Under such a model, we show the efficacy of the proposed FairGreedy selection policy which systematically trades greedy score maximization against fairness objectives. In addition to experimenting on synthetic data, we adapt static realworld datasets on law school candidates and credit lending to simulate the dynamics of the composition of the applicant pool.
Paper


An Investigation into the Open World Survival Game Crafter
Stanic, Aleksandar*; Tang, Yujin; Ha, David; Schmidhuber, JŸrgen 
We share our experience with the recently released Crafter benchmark, a 2D open world survival game. Crafter allows tractable investigation of novel agents and their generalization, exploration and longterm reasoning capabilities. We evaluate agents on the original Crafter environment, as well as on a newly introduced set of generalization environments, suitable for evaluating agents' robustness to unseen objects and fastadaptation (metalearning) capabilities. Through several experiments we provide a couple of critical insights that are of general interest for future work on Crafter. We find that (1) Simple agents with tuned hyperparameters outperform all previous agents. (2) Feedforward agents can unlock almost all achievements by relying on the inventory display. (3) Recurrent agents improve on feedforward ones, also without the inventory information. (4) All agents (including interpretable objectcentric ones) fail to generalize to OOD objects. We will opensource our code.
Paper


Equity and Equality in Fair Federated Learning
Mozaffari, Hamid*; Houmansadr, Amir 
Federated Learning (FL) enables data owners to train a shared global model without sharing their private data. Unfortunately, FL is susceptible to an intrinsic fairness issue due to heterogeneity in clients' data distributions, the final trained model can give disproportionate advantages across the participating clients. In this work, we present Equal and Equitable Federated Learning (E2FL) to produce fair federated learning models by preserving two main fairness properties, equity and equality, concurrently. We validate the efficiency and fairness of E2FL in different realworld FL applications, and show that E2FL outperforms existing baselines in terms of the resulting efficiency, fairness of different groups, and fairness among all individual clients.
Paper
Poster


Certifiably Robust MultiAgent Reinforcement Learning against Adversarial Communication
Sun, Yanchao*; Zheng, Ruijie; Hassanzadeh, Parisa; Liang, Yongyuan; Feizi, Soheil; Ganesh, Sumitra; Huang, Furong 
Communication is important in many multiagent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a realworld application where noise and potential attackers exist, the safety of communicationbased policies becomes a severe issue that is underexplored. Specifically, if communication messages are manipulated by malicious attackers, agents relying on untrustworthy communication may take unsafe actions that lead to catastrophic consequences. Therefore, it is crucial to ensure that agents will not be misled by corrupted communication, while still benefiting from benign communication. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $C<\frac{N1}{2}$ agents to a victim agent. For this strong threat model, we propose a certifiable defense by constructing a messageensemble policy that aggregates multiple randomly ablated message sets. Theoretical analysis shows that this messageensemble policy can utilize benign communication while being certifiably robust to adversarial communication, regardless of the attacking algorithm. Experiments in multiple environments verify that our defense significantly improves the robustness of trained policies against various types of attacks.
Paper
Poster


Prisoners of Their Own Devices  How Models Induce Data Bias in Performative Prediction
Pombal, Jose*; Saleiro, Pedro; Figueiredo , Mario; Bizarro, Pedro 
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real world use cases operate. In the latter, the predictive model itself plays a pivotal role in shaping the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a realworld account opening fraud detection case study as an example, we explore the dangers to both performance and fairness of two typical biases in performative prediction  distribution shifts, and the problem of selective labels.
Paper
Poster


A Decision Metric for the Use of a Deep Reinforcement Learning Policy
Selby, Christina* 
Uncertainty estimation techniques such as those found in Osband et al. (2018) and Burda et al. (2019) have been shown to be useful for efficient exploration during training. This paper demonstrates that such uncertainty estimation techniques can also be used as part of a timeseries based methodology for outofdistribution (OOD) detection for an offline modelfree deep reinforcement learning policy. In particular, this paper defines a "decision metric" that can be utilized for determining when another decisionmaking process should be used in place of the deep reinforcement learning policy.
Paper
Poster


Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms
Sa_lam, Baturay*; Cicek, Dogan Can; Mutlu, Furkan Burak; Kozat, Suleyman S 
Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future offpolicy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel offpolicy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.
Paper
Poster


Planning to Fairly Allocate  Probabilistic Fairness in the Restless Bandit Setting
Herlihy, Christine R*; Prins, Aviva; Srinivasan, Aravind; Dickerson, John P 
Restless and collapsing bandits are often used to model budgetconstrained resource allocation in settings where receiving the resource increases the probability that an arm will transition to, or remain in, a desirable state. However, SOTA Whittleindexbased approaches to this planning problem either do not consider fairness among arms, or incentivize fairness without guaranteeing it. We introduce ProbFair, an algorithm which finds the best (rewardmaximizing) policy that  (a) satisfies the budget constraint; and (b) enforces bounds $[\ell, u]$ on the probability of being pulled at each timestep. We evaluate our algorithm on a realworld application, where interventions support continuous positive airway pressure (CPAP) therapy adherence among patients, as well as on a broader class of synthetic transition matrices. ProbFair preserves utility while providing fairness guarantees.
Paper


Exposing Algorithmic Bias through Inverse Design
Mazijn, Carmen A.C.*; Prunkl, Carina EA; Algaba, Andres; Danckaert, Jan; Ginis, Vincent 
Traditional group fairness notions assess a modelÕs equality of outcome by computing statistical metrics on the outputs. We argue that these output metrics encounter fundamental obstacles and present a novel approach that aligns with equality of treatment. Through gradientbased inverse design, we generate a canonical set that shows the desired inputs for a model given a preferred output. The canonical set reveals the internal logic of the model and thereby exposes potential unethical biases. For the UCI Adult data set, we find that the biases detected by a canonical set interestingly differ from those of output metrics.
Paper
Poster


Reward Reports for Reinforcement Learning
Contributed talk
Krendl Gilbert, Thomas; Dean, Sarah*; Lambert, Nathan O; Zick, Tom; Snoswell, Aaron J 
The desire to build good systems in the face of complex societal effects requires a dynamic approach towards equity and access. Recent approaches to machine learning (ML) documentation have demonstrated the promise of discursive frameworks for deliberation about these complexities. However, these developments have been grounded in a static ML paradigm, leaving the role of feedback and postdeployment performance unexamined. Meanwhile, recent work in reinforcement learning design has shown that the effects of optimization objectives on the resultant system behavior can be wideranging and unpredictable. In this paper we sketch a framework for documenting deployed learning systems, which we call \textit{Reward Reports}.
Paper


Rashomon Capacity Measuring Predictive Multiplicity in Probabilistic Classification
Hsu, Hsiang*; Calmon, Flavio 
Predictive multiplicity occurs when classification models with nearly indistinguishable average performances assign conflicting predictions to individual samples. When used for decisionmaking in applications of consequence (e.g., lending, education, criminal justice), models developed without regard for predictive multiplicity may result in unjustified and arbitrary decisions for specific individuals. We introduce a new measure of predictive multiplicity in probabilistic classification called Rashomon Capacity. We show that Rashomon Capacity yields principled strategies for disclosing conflicting models to stakeholders. Our numerical experiments illustrate how Rashomon Capacity captures predictive multiplicity in various datasets and learning models, including neural networks. The tools introduced in this paper can help data scientists measure, report, and ultimately resolve predictive multiplicity prior to model deployment.
Paper


Counterfactual Metrics for Auditing BlackBox Recommender Systems for Ethical Concerns
Akpinar , NilJana *; Leqi, Liu; HadfieldMenell, Dylan; Lipton, Zachary 
Recommender systems can shape peoples' online experience in powerful ways which makes close scrutiny of ethical implications imperative. Most existing work in this area attempts to measure induced harm exclusively based on observed recommendations under a set policy. This neglects potential dependencies on other quantities and can lead to misleading conclusions about the behavior of the algorithm. Instead, we propose counterfactual metrics for auditing recommender systems for ethical concerns. By asking how recommendations would change if users behaved differently or if the training data was different, we are able to isolate the effects of the recommendation algorithm from components like user preference and information. We discuss the ethical context of the suggested metrics and propose directions for future work.
Paper
Poster


Adaptive Data Debiasing Through Bounded Exploration
Yang, Yifan*; Liu, Yang; Naghizadeh, Parinaz 
Biases in existing datasets used to train algorithmic decision rules can raise ethical and economic concerns due to the resulting disparate treatment of different groups. We propose an algorithm for sequentially debiasing such datasets through adaptive and bounded exploration in a classification problem with costly and censored feedback. Our proposed algorithm includes parameters that can be used to balance between the ultimate goal of removing data biases  which will in turn lead to more accurate and fair decisions, and the exploration risks incurred to achieve this goal. We analytically show that such exploration can help debias data in certain distributions. We further investigate how fairness criteria can work in conjunction with our data debiasing algorithm. We illustrate the performance of our algorithm using experiments on synthetic and realworld datasets.
Paper
Poster


Fairness Over Utilities Via MultiObjective Rewards
Blandin, Jack*; Kash, Ian 
Group fairness definitions make assumptions about the underlying decisionproblem that restrict them to classification problems. Numerous bespoke interpretations of group fairness definitions exist as attempts to extend them to specific applications. In an effort to generalize group fairness definitions beyond classification, Blandin & Kash (2021) explore using utility functions to define group fairness measures. In addition to the decisionmaker's utility function, they introduce a benefit function that represents the individual's utility from encountering a given decisionmaker policy. Using this framework, we interpret fairness problems as a multiobjective optimization, where we aim to optimize for both the decisionmaker's utility and the individual's benefit, as well as reduce the individual benefit difference across protected groups. We demonstrate our instantiation of this multiobjective approach in a reinforcement learning simulation.
Paper


Defining and Characterizing Reward Gaming
Skalse, Joar; Howe, Nikolaus HR; Krasheninnikov, Dmitrii; Krueger, David* 
We provide the first formal definition of \textbf{reward gaming}, a phenomenon where optimizing an imperfect \emph{proxy reward function}, $\mathcal{\tilde{R}}$, leads to poor performance according to a true reward function, $\mathcal{R}$. We say that a proxy is \emph{ungameable} if increasing the expected proxy return can never decrease the expected true return. Intuitively, it should be possible to create an ungameable proxy by overlooking finegrained distinctions between roughly equivalent outcomes, but we show this is usually not the case. A key insight is that the linearity of reward (as a function of stateaction visit counts) makes ungameability a very strong condition. In particular, for the set of all stochastic policies, two reward functions can only be ungameable if one of them is constant. We thus turn our attention to deterministic policies and finite sets of stochastic policies, where nontrivial ungameable pairs always exist, and establish necessary and sufficient conditions for the existence of simplifications, an important special case of ungameability. Our results reveal a tension between using reward functions to specify narrow tasks and aligning AI systems with human values.
Paper
Poster


Endtoend Auditing of Decision Pipelines
Laufer, Benjamin D*; Pierson, Emma; Garg, Nikhil 
Many highstakes policies can be modeled as a sequence of decisions along a \textit{pipeline}. We are interested in auditing such pipelines for both \textit{efficiency} and \textit{equity}. Using a dataset of over 100,000 crowdsourced resident requests for potentially hazardous tree maintenance in New York City, we observe a sequence of city government decisions about whether to inspect and work on a reported incident. At each decision in the pipeline, we define parity definitions and tests to identify inefficient, inequitable treatment. Disparities in resource allocation and scheduling across census tracts are reported as preliminary results.
Paper
Poster


Efficient Adversarial Training without Attacking  WorstCaseAware Robust Reinforcement Learning
Liang, Yongyuan*; Sun, Yanchao; Zheng, Ruijie; Huang, Furong 
Recent studies reveal that a welltrained deep reinforcement learning (RL) policy can be particularly vulnerable to adversarial perturbations on input observations. Therefore, it is crucial to train RL agents that are robust against any attacks with a bounded budget. Existing robust training methods in deep RL either treat correlated steps separately, ignoring the robustness of longterm reward, or train the agents and RLbased attacker together, doubling the computational burden and sample complexity of the training process. In this work, we propose a strong and efficient robust training framework for RL, named Worstcaseaware Robust RL (WocaRRL), that directly estimates and optimizes the worstcase reward of a policy under bounded $\ell_p$ attacks without requiring extra samples for learning an attacker. Experiments on multiple environments show that WocaRRL achieves stateoftheart performance under various strong attacks, and obtains significantly higher training efficiency than prior stateoftheart robust training methods.
Paper
Poster


Engineering a Safer Recommender System
Leqi, Liu*; Dean, Sarah 
While recommender systems suffuse our daily life, influencing information we receive, products we purchase, and beliefs we form, few works have systematically examined the safety of these systems. This can be partly attributed to the complex feedback loops. In this work, we take a systems safety perspective and focus on a particular feedback loop in recommender systems where users react to recommendations they receive. We characterize the difficulties of designing a safe recommender within this feedback loop. Further, we connect the causes of widely covered recommender system failures to flaws of the system in treating the feedback loop. Our analysis suggests lines of future work on designing safer recommender systems and more broadly systems that interact with people psychologically.
Paper


RiskyZoo  A Library for RiskSensitive Supervised Learning
Wong, William*; Leqi, Liu; Huang, Audrey; Azizzadenesheli, Kamyar; Lipton, Zachary 
Supervised learning models are increasingly used in algorithmic decisionmaking. The traditional assumption on the training and testing data being independently and identically distributed is often violated in practical learning settings, due to different types of distribution shifts. To mitigate the effects of such nonstationarities, risksensitive learning is proposed to train models under different (risk) functionals beyond the expected loss. For example, learning under the conditional valueatrisk of the losses is equivalent to training a model under a particular type of worstcase distribution shift. While many risk functionals and learning procedures have been proposed, their implementations are either nonexistent or in individualized repositories. With no common implementations and baseline test beds, it is difficult to decide which risk functionals and learning procedures to use. To address this, we introduce a library (RiskyZoo) for risksensitive supervised learning. The library contains implementations of risksensitive learning objectives and optimization procedures that can be used as addons to the PyTorch library. We also provide datasets that can be used to compare these learning methods. We demonstrate how our library can be used through comparing models learned under different risk objectives, optimization performances of different methods for a single objective, and risk assessments of pretrained ImageNet models.
Paper
Poster


Open Problems in (Un)fairness of the Retail Food Safety Inspection Process
BergerWolf, Tanya; Howell, Allison; Kanich, Chris; Kash, Ian*; Keymanesh, Moniba; Kowalcyk, Barbara; Nicholson Kramer, Gina; Perrault, Andrew; Singh, Shubham 
The inspection of retail food establishments is an essential public health intervention. We discuss existing work on roles AI techniques can play in food inspections and resulting fairness and interpretability challenges. We also examine open problems stemming from the complex and dynamic nature of the inspections.
Paper


From Soft Trees to Hard Trees  Gains and Losses
Zeng, Xin*; Yao, Jiayu; DoshiVelez, Finale; Pan, Weiwei 
Trees are widely used as interpretable models. However, when they are greedily trained they can yield suboptimal predictive performance. Train ing soft trees, with probabilistic splits rather than deterministic, provides a way to globally optimize tree models. For interpretability purposes, a hard tree can be obtained from a soft tree by binarizing the probabilistic splits, called hardening. Unfortunately, the good performance of the soft model is often lost after hardening. We systemmatically study two factors contributing to the performance drop and show that simple mitigation methods in literature do not fully address these issues.
Paper
Poster


Success of UncertaintyAware Deep Models Depends on Data Manifold Geometry
Penrod, Mark*; Termotto, Harrison; Reddy, Varshini; Yao, Jiayu; Pan, Weiwei; DoshiVelez, Finale 
For responsible decision making in safetycritical settings, machine learning models must effectively detect and process edgecase data. Al though existing works show that predictive un certainty is useful for these tasks, it is not evident from literature which uncertaintyaware models are best suited for a given dataset. Thus, we com pare six uncertaintyaware deep learning models on a set of edgecase tasks  robustness to adversarial attacks as well as outofdistribution and adversarial detection. We find that the geometry of the data submanifold is an important factor in determining the success of various models. Our finding suggests an interesting direction in the study of uncertaintyaware deep learning models
Paper
Poster


Long Term Fairness for Minority Groups via Performative Distributionally Robust Optimization
PeetPare, Garnet L*; Fyshe, Alona; Hegde, Nidhi 
In recent years fairness researchers in machine learning (ML) have coalesced around several fairness criteria which provide formal definitions of what it means for an ML model to be fair. These criteria have some serious limitations, however, We identify four key shortcomings of these formal fairness criteria, and aim to help to address them by extending performative prediction to include a distributionally robust objective.
Paper
Poster


A GameTheoretic Perspective on Trust in Recommendation
Contributed talk
Cen, Sarah H*; Ilyas, Andrew; Madry, Aleksander 
Recommendation platformssuch as Amazon, Netflix, and Facebookuse various strategies in order to engage and retain users, from tracking their data to showing addicting content. Ostensibly these measures improve performance, but they can also erode {\em trust}. In this work, we study the role of trust in recommendation, and show that trust is important to a recommendation platform's success because users are the platforms' data sources. Our main contribution is a gametheoretic view of recommender systems and a corresponding formal definition of trust. Namely, if a user trusts their recommendation platform, then their optimal longterm strategy is to act greedilyand thus report their preferences truthfullyat all times. Our definition reflects the intuition that trust arises when the incentives of the user and plaform are sufficiently aligned. To illustrate the implications of this definition, we explore two simple examples of trust. We show that distrust can hurt the platform and building trust can be good for both the user and the platform.
Paper
Poster


Optimizing Personalized Assortment Decisions in the Presence of Platform Disengagement
Sumida, Mika; Zhou, Angela* 
We consider a problem where customers repeatedly interact with a platform. During each interaction with the platform, the customer is shown an assortment of items and selects among these items according to a Multinomial Logit choice model. The probability that a customer interacts with the platform in the next period depends on the customer's cumulative number of past purchases. The goal of the platform is to maximize the total revenue obtained from each customer over a finite time horizon. We first study a nonlearning version of the problem where consumer preferences are completely known. We formulate the problem as a dynamic program and prove structural properties of the optimal policy. Next, we provide a formulation in a contextual episodic reinforcement learning setting, where the parameters governing consumer preferences and return probabilities are unknown and learned over multiple episodes. We develop an algorithm based on the principle of optimism under uncertainty for this contextual reinforcement learning problem and provide a regret bound.
Poster


Machine Learning Explainability & Fairness  Insights from Consumer Lending
Yazdi, Sormeh*; Blattner, Laura; McElfresh, Duncan; Stark, PR; Spiess, Jann L 
An ongoing debate in the credit lending domain is whether lenders and regulators can responsibly use machine learning models. Transparency is a major concern in this area, but machine learning models can be difficult to interpret. Our work is a case study of the models and explainability tools currently in use in the finance industry that advertise better transparency. In this submission we focus on the effects of the explainability tools on disparate impact / fair lending in credit underwriting. We evaluate these explainability tools on a criteria we dub Òusability,Ó which indicates a toolÕs ability to identify less discriminatory alternative models. Notably, we find that in the context of credit lending, the intuitive method of feature dropping, which attempts to reduce bias by removing ÒproblematicÓ features, does not lead to less discriminatory alternative models, and often leads to substantial performance deterioration. In contrast, automated tools that search for a range of less discriminatory alternative models can successfully improve fairness metrics. The findings presented here are a subset of results from a larger study "Machine Learning Explainability & Fairness  Insights from Consumer Lending". In the full report we investigate several other aspects of ML in financial services, including the use of explainability tools in Adverse Action Notices.
Paper


Policy Fairness in Sequential Allocations under Bias Dynamics
Segal, Meirav*; George, AnneMarie; Dimitrakakis, Christos 
This work considers a dynamic decision making framework for allocating opportunities over time to advantaged and disadvantaged individuals. Here, individuals in the disadvantaged group are assumed to experience a societal bias that limits their success probability. A policy of allocating opportunities stipulates thresholds on the success probability for the advantaged and disadvantaged group. We analyse the interplay between utility and a novel measure of fairness for different dynamics that dictate how the societal bias changes based on the current thresholds while the group sizes are fixed. Our theoretical analysis is supported by experimental results on synthetic data for the use case of college admissions.
Paper
Poster


A law of adversarial risk, interpolation, and label noise
Paleka, Daniel; Sanyal, Amartya* 
In supervised learning, it is known that label noise in the data can be interpolated without penalties on test accuracy. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the dependence of label noise and adversarial risk in terms of the data distribution. Our results are almost sharp without accounting for the inductive bias of the learning algorithm. We also show that inductive bias makes the effect of label noise much stronger.
Paper


LPI  Learned Positional Invariances for Transfer of Task Structure and Zeroshot Planning
Madarasz, Tamas J* 
Realworld tasks often include interactions with the environment where an agent's actions can drastically change the available or desirable longterm outcomes. One formulation of this in the reinforcement learning setting is in terms of nonMarkovian rewards, where the reward function, and thus the available rewards, are themselves historydependent, and dynamically change given the agentenvironment interactions. An important challenge for navigating such environments is to be able to capture the structure of this dynamic reward function in a way that is interpretable and allows for optimal planning. This structure, in conjunction with the particular task setting at hand, can determine the optimal order in which actions should executed or subtasks completed, however, planning methods face the challenge of combinatorial explosion if all such orderings need to be evaluated. Learning invariances inherent in the task structure can alleviate this pressure, allowing the planning method to recognise task segments where temporal ordering is irrelevant for predicting outcomes downstream. To enable this efficiency in planning, we simultaneously train a model to segment a task and track the changing reward function that results from the agentÕs actions, while also learning about the permutation invariances relevant for this prediction, allowing zeroshot or fewshot generalisation for complex, dynamic reinforcement learning tasks.
Paper


The Backfire Effects of Fairness Constraints
Sun, Yi*; Cuesta Infante, Alfredo; Veeramachaneni, Kalyan 
Recently, the fairness community has shifted from achieving oneshot fair decisions to striving for longterm fairness. In this work, we propose a metric to measure the longterm impact of a policy on the target variable distributions. We theoretically characterize the conditions under which threshold policies could lead to a backfire on population groups. We conduct experiments with a set of wellused fairness constraints on both synthetic and realworld datasets.
Paper


Beyond Adult and COMPAS  Fairness in MultiClass Prediction
Alghamdi, Wael*; Hsu, Hsiang; Jeong, Haewon; Wang, Hao; Michalak, Peter Winston; Asoodeh, Shahab; Calmon, Flavio 
We consider the problem of producing fair probabilistic classifiers for multiclass classification tasks. We formulate this problem in terms of ``projecting'' a pretrained (and potentially unfair) classifier onto the set of models that satisfy target groupfairness requirements. The new, projected model is given by postprocessing the outputs of the pretrained classifier by a multiplicative factor. We provide a parallelizable iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. Comprehensive numerical comparisons with stateoftheart benchmarks demonstrate that our approach maintains competitive performance in terms of accuracyfairness tradeoff curves, while achieving favorable runtime on large datasets.
Paper
Poster
