Fair Ranking with Biased Data - Interview with Thorsten Joachims
01-03-2020, by Jakub Zavrel Starting with his PhD at the University of Dortmund in the late 1990’s, Thorsten Joachims has pioneered and popularized the use of Machine Learning in NLP and Search. One of the first to focus on the role of user feedback to improve search rankings, as a professor at Cornell University since the early 2000’s, he and his students have enjoyed considerable success with work on a variety of theoretical and practical problems in ML, Search and Recommender Systems. Increasing awareness of the mixed effects of these systems on society has led him to turn his attention more and more to the issues of bias and fairness.
After his Search Engines Amsterdam talk “Fair Ranking with Biased Data”, we were lucky to have the opportunity to ask Thorsten a few questions about his current research interests and views.
JZ: Thanks for the great lecture, Thorsten. Maybe you can reiterate for us, in layman's terms, a few of the points that you made. How do you see where we are now with AI applications, and how has that influenced your research agenda?
TJ: So I've been working on online systems for a long time. And I think those are the first kind of AI systems that actually proved themselves in the real world. Maybe we don't think about them as AI any more, but search engines, recommender systems have come a long way, and they are amazingly good these days in how they actually understand people, in their respective domains.
One kind of conceptual shift that influenced my research in the past few years-- both on the technical side, and also on the conceptual side -- is that these systems really intervene in the world. Whenever a search engine presents a ranking, that's an intervention. When an e-commerce application presents a certain ranking, that's an intervention. Those actions have tangible effects, so thinking about these systems from a causal perspective, that these are really intelligent agents that are acting and that affect the world, is an interesting conceptual realization. And it also opens up a whole interesting set of methodologies coming from causal inference to understand and build these systems in a more rigorous way.
So if you think about these systems from this interventional, causal point of view, I think it's very natural to then realize that we have to think about the effects in this much broader context. What are the effects on the various groups of stakeholders that our ranking policy has? The systems themselves have never even learned about that, have they?
There's a number of examples where the kind of intended effects were not what we thought. I was one of the people who wrote the first papers on how to learn from click data. And it seemed like a great idea, right? It has this participatory aspect to it, that we would actually give everybody a voice in it. We kind of leave it up to the users to define what is the right answer to a query. Now, in a lot of cases, that actually works very well, but in some cases, it spirals into conspiracy theories. I certainly did not see this coming.
Part of this comes from a first-order approximation of ranking systems that we have traditionally made -- namely that we can treat each interaction in isolation. There is a query, the system responds, and then there is an immediate utility. This is naive in how it ignores dynamic long-term effects. For example, it has ignored that it actually changes people's beliefs, and that there's a dynamic to these systems that we also have to take into account. For many systems, we use first-order approximations of what the system is supposed to do, but these kinds of dynamic effects we really haven't fully considered in the design of our systems. And so I think there's big challenges ahead that we have, especially if we move into new applications that have even bigger societal consequences, like hiring and other matching-market applications.
JZ: In your talk at Search Engines Amsterdam, you argued that it's really important to address the two sided nature of these markets, that fairness is really a two sided affair. Can you maybe elaborate on that a bit more?
TJ: Underlying most algorithms and the way we train them is still a very conventional view of what ranking systems are supposed to optimize, namely that we want to be maximally useful to the person typing the query.
In many domains, that's good and that's sufficient. If you want to find a book in the library, that's certainly the main objective, but if you are mediating, let's say, an e-commerce site, then there are interests that the customers have, they want to purchase something, and we want to present them a good ranking. But there's also the sellers of the items that have an interest in their items being found, and they have an interest in being treated in a fair way. Two people or two sellers that are both equally competent, sell items of equal quality, they should not be treated in a different way. I think we perceive that as the minimum level of fairness, that you're not arbitrarily discriminating between people that have similar merit.
Whenever we violate this type of fairness considerations, there’s obviously an ethical problem, but also a problem in the self-interest of the platform. If we are creating an environment that is not operating in the best possible way, people that feel unfairly treated may cease to participate and may leave. So, platforms should be interested in creating the kind of two sided markets that are actually perceived as fair. The examples that I used in the talk are the most egregious violations of fairness, where you had two job candidates, or you had two items that were actually equally good, but by perturbations of probability estimates or by a small bias against minority users, they were actually treated in very disparate ways. Again, that's really one of the lowest common denominators of fairness that we can agree upon: that two items, or two people that are equally meritorious, should not be treated in very disparate ways.
JZ: So if you try to translate that into more technical terms, what would you recommend as the most practical and applicable approach that AI developers can actually embrace to ensure fairness?
TJ: Ah, this is a very difficult question! I think there is no general rule for what it means to be fair. I think what's fair or desirable in a newspaper is probably different from what's fair or desirable in a job-ranking application. So, fairness depends on the applications, including their different legal and regulatory requirements. In each application, what we consider fair requires a broad discussion. It's a policy choice. I think we're at a stage where we have a number of different methods for enforcing various fairness criteria, but I think it's still a space that's maturing.
Many of these fairness criteria in two sided markets are criteria of fairness for the items, and for the sellers, and fairness of exposure is one of the key concepts. That could be defined differently for different applications, for example, to account for differences with different user groups, and different item groups, but I think a big unified method we just don't have yet.
JZ: Originally, these sort of ranking systems in Information Retrieval came from library scientists looking for scientific knowledge, research, papers, books. How do you think fairness issues affect this area and the systems that we are using to surface relevant research work in this ocean of new work being published every day?
TJ: Right, so we're using more and more recommender systems to help us discover and stay on top of research publications. And I think there's both a danger as well as an opportunity. The danger is that recommender systems just perpetuate old structures, where what you read depends heavily on the institution of the authors and who you went to grad school with. The opportunity is that recommender systems help us broaden the reach of which literature actually gets read. Do we actually manage to get these new systems to give exposure based on merit of the work?
At the KDD 2020 conference last summer, we built a recommender system that recommended papers to participants, and we actually built some fairness mechanisms into this that we wanted to try out, that would kind of lead to a broader distribution of exposure for innovative papers.
The key point of this system was to model the uncertainty that we have about how relevant a paper is. Paper with particularly innovative ideas may fare badly under standard ranking methods. The issue is that especially content based techniques are more confident about which paper is relevant to which person, if it's an established topic. If I am interested in established topic X and there's a new paper on topic X, it is easy to make this match. But what if there's an exciting new paper on a new topic Y, it may not be clear who it is relevant to. The idea was to allocate exposure to these novel topics in a fair way, that does not drown them out among the established topics, and that is desirable for us as a scientific community.
Just another example that there are many different aspects of fairness:how do you deal with uncertainty?
JZ: What's your personal approach to stay up to date in your own field, in the surrounding areas and to be knowledgeable about the progress of AI?
TJ: Staying on top of the field is very hard, since the number of new papers is so high. If you just take NeurIPS and ICML, last year, roughly 3000 papers got published. There's just no way you could read them all. That is almost a prototypical application scenario for recommender systems.
So I do use recommendations that I get from various systems, but actually, quite frankly, it's, you know, PhD students, undergrads, and collaborators, it's, mostly social, yeah. However, if I compare how broadly I can discover papers now to 20 years ago, I feel that search and recommender systems have made a big difference.
JZ: You started your academic career in Europe and moved to the US. What do you see as the main differences, and what does Europe need to play a global role in AI in the future?
TJ: I'm actually really out of the European scene at this point. So I'm unsure if anything, particularly. At the time my wife and I were on the job market for academic positions, US universities were much quicker to act. And that's why we are where we are now.
I think it can be a strategic advantage of Europe that it is actually willing and able to set up regulatory environments and lead in that way.
Investments in basic education and research are another very important aspect. Machine learning and artificial intelligence are one of the top funding priorities in the US and other nations.
And so we're actually educating a lot of people with the skills that they need. The machine learning class that I teach at Cornell is highly oversubscribed, it's a huge class.
But I think the public perception, at least in Germany, is not the same, but that may be different for different European countries. So there may be the need for an education campaign in the sense of getting bright students into computer science and AI, who can then become the next generation of leaders in AI.
JZ: Right. Thank you for the interview, Thorsten. So I will close by seconding Maarten de Rijke’s comment that for strengthening the European AI ecosystem, we would certainly love to have you back sometime for a visit in Amsterdam.
TJ: Always love to visit Amsterdam, thank you.