How can AI contribute to a more fair tax system or better climate change agreements? Recently, Multi-agent reinforcement learning has made significant progress in modeling complex economic systems and one of the main drivers of this research is Stephan Zheng, who grew up and studied in the Netherlands and moved to the US to pursue a distinguished research career at the crossroads of AI and Economics.
First of all, I'm really curious to know how you got to where you are today. You started out in theoretical physics in the Netherlands, and I know that great physicists can almost do everything. But what was your path to get into AI; and more precisely, into using AI for economic modeling?
It's been a while! But yes, in my undergrad studies, I started doing math and physics and I was really fascinated by trying to understand the universe. Physics is a fascinating field because you can make models and predictions about things that you otherwise can't really see. It's mind-boggling, how the universe actually turns out to work! But at some point, it became very abstract. I found myself studying theories of, for instance, what the interior of a black hole might look like.
People have come up with these beautiful models called string theory, which study the "extreme universe", such as what happened during the Big Bang and what happens inside black holes. This is fascinating, but at the same time, it also became clear that a lot of those models won't be testable anytime soon.
So in my PhD, I started looking for something else. And I was fortunate that this was about 2013, when I started to think about what else to do research on, deep learning really crashed onto the scene.
A lot of great researchers in AI are motivated by wanting to understand the human way of thinking and human cognition. But what made you pursue this use of AI methodology for economic modeling? It's yet another perspective on human behavior, which is quite external - you're looking at groups rather than individual humans.
If I look at my personal research history, there are two phases. My PhD advisor had a very unique data set of NBA basketball players, so I started with that. The data set was very high-resolution, and had this “team element” in it, because you had two basketball teams competing with each other. With deep learning, some of my early research was about trying to deconstruct levels of human behavior. You could ask the computer, “If you were Kobe Bryant and you got the ball, where would you go? Where would you run to? Or would you pass the ball to your teammate?”
The resolution we had at the time was captured at about 30 frames per second, more or less. So you had this pretty high-resolution video (or rather, tracks of XY coordinates in time). It turns out that deep learning was really good at modeling human behavior from this data.
Part of my research was looking into decomposing that behavior in terms of, for instance, short-term and long-term decision-making. Towards the end of my PhD, I became interested in modeling teams and how people cooperate. When I became a researcher at Salesforce, I continued that trend, as the collaboration was fascinating. One reason for this is that humans and computers communicate with chatbots, for instance. A lot of intelligence, in other words, is not in a vacuum. Intelligence is usually formed in a community where we're constantly communicating with each other or responding to each other.
So generally speaking, you became very interested in this multi-agent view of AI?
Indeed. This rolled into economics because in economics, people think a lot about the incentives of human behavior. Not just like, can you use deep learning to predict what the human is going to do, but why do people do what they do? Why do people cooperate? Why do people in a basketball court go to the left instead of the right? So, I started doing economics to study these incentives.
The old-fashioned economics theory is they do everything because they're perfectly rational beings, given the information that they have, isn't it?
Yes, in traditional economics, rationality is a very basic assumption. Rationality means something like the human being like a perfect computer, which knows exactly what numerical objective it should optimize, and then finds the optimal behavior. But that's not always true, seemingly, in the real world. Well, people do have models of bounded or imperfect rationality, but they're not so prevalent.
Does this naturally lead to the use of multi-agent reinforcement learning systems that you've later made use of?
Yeah, it's a very natural framework for many reasons. In multi-agent reinforcement learning, you think of the world in terms of a simulation and all the agents in the world are controlled by a reinforcement learning model. This is attractive because you can use reinforcement learning to ask questions like, if I give you a numerical objective, what is the optimal behavior?
AI people have shown that reinforcement learning methods are very good at finding quite good behavior, even if it's not always clear if you find the optimal behavior, theoretically. But the thing that's conceptually interesting about reinforcement learning is that it's very flexible. So if I change the numerical objective from a purely rational objective to a boundedly rational one, there's no problem for reinforcement learning.
In your research, you showed that if you optimize government policy using reinforcement learning, you can go away from the traditional objectives that economists have been using, right?
Yes. The traditional view on economics, or government policy, is that the government is maximizing a sum of everyone's happiness, everyone's utility. The government supposedly can add up everyone's happiness, and every person has a number - the higher the number, the happier we are - and the government can add up all these numbers somehow and then find the best policy that optimizes the sum of all those numbers.
But that's unrealistic for many reasons, isn't it?
Indeed. In our own research, we wondered, what if we just try to optimize directly for things like economic growth or economic equality? That's the main idea behind our paper “The AI Economist”, which got a lot of attention, including from mainstream press and economists. We showed that this way of optimizing policies might lead to very different outcomes.
I'm happy with this work for many reasons, because one of the interesting features of reinforcement learning is that you can ask it to optimize a policy for what people care about, not some fictitious objective of total utility, which is immeasurable. We can't measure total utility, but we can directly optimize for economic equality and economic productivity, and we can measure those things. That's also what people talk about, right?
Can you briefly summarize the outcomes of your policy optimization? What kind of tax policies did your algorithms discover?
A. Yes. The AI tax policies in this work are not like polls that are offered to students, or classic economic tax policies. What we do is, we have this simulated economy that is complex enough that you can't use economic theory to ask questions like, what is the optimal tax policy in this world? The AI finds that there's this sort of “up and down” tax schedule that's optimal, or at least it does better.
What do you mean by up and down exactly?
Classic economics usually predicts a progressive system, where the more money you make, the higher the buckets of money are taxed. So for instance, the tax schedule says if you make $10,000, over the first $2,000, you give 10% to the government, and then in the next bucket, the next $2,000, you might give 20% to the government, and so on. But in this case, because the simulation is more complicated, there's an up-and-down schedule: at the lower end of your income, sometimes you actually pay higher taxes, then there's this valley in the middle where there's a part of your income where you pay very low taxes, and then at the very high end, again, you start to pay high tax rates again. That's a bit counterintuitive.
But wasn't it kind of baked in to your loss functions for your reinforcement learning? Or was it really surprising to you that you got that outcome?
We didn't predict that this would be the result — this up-and-down tax schedule. And I want to be careful, I'm not saying that we should be implementing this in the real world all of a sudden.
But still, the connection with the objective function is very interesting. It has something to do with the fact that we're optimizing for a combination of equality and productivity. In the simulation, the government uses the tax money to give back to the people, so there's a balance that this tax policy manages to achieve, with two forces at work: if you lower the tax rate, as a government, you make it more attractive to work because you'll pay less in taxes. But at the same time, if the tax rates are high, the government can raise more revenue and have more money to redistribute.
One way to interpret this up-and-down schedule is that there's some balance between these two forces where at the low end, you can raise the taxes, in fact, because you lower the tax rates at a slightly higher bucket. So these reinforced monetary agents start to realize that if they "just work a little bit more", they can hugely benefit from low tax rates. It's more attractive, so to say, to work a bit more and to end up with your income in this valley, to really benefit from the low incomes there.
Now these AI policies are hitting the real world. What were some of the first reactions and what is your assessment of their success in the real world? Do you think AI policies should be implemented in the real world? How does the confrontation with real economic policymaking go?
That's a huge topic and there's a lot involved with it. We certainly shouldn't deploy the AI policies that we've come up with so far directly into the real world for many reasons. Economists have a mixed response to it, it's like this sort of Rorschach test, I'd say . Some economists point to it from a more computational point of view and say that modeling the government's policy and adaptation of the agents in the economy with reinforcement learning is very novel and something classic economic theory can't do. It's clear that we're solving a very hard problem. But then, what does the solution mean that we found? How do we interpret that?
That's hard because most economists, when they do economics, they do think like a physicist. They want to understand how the world works, so they want causality. The Holy Grail is to have some sort of causal model of, if I do this, then this happens in the economy. Or, this result or economic growth can be explained by these causal factors, like population growth or unemployment rates.
Then many will lash at the simulation. They will say, this economy is maybe very simple or you didn't include population growth in this simulation. That can lead to a lack of realism, or too much realism. So the question is, are the policies that you come up with relevant to the real world? If there's a mismatch between the simulation and the reality, that's the sim-to-real transfer problem, which is still unsolved. Can this be used in the real world? What are the conditions under which you have something like robustness, if the world is not like the simulation, would the policy still work? That's very much tied to the realism and complexity of the simulation.
What kind of scale do these simulations operate on? And how many agents are you modeling in work like this?
A. In this first paper, we went up to 10 economic agents, so it's a very small scale in some sense. But this paper was focused on the methodology, so the government uses reinforcement learning, and all the economic agencies use reinforcement learning as well. Multi-agent reinforcement learning is a very hard problem, so we had to do a lot of tricks to make the machine learning work.
We have follow-up work where we've made the simulation bigger. With some interns, we added companies and had somewhere between 100 and 1,000 economic agents in the simulation. We also developed open-source simulation software called WarpDrive, which lets you run these economic simulations with thousands of agents on a GPU.
Does the economic modeling outcome change a lot with a bigger and more rich toolkit?
Yes. The level of realism goes up because the diversity among agents in the simulation increases. My intern Michael Curry found that the AI policies outperform classic policies when there are more economic agents. The policy won't look exactly the same, but if you look at the numerical objective, like equality of productivity or total utility, you will beat the economic baselines.
From an economic modeling point of view, these models can be more realistic, and you don't need massive sizes for that. Even in the 10-agent version, there's a notion of skill differentiation.
This is one of our most recent ongoing projects and it's open to the public, it's called the Act for Global Climate Cooperation. We're asking how countries can work together more, what are the right incentives and mutual agreements that need to exist to incentivize countries to work together on mitigating climate change? We made a simulation of the real world, where we model the economy, but also the global climate. We have multiple countries, with their internal little economy, but they can also trade with each other and talk with each other.
When it comes to climate change, the scientific doubt is gone, we need to find ways to mitigate the rise in temperature in the world. What was unclear in climate change research or policymaking is why countries would coordinate their efforts together and stick with their agreements?
Climate policy and climate negotiations in the real world are decentralized. There's no central authority, like a police officer, telling countries to talk to each other, send money to other countries, and invest in green energy, with punishments for those that don't. That's not how the real world works.
It's a negotiation game with lots of complexity. It's easy for any sort of agreement to arise, but it's even harder for countries to stick with their agreements due to domestic and global politics. The IPCC's climate report from last year started to look into these questions and asked what happens if certain countries follow through with certain climate policies and how that impacts climate simulations.
There's a whole question around what kind of policies countries have and their impact on the world. It's been shown that the impact on global temperatures can range from 1.5 centigrade temperature increase in 50 years or 100 years, all the way to 5 degrees centigrade - so it's the difference between life being OK or life being hell, essentially. It's clear that global cooperation is really crucial to make it attractive again and provide the right incentives for countries to execute certain climate policies.
In this project, we're asking if you can use simulations to study the effect of certain negotiations. Can you study the agreements countries should make, show using simulations that incentives are correct, and then eventually, what does it mean for the climate? That's what the project hopes to contribute.
Around the same time as your work, people from other research groups, like Meta and DeepMind, started working on multi-agent systems playing negotiation games, such as Cicero from Meta. Agents were using language models to do the negotiation, being persuasive or hiding certain facts from the other players. Now, with GPT-4 becoming more capable of generating language, reasoning, and long-term planning, how do you see the parallels between your work on climate negotiation and work on diplomacy? How do you see this evolving in the light of language models becoming more powerful?
On diplomacy, that's amazing work. Our project is asking the inverse game theory question. In diplomacy and Cicero, they were given the rules and all the pieces and used language models and reinforcement learning to find it. We're asking the opposite — what are the right rules and what types of negotiation lead to more cooperation?
In terms of language models, it's fascinating to see how language can inform strategy and help us negotiate. Language models can be useful in different parts of the problem.
Firstly, when you think about negotiations, there could be a lot of texts out there about historical outcomes, contracts, results, and behaviors of countries after agreements were reached. Language models can read all of that, and then suggest other types of negotiations, or things to say during a negotiation, or come up with contracts. This could provide a strategic planning element, as the language models have read all that historical text.
It can also be useful in terms of training agents for this climate negotiation game. For instance, if a country makes their own Cicero bot, it can help them negotiate in upcoming climate negotiations. It has seen all the texts, and can provide insights on what has been successful and what has not. These large language models are able to analyze the corpus of texts more quickly and thoroughly than humans could manage.
Do you see a future in this line of work where AI models, under the supervision of humans, are able to act as strategy advisors in complex multi-agent games? One of the main concerns is still hallucination — the fact that they can dream up facts. If you're talking about climate negotiations and optimal strategies, you would want to make sure that whatever the language model is saying is based on real facts and that its logical inferences are correct.
When you want to use it as a strategic advisor, you have to deal with the hallucination problem, and that's something we'd need to do more research on. Additionally, ideas are cheap, execution is expensive — it's about the little details. For instance, the AI bot might suggest negotiating harder on mitigating or cutting fossil fuels, but it's not so clear that you can always get the level of detail from the model in a reliable way. Are the suggested ranges of fossil fuel the optimal ranges? Or is it just something that the model came up with?
Wrapping up, what are the next steps for this AI for global climate cooperation? Are you guys organizing some workshops or some next steps?
On April 26th, we're going to have a workshop that's going to do a review of the first phase. In the first phase, we had this competition and multiple teams who were working on this problem, and then they were to submit their solution and write up an explanation of what they found. On April 26th, we're just going to have a little get-together to have some talks with some of the teams, and have a retrospective of what we've learned.
In the next phase, we're still brainstorming about it a little bit, but there's definitely a big appetite to take this further. We've had a lot of communication with various experts — people in climate science, economics, and political science — who have looked at the simulation and have given us their take. High level, the next step is to have a group of people who are all really interested in sort of one specific framing of what we could do with this climate economic framework and AI, and then to work together on that.
Alright, that was it for today, thanks for your time Stephan.
Thank you for having me!
The AI economist (paper): https://arxiv.org/abs/2108.02755
WarpDrive framework: https://github.com/salesforce/warp-drive