AI research is truly global. We recently had the opportunity to interview one of the rising stars of the Indian AI scene, Ramit Sawhney, now the Lead AI Scientist at ShareChat. Partially an autodidact, he has set up his own research collaborations with IIIT in Delhi, Georgia Tech in the US, and University of Marburg in Germany. These collaborations have resulted in a wealth of papers at top tier conferences. In his previous job, Ramit was a Technical Lead at Tower Research Capital where he worked on AI for Trading Strategies. Interview by Jakub Zavrel, September 2021.
Hi Ramit, it's a pleasure to have the opportunity to have this interview with you. We have quite a lot of readers who are doing research or engineering work in NLP and finance. Your work is fascinating because you sketch a path for AI systems to learn an end-to-end system that basically reads news, financial reports, social media, and actually generates profitable trading strategies (Sawhney et al., 2021).
Figure 1. Quantitative Day Trading from Natural Language using Reinforcement Learning, and related work, visualized on Zeta Alpha.
First of all, can you tell us a little bit about your background, and what was your journey to get into NLP and AI?
I finished my undergraduate degree back in New Delhi, India about three years ago. In India, we have to do something called an undergraduate thesis, which is pretty much six months of work, where we would hope to solve more of an open problem. I picked the topic of suicide ideation on social media, because I wanted to tackle a problem that has a real world impact, as opposed to something that's more theoretical or mathematically centered.
And that was also an easy way, starting with an application, for a beginner like me. I come from a background where we've not had formal courses, or a Master's education in AI or machine learning. It's pretty much all been self taught. A good propellant, in terms of how I've learned, has been the extremely inclusive NLP community that we have. The first paper that I had back in 2018, I submitted to the ACL student research workshop. And attending the student research workshop back then was the gateway for me to understand how inclusive the community is, how willing everyone is to help each other out, to review work, to give feedback and to learn much more. So I saw the kind of work other people did at ACL, and that it's not limited to a specific domain, they're welcoming a lot of different domains, a lot of different ideas and thoughts, and different people from different backgrounds. And that's where I felt that this is the community I'd like to be involved with.
Since 2018, I have been doing a full time job that deals with the financial aspects of algorithmic trading, which is how I started my journey from NLP all the way to using NLP for finance. And over these three years, I feel that although we've made a lot of progress in NLP, as a field, and we have made a lot of progress in finance, there's an underexplored section or intersection of AI and finance, specifically NLP, which seems to be something that's booming. So more recently, we've been fortunate to have papers at pretty much all of the big NLP conferences, ACL, NAACL, EMNLP, and the most common feedback I get from people is we've not seen something like this, it seems quite different. And that's been the motivation for me: I want to set an example that we can come from a background where we may not have formal education, we can come from a background where we do a certain kind of application or a domain of work.
The reason why these papers seem to be interesting and seem to have got good reviews or traction, is I think, the fact that they bring a very new and distinct perspective to how the NLP community works at the moment.
Part of it is also, and this is one of the important things I've learned, there's a lot of power in teamwork. Although I have many papers, the people that I've worked with each of these papers are from different groups. So often, you know, people can complement each other's strengths. In certain cases, people are able to help with more aspects. I've had great people who I've worked with and their support and their complementary skills to what I bring to the table is what's helped us handle multiple papers, despite, you know, strict deadlines and all kinds of barriers.
So if you look at the current state of AI and NLP, what kind of topics are you most interested in these days?
There's definitely a part of me that's very interested in the kind of work that has measurable real world impact. Finance is a good example. We can very easily measure the kind of impact that we're making, we can see in terms of how profitable our strategies are. Another line of work that's very interesting to me is studying mental health on social media. I think we've come an extremely long way from what was being done three years back and what's being done at the moment. In fact, with things like COVID, with more and more people using social media, filtering even to smaller cities, more remote areas, there's a widespread usage of social media. And definitely, although it is a great platform, there are often aspects like suicide ideation, hate speech, misinformation, which can have a significant real world adversarial impacts on all of these different communities that exist online.
From a more technical perspective, I'm definitely interested in multimodal aspects. One of the papers we had in NAACL was focusing not just on the textual aspects, but also combined speech and audio processing. I think the world is definitely multimodal, when we talk about emotion, we talk about sentiment, we talk about any of these problems, although we'll try to isolate the modalities, often they exist together. All of these problems deal with multiple modalities, multiple tasks going hand in hand together. And I feel one of the most interesting fields to me at the moment is studying all of these tasks from a more multi-modal perspective, what is the best way to generate not only individual representations, but what are the best ways to combine these representations for solving specific problems.
Let's zoom in a little bit more on your finance and trading paper. Can you please explain the core idea of the paper?
So the core idea is rather simple. I'd say that financial models date back to the 1900s. Finance and stock trading has been going on for a long time. So we have a lot of models that work extremely well. Most of these strategies, models like simple linear regression, fancier versions of regression models, more quantitative models, have been doing really well. So the first question is: are there any cases where they didn't do really well? The interesting thing is, as we see social media grow, we see that a lot of people are now using the internet and using social media as a platform to talk about finance and stocks. A recent good example is the short squeeze that happened with GameStop. So conventional financial are not aware of what's being discussed in Reddit, for example. Another example in the paper is this tweet from Elon Musk a while ago, where he said something like that the Tesla stock price is too high. Now this is just a tiny tweet. From a conventional financial model perspective, it means nothing. But such tweets hold immense power. Social media is extremely accessible, it influences a lot of people. So the impact is huge. It impacts and influences a huge amount of people who are now led to act on this tweet, for example, talking in quantitative terms, this one tweet that is literally six words that Elon Musk posted, it led to a stock price reduction of more than 20% in a few minutes, and the impact over a week was a loss of around 14 billion in terms of market valuation for Tesla, right? So this is exactly what motivates our work. We wonder that beyond looking at historical financial metrics, are there other sources of data that perhaps the market is reacting to? So this motivated us to look beyond just numbers. We thought, why don't we try to build a stock trading model that also reads tweets, and we can quickly start processing tweets trying to get a sense of what the market feels towards these kinds of stocks? And based on that make a predictive trading strategy. Is the price of this stock going to increase or decrease? And then try to take the most profitable decision.
So how do you deal with the variable and large amount of text on social media in your model? And how do you model influence versus content?
RS: There are two parts to this. So the first is a feature design kind of perspective. So when we look at tweet based features, we don't just look at the content of the tweet. Rather, we augment this with samples such as the number of tweets, retweets, comments, likes, and the number of followers of the person who's posted the tweet. Adding these kinds of features is an explicit way of adding some kind of bias to the model, where it would most likely be prompted to give more importance to the kinds of users which may have a greater influence.
Looking at content features, we compare simple models like Glove, or even N-gram based approaches and see how well BERT compares to these, and the difference is more than 40 to 50%, in terms of profit. So BERT does a better job than some of the other language models in understanding social media. For example, people would often talk in more slang, people would often use short forms, people would often have typos in what they write on social media, like Twitter, as opposed to maybe news articles that more conventional models like word2vec have been trained on. And then the second aspect is that as BERT is a more contextual model, it's often able to understand things from a financial nuance. So for example, when we talk in layman terms, the word ‘bull’ often refers to an animal, which is what more conventional models like glove and word2vec make of it, but when we think of bull from a financial perspective, we will say something like: “the Apple stock has been on a bull run”. In a financial context, this means that the Apple stock price is increasing, and we expect it to increase for the foreseeable future. So models like BERT that have this inbuilt contextual understanding in their training, are able to pick this up and definitely have a much bigger impact in terms of how well the model learns about the actual financial nuances of predicting what might be the next best profitable decision to make.
And the good thing with financial data is we don't need to look for human annotation or manual labels. If we look at all the tweets that have been posted yesterday for a specific stock, I can go to the stock market where I have public information that the stock price increased or decreased. So in essence, you already have some kind of indicator that these tweets are representative of a stock price increase or decrease. At this point, you can model this like a regression or a classification task, where given a set of tweets, you need to predict that after all of these tweets were posted, the stock price increases or decreases. This binary classification task, which is something that has been done by a lot of the existing works that we built upon, we use this classification task as the fine tuning task for our BERT model before we use the fine tuned embeddings for our reinforcement learning model. So the [CLS] representations in a way get fine tuned, and we use these [CLS] token representations for the reinforcement learning model.
It's a fascinating idea, to use this directly for trading. What people want to know, of course, is does it actually work to make money in trading?
Yes. We've seen results that have been even better in terms of profitability than what we got in the paper. Part of that reason has been because more recently, for 2021, social media has been a big driver of a lot of stocks. It started with things like GameStop, and now we've reached a point where most of the cryptocurrencies are being discussed on social media. Now, a drawback and limitation of not just our model, but NLP for trading in general, which limits the usage of such a models in an industrial setting, is the fact that in industrial settings, we often look at more low latency trading or high frequency trading, where trading often happens at, you know, hundreds of nanoseconds or microseconds. But as you would imagine, even reinforcement learning models are generally deep neural networks. The inference time is something that, you know, sits at around a few seconds or milliseconds. So there's definitely a time gap, which I think we're making progress towards with hardware, as well as with newer, more compact architectures, I think we'll reach a point in the future where inference and retraining times of such models would reduce, but at present, it's not yet competitive with the ultra low latency models that are more commonly used in the industry.
Do you see a kind of arms race developing between financial institutions and investors towards fully AI based trading? And do you think that human judgments still add something special?
It's a combination of both. I think we're at the point that AI based strategies alone might be able to beat what a human's judgment would be. But we should be making AI more interpretable, where it can start complementing human decisions, because on average, the model does well, but there might be certain decisions that could be extremely risky. So the overall stance I would go with is, yes perhaps in the future, we're making progress towards it, but at the end of the day, AI without having a human component in it is something that might be a very risky strategy.
In some of your other papers and work you focus more on social media analysis in general. What role do you see for NLP and AI to automatically monitor and filter and interact with people on social media?
Here, my stance is slightly stronger. Even in the future, these should not be systems that are just operating in isolation. systems such as suicide ideation, or hate speech moderation, you know, we could have mechanisms like perhaps flagging the tweet, perhaps, you know, sending it to some kind of moderator, or maybe shortlisting them, or creating some kind of priority based assortment, where at a future point of time, these would go through a human moderator, specific to suicide ideation detection. We've worked with a lot of people in clinical psychology who are experts in this kind of domain. And given the sensitive nature of such kinds of tasks, it becomes important that we need to have some kind of human interference.
The goal here is for AI to not replace humans, but to make their life easier and to assist them. A very recent example of this is a chatbot for mental health that was created leveraging GPT. And when GPT was prompted by a user that asked if they should kill themselves, the language model suggested that killing themselves would be a good idea. Now, the problem with such models is we don't have complete insights into how they operate. Interpretability and explainability is still work in progress. So it's kind of difficult to know at what point can these models go rogue? You know, even having something like 99% accuracy for suicide ideation detection might sound great. But if we have the model, even miss predicting one case, that's possibly one life in danger. So at that point, having these models running completely automatic is risky. Whereas having that one layer of human intervention for some of the more extreme cases, would be a good way to prevent such cases.
And this is something that I've been working on with the University of Marburg with Professor Lucie Flek, one of them is around privacy. So commonly, what I've learned over these three years is the more data you feed to these models, the better they become in terms of understanding. But then one drawback here is, the more data you're feeding, the more privacy you're taking away from a user. At this point, what happens is not only are we directly infringing a user's privacy, but we're also allowing language models to expose this to other people. Recently, I think a study from Google if I'm not wrong, showed that if you probe a language model like GPT with someone's name and address, the language model was able to retrieve their social security number and it was able to retrieve their house address. At this point, these models are not just infringing privacy, but are also becoming dangerous.
There was a tool back in 2016 on Twitter and the mission behind it was great. It was a tool to keep track of the mental health and just make sure the well being of the people you love is going fine and just kind of monitoring them. But what started happening was these deep learning models started collecting more and more information about those specific users. And at a later point, adversaries on social media started using it to identify people who could be bullied or to target these specific people who might not be in their best mental health. And that ultimately led to the downfall of such models and Twitter prohibited them. I'm seeing there's a recent surge in people working on ethics and privacy, and most conferences, like ACL, EMNLP, and NAACL, all have separate ethics sections in their reviewing forms, and they have separate ethics committees. The goal of this is to make sure that we're not only just chasing the best performing models, but we're also analyzing things from the perspective of ethics, fairness, bias, and these kinds of aspects.
Do you think that these kinds of safety mechanisms can be built into AI systems like for example the European Union is now trying to push for? What do you think of the technical approach of building such safety measures?
I think it's a good step. Differential privacy is one of them, that seems to be the most recent thing that's working where, you know, in short, we kind of make a trade off between how much, or how great of a performance we want, as opposed to maybe getting a formal guarantee on the privacy infringed by the kind of data. So we are making advances where again, I don't think we're at the point where AI alone can do this. In some cases, we might need humans for more sensitive applications. In some cases, like maybe fake news where the negative impact might not be as harmful as mis predicting suicide ideation. In such cases, it's often good to combine AI based systems with more conventional rule based systems, for example, and I think more, the more common word here is an expert system, which often combines the intelligence that AI brings, but it combines these with heuristics and rules that have been defined by human experts, combining these is a good way of making sure that we're not just allowing AI to completely go rogue and potentially make incorrect decisions or judgments.
One last question that we have to ask: What do you do to keep up-to-date with all the progress in your own field and to keep track of what's happening in AI more in general?
There's an exponential rise in terms of everything that's coming out. And all of these things are extremely filled with valuable insights. I use a combination of two things. So the first is just playing on social media. There are these bots on social media that keep posting archive links to some of the best new papers. But more recently tools like the Zeta Alpha platform and Semantic Scholar, have come with some more progress in terms of searching, aggregating information, and summarizing. It's often difficult to read 10 papers, or even five papers in a week. But it's very easy to read a one or two line summary or maybe just the abstracts or search by specific keywords, filter these based on the kind of content or how related they are, If you type the name of a paper, and we discover several other papers that are very recent, which are related to this. And more recently, this is something that all conferences are doing now as well, when we try to explore papers, it's no longer just a list of their titles with the author list. But rather, people are representing papers and all of these things more as Knowledge Graphs, and traversing these through interactive tools like Zeta Alpha and Semantic Scholar. These have been the best way for me to keep in touch with what's the most relevant research to what I do, rather than just looking at everything that's new.