From emailing Jupyter Notebooks to Comet ML - the entrepreneurial journey of Gideon Mendels
Meet Gideon Mendels, CEO and co-founder of Comet ML, an MLOps vendor that has built a popular platform for experiment management, model management, and model monitoring.
Hi Gideon, can you tell us about how you got started on Comet ML, and how you define what the company is doing right now?
I'm originally from Israel and I started my career as a software engineer about 16 - 17 years ago and switched to ML about nine years ago, going to Columbia University in New York in the United States as a grad student, doing research in speech recognition, low resource language models and motion detection.
After that, Nimrod — cofounder also in Comet ML — and I started another company where we built a large-scale chat analytics app. We processed over a billion chats, mostly from applications like WhatsApp and Viber, Group Me... and deployed over 50 ML models in production doing mostly document classification tasks. That app really blew up from a user perspective, but it didn’t make so much sense in terms of monetization. So after that, I joined Google, to work on Deep Learning research, specifically on hate speech detection on YouTube comments.
When I joined Google, I worked with a team on hate speech detection, and the team already had a model in production there. One of my tasks right after joining was to try to build a better model and try to beat the production baseline. The first thing you ask typically when you're trying to beat a model is, well, what is that model that I'm trying to beat?
To my surprise, I had a very hard time answering that question. There was no system of record of the experiments they had done before. There were some slides and some emails, of course, but it was very, very hard to figure out exactly what that model was, and what data it was trained on, which parameters, and so on.
I was really surprised, because I've seen these challenges before in academia, but now at Google, arguably the company with the best developer practices in the world. In the first two weeks, they teach you how to check in code and how to style it; they're excellent at it. But then you join a Machine Learning team and someone emails you a Jupyter Notebook.
That was my first insight that this is a much bigger problem: if companies such as Google haven't solved it yet, I was pretty confident others didn't as well. At Google, that really impacted our ability to deliver, so eventually, because we wanted to potentially publish something, we had to start from scratch and train a baseline model so we could get it published.
To our surprise, within three or four weeks, we managed to beat the production model, which was a fancy deep learning model using simple techniques such as n-grams and logistic regressions. And I was like, wow, okay. This team has been working on this for a year, and if they had a system of record if someone actually tested the baseline — and I'm sure someone did but it kind of fell through with people leaving and switching projects — we would have saved a year of research. So that's really how we got into working on Comet. I called Nimrod, with whom I had already worked before, and went "hey, remember all this stuff we're talking about? I'm at Google, and it's exactly the same here."
So we started the company in 2017. Obviously, the product and the platform evolved but essentially, what we do is provide a platform that allows ML teams, Data Science teams, and individuals to manage their ML workflow. So all the way from tracking the datasets, code, experiments, and models, whether it is early experimentation and prototyping, all the way to monitoring these models in production.
We have a free community edition, that's used by tens of thousands of data scientists around the world, and then we have a commercial offering as well. We power some of the best ML teams in the world like Etsy, Uber, Zappos, and many more.
If I was an engineer setting up the whole tooling and DevOps around ML, what would be your advice for me?
The answer is similar regardless of the size of your team and company, though not exactly the same. My first advice for someone trying to set up a platform, product, or team is to start by solving the existing pain points. Often people try to solve the entire end-to-end stack and workflow before it's even relevant, and when things might change down the road.
For instance, if you're still in early experimentation, trying to get your model to work well enough; it's probably still too early to figure out deployment and retraining. So focus on solving existing pain points that you and the other data scientists on the team experience typically. Like, the most basic thing you need is a way to run your jobs, right? Whether it's on-premise, on your laptop on a cloud provider, and some way to manage the workload, track experiments, and make sure that you can collaborate. That's typically the point where we see a lot of our customers start from. Then, you evolve as you get to experience other pain points like deployment and monitoring.
Again, this depends on the use case and what you're trying to do, but as a rule of thumb for deployment, our view is deploying ML models is not that different than deploying applications, so if your team or company already have like a CI/CD process and stack, reusing that is probably a good idea.
For monitoring, depending on what you're trying to do, you can get started with open source tools like Grafana, or maybe your APM product like Data Dog or New Relic. When you get to the point where model quality and production data drift is something that could impact business or product results; that’s where it makes sense to start looking into a more advanced monitoring solution.
Regarding the “build vs. buy” dichotomy, it’s largely a function on how generalizable your domain is. If you have unique pain points, it is likely that you might need to build that. For other areas of the workflow that are more generalizable, it's probably not the best use of your cycles to try to build it yourself — definitely if you're a small team, but also if you're a large one.
For instance, in financial services, one might have specific needs around auditing that make existing general MLOps tools unsuitable.
But build or buy are not necessarily excluding choices. If we look at one of our biggest clients, Uber, they built what was considered the first proper MLOps platform in the world: Michelangelo. That's an amazing platform built by an amazing team, I've never seen anything like it, It's extremely impressive. But you know, they came to us about two years ago, and said, look, we have to focus on other pieces of the workflow that are very specific to us. You guys go deeper and do better experimentation management than anyone else. So we want to integrate you into the platform. So we're part of it, we're not replacing the platform, but we are giving their customers and users significantly more value on top of what they already built.
Since Comet ML was born in 2017, there's been an explosion of a lot of startups and products that solve problems revolving around ML engineering. Do you think we're still in this growing exploration phase, or is it already converging into fewer categories with stronger players?
It's a little bit of both. We are going to continue to see new solutions, new vendors, for more niche use cases. For example, in data anonymization, there's an open-source project by Microsoft called Presidio that focuses on that. Another example is solutions for synthetic data, because they're typically use-case specific: you can't generate synthetic data for all machine learning problems.
But with regards to the core categories as we know them today, I strongly believe we'll see consolidation. You know, one example is experimentation management and model monitoring. Historically been seen as separate categories, and that's mostly because in ML, by default we tend to inherit from software engineering. In software, GitHub and Datadog are completely different beasts from a product perspective. But if you think about it in the ML side of things, it's almost the same product, just different “data streams”. So you know, experimentation, management, offline experiments, offline data, static datasets, monitoring is like live production data.
But, experimentation doesn't end the first time you deploy a model to production; in some ways, it's just the start. It's a very iterative process. Our view about specifically those two categories is that they're going to become the same category. That's what we're seeing with our customers as well: they are using those two solutions together.
Orchestration will remain the strong focus area of cloud providers. That is very similar to software engineering, and we might see things on top of it, but even there, we're seeing a lot of the same tools used in software, like Kubernetes and Airflow, data engineering to drive that. In terms of categories, there are definitely a lot of companies there — most of them, unfortunately — that are not going to make it so there's going to be acquisitions.
How have things shifted since you started Comet ML? Is the problem you're solving today the same that you thought you were going to solve when you started the company? What is a belief of yours that has been proven wrong recently?
We've actually been quite consistent doing the same thing; obviously, more than we were doing on day one. We're fortunate to know the pain point personally, we are the users of our products. So it didn't surprise us that other people experienced that as well.
However, on the ML progress side of things, NLP is one thing that really surprised me. I worked a lot on NLP and spent a lot of time using the latest Deep Learning techniques at the time such as LSTMs, and using them to generate embeddings. It was almost impossible to beat “traditional” NLP techniques… Things like N-grams and stop words removal. I was very confident it would stay that way for a long time. But I was wrong obviously, because Transformer based large language models are definitely doing better on NLP tasks. That was exciting to see, I wish I had access to these models back in the day!
Another recent example is Uber using Deep Learning for ETA. We work very closely with that team and that was exciting to see because I think most people assumed it's almost it's impossible to beat XGBoost “in real life structured or tabular data”. But it's very exciting to see that if you do everything right, and you spend enough time on it and you have enough data, you can actually get improvements.
Thanks for this interview Gideon, we look forward to your future work.
Thanks for having me!