top of page

Making sense of the MLOps space — Interview with Jakub Czakon

Jakub Czakon is currently the head of growth at Neptune AI (a metadata store for MLOps, more on that later) after years working as a Deep Learning specialist and Data Scientist. We take this opportunity to talk about his learning and perspectives on the space of tooling for MLOps after years working with clients that face ever-evolving challenges.

About Jakub Czakon

Jakub had an unusual career before getting into the world of Data Science: he used to be a chess player at the International Master level, which he still fondly remembers. He followed that stage again in an unconventional fashion, studying Finance and Accounting next to Theoretical Physics, where he acquired a taste for working with data and all the statistical jam. Soon after he worked for many years at DeepSense, a consultancy specialized in Deep Learning, where was brewed and emerged as a spin-off, which he joined right away in 2016. Now he's leading growth at, a startup in the space of metadata storage and management, which in a nutshell consists of tooling to manage versioning of data, experiment tracking and model registry: "A metadata store makes it easy to log, store, display, organize, compare and query all metadata generated during ML model lifecycle."

Finding what you need in terms of tooling for MLOps can be a daunting task: the amount of tools, frameworks and software out there is so vast it gets overwhelming. What are your thoughts on the current state of MLOps tools?

Indeed this "primordial soup of MLOps" is very nascent. In my opinion the reason for this is that the vast majority of people don't know what they're doing, and it's fine, because we also don't know what we're doing in a sense! We're at a point where big companies developed their in-house stuff that's good for them in terms of MLOps, but now we're trying to figure out how those things can translate to smaller companies: what problems do they face and what products solve more pains than they introduce? The market is still struggling to stabilize into categories and products that clearly solve the problems that people have.

When it comes to in particular, we started as an experiment manager (experiment tracking tool would be even better) and now because our customers needed it we're expanding into a bigger scope: a metadata store, which includes versioning of models, datasets and everything else. There's folks out there (e.g. Databricks or Sagemaker) that are trying to build end-to-end solutions, but we don't think that will be the dominant approach in the long run. I think in the next few years we will see a few categories become established and have various players in them, and our strategy is to become the "best-in-breed" type of tool for the specific segment known as metadata stores. End-to-end solutions will definitely exist and keep improving but we firmly believe in being the best-in-breed for a specific category is the winning bet.

An important open question though is still, which will be the categories that will stick? Is it model tracking? or model tracking bundled with experiment tracking + model registry? In other words, it's still unclear what are the optimal category scopes. In our case, we want to be straightforward, transparent about what we do and non opinionated. Transparent because we're not interested in adding a small hosting feature to say "hey we're also doing hosting now!" when our offering is not strong. People in ML are smart to figure that out when a company is doing precisely that. And "non opinionated" because we really believe nobody knows yet how this space should be, what are the best practices for various data/problem modalities. We want to let people do what is best for them with tool that can fit their workflow. You can think of Neptune’s data model as a python dictionary. You define the structure and organize it the way you want. The model can be 1 binary, 5 binaries, anything. We avoid having fixed "recipes" (e.g. this specific format for the model, 3 stages for deployment, etc.) but rather let people define it

Regarding the marketing side of things, we've shifted a bit since we started. At the beginning we focused on Kaggle competitions. After all, that's where Neptune started! It was created during one of the competitions which the team from DeepSense won. But we eventually realized that it wasn't for everyone, so we shifted our efforts to a blog which focuses on the problems of ML practitioners, which has been going great so far.

In the past 5 years, how do you think Data Science teams' challenges changed?

I'd say the biggest difference comes from the expectations on the business side. Machine Learning used to be "presentation driven": create a proof of concept, hack it in a notebook and present it in a nice way. This would fly just a few years ago. But now it's really become more grounded from the business perspective: does this make sense? what are the baselines? are they already in production? how does the new approach compare to the existing ones? what are the tradeoffs?

There's a level of pain that business and org folks can endure without seeing a return on investment but it's not infinite, and now people start to expect this return soon(ish).

Another reason for the rising expectations on returns is how dramatically the tooling surrounding Machine Learning has improved. For instance, in Natural Language Processing (NLP), with Hugging Face, building and training those models is way simpler than it was 5 years ago. In Computer Vision, not so long ago we had Theano, which was tremendously complex compared to Keras/TensorFlow and PyTorch nowadays: if you don't need to go deeper, building a model is simple, but even if you need to change things deeply in a model, the abstractions hold up very well which wasn't the case even recently.

On the topic of scale: we see ever growing models and datasets in the research world, specially coming from big companies. Is this a trend where people in the industry diverge, or is it something that will eventually also be used?

We have clients that work on improving the deployment of these big models. It is still a painful and expensive task, which requires a lot of expertise to pull off successfully.

It really depends on the usecase: are you at the stage where getting from 90% to 94% will make a big difference? You might be, but for the vast majority of companies out there, they won't be at this stage. It will make more business sense to work on a new feature or work on making your current solution more robust, explainable or efficient.

One way in which very big datasets could play an important role in industry is if out of the box pretrained models perform really well on "any" data you throw at them, or training in a few-shot setting becomes very easy.

To summarize, I don't think it will go that much bigger on the modeling side because the tradeoff just doesn't pay off for most people; however on the dataset side we will probably keep seeing bigger and better datasets because those are still further from saturating in many tasks and are really useful in the context of pretrained models and transfer learning. On the model side, it is more like "how can we make it smaller but good enough, simpler but good enough, etc."

To put an example, if you look at Kaggle competitions for Computer Vision, the top people try all the new big stuff like Transformers, but ResNets and the very established architectures still rule. If the new stuff was truly ready for prime time (i.e. on not ideal data) we'd be seeing those win competitions and became the new baseline architectures, I think.

Alright, this was it for today, thanks for your time and let's keep in touch!



We hope you found this piece useful and interesting, if you want to stay up to date about the world of data, ML and AI follow us on twitter and subscribe to our newsletter.

bottom of page