A monthly selection of recent implementations, tools and software gaining traction in the ML-sphere right now: Google's Vertex AI, PyTorch Video, HuggingFace Accelerate and more.
Building ML software and jumping from research to production is no easy feat, even so when the sea of resources is vast: it’s hard to keep up. Precisely, we often encounter this problem at Zeta Alpha as we work to put the latest models in production and find the best paper implementations, so we want to contribute by sharing a monthly selection of our findings in the latest repositories, tools, libraries and software news that are quickly gaining traction. Enjoy!
⚙️ MLOps highlights
The space of ML Operations (MLOps) is moving at dazzling speed, which can make it confusing: in the last couple of years hundreds of companies and products have emerged to help bring ML to production in a more reliable and robust way, but the result is a very crowded, not-yet-settled marketplace which is still hard to navigate. As we ourselves try to make some sense out of it, we follow closely the latest news and developments. Here’s a selection:
Google’s release of VertexAI: a “unified platform for MLOps”. The big cloud providers (Microsoft’s Azure, Amazon’s AWS) are quickly building ML tools that integrate with their technologies — probably in an attempt to lock more customers in — whereas companies like DataBricks, with the new DataBricks ML, or frameworks like MLFlow or Kubeflow, keep pushing for a future that’s agnostic of where you’re running your ML projects on. We’ll see whether built-in or third-party solutions get wider adoption in the next few years, and whether end-to-end MLOps products will win over the sea of feature-specific tools such as label with Snorkel, track experiments with Neptune and orchestrate with AirFlow.
Huggingface can be credited for bringing Transformers to the masses and speeding up the applications of these architectures in production. With Transformers making the jump in research to solve Computer Vision tasks a few months ago, Huggingface recently made its first full “vision release” which include CLIP¹, ViT² and Deit³. Models used in industry are always a few months behind those in research, as the performance/simplicity trade-off applies differently. So, despite the recent success of ViTs, we’re still wondering: will Transformers replace good ol’ CNNs in production anytime soon?
Andrew Ng’s new coursera “ML in production”: The original Machine Learning course from Andrew acted as a catalyst for the ML meteoric rise by introducing ML to milions of practitioners. If that’s any indicator, this new course has the potential of narrowing the gap between research and production by educating thousands of engineers and researchers.
Meanwhile, on GitHub
A selection of recently released libraries, frameworks and implementations.
👾 facebookresearch/pytorchvideo ⭐️ 1.4k | 📄 Documentation | 🌐Website
👉 Pytorch Video provides reusable, modular and efficient components needed to accelerate the video understanding research. PyTorchVideo is developed using PyTorch and supports different deep learning video components like video models, video datasets, and video-specific transforms.
🚀 Great for building and benchmarking custom video processing research models. Features and characteristics:
A “Model Zoo” with several implementations of existing models, along with dataloaders for the most common datasets.
Efficient video components: components are optimized for video and support accelerated inference on hardware.
👾 microsoft/swin-transformer ⭐️ 3.4k | 📄 Paper
👉 If you want to play around with vision transformers, here’s the backbone you probably need right now.
❓ This repo is the official implementation of “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”, and it also includes the code for the following papers and tasks:
Object Detection and Instance Segmentation: See Swin Transformer for Object Detection.
Semantic Segmentation: See Swin Transformer for Semantic Segmentation.
Self-Supervised Learning: See Transformer-SSL.
This implementation will be useful if you want to play around with a solid vision transformers iteration.
👾 huggingface/accelerate ⭐️1.4k | 📄 Documentation
👉 A dead simple way to run PyTorch on different hardware acceleration configurations (single/multiple GPUs and TPUs on single/multiple nodes) without hardware-specific boilerplate code.
🚀 Define once, train anywhere. Fast. What accelerate is for and what it’s not:
Accelerate lets you run your training scripts on distributed environments (i.e. across compute nodes or many other configurations) without giving up control over the training loop.
Accelerate does not let you abstract away from the training loop itself, unlike other higher-level frameworks above PyTorch such as pytorch/ignite .
👾 open-mmlab/mmocr ⭐️ 1.2k | 📄 Documentation
👉 An open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction.
🚀 Great for building image or video processing pipelines. Main Features and components:
A comprehensive pipeline which includes downstream tasks for information extraction.
A variety of state-of-the-art models for text detection and recognition
A modular design to define your own optimizers, data preprocessing, model backbones or loss functions.
Several utilities for visualization, ground truch and prediction bounding boxes among others.
👾 linkedin/greyKite ⭐️ 939 | 📄 Documentation
👉 The Greykite library provides flexible, intuitive and fast forecasts through its flagship algorithm, Silverkite. The Silverkite algorithm works well on most time series, and is specially adept for those with changepoints in trend or seasonality, event/holiday effects, and temporal dependencies. Its forecasts are interpretable and therefore useful for trusted decision-making and insights.
🚀 Great for quick off-the-shelf time-series applications. Top features and characteristics:
Flexible design: provides basic regressors to detect trends and seasonaily, holidays and change-points; as well as state-of-the-art ML models to choose from. The same pipeline enables preprocessing, cross-validation, backtest, forecasting and evaluation for all models.
An intuitive interface for visualizations, templates that work well for certain data characteristics and an it can produce interpretable output to inspect the contributions of each regressor.
Quick training, benchmarking and prototyping with grid search for model selection.
👾 google/lyra ⭐️ 2.6k | Blog post | 📄 Paper
👉 Ahigh-quality, low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this it applies traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.
🚀 This work is similar in spirit as Nvidia’s GAN-based very low bitrate video compression. The method works by extracting speech features every 40ms, quantizing based on psychoacoustic characteristics (i.e. removing aspects the human ear has low sensitivity to) and transmitting that signal through the channel. The decoder is a generative model that receives the speech features as input and outputs an audio waveform. This is great for building bandwidth constrained mobile apps. Main features and characteristics include:
Very low bitrate speech encoding-decoding (3kbps).
Easy integration for Android and Linux-based applications.
If you’re more interested in the vast world of MLOps, I can’t recommend enough you check out Chip Huyen’s summary of tools diagnosis of the space on her blog, as well as the ml-ops.org website and their list of resources on GitHub.
This is it for this month, if you want to keep up with the latest developments of the world of ML, follow us on Twitter @zetavector. To learn more about interesting recent research, check out other monthly blog series Best of arXiv, which focuses on academic literature. See you next month!
[All GitHub repo star counts are as of June 1, 2021]. References [1] Learning Transferable Visual Models From Natural Language Supervision — By Alec Radford, Jong Wook Kim et al. 2021. [2] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — By Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai et al. 2021 [3] Training data-efficient image transformers & distillation through attention — By Hugo Touvron et al. 2020
Comments