Trends in AI — September 2024

The AI landscape is buzzing with developments, from massive funding rounds to strategic acquisitions and groundbreaking model releases. Join us for an overview of the latest news in AI R&D and a curated list of the month's top 10 trending research papers.

>> Join us for Transformers at Work, live from the Bay Area on Friday, September 20th! <<

Transformers are the backbone of modern AI, and we're here to celebrate recent progress. In the fifth edition of our workshop, we are exploring recent breakthroughs focusing on LLM applications in the enterprise: Transformers at Work!

In this program, we bridge the space between cutting-edge research in AI and building practical applications that deliver value. Learn about the progress in AI from world-renowned researchers and engineers pushing the boundaries of Neural Search, RAG, LLMOps, Prompt Optimization, Agents, and AI Hardware.

News Articles

Model Releases

Zeta Alpha: Zeta-Alpha-E5-Mistral
OpenAI: o1
Mistral: Mathstral, Codestral Mamba, Mistral NeMo, Mistral Large 2, Pixtral
xAI: Grok 2
AI21: Jamba 1.5
Nous Research: Hermes 3
DeepSeek AI: DeepSeek V2.5
Qwen: Qwen2-Audio, Qwen2-VL
Microsoft: Phi-3.5

Trending AI papers for September 2024

[1] ColPali: Efficient Document Retrieval with Vision Language Models - M. Faysse et al. (Illuin Tech, CentraleSupélec) - 27 June 2024

→ ColPali: a document retrieval model that uses Vision-Language Models to understand complex and visually rich document formats.

🤔 Why? It radically simplifies the document indexing pipeline and shows great performance on visual question-answering tasks involving e.g. figures and tables.

💡 Key Findings:

ColPali improves retrieval performance while reducing latency and enabling faster indexing compared to standard retrieval methods.
It beats both CLIP-style and text-only models by a large margin on the ViDoRe benchmark.

[2] RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models - H. Lee et al. (KAST AI, AI2) - 04 September 2024

→ RouterRetriever: a retrieval system comprising multiple domain-specific embedding models that uses a routing mechanism to select the best expert for each query.

🤔 Why? It addresses the limitations of models trained on single, static, large-scale general-domain datasets.

💡 Key Findings:

Outperforms both MSMARCO-trained and multi-task trained models on BEIR.
Good zero-shot generalizability, which further improves by including more expert models.

[3] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters - C. Snell et al. (Google DeepMind) - 06 August 2024

→ Suggests that improving the efficiency of test-time compute scaling can improve performance on hard prompts.

🤔 Why? We can enhance the effectiveness of LLMs without increases in model size or pre-training effort.

💡 Key Findings:

Analysis of two primary mechanisms:
- iterative answer revisions
- process-based verifier re-ranking
Experiments show up to 4x performance boost using optimal test-time compute allocation.
For some tasks, test-time compute can significantly substitute for pre-training.
For a smaller but good base mode, test-time compute can outperform a 14× larger model.

[4] Automated Design of Agentic Systems - S. Hu et al. (Vector Institute) - 15 August 2024

→ Proposes the Automated Design of Agentic Systems framework that uses a meta agent to automatically generate building blocks for agentic systems.

🤔 Why? The framework can reduce the effort required in designing complex agentic systems, potentially leading to more efficient, robust, and innovative solutions than manually engineered ones.

💡 Key Findings:

Outperforms state-of-the-art hand-designed agents in reading comprehension (+13.6 F1 score) and math tasks (+14.4% accuracy rate).
The agents demonstrate high transferability, showing improvements across various held-out models and non-math domains.

[5] FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision - J. Shah et al. (Colfax, NVIDIA, Together AI) - 11 July 2024

→ FlashAttention-3 proposes an attention mechanism that leverages asynchrony and FP8-precision for enhanced speed & accuracy on NVIDIA Hopper GPUs.

🤔 Why? Asynchrony allows computational tasks to overlap and reduces idle times, while low-precision computations ensure faster processing without significant loss of accuracy.

💡 Key Findings:

Compared to FlashAttention-2, using FP16:
- 1.5-2x faster for forward passes, 1.5-1.75x faster for backward passes
Compared to standard attention: 3-16x faster
2.6x reduction of numerical errors with FP8

[6] The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery - C. Lu et al. (Sakana AI, FLAIR, Vector Institute) - 12 August 2024

→ The AI Scientist: an end-to-end framework for fully automated scientific discovery using LLMs. It generates research ideas, performs experiments, analyzes results, and writes scientific papers autonomously.

🤔 Why? The goal is to accelerate scientific progress. By automating the entire research process, this framework can help overcome human limitations related to time, expertise, and biases.

💡 Key Findings:

Automated Review Process: an LLM-based reviewing agent assesses the quality of the generated papers. This agent is benchmarked against human reviewers using ICLR ’22 data.
Case Study: A specific example, "Adaptive Dual-Scale Denoising," is explored in detail, showcasing the framework's capability to generate credible research outcomes.

[7] De novo design of high-affinity protein binders with AlphaProteo - V. Zambaldi et al. (Google DeepMind) - 05 September 2024

→ AlphaProteo: a computational deep-learning-based system capable of designing high-affinity protein binders de novo without requiring extensive rounds of experimental optimization.

🤔 Why? Traditional methods for producing such binders are labor-intensive. AlphaProteo could be a transformative tool in drug development, diagnostics, and biomedical research.

💡 Key Findings:

High success rates (9-88%) across different targets, outperforming current SOTA methods.
The binders obtained exhibited high affinities, with KD < 1 nM for several proteins.
Selected binders neutralized SARS-CoV-2 in live-virus assays, indicating functional efficacy.
The AlphaProteo system designs proteins using a combination of generative modeling (distilled from AlphaFold3) and filtering.

[8] OLMoE: Open Mixture-of-Experts Language Models - N. Muennighoff et al. (Contextual AI, AI2) - 03 September 2024

→ OLMoE: an open mixture-of-experts (MoE) language model. OLMoE-1B-7B has 7 billion parameters but only activates 1.3 billion parameters per token.

🤔 Why? It aspires to democratize access to high-performing language models, with insights for the community on optimizing MoE architectures.

💡 Key Findings:

Outperforms much larger models like Llama2-13B and DeepSeekMoE-16B on benchmarks such as MMLU and HellaSwag.
Ablations on router saturation, expert co-activation, and domain & vocabulary specialization.

[9] Diffusion Models Are Real-Time Game Engines - D. Valevski et al. (Google) - 27 August 2024

→ GameNGen: a neural model-based game engine that is capable of executing complex interactive video games. It can run DOOM at over 20 FPS on a single TPU.

🤔 Why? A major shift from traditional game engines with handcrafted code and predefined rules, to a model where game worlds are generated with neural networks.

💡 Key Findings:

The quality of the simulated game clips is comparable to JPEG compression (PSNR=29.4).
The model is evaluated on its ability to generate long trajectories by comparing predicted frames against actual gameplay.
Models trained on agent-generated data perform better, highlighting the importance of realistic training data.

[10] Sapiens: Foundation for Human Vision Models - R. Khirodkar et al. (Meta) - 22 August 2024

→ Sapiens: a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth prediction, and surface normal estimation.

🤔 Why? It addresses the challenge of creating robust and generalizable vision models that can perform well in diverse in-the-wild conditions.

💡 Key Findings:

Good data + scaling are the key to success (like DINO).
Outperforms SOTA methods on 2D Pose Estimation and Body-Part Segmentation.
More accurate relative depth estimation and higher precision of surface normal predictions.

And a few runner-ups:

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers - C. Si et al. (Stanford) - 06 September 2024
In Defense of RAG in the Era of Long-Context Language Models - T. Yu et al. (NVIDIA) - 03 September 2024
Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment - K. Luo et al. (BAAI) - 22 August 2024
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation - D. Ru et al. (Amazon AWS) - 15 August 2024

You can find an annotated collection of these papers (+ more that didn't make the cut) in Zeta Alpha, allowing you to easily discover relevant literature and dive deeper into any topic that interests you.

Here is a 3-minute preview of the papers in our top-10 list:

The full recording of our latest Trends in AI episode is available on our YouTube, covering all of the papers in depth. Sign up to join us live for the next edition in October. Until then, enjoy discovery!