Must See AI Innovations This Week - Cache-Augmented Generation (CAG), UC Berkeley's Sky-T1-32B, MemoRAG, Microsoft Phi-4, Nebius AI

Latest AI breakthroughs: UC Berkeley's cost-efficient LLM training, Microsoft's Phi-4 open source launch, Cache-Augmented Generation

Jan 13, 2025

Article voiceover

1×

0:00

-5:50

🔬 Research Highlights

1. UC Berkeley's Sky-T1-32B-Preview

UC Berkeley's NovaSky team released Sky-T1-32B-Preview, matching o1-preview's capabilities with just $450 and 19 hours of training. The model uses Alibaba's Qwen2.5-32-Instruct base with QwQ-32B-Preview generated training data, demonstrating high-level reasoning can be replicated efficiently.

Technical Details:

Training completed in 19 hours on 8 H100 GPUs
Total cost around $450
Open-source pipeline including training data and code
Excels in mathematics and coding tasks

2. MemoRAG: Advanced Memory-Driven RAG Framework

A new open-source RAG system introducing long-term memory capabilities, processing up to 1 million tokens in context. The framework includes MemoRAGLite for quick prototyping on 16GB GPUs.

Technical Details:

30x speedup in context pre-filling
Context processing reduced from 35s to 1.5s for 200K tokens
Supports OpenAI and Deepseek APIs
Compatible with Meta-Llama-3.1-8B and custom LLMs

☑️ New Models and Updates

1. Microsoft Phi-4 Goes Open Source

Microsoft released Phi-4 on Hugging Face under MIT license, featuring:

14B parameters
16k token context window
Training on ~10 trillion tokens
Outperforms larger models on STEM tasks
Runs on consumer hardware

2. Nebius AI Studio Expansion

Platform enhanced with new models:

Qwen2-VL-72B-Instruct for visual tasks
Meta Llama-3.3-70B-Instruct supporting 8 languages
Dolphin-2.9.2-mixtral-8×22b for coding
New embedding models including BGE-ICL and e5-mistral-7b-instruct

⚡ Performance Benchmarks

1. NVIDIA's Video Processing

New benchmarks for autonomous vehicle data processing:

Processes 100M video clips from 20M hours of training data
Automated video captioning every 256 frames
H.264 hardware implementations
Efficient fine-tuning capabilities for World Foundation Models

2. Cache-Augmented Generation (CAG) vs Traditional RAG

New performance metrics show CAG outperforming traditional RAG systems:

Eliminates retrieval latency
Reduces architecture complexity
Optimized for limited knowledge bases
Significantly faster inference times
Improved memory efficiency

👉 NVIDIA Just Changed Everything About AI Agents, Here's What I Found

·

Jan 12

👉 NVIDIA Just Changed Everything About AI Agents, Here's What I Found

NVIDIA's Vision for AI Agents Reshapes Tech Landscape

Read full story

Must See AI News This Week - NVIDIA DIGITS, OpenAI O3, Microsoft Phi-4, ElevenLabs Flash Speech Model

·

Jan 11

Must See AI News This Week - NVIDIA DIGITS, OpenAI O3, Microsoft Phi-4, ElevenLabs Flash Speech Model

👉 Most Interesting This Week

Read full story

Here Are the Vanity Metrics to Forget in AI Product Management and What Real Metrics To Use

·

Jan 12

Here Are the Vanity Metrics to Forget in AI Product Management and What Real Metrics To Use

Hey there, AI product leaders! 👋

Read full story

Here's Why Edge AI Beats Cloud AI for Latency-Sensitive Applications

·

Jan 9

Here's Why Edge AI Beats Cloud AI for Latency-Sensitive Applications

Hey there, AI product leaders and tech decision-makers! 👋

Read full story

Discussion about this post

No posts

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts