Must See AI Innovations This Week - Cache-Augmented Generation (CAG), UC Berkeley's Sky-T1-32B, MemoRAG, Microsoft Phi-4, Nebius AI
Latest AI breakthroughs: UC Berkeley's cost-efficient LLM training, Microsoft's Phi-4 open source launch, Cache-Augmented Generation
🔬 Research Highlights
1. UC Berkeley's Sky-T1-32B-Preview
UC Berkeley's NovaSky team released Sky-T1-32B-Preview, matching o1-preview's capabilities with just $450 and 19 hours of training. The model uses Alibaba's Qwen2.5-32-Instruct base with QwQ-32B-Preview generated training data, demonstrating high-level reasoning can be replicated efficiently.
Technical Details:
Training completed in 19 hours on 8 H100 GPUs
Total cost around $450
Open-source pipeline including training data and code
Excels in mathematics and coding tasks
2. MemoRAG: Advanced Memory-Driven RAG Framework
A new open-source RAG system introducing long-term memory capabilities, processing up to 1 million tokens in context. The framework includes MemoRAGLite for quick prototyping on 16GB GPUs.
Technical Details:
30x speedup in context pre-filling
Context processing reduced from 35s to 1.5s for 200K tokens
Supports OpenAI and Deepseek APIs
Compatible with Meta-Llama-3.1-8B and custom LLMs
☑️ New Models and Updates
1. Microsoft Phi-4 Goes Open Source
Microsoft released Phi-4 on Hugging Face under MIT license, featuring:
14B parameters
16k token context window
Training on ~10 trillion tokens
Outperforms larger models on STEM tasks
Runs on consumer hardware
2. Nebius AI Studio Expansion
Platform enhanced with new models:
Qwen2-VL-72B-Instruct for visual tasks
Meta Llama-3.3-70B-Instruct supporting 8 languages
Dolphin-2.9.2-mixtral-8×22b for coding
New embedding models including BGE-ICL and e5-mistral-7b-instruct
⚡ Performance Benchmarks
1. NVIDIA's Video Processing
New benchmarks for autonomous vehicle data processing:
Processes 100M video clips from 20M hours of training data
Automated video captioning every 256 frames
H.264 hardware implementations
Efficient fine-tuning capabilities for World Foundation Models
2. Cache-Augmented Generation (CAG) vs Traditional RAG
New performance metrics show CAG outperforming traditional RAG systems:
Eliminates retrieval latency
Reduces architecture complexity
Optimized for limited knowledge bases
Significantly faster inference times
Improved memory efficiency