Mustafa Eyceoz
Mustafa Eyceoz's contributions
Article
Get started with language model post-training using Training Hub
Mustafa Eyceoz
Simplify LLM post-training with the Training Hub library, which provides a common, pythonic interface for running language model post-training algorithms.
Article
Post-training methods for language models
Mustafa Eyceoz
+1
Dive into LLM post-training methods, from supervised fine-tuning and continual learning to parameter-efficient and reinforcement learning approaches.
Article
Async-GRPO: Open, fast, and performant
Aldo Pareja
+1
Discover Async-GRPO, a new library for reinforcement learning tasks that efficiently handles large models, eliminates bottlenecks, and accelerates experiments.
Article
Sculpting subspaces: How we solved continual learning in LLMs
Nikhil Shivakumar Nayak
+10
Discover how the adaptive SVD approach enables LLMs to continually learn and adapt without forgetting previously acquired knowledge.
Article
Lessons on reproducing R1-like reasoning in small LLMs
Akash Srivastava
+8
Learn about an efficient inference scaling method that can improve your model's reasoning ability and performance at runtime while saving on compute costs.
Article
On reasoning versus inference-time scaling
Akash Srivastava
+8
Progress in small LLM reasoning: Our Qwen-32B model, using particle filtering, now surpasses o1-preview on Math500.
Article
Granite, LIMO, and small LLM reasoning
Akash Srivastava
+8
On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for Llama/Granite; synthetic data generation shows promise but fine-tuning is tricky.
Article
How particle filtering makes small LLMs think big
Akash Srivastava
+8
An update on reproducing R1-like reasoning in small LLMs: Granite models show big gains with particle filtering, outperforming GPT-4o on benchmarks.
Get started with language model post-training using Training Hub
Simplify LLM post-training with the Training Hub library, which provides a common, pythonic interface for running language model post-training algorithms.
Post-training methods for language models
Dive into LLM post-training methods, from supervised fine-tuning and continual learning to parameter-efficient and reinforcement learning approaches.
Async-GRPO: Open, fast, and performant
Discover Async-GRPO, a new library for reinforcement learning tasks that efficiently handles large models, eliminates bottlenecks, and accelerates experiments.
Sculpting subspaces: How we solved continual learning in LLMs
Discover how the adaptive SVD approach enables LLMs to continually learn and adapt without forgetting previously acquired knowledge.
Lessons on reproducing R1-like reasoning in small LLMs
Learn about an efficient inference scaling method that can improve your model's reasoning ability and performance at runtime while saving on compute costs.
On reasoning versus inference-time scaling
Progress in small LLM reasoning: Our Qwen-32B model, using particle filtering, now surpasses o1-preview on Math500.
Granite, LIMO, and small LLM reasoning
On reproducing R1-like reasoning in small LLMs: LIMO dataset ineffective for Llama/Granite; synthetic data generation shows promise but fine-tuning is tricky.
How particle filtering makes small LLMs think big
An update on reproducing R1-like reasoning in small LLMs: Granite models show big gains with particle filtering, outperforming GPT-4o on benchmarks.