Media Summary: This talk addresses the Training-Inference Mismatch problem commonly encountered in ... Yeah And what I want to introduce is some recent updates um a topic what we are moving forward on Join Discord to tell us your ideas about the video: Title: Back to Basics: Revisiting REINFORCE ...

Optimizing Large Scale Rl With - Detailed Analysis & Overview

This talk addresses the Training-Inference Mismatch problem commonly encountered in ... Yeah And what I want to introduce is some recent updates um a topic what we are moving forward on Join Discord to tell us your ideas about the video: Title: Back to Basics: Revisiting REINFORCE ... In this video, I break down DeepSeek's Group Relative Policy Learn how NVIDIA researchers introduced GDPO to enhance multi-reward reinforcement learning for Title: The Art of Scaling Reinforcement Learning Compute for LLMs (Oct 2025) Link: Date: ...

In this AI Research Roundup episode, Alex discusses the paper: 'Soft Adaptive Policy To learn more about enrolling in the graduate course, visit: ... A top-down, self-contained guide to RLHF, PPO, and GRPO: how At Ray Summit 2025, Jason Lopatecki from Arize AI shares a new paradigm for iterative model improvement—Prompt Learning ...

Photo Gallery

Optimizing Large-Scale RL with SGLang | Chenyang Zhao | AER Labs
Optimizing Large-Scale LLM RL Training with SGLang
Pivot RL Explained: Efficient Reinforcement Learning for AI Agents
[2024 Best AI Paper] Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human
DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (Feb 2026)
NVIDIA's GDPO: Optimising Multi-Reward RL for Better LLM Performance
The Art of Scaling Reinforcement Learning Compute for LLMs (Oct 2025)
SAPO: Stable RL Policy Optimization for LLMs
Large-scale deep learning to augment production RL workloads at Riot Games
Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 9: RL for LLMs
RLHF, PPO & GRPO Explained: A Top-Down Guide to LLM Policy Optimization
View Detailed Profile
Optimizing Large-Scale RL with SGLang | Chenyang Zhao | AER Labs

Optimizing Large-Scale RL with SGLang | Chenyang Zhao | AER Labs

This talk addresses the Training-Inference Mismatch problem commonly encountered in

Optimizing Large-Scale LLM RL Training with SGLang

Optimizing Large-Scale LLM RL Training with SGLang

... Yeah And what I want to introduce is some recent updates um a topic what we are moving forward on

Pivot RL Explained: Efficient Reinforcement Learning for AI Agents

Pivot RL Explained: Efficient Reinforcement Learning for AI Agents

PivotRL:

[2024 Best AI Paper] Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human

[2024 Best AI Paper] Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human

Join Discord to tell us your ideas about the video: https://discord.gg/nPUm3ThuBc Title: Back to Basics: Revisiting REINFORCE ...

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

In this video, I break down DeepSeek's Group Relative Policy

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (Feb 2026)

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (Feb 2026)

Title: CUDA Agent:

NVIDIA's GDPO: Optimising Multi-Reward RL for Better LLM Performance

NVIDIA's GDPO: Optimising Multi-Reward RL for Better LLM Performance

Learn how NVIDIA researchers introduced GDPO to enhance multi-reward reinforcement learning for

The Art of Scaling Reinforcement Learning Compute for LLMs (Oct 2025)

The Art of Scaling Reinforcement Learning Compute for LLMs (Oct 2025)

Title: The Art of Scaling Reinforcement Learning Compute for LLMs (Oct 2025) Link: http://arxiv.org/abs/2510.13786v1 Date: ...

SAPO: Stable RL Policy Optimization for LLMs

SAPO: Stable RL Policy Optimization for LLMs

In this AI Research Roundup episode, Alex discusses the paper: 'Soft Adaptive Policy

Large-scale deep learning to augment production RL workloads at Riot Games

Large-scale deep learning to augment production RL workloads at Riot Games

Large

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 9: RL for LLMs

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 9: RL for LLMs

To learn more about enrolling in the graduate course, visit: ...

RLHF, PPO & GRPO Explained: A Top-Down Guide to LLM Policy Optimization

RLHF, PPO & GRPO Explained: A Top-Down Guide to LLM Policy Optimization

A top-down, self-contained guide to RLHF, PPO, and GRPO: how

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

At Ray Summit 2025, Jason Lopatecki from Arize AI shares a new paradigm for iterative model improvement—Prompt Learning ...