Sandip Kulkarni profile picture

Sandip Kulkarni

Is this your author profile? Create an account to claim and customize it!

Stand Alone

A Practical Guide to Reinforcement Learning from Human Feedback - Foundations, aligning large language models, and the evolution of preference-based methods