Improving Composer through real-time RL · Cursor

We apply online reinforcement learning to Composer, serving model checkpoints to production and using real user interactions as reward signals to ship an improved checkpoint multiple times a day.

Read in full here: