Using GRPO to Beat o1, o3-mini and R1 at "Temporal Clue" - OpenPipe

Convert expensive LLM prompts into fast, cheap fine-tuned models

Read in full here: