Compare frontier models across reasoning, coding, and speed-focused tasks.
Compare frontier models across reasoning, coding, and speed-focused tasks.
This project is part of the RAG & Agents category and is recommended for learners at Levels 4-6. Expected difficulty: Advanced
Compare frontier models with cost, latency, and red-team checks using Promptfoo.
Ship structured outputs, agent tools, and long-context workflows with GPT-5.4.
Track traces, metrics, and evaluation runs for LLM and agent workflows.