
We discuss use-cases for LLM-cluster interpretability for agent analysis, experimental design to test these pipelines, and conclusions on how we use thee techniques to analyze our own agents.

Rival proposes a new benchmark for evaluating AI-driven vulnerability SAST triage, offering data-backed validation for our agentic design. We are open-sourcing our benchmark to advance research in application security.

In this blogpost, we're excited to present 🚕 Taxi — a new tool we're developing to solve the difficulty of understanding what agents actual do at scale. Taxi is a generic, trajectory-oriented taxonomy generator that helps you make sense of your agent's behavior at scale.

A new wave of activity consistent with the Shai Hulud supply chain attack pattern is emerging right now.

Introducing Conductor: Rival Security’s reasoning engine built for real‑world complexity. Unlike today’s fragile agentic systems, Conductor delivers verifiable, scalable results and achieves breakthrough performance on Spider 2.0 — marking a serious step forward in enterprise‑grade cybersecurity AI.

Follow our journey on the new Rival Security research blog, where we share our path from early experiments to building foundational AI systems that redefine cybersecurity.

Rival is redefining AI reasoning in cybersecurity by solving real-world analytical challenges that traditional agentic systems fail to handle. In this post, we explore how Rival’s orchestrated workflows outperform state-of-the-art models on Spider 2.0 and similar complex benchmarks.