Rhesis AI | Collaborative, open-Source Gen AI testing platform

Dive Into Testing

Fast, thorough, and surprisingly painless.

app.py


                import os

from rhesis.sdk.synthesizers.multi_turn import MultiTurnSynthesizer, GenerationConfig
from rhesis.penelope import PenelopeAgent, EndpointTarget

# Set the environment variables
os.environ["RHESIS_BASE_URL"] = "https://api.rhesis.ai"
os.environ["RHESIS_API_KEY"] = "rh-XXXXXXXXXXXXXXXXXXXX"

# Step 1: Generate multi-turn test case
config = GenerationConfig(
    generation_prompt="Generate a test for an AI travel guide planning complex multi-city trips"
)

synthesizer = MultiTurnSynthesizer(config=config)
test_set = synthesizer.generate(num_tests=1)
test = test_set.tests[0]

# Step 2: Execute with Penelope
agent = PenelopeAgent()
target = EndpointTarget(endpoint_id="travel-guide-prod")

result = agent.execute_test(
    target=target,
    goal=test['test_configuration']['goal'],
    instructions=test['test_configuration']['instructions'],
    restrictions=test['test_configuration']['restrictions'],
    scenario=test['test_configuration']['scenario']
)

# Step 3: Access results
print(f"Goal Achieved: {result.goal_achieved}")
print(f"Turns: {result.turns_used}")
print(f"Duration: {result.duration_seconds:.2f}s")

for metric_name, metric_data in result.metrics.items():
    print(f"{metric_name}: {metric_data['score']}")

            

Platform

Get Your Whole Team Involved

Legal, PMs, and domain experts capture requirements in plain language. Rhesis turns them into realistic test scenarios and a review flow, so teams spot failures early & agree on what “good” looks like.

SDK

Test Without Leaving Your IDE

Integrate Rhesis directly into your development workflow. Generate, run, and analyze tests from code, then sync results back to the platform for review. Fewer context switches, safer releases.
‍

END-TO-END Solution

Full testing cycle coverage

From 'I hope this works' to 'I know this works.' Everything you need to develop and ship with confidence instead of crossed fingers.

Test Generation

Automated scenario creation at scale

Knowledge Sets

Domain-specific testing intelligence

Test Execution

Real-world simulation engine

Metrics

Clear insights, actionable results

Integrations

Works with your existing stack

END-TO-END Solution

How it works

Great AI teams know what they're shipping before users do. Let's turn testing from "crossing fingers" into something as sophisticated as your development process.

Connect application

Our API and SDK work with any Gen AI system, from simple chatbots to complex multi-agent architectures.

Generate tests

Your team defines what matters: legal requirements, business rules, edge cases. We automatically generate thousands of test scenarios based on their expertise.

Select metrics

Set quality benchmarks that actually matter to your team. Track performance, safety, compliance, and user experience with clear analytics.

Improve quality

Receive detailed analysis that help you understand exactly how your Gen AI performs before your users do.

Platypus Pond

Frequently asked questions

Everything you need to know about Rhesis AI, served with a smile.

Collaboration > Computation: Why domain experts matter more than "AI Skills"

The shift AI has brought to software development goes beyond coding assistants and faster deployments. The more fundamental change is that the people who understand the problem domain can no longer sit on the sidelines.

Harry Cruz

December 8, 2025

•

13 mins

Building MCP connections for the Rhesis platform: what I learnt about PRDs & shipping simple MVPs

It started the same way many of my engineering mistakes begin: with a beautifully over-designed document. I had spent hours writing a lengthy, thoughtful Product Requirements Document (PRD) for our Model Context Protocol (MCP) integration...

Emanuele de Rossi

December 2, 2025

•

7 mins

Our first community hour: Building together

We just hosted our first Community Hour, a new regular virtual meetup for everyone building, testing, and evaluating Gen AI agents and LLM applications. Join our growing community where testing is a collaborative conversation, not an afterthought.

Dr. Nicolai Bohn

November 7, 2025

•

3 mins

Collaborative testing for LLM & agentic apps