The latest updates from Rhesis AI.

Our Blog

Blog post image

How to Build Your Own Custom LLM Evaluation Metric

LLM-as-a-judge is an evaluation approach where a language model is used to assess the quality of another model’s output. Instead of relying solely on human annotators, an LLM is prompted to evaluate a response according to predefined criteria such as correctness, helpfulness, or relevance.
Emanuele de Rossi
December 28, 2025
13 mins
Blog post image

An Engineer’s Guide for Testing Conversational AI

This guide walks through everything you need to know about testing conversational AI from the perspective of a developer who needs to ship a production-ready system. You'll learn what makes these systems unique to test, which metrics actually signal quality versus noise...
Dr. Harry Cruz
December 26, 2025
25 mins
Blog post image

Collaboration > Computation: Why domain experts matter more than "AI Skills"

The shift AI has brought to software development goes beyond coding assistants and faster deployments. The more fundamental change is that the people who understand the problem domain can no longer sit on the sidelines.
Dr. Harry Cruz
December 8, 2025
13 mins
Blog post image

Building MCP connections for the Rhesis platform: what I learnt about PRDs & shipping simple MVPs

It started the same way many of my engineering mistakes begin: with a beautifully over-designed document. I had spent hours writing a lengthy, thoughtful Product Requirements Document (PRD) for our Model Context Protocol (MCP) integration...
Emanuele de Rossi
December 2, 2025
7 mins
Blog post image

Our first community hour: Building together

We just hosted our first Community Hour, a new regular virtual meetup for everyone building, testing, and evaluating Gen AI agents and LLM applications. Join our growing community where testing is a collaborative conversation, not an afterthought.
Dr. Nicolai Bohn
November 7, 2025
3 mins
Blog post image

Self-hosting Rhesis with Docker compose: Our journey to a one-command setup

A behind-the-scenes look at how we made Rhesis run anywhere and what we learned along the way. It started with a simple question from our first Objectives & Roadmap session: "Can I run Rhesis on my laptop without dealing with cloud credentials?"
Md Asaduzzaman Miah
November 4, 2025
11 mins
Blog post image

From enterprise SaaS to open source: Why we rebranded Rhesis AI

Discover how Rhesis AI pivoted from enterprise SaaS to open source, what drove the rebrand, and the lessons every AI startup can learn about aligning brand, product, and community.
Dr. Nicolai Bohn
October 28, 2025
8 mins
Blog post image

Ensuring Trustworthy AI: Why Quality Assurance Matters

Artificial Intelligence (AI) is transforming numerous sectors, profoundly impacting task performance and decision-making processes. However, as AI's prevalence increases, so does the need for trustworthiness, i.e., ensuring that AI applications operate as intended and meet required quality standards.
Dr. Nicolai Bohn
September 29, 2025
8 mins
Blog post image

LLM Chatbots in the Insurance Industry: Are they Trustworthy?

As Gen AI technology, particularly Large Language Models (LLMs), continues to shape industries across sectors, it is crucial to understand how these applications perform in real-world scenarios and assess their overall quality and trustworthiness.
Dr. Nicolai Bohn
September 29, 2025
7 mins

Join our community

Learn how Rhesis is powering safer Gen AI deployments and driving innovation in production readiness. Join our community on Discord.