Building Penelope: A multi-turn testing agent