
Speakers: Matteo Camilli and Simone Corbo
21 Novembre 2025 | 14:30
DEIB - Aula Beta (Ed. 24)
On line by Teams
Contatti: Prof. Giovanni Quattrocchi
Sommario
On Friday, November 21, 2025, at 11:30 am will we have a new seminar of DEEPSE Forum series titled "Testing Rogue AI for Harmful Content Degeneration".Large Language Models (LLMs) increasingly shape our daily work and personal lives—yet even the most aligned, mainstream models can still produce surprising, uncomfortable, or harmful outputs when pushed in the right (or wrong) ways. In this talk, we present the core breakthrough of our recent research: EvoTox, an automated, search-based testing framework that evolves natural, human-like prompts to probe where modern LLMs can still "break bad." Instead of relying on manual red-teaming or jailbreak, EvoTox systematically uncovers subtle linguistic pressure points that can trigger toxicity even in strongly aligned systems. Our results show that EvoTox reveals patterns of vulnerability across models of different scales—from compact to large scale LLMs—highlighting where alignment succeeds, where it cracks, and why it matters. The talk traces the story behind EvoTox, important challenges we face, our current findings, and how this line of research opens new directions for quality assurance of AI systems considering ethical concerns.
