Automatically test the same prompt across multiple AI providers (OpenAI, Anthropic, AWS Bedrock) and generate a comparison report to help teams choose the best model for specific tasks.