William Caban Babilonia's contributions
Article
Understanding evaluation collections in EvalHub
William Caban Babilonia
+2
Learn how to read an existing system collection, understand its threshold logic, and build your own collection that encodes your actual measurement strategy with thresholds that mean something.
Article
Evaluation-driven development with EvalHub
William Caban Babilonia
+1
Learn how evaluation-driven development (EDD) turns AI optimization from an art into an engineering discipline with EvalHub.
Article
EvalHub: Because "looks good to me" isn't a benchmark
William Caban Babilonia
+1
Learn about the five primary structural challenges in enterprise AI evaluation and how EvalHub addresses them with a unified foundation for AI evaluation.
Article
How EvalHub manages two-layer Kubernetes control planes
William Caban Babilonia
+4
Learn how Red Hat AI 3.4 uses EvalHub to orchestrate AI evaluations on Kubernetes. Scale frameworks like Garak and LightEval with built-in MLflow tracking.
Article
Synthetic data for RAG evaluation: Why your RAG system needs better testing
Aditi Saluja
+2
Build better RAG systems with SDG Hub. Generate high-quality question-answer-context triplets to benchmark retrievers and track LLM performance over time.
Understanding evaluation collections in EvalHub
Learn how to read an existing system collection, understand its threshold logic, and build your own collection that encodes your actual measurement strategy with thresholds that mean something.
Evaluation-driven development with EvalHub
Learn how evaluation-driven development (EDD) turns AI optimization from an art into an engineering discipline with EvalHub.
EvalHub: Because "looks good to me" isn't a benchmark
Learn about the five primary structural challenges in enterprise AI evaluation and how EvalHub addresses them with a unified foundation for AI evaluation.
How EvalHub manages two-layer Kubernetes control planes
Learn how Red Hat AI 3.4 uses EvalHub to orchestrate AI evaluations on Kubernetes. Scale frameworks like Garak and LightEval with built-in MLflow tracking.
Synthetic data for RAG evaluation: Why your RAG system needs better testing
Build better RAG systems with SDG Hub. Generate high-quality question-answer-context triplets to benchmark retrievers and track LLM performance over time.