Moving machine learning model evaluations from development to production means configuring your runtime to talk to tightly protected endpoints. This practical guide shows you how to connect your EvalHub runtime to internal or external model servers using service account tokens, API keys, or custom certificates.
Series note
This is part 9 in a series covering how to build a scalable, reproducible AI evaluation infrastructure using the EvalHub project and Red Hat AI. Catch up on the other parts in the series:
- Part 1: How EvalHub manages two-layer Kubernetes control planes
- Part 2: EvalHub: Because "looks good to me" isn't a benchmark
- Part 3: Evaluation-driven development with EvalHub
- Part 4: Understanding evaluation collections in EvalHub
- Part 5: Bring your own evaluation framework to EvalHub
- Part 6: Add automated AI evaluations to your CI/CD pipeline
- Part 7: Store immutable AI evaluation records with EvalHub and OCI
- Part 8: Manage LLM evaluation workloads at scale with EvalHub and Kueue
The problem
EvalHub runs each evaluation in a job pod whose runtime sends requests to your model’s inference endpoint. In development, this works without interruption because model endpoints are often open or easy to reach. In production, those same endpoints are protected: Red Hat OpenShift AI model serving can require a ServiceAccount token and SubjectAccessReview; external APIs like OpenAI may need bearer tokens; any endpoint might use TLS with custom CA certificates that aren't in the default trust bundle.
When you submit an evaluation job against a protected endpoint, you might use something like:
$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
-H "Authorization: Bearer $USER_TOKEN" \
-H "X-Tenant: my-tenant" \
-H "Content-Type: application/json" \
-d '{
"name":"arc_easy evaluation",
"model":{
"url":"https://flan-t5-prod.apps.example.com/v1",
"name":"google/flan-t5-small"
},
"benchmarks":[
{
"id":"arc_easy",
"provider_id":"lm_evaluation_harness"
}
]
}'The POST succeeds (the job is accepted). If the runtime cannot authenticate to the model, the failure shows up when you fetch the job:
$ curl -X GET "$EVALHUB_URL/api/v1/evaluations/jobs/$JOB_ID" \
-H "Authorization: Bearer $USER_TOKEN" \
-H "X-Tenant: my-tenant"Response:
{
"resource": {
"id": "951d2728-3a3c-4f5d-b717-7c38d798f3e0",
"tenant": "dataplane",
"created_at": "2026-05-04T13:52:34.534558Z",
"updated_at": "2026-05-04T13:52:51.302725Z",
"owner": "system:serviceaccount:my-tenant:evalhub-evalhub-job"
},
"status": {
"state": "failed",
"message": {
"message": "Evaluation job is failed. \nBenchmark arc_easy failed with message: Evaluation failed: 401 Client Error: Unauthorized for url: https://flan-t5-prod.apps.example.com/v1/completions\n",
"message_code": "evaluation_job_updated"
},
"benchmarks": [
{
"provider_id": "lm_evaluation_harness",
"id": "arc_easy",
"benchmark_index": 0,
"status": "failed",
"error_message": {
"message": "Evaluation failed: 401 Client Error: Unauthorized for url: https://flan-t5-prod.apps.example.com/v1/completions",
"message_code": "evaluation_failed"
}
}
]
},
"results": {
"benchmarks": [
{
"id": "arc_easy",
"provider_id": "lm_evaluation_harness",
"benchmark_index": 0
}
]
},
"name": "arc_easy evaluation",
"model": {
"url": "https://flan-t5-prod.apps.example.com/v1",
"name": "google/flan-t5-small"
},
"benchmarks": [
{
"id": "arc_easy",
"provider_id": "lm_evaluation_harness"
}
]
}You authenticated to EvalHub successfully. The runtime still needs credentials to reach the model: API keys for external endpoints, tokens for internal models, or CA certificates for custom TLS.
This can be addressed in a number of ways depending on whether your model is internal or external, and what credentials it requires. The following patterns cover each case.
Note
In the examples throughout this guide, my-tenant refers to both the Kubernetes namespace and the X-Tenant header value.
How EvalHub handles authentication
Depending on your model’s setup, EvalHub might require no configuration at all, or one of three authentication modes:
- No authentication: For internal models served over plain HTTP on a trusted network, point to the cluster-local URL—no configuration needed.
- Implicit authentication: For models served by OpenShift AI, EvalHub uses the job pod's ServiceAccount token. OpenShift AI automatically creates a view Role for each InferenceService. You bind the EvalHub job ServiceAccount to that role via a RoleBinding, and EvalHub handles the rest. No secrets, no credentials; just RBAC.
- Explicit authentication: For external APIs or models requiring API keys, you create a Kubernetes secret containing the credentials. EvalHub mounts this secret into the evaluation pod, and the runtime reads it. The secret can contain an API key, a CA certificate, or both. The credentials live in Kubernetes secrets, are mounted read-only into pods, and are accessed only by the evaluation runtime, not by EvalHub's control plane.
Pattern 1: ServiceAccount tokens for internal models
This pattern applies to models deployed on OpenShift AI model serving in the same cluster as EvalHub. OpenShift AI deploys a kube-rbac-proxy sidecar alongside each model container to enforce token-based authentication—this is specific to OpenShift AI. When you create an InferenceService, OpenShift AI automatically creates a view Role scoped to that model. The Role name follows the pattern {inferenceservice-name}-view-role, and it grants access to the InferenceService resource. Requests to the protected endpoint without a valid token are rejected with HTTP 401.
EvalHub job pods run with a dedicated ServiceAccount that is internal to EvalHub, following the pattern {evalhub-name}-{evalhub-namespace}-job. For example, if EvalHub is named evalhub and deployed in the namespace evalhub, the job ServiceAccount is evalhub-evalhub-job, created in the tenant namespace.
To find the auto-created view Role for your model:
$ kubectl get roles -n my-tenant | grep view-role
my-model-view-roleGranting the EvalHub job ServiceAccount access to the model is a one-time setup task performed by the tenant admin. The admin needs to know the EvalHub job SA name (derived from the naming convention just described) and create a RoleBinding in the model's namespace:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: evalhub-job-model-access
namespace: my-tenant
subjects:
- kind: ServiceAccount
name: evalhub-evalhub-job
namespace: my-tenant
roleRef:
kind: Role
name: my-model-view-role
apiGroup: rbac.authorization.k8s.ioNote
This is a tenant admin responsibility. The RoleBinding must be created in the model's namespace (my-tenant). The EvalHub job ServiceAccount (evalhub-evalhub-job in this example) is also in the tenant namespace.
That's it. No secret to create, no configuration to change. When you submit an evaluation job, EvalHub provisions a pod with the job ServiceAccount, Kubernetes auto-mounts the token, and the evaluation runtime uses it to authenticate model requests.
The job configuration requires no authentication block:
{
"name": "arc_easy evaluation",
"model": {
"url": "https://my-internal-model.my-tenant.svc.cluster.local/v1",
"name": "llama-3"
},
"benchmarks": [
{
"id": "arc_easy",
"provider_id": "lm_evaluation_harness"
}
]
}Behind the scenes, the evaluation runtime reads the ServiceAccount token from /var/run/secrets/kubernetes.io/serviceaccount/token and includes it in the Authorization: Bearer header when calling the model. The kube-rbac-proxy sidecar in the predictor pod intercepts the request, validates the token against the Kubernetes API, and verifies that the ServiceAccount is authorized before forwarding the request to the model.
This pattern is specific to OpenShift AI model serving, which deploys kube-rbac-proxy as part of its model serving stack. If you are using a standalone KServe deployment or a custom model server, the authentication mechanism might differ.
Pattern 2: API keys for external models
External APIs—OpenAI, Azure OpenAI, or self-hosted vLLM instances with --api-key—require explicit credentials. To protect these credentials, create a Kubernetes secret in the tenant namespace:
$ kubectl create secret generic openai-credentials \
--from-literal=api-key="sk-..." \
-n my-tenantThe secret key must be named api-key (hyphen, not underscore). This is the contract between EvalHub and the evaluation runtime: if a secret is mounted, the runtime looks for a field called api-key and uses it as the bearer token.
The job configuration references the secret by name:
{
"name":"hellaswag evaluation",
"model":{
"url":"https://api.openai.com/v1",
"name":"gpt-4-turbo",
"auth":{
"secret_ref":"openai-credentials"
}
},
"benchmarks":[
{
"id":"hellaswag",
"provider_id":"lm_evaluation_harness"
}
]
}When EvalHub schedules the job, it mounts the secret read-only at /var/run/secrets/model/. The evaluation runtime reads /var/run/secrets/model/api-key and passes it as a bearer token in the Authorization header when calling the model.
Pattern 3: Combining API keys and custom CA certificates
A self-hosted vLLM deployment behind an HTTPS route might present a certificate signed by a private or corporate CA, which standard TLS clients will reject.
The fix is to provide the CA certificate to the evaluation runtime. You add it to the same Kubernetes secret, using the key name ca_cert (underscore, not hyphen):
$ kubectl create secret generic vllm-credentials \
--from-literal=api-key="my-secret-key" \
--from-file=ca_cert=./internal-ca.crt \
-n my-tenantA single secret can hold both the API key and the CA certificate. EvalHub mounts the entire secret, and the runtime uses whichever fields are present. If only the api-key exists, TLS verification uses the system CA bundle. If only ca_cert exists, the runtime uses it for TLS verification without authentication. If both exist, both are used.
The job configuration references the secret the same way, even though it now contains both credentials:
{
"name":"hellaswag evaluation",
"model":{
"url":"https://vllm-secure.internal.corp/v1",
"name":"mistral-7b",
"auth":{
"secret_ref":"vllm-credentials"
}
},
"benchmarks":[
{
"id":"hellaswag",
"provider_id":"lm_evaluation_harness"
}
]
}The runtime detects the ca_cert field and passes its path to the HTTP client as the verify parameter. For example, with the provider lm-evaluation-harness, this becomes the verify_certificate model argument:
lm_eval.evaluator - INFO - Initializing local-completions model, with arguments: {'model':....., 'verify_certificate': '/var/run/secrets/model/ca_cert'}Putting it all together: A real-world scenario
Imagine you're evaluating three models:
- Internal Llama model on OpenShift AI model serving (ServiceAccount token)
- OpenAI GPT-4 (API key)
- Self-hosted Mistral with custom TLS and API key (both)
You create two secrets (the internal model needs none):
# OpenAI
$ kubectl create secret generic openai-key \
--from-literal=api-key=sk-proj-... \
-n my-tenant
# Self-hosted Mistral
$ kubectl create secret generic mistral-auth \
--from-literal=api-key=mistral-secret-key \
--from-file=ca_cert=/path/to/mistral-ca.pem \
-n my-tenantYou grant the job ServiceAccount access to the internal model:
$ kubectl create rolebinding evalhub-internal-access \
--role=my-model-view-role \
--serviceaccount=my-tenant:evalhub-evalhub-job \
-n my-tenantYou submit three evaluation jobs. Each job uses the same REST API endpoint and headers; only the model block changes.
Internal Llama (ServiceAccount token):
$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
-H "Authorization: Bearer $USER_TOKEN" \
-H "X-Tenant: my-tenant" \
-H "Content-Type: application/json" \
-d '{
"name":"Llama evaluation",
"model":{
"url":"https://my-internal-model.my-tenant.svc.cluster.local/v1",
"name":"llama-3"
},
"benchmarks":[
{
"id":"arc_easy",
"provider_id":"lm_evaluation_harness"
}
]
}'OpenAI GPT-4 (API key):
$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
-H "Authorization: Bearer $USER_TOKEN" \
-H "X-Tenant: my-tenant" \
-H "Content-Type: application/json" \
-d '{
"name":"GPT-4 evaluation",
"model":{
"url":"https://api.openai.com/v1",
"name":"gpt-4",
"auth":{
"secret_ref":"openai-key"
}
},
"benchmarks":[
{
"id":"arc_easy",
"provider_id":"lm_evaluation_harness"
}
]
}'Self-hosted Mistral (API key and CA certificate):
$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
-H "Authorization: Bearer $USER_TOKEN" \
-H "X-Tenant: my-tenant" \
-H "Content-Type: application/json" \
-d '{
"name":"Mistral evaluation",
"model":{
"url":"https://mistral-internal.company.com/v1",
"name":"mistral-7b",
"auth":{
"secret_ref":"mistral-auth"
}
},
"benchmarks":[
{
"id":"arc_easy",
"provider_id":"lm_evaluation_harness"
}
]
}'Check the results:
$ curl -X GET "$EVALHUB_URL/api/v1/evaluations/jobs/$JOB_ID" \
-H "Authorization: Bearer $USER_TOKEN" \
-H "X-Tenant: my-tenant"Response snippet:
{
"status": {"state": "completed"},
"results": {
"benchmarks": [{
"id": "arc_easy",
"provider_id": "lm_evaluation_harness",
"metrics": {
"acc": 0.5063131313131313,
"acc_norm": 0.4831649831649832,
"acc_norm_stderr": 0.010253966261288895,
"acc_stderr": 0.01025896566804444
}
}]
}
}All three jobs run successfully, each using the appropriate authentication method.
Troubleshooting: When authentication fails
If your deployment does not connect properly, use these common error messages and solutions to help locate the issue.
Error: "Unauthorized" or HTTP 401
This means the credentials weren't accepted by the model. Common causes:
The secret doesn't exist in the tenant namespace. Verify:
$ kubectl get secret <name> -n <namespace>The secret key is misnamed. It must be api-key (hyphen), not apiKey or api_key. Check:
$ kubectl get secret <name> -o yamlThe API key is incorrect. Test manually:
$ curl -H "Authorization: Bearer $(kubectl get secret <name> -o jsonpath='{.data.api-key}' | base64 -d)" <model-url>/v1/models
For ServiceAccount token authentication, verify the RoleBinding exists and targets the correct ServiceAccount:
$ kubectl get rolebinding evalhub-job-model-access -n my-tenantError: "SSL certificate verification failed"
The model uses a self-signed certificate, and you haven't provided the CA cert. Add it to your secret:
$ kubectl patch secret <name> -n <namespace> \
--type=merge -p="{\"data\":{\"ca_cert\":\"$(base64 -w0 ./ca.crt)\"}}"Error: "Secret not found"
The secret must exist in the tenant namespace, not the EvalHub namespace. If you deployed EvalHub in the namespace evalhub but are submitting jobs as tenant my-tenant, the secret must be in my-tenant.
With authentication configured, EvalHub can evaluate any protected model without exposing credentials. Set up the secrets once, reference them in your job configuration, and the runtime handles the rest.
Start here
- EvalHub GitHub page
- EvalHub source code
- EvalHub SDK (
evalhubCLI, REST client, BYOF adapter, MCP server, OCI persistence) - OpenAPI specification
- TrustyAI Operator (for Kubernetes/OpenShift deployment)