Connect EvalHub to protected production model servers

Moving machine learning model evaluations from development to production means configuring your runtime to talk to tightly protected endpoints. This practical guide shows you how to connect your EvalHub runtime to internal or external model servers using service account tokens, API keys, or custom certificates.

Series note

This is part 9 in a series covering how to build a scalable, reproducible AI evaluation infrastructure using the EvalHub project and Red Hat AI. Catch up on the other parts in the series:

The problem

EvalHub runs each evaluation in a job pod whose runtime sends requests to your model’s inference endpoint. In development, this works without interruption because model endpoints are often open or easy to reach. In production, those same endpoints are protected: Red Hat OpenShift AI model serving can require a ServiceAccount token and SubjectAccessReview; external APIs like OpenAI may need bearer tokens; any endpoint might use TLS with custom CA certificates that aren't in the default trust bundle.

When you submit an evaluation job against a protected endpoint, you might use something like:

$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "X-Tenant: my-tenant" \
  -H "Content-Type: application/json" \
  -d '{
  "name":"arc_easy evaluation",
  "model":{
    "url":"https://flan-t5-prod.apps.example.com/v1",
    "name":"google/flan-t5-small"
  },
  "benchmarks":[
    {
      "id":"arc_easy",
      "provider_id":"lm_evaluation_harness"
    }
  ]
}'

The POST succeeds (the job is accepted). If the runtime cannot authenticate to the model, the failure shows up when you fetch the job:

$ curl -X GET "$EVALHUB_URL/api/v1/evaluations/jobs/$JOB_ID" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "X-Tenant: my-tenant"

Response:

{
  "resource": {
    "id": "951d2728-3a3c-4f5d-b717-7c38d798f3e0",
    "tenant": "dataplane",
    "created_at": "2026-05-04T13:52:34.534558Z",
    "updated_at": "2026-05-04T13:52:51.302725Z",
    "owner": "system:serviceaccount:my-tenant:evalhub-evalhub-job"
  },
  "status": {
    "state": "failed",
    "message": {
      "message": "Evaluation job is failed. \nBenchmark arc_easy failed with message: Evaluation failed: 401 Client Error: Unauthorized for url: https://flan-t5-prod.apps.example.com/v1/completions\n",
      "message_code": "evaluation_job_updated"
    },
    "benchmarks": [
      {
        "provider_id": "lm_evaluation_harness",
        "id": "arc_easy",
        "benchmark_index": 0,
        "status": "failed",
        "error_message": {
          "message": "Evaluation failed: 401 Client Error: Unauthorized for url: https://flan-t5-prod.apps.example.com/v1/completions",
          "message_code": "evaluation_failed"
        }
      }
    ]
  },
  "results": {
    "benchmarks": [
      {
        "id": "arc_easy",
        "provider_id": "lm_evaluation_harness",
        "benchmark_index": 0
      }
    ]
  },
  "name": "arc_easy evaluation",
  "model": {
    "url": "https://flan-t5-prod.apps.example.com/v1",
    "name": "google/flan-t5-small"
  },
  "benchmarks": [
    {
      "id": "arc_easy",
      "provider_id": "lm_evaluation_harness"
    }
  ]
}

You authenticated to EvalHub successfully. The runtime still needs credentials to reach the model: API keys for external endpoints, tokens for internal models, or CA certificates for custom TLS.

This can be addressed in a number of ways depending on whether your model is internal or external, and what credentials it requires. The following patterns cover each case.

Note

In the examples throughout this guide, my-tenant refers to both the Kubernetes namespace and the X-Tenant header value.

How EvalHub handles authentication

Depending on your model’s setup, EvalHub might require no configuration at all, or one of three authentication modes:

No authentication: For internal models served over plain HTTP on a trusted network, point to the cluster-local URL—no configuration needed.
Implicit authentication: For models served by OpenShift AI, EvalHub uses the job pod's ServiceAccount token. OpenShift AI automatically creates a view Role for each InferenceService. You bind the EvalHub job ServiceAccount to that role via a RoleBinding, and EvalHub handles the rest. No secrets, no credentials; just RBAC.
Explicit authentication: For external APIs or models requiring API keys, you create a Kubernetes secret containing the credentials. EvalHub mounts this secret into the evaluation pod, and the runtime reads it. The secret can contain an API key, a CA certificate, or both. The credentials live in Kubernetes secrets, are mounted read-only into pods, and are accessed only by the evaluation runtime, not by EvalHub's control plane.

Pattern 1: ServiceAccount tokens for internal models

This pattern applies to models deployed on OpenShift AI model serving in the same cluster as EvalHub. OpenShift AI deploys a kube-rbac-proxy sidecar alongside each model container to enforce token-based authentication—this is specific to OpenShift AI. When you create an InferenceService, OpenShift AI automatically creates a view Role scoped to that model. The Role name follows the pattern {inferenceservice-name}-view-role, and it grants access to the InferenceService resource. Requests to the protected endpoint without a valid token are rejected with HTTP 401.

EvalHub job pods run with a dedicated ServiceAccount that is internal to EvalHub, following the pattern {evalhub-name}-{evalhub-namespace}-job. For example, if EvalHub is named evalhub and deployed in the namespace evalhub, the job ServiceAccount is evalhub-evalhub-job, created in the tenant namespace.

To find the auto-created view Role for your model:

$ kubectl get roles -n my-tenant | grep view-role
my-model-view-role

Granting the EvalHub job ServiceAccount access to the model is a one-time setup task performed by the tenant admin. The admin needs to know the EvalHub job SA name (derived from the naming convention just described) and create a RoleBinding in the model's namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: evalhub-job-model-access
  namespace: my-tenant
subjects:
- kind: ServiceAccount
  name: evalhub-evalhub-job
  namespace: my-tenant
roleRef:
  kind: Role
  name: my-model-view-role
  apiGroup: rbac.authorization.k8s.io

Note

This is a tenant admin responsibility. The RoleBinding must be created in the model's namespace (my-tenant). The EvalHub job ServiceAccount (evalhub-evalhub-job in this example) is also in the tenant namespace.

That's it. No secret to create, no configuration to change. When you submit an evaluation job, EvalHub provisions a pod with the job ServiceAccount, Kubernetes auto-mounts the token, and the evaluation runtime uses it to authenticate model requests.

The job configuration requires no authentication block:

{
  "name": "arc_easy evaluation",
  "model": {
    "url":  "https://my-internal-model.my-tenant.svc.cluster.local/v1",
    "name": "llama-3"
  },
  "benchmarks": [
    {
      "id": "arc_easy",
      "provider_id": "lm_evaluation_harness"
    }
  ]
}

Behind the scenes, the evaluation runtime reads the ServiceAccount token from /var/run/secrets/kubernetes.io/serviceaccount/token and includes it in the Authorization: Bearer header when calling the model. The kube-rbac-proxy sidecar in the predictor pod intercepts the request, validates the token against the Kubernetes API, and verifies that the ServiceAccount is authorized before forwarding the request to the model.

This pattern is specific to OpenShift AI model serving, which deploys kube-rbac-proxy as part of its model serving stack. If you are using a standalone KServe deployment or a custom model server, the authentication mechanism might differ.

Pattern 2: API keys for external models

External APIs—OpenAI, Azure OpenAI, or self-hosted vLLM instances with --api-key—require explicit credentials. To protect these credentials, create a Kubernetes secret in the tenant namespace:

$ kubectl create secret generic openai-credentials \
  --from-literal=api-key="sk-..." \
  -n my-tenant

The secret key must be named api-key (hyphen, not underscore). This is the contract between EvalHub and the evaluation runtime: if a secret is mounted, the runtime looks for a field called api-key and uses it as the bearer token.

The job configuration references the secret by name:

{
  "name":"hellaswag evaluation",
  "model":{
    "url":"https://api.openai.com/v1",
    "name":"gpt-4-turbo",
    "auth":{
      "secret_ref":"openai-credentials"
    }
  },
  "benchmarks":[
    {
      "id":"hellaswag",
      "provider_id":"lm_evaluation_harness"
    }
  ]
}

When EvalHub schedules the job, it mounts the secret read-only at /var/run/secrets/model/. The evaluation runtime reads /var/run/secrets/model/api-key and passes it as a bearer token in the Authorization header when calling the model.

Pattern 3: Combining API keys and custom CA certificates

A self-hosted vLLM deployment behind an HTTPS route might present a certificate signed by a private or corporate CA, which standard TLS clients will reject.

The fix is to provide the CA certificate to the evaluation runtime. You add it to the same Kubernetes secret, using the key name ca_cert (underscore, not hyphen):

$ kubectl create secret generic vllm-credentials \
  --from-literal=api-key="my-secret-key" \
  --from-file=ca_cert=./internal-ca.crt \
  -n my-tenant

A single secret can hold both the API key and the CA certificate. EvalHub mounts the entire secret, and the runtime uses whichever fields are present. If only the api-key exists, TLS verification uses the system CA bundle. If only ca_cert exists, the runtime uses it for TLS verification without authentication. If both exist, both are used.

The job configuration references the secret the same way, even though it now contains both credentials:

{
  "name":"hellaswag evaluation",
  "model":{
    "url":"https://vllm-secure.internal.corp/v1",
    "name":"mistral-7b",
    "auth":{
      "secret_ref":"vllm-credentials"
    }
  },
  "benchmarks":[
    {
      "id":"hellaswag",
      "provider_id":"lm_evaluation_harness"
    }
  ]
}

The runtime detects the ca_cert field and passes its path to the HTTP client as the verify parameter. For example, with the provider lm-evaluation-harness, this becomes the verify_certificate model argument:

lm_eval.evaluator - INFO - Initializing local-completions model, with arguments: {'model':....., 'verify_certificate': '/var/run/secrets/model/ca_cert'}

Putting it all together: A real-world scenario

Imagine you're evaluating three models:

Internal Llama model on OpenShift AI model serving (ServiceAccount token)
OpenAI GPT-4 (API key)
Self-hosted Mistral with custom TLS and API key (both)

You create two secrets (the internal model needs none):

# OpenAI
$ kubectl create secret generic openai-key \
  --from-literal=api-key=sk-proj-... \
  -n my-tenant
# Self-hosted Mistral
$ kubectl create secret generic mistral-auth \
  --from-literal=api-key=mistral-secret-key \
  --from-file=ca_cert=/path/to/mistral-ca.pem \
  -n my-tenant

You grant the job ServiceAccount access to the internal model:

$ kubectl create rolebinding evalhub-internal-access \
  --role=my-model-view-role \
  --serviceaccount=my-tenant:evalhub-evalhub-job \
  -n my-tenant

You submit three evaluation jobs. Each job uses the same REST API endpoint and headers; only the model block changes.

Internal Llama (ServiceAccount token):

$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "X-Tenant: my-tenant" \
  -H "Content-Type: application/json" \
  -d '{
  "name":"Llama evaluation",
  "model":{
    "url":"https://my-internal-model.my-tenant.svc.cluster.local/v1",
    "name":"llama-3"
  },
  "benchmarks":[
    {
      "id":"arc_easy",
      "provider_id":"lm_evaluation_harness"
    }
  ]
}'

OpenAI GPT-4 (API key):

$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "X-Tenant: my-tenant" \
  -H "Content-Type: application/json" \
  -d '{
  "name":"GPT-4 evaluation",
  "model":{
    "url":"https://api.openai.com/v1",
    "name":"gpt-4",
    "auth":{
      "secret_ref":"openai-key"
    }
  },
  "benchmarks":[
    {
      "id":"arc_easy",
      "provider_id":"lm_evaluation_harness"
    }
  ]
}'

Self-hosted Mistral (API key and CA certificate):

$ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "X-Tenant: my-tenant" \
  -H "Content-Type: application/json" \
  -d '{
  "name":"Mistral evaluation",
  "model":{
    "url":"https://mistral-internal.company.com/v1",
    "name":"mistral-7b",
    "auth":{
      "secret_ref":"mistral-auth"
    }
  },
  "benchmarks":[
    {
      "id":"arc_easy",
      "provider_id":"lm_evaluation_harness"
    }
  ]
}'

Check the results:

$ curl -X GET "$EVALHUB_URL/api/v1/evaluations/jobs/$JOB_ID" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "X-Tenant: my-tenant"

Response snippet:

{
  "status": {"state": "completed"},
  "results": {
    "benchmarks": [{
      "id": "arc_easy",
      "provider_id": "lm_evaluation_harness",
      "metrics": {
          "acc": 0.5063131313131313,
          "acc_norm": 0.4831649831649832,
          "acc_norm_stderr": 0.010253966261288895,
          "acc_stderr": 0.01025896566804444
        }
    }]
  }
}

All three jobs run successfully, each using the appropriate authentication method.

Troubleshooting: When authentication fails

If your deployment does not connect properly, use these common error messages and solutions to help locate the issue.

Error: "Unauthorized" or HTTP 401

This means the credentials weren't accepted by the model. Common causes:

The secret doesn't exist in the tenant namespace. Verify:
```
$ kubectl get secret <name> -n <namespace>
```
The secret key is misnamed. It must be api-key (hyphen), not apiKey or api_key. Check:
```
$ kubectl get secret <name> -o yaml
```

The API key is incorrect. Test manually:

$ curl -H "Authorization: Bearer $(kubectl get secret <name> -o jsonpath='{.data.api-key}' | base64 -d)" <model-url>/v1/models

For ServiceAccount token authentication, verify the RoleBinding exists and targets the correct ServiceAccount:

$ kubectl get rolebinding evalhub-job-model-access -n my-tenant

Error: "SSL certificate verification failed"

The model uses a self-signed certificate, and you haven't provided the CA cert. Add it to your secret:

$ kubectl patch secret <name> -n <namespace> \
  --type=merge -p="{\"data\":{\"ca_cert\":\"$(base64 -w0 ./ca.crt)\"}}"

Error: "Secret not found"

The secret must exist in the tenant namespace, not the EvalHub namespace. If you deployed EvalHub in the namespace evalhub but are submitting jobs as tenant my-tenant, the secret must be in my-tenant.

With authentication configured, EvalHub can evaluate any protected model without exposing credentials. Set up the secrets once, reference them in your job configuration, and the runtime handles the rest.

Start here

EvalHub GitHub page
EvalHub source code
EvalHub SDK (evalhub CLI, REST client, BYOF adapter, MCP server, OCI persistence)
OpenAPI specification
TrustyAI Operator (for Kubernetes/OpenShift deployment)

Connect EvalHub to protected production model servers

Authentication without compromise: A practical guide for platform engineers and ML teams evaluating production endpoints

Series note

The problem

Note

How EvalHub handles authentication

Pattern 1: ServiceAccount tokens for internal models

Note

Pattern 2: API keys for external models

Pattern 3: Combining API keys and custom CA certificates

Putting it all together: A real-world scenario

Troubleshooting: When authentication fails

Error: "Unauthorized" or HTTP 401

Error: "SSL certificate verification failed"

Error: "Secret not found"

Start here

Inference-time scaling on Red Hat AI: Improving model reliability

Optimize GPU efficiency with OpenShift AI and llm-d flow-control

Behavioral testing for AI agents

Just-in-time automated elevated access with Red Hat Ansible Automation Platform and ServiceNow ITSM

Performance analysis of storage live migration feature in Red Hat OpenShift Virtualization

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links