Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Connect EvalHub to protected production model servers

Authentication without compromise: A practical guide for platform engineers and ML teams evaluating production endpoints

June 23, 2026
Sobha Cheruku Prabhu Padashetty Narayan Sagar Gin Biak Naulak
Related topics:
AI inferenceArtificial intelligencePlatform engineeringSecurity
Related products:
Red Hat AI

    Moving machine learning model evaluations from development to production means configuring your runtime to talk to tightly protected endpoints. This practical guide shows you how to connect your EvalHub runtime to internal or external model servers using service account tokens, API keys, or custom certificates.

    Series note

    This is part 9 in a series covering how to build a scalable, reproducible AI evaluation infrastructure using the EvalHub project and Red Hat AI. Catch up on the other parts in the series:

    • Part 1: How EvalHub manages two-layer Kubernetes control planes
    • Part 2: EvalHub: Because "looks good to me" isn't a benchmark
    • Part 3: Evaluation-driven development with EvalHub
    • Part 4: Understanding evaluation collections in EvalHub
    • Part 5: Bring your own evaluation framework to EvalHub
    • Part 6: Add automated AI evaluations to your CI/CD pipeline
    • Part 7: Store immutable AI evaluation records with EvalHub and OCI
    • Part 8: Manage LLM evaluation workloads at scale with EvalHub and Kueue

    The problem

    EvalHub runs each evaluation in a job pod whose runtime sends requests to your model’s inference endpoint. In development, this works without interruption because model endpoints are often open or easy to reach. In production, those same endpoints are protected: Red Hat OpenShift AI model serving can require a ServiceAccount token and SubjectAccessReview; external APIs like OpenAI may need bearer tokens; any endpoint might use TLS with custom CA certificates that aren't in the default trust bundle.

    When you submit an evaluation job against a protected endpoint, you might use something like:

    $ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
      -H "Authorization: Bearer $USER_TOKEN" \
      -H "X-Tenant: my-tenant" \
      -H "Content-Type: application/json" \
      -d '{
      "name":"arc_easy evaluation",
      "model":{
        "url":"https://flan-t5-prod.apps.example.com/v1",
        "name":"google/flan-t5-small"
      },
      "benchmarks":[
        {
          "id":"arc_easy",
          "provider_id":"lm_evaluation_harness"
        }
      ]
    }'

    The POST succeeds (the job is accepted). If the runtime cannot authenticate to the model, the failure shows up when you fetch the job:

    $ curl -X GET "$EVALHUB_URL/api/v1/evaluations/jobs/$JOB_ID" \
      -H "Authorization: Bearer $USER_TOKEN" \
      -H "X-Tenant: my-tenant"

    Response:

    {
      "resource": {
        "id": "951d2728-3a3c-4f5d-b717-7c38d798f3e0",
        "tenant": "dataplane",
        "created_at": "2026-05-04T13:52:34.534558Z",
        "updated_at": "2026-05-04T13:52:51.302725Z",
        "owner": "system:serviceaccount:my-tenant:evalhub-evalhub-job"
      },
      "status": {
        "state": "failed",
        "message": {
          "message": "Evaluation job is failed. \nBenchmark arc_easy failed with message: Evaluation failed: 401 Client Error: Unauthorized for url: https://flan-t5-prod.apps.example.com/v1/completions\n",
          "message_code": "evaluation_job_updated"
        },
        "benchmarks": [
          {
            "provider_id": "lm_evaluation_harness",
            "id": "arc_easy",
            "benchmark_index": 0,
            "status": "failed",
            "error_message": {
              "message": "Evaluation failed: 401 Client Error: Unauthorized for url: https://flan-t5-prod.apps.example.com/v1/completions",
              "message_code": "evaluation_failed"
            }
          }
        ]
      },
      "results": {
        "benchmarks": [
          {
            "id": "arc_easy",
            "provider_id": "lm_evaluation_harness",
            "benchmark_index": 0
          }
        ]
      },
      "name": "arc_easy evaluation",
      "model": {
        "url": "https://flan-t5-prod.apps.example.com/v1",
        "name": "google/flan-t5-small"
      },
      "benchmarks": [
        {
          "id": "arc_easy",
          "provider_id": "lm_evaluation_harness"
        }
      ]
    }

    You authenticated to EvalHub successfully. The runtime still needs credentials to reach the model: API keys for external endpoints, tokens for internal models, or CA certificates for custom TLS.

    This can be addressed in a number of ways depending on whether your model is internal or external, and what credentials it requires. The following patterns cover each case.

    Note

    In the examples throughout this guide, my-tenant refers to both the Kubernetes namespace and the X-Tenant header value.

    How EvalHub handles authentication

    Depending on your model’s setup, EvalHub might require no configuration at all, or one of three authentication modes:

    • No authentication: For internal models served over plain HTTP on a trusted network, point to the cluster-local URL—no configuration needed.
    • Implicit authentication: For models served by OpenShift AI, EvalHub uses the job pod's ServiceAccount token. OpenShift AI automatically creates a view Role for each InferenceService. You bind the EvalHub job ServiceAccount to that role via a RoleBinding, and EvalHub handles the rest. No secrets, no credentials; just RBAC.
    • Explicit authentication: For external APIs or models requiring API keys, you create a Kubernetes secret containing the credentials. EvalHub mounts this secret into the evaluation pod, and the runtime reads it. The secret can contain an API key, a CA certificate, or both. The credentials live in Kubernetes secrets, are mounted read-only into pods, and are accessed only by the evaluation runtime, not by EvalHub's control plane.

    Pattern 1: ServiceAccount tokens for internal models

    This pattern applies to models deployed on OpenShift AI model serving in the same cluster as EvalHub. OpenShift AI deploys a kube-rbac-proxy sidecar alongside each model container to enforce token-based authentication—this is specific to OpenShift AI. When you create an InferenceService, OpenShift AI automatically creates a view Role scoped to that model. The Role name follows the pattern {inferenceservice-name}-view-role, and it grants access to the InferenceService resource. Requests to the protected endpoint without a valid token are rejected with HTTP 401.

    EvalHub job pods run with a dedicated ServiceAccount that is internal to EvalHub, following the pattern {evalhub-name}-{evalhub-namespace}-job. For example, if EvalHub is named evalhub and deployed in the namespace evalhub, the job ServiceAccount is evalhub-evalhub-job, created in the tenant namespace.

    To find the auto-created view Role for your model:

    $ kubectl get roles -n my-tenant | grep view-role
    my-model-view-role

    Granting the EvalHub job ServiceAccount access to the model is a one-time setup task performed by the tenant admin. The admin needs to know the EvalHub job SA name (derived from the naming convention just described) and create a RoleBinding in the model's namespace:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: evalhub-job-model-access
      namespace: my-tenant
    subjects:
    - kind: ServiceAccount
      name: evalhub-evalhub-job
      namespace: my-tenant
    roleRef:
      kind: Role
      name: my-model-view-role
      apiGroup: rbac.authorization.k8s.io

    Note

    This is a tenant admin responsibility. The RoleBinding must be created in the model's namespace (my-tenant). The EvalHub job ServiceAccount (evalhub-evalhub-job in this example) is also in the tenant namespace.

    That's it. No secret to create, no configuration to change. When you submit an evaluation job, EvalHub provisions a pod with the job ServiceAccount, Kubernetes auto-mounts the token, and the evaluation runtime uses it to authenticate model requests.

    The job configuration requires no authentication block:

    {
      "name": "arc_easy evaluation",
      "model": {
        "url":  "https://my-internal-model.my-tenant.svc.cluster.local/v1",
        "name": "llama-3"
      },
      "benchmarks": [
        {
          "id": "arc_easy",
          "provider_id": "lm_evaluation_harness"
        }
      ]
    }

    Behind the scenes, the evaluation runtime reads the ServiceAccount token from /var/run/secrets/kubernetes.io/serviceaccount/token and includes it in the Authorization: Bearer header when calling the model. The kube-rbac-proxy sidecar in the predictor pod intercepts the request, validates the token against the Kubernetes API, and verifies that the ServiceAccount is authorized before forwarding the request to the model.

    This pattern is specific to OpenShift AI model serving, which deploys kube-rbac-proxy as part of its model serving stack. If you are using a standalone KServe deployment or a custom model server, the authentication mechanism might differ.

    Pattern 2: API keys for external models

    External APIs—OpenAI, Azure OpenAI, or self-hosted vLLM instances with --api-key—require explicit credentials. To protect these credentials, create a Kubernetes secret in the tenant namespace:

    $ kubectl create secret generic openai-credentials \
      --from-literal=api-key="sk-..." \
      -n my-tenant

    The secret key must be named api-key (hyphen, not underscore). This is the contract between EvalHub and the evaluation runtime: if a secret is mounted, the runtime looks for a field called api-key and uses it as the bearer token.

    The job configuration references the secret by name:

    {
      "name":"hellaswag evaluation",
      "model":{
        "url":"https://api.openai.com/v1",
        "name":"gpt-4-turbo",
        "auth":{
          "secret_ref":"openai-credentials"
        }
      },
      "benchmarks":[
        {
          "id":"hellaswag",
          "provider_id":"lm_evaluation_harness"
        }
      ]
    }

    When EvalHub schedules the job, it mounts the secret read-only at /var/run/secrets/model/. The evaluation runtime reads /var/run/secrets/model/api-key and passes it as a bearer token in the Authorization header when calling the model.

    Pattern 3: Combining API keys and custom CA certificates

    A self-hosted vLLM deployment behind an HTTPS route might present a certificate signed by a private or corporate CA, which standard TLS clients will reject.

    The fix is to provide the CA certificate to the evaluation runtime. You add it to the same Kubernetes secret, using the key name ca_cert (underscore, not hyphen):

    $ kubectl create secret generic vllm-credentials \
      --from-literal=api-key="my-secret-key" \
      --from-file=ca_cert=./internal-ca.crt \
      -n my-tenant

    A single secret can hold both the API key and the CA certificate. EvalHub mounts the entire secret, and the runtime uses whichever fields are present. If only the api-key exists, TLS verification uses the system CA bundle. If only ca_cert exists, the runtime uses it for TLS verification without authentication. If both exist, both are used.

    The job configuration references the secret the same way, even though it now contains both credentials:

    {
      "name":"hellaswag evaluation",
      "model":{
        "url":"https://vllm-secure.internal.corp/v1",
        "name":"mistral-7b",
        "auth":{
          "secret_ref":"vllm-credentials"
        }
      },
      "benchmarks":[
        {
          "id":"hellaswag",
          "provider_id":"lm_evaluation_harness"
        }
      ]
    }

    The runtime detects the ca_cert field and passes its path to the HTTP client as the verify parameter. For example, with the provider lm-evaluation-harness, this becomes the verify_certificate model argument:

    lm_eval.evaluator - INFO - Initializing local-completions model, with arguments: {'model':....., 'verify_certificate': '/var/run/secrets/model/ca_cert'}

    Putting it all together: A real-world scenario

    Imagine you're evaluating three models:

    • Internal Llama model on OpenShift AI model serving (ServiceAccount token)
    • OpenAI GPT-4 (API key)
    • Self-hosted Mistral with custom TLS and API key (both)

    You create two secrets (the internal model needs none):

    # OpenAI
    $ kubectl create secret generic openai-key \
      --from-literal=api-key=sk-proj-... \
      -n my-tenant
    # Self-hosted Mistral
    $ kubectl create secret generic mistral-auth \
      --from-literal=api-key=mistral-secret-key \
      --from-file=ca_cert=/path/to/mistral-ca.pem \
      -n my-tenant

    You grant the job ServiceAccount access to the internal model:

    $ kubectl create rolebinding evalhub-internal-access \
      --role=my-model-view-role \
      --serviceaccount=my-tenant:evalhub-evalhub-job \
      -n my-tenant

    You submit three evaluation jobs. Each job uses the same REST API endpoint and headers; only the model block changes.

    Internal Llama (ServiceAccount token):

    $ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
      -H "Authorization: Bearer $USER_TOKEN" \
      -H "X-Tenant: my-tenant" \
      -H "Content-Type: application/json" \
      -d '{
      "name":"Llama evaluation",
      "model":{
        "url":"https://my-internal-model.my-tenant.svc.cluster.local/v1",
        "name":"llama-3"
      },
      "benchmarks":[
        {
          "id":"arc_easy",
          "provider_id":"lm_evaluation_harness"
        }
      ]
    }'

    OpenAI GPT-4 (API key):

    $ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
      -H "Authorization: Bearer $USER_TOKEN" \
      -H "X-Tenant: my-tenant" \
      -H "Content-Type: application/json" \
      -d '{
      "name":"GPT-4 evaluation",
      "model":{
        "url":"https://api.openai.com/v1",
        "name":"gpt-4",
        "auth":{
          "secret_ref":"openai-key"
        }
      },
      "benchmarks":[
        {
          "id":"arc_easy",
          "provider_id":"lm_evaluation_harness"
        }
      ]
    }'

    Self-hosted Mistral (API key and CA certificate):

    $ curl -X POST "$EVALHUB_URL/api/v1/evaluations/jobs" \
      -H "Authorization: Bearer $USER_TOKEN" \
      -H "X-Tenant: my-tenant" \
      -H "Content-Type: application/json" \
      -d '{
      "name":"Mistral evaluation",
      "model":{
        "url":"https://mistral-internal.company.com/v1",
        "name":"mistral-7b",
        "auth":{
          "secret_ref":"mistral-auth"
        }
      },
      "benchmarks":[
        {
          "id":"arc_easy",
          "provider_id":"lm_evaluation_harness"
        }
      ]
    }'

    Check the results:

    $ curl -X GET "$EVALHUB_URL/api/v1/evaluations/jobs/$JOB_ID" \
      -H "Authorization: Bearer $USER_TOKEN" \
      -H "X-Tenant: my-tenant"

    Response snippet:

    {
      "status": {"state": "completed"},
      "results": {
        "benchmarks": [{
          "id": "arc_easy",
          "provider_id": "lm_evaluation_harness",
          "metrics": {
              "acc": 0.5063131313131313,
              "acc_norm": 0.4831649831649832,
              "acc_norm_stderr": 0.010253966261288895,
              "acc_stderr": 0.01025896566804444
            }
        }]
      }
    }

    All three jobs run successfully, each using the appropriate authentication method.

    Troubleshooting: When authentication fails

    If your deployment does not connect properly, use these common error messages and solutions to help locate the issue.

    Error: "Unauthorized" or HTTP 401

    This means the credentials weren't accepted by the model. Common causes:

    • The secret doesn't exist in the tenant namespace. Verify:

      $ kubectl get secret <name> -n <namespace>
    • The secret key is misnamed. It must be api-key (hyphen), not apiKey or api_key. Check:

      $ kubectl get secret <name> -o yaml
    • The API key is incorrect. Test manually:

      $ curl -H "Authorization: Bearer $(kubectl get secret <name> -o jsonpath='{.data.api-key}' | base64 -d)" <model-url>/v1/models

    For ServiceAccount token authentication, verify the RoleBinding exists and targets the correct ServiceAccount:

    $ kubectl get rolebinding evalhub-job-model-access -n my-tenant

    Error: "SSL certificate verification failed"

    The model uses a self-signed certificate, and you haven't provided the CA cert. Add it to your secret:

    $ kubectl patch secret <name> -n <namespace> \
      --type=merge -p="{\"data\":{\"ca_cert\":\"$(base64 -w0 ./ca.crt)\"}}"

    Error: "Secret not found"

    The secret must exist in the tenant namespace, not the EvalHub namespace. If you deployed EvalHub in the namespace evalhub but are submitting jobs as tenant my-tenant, the secret must be in my-tenant.

    With authentication configured, EvalHub can evaluate any protected model without exposing credentials. Set up the secrets once, reference them in your job configuration, and the runtime handles the rest.

    Start here

    • EvalHub GitHub page
    • EvalHub source code
    • EvalHub SDK (evalhub CLI, REST client, BYOF adapter, MCP server, OCI persistence)
    • OpenAPI specification
    • TrustyAI Operator (for Kubernetes/OpenShift deployment)

    Related Posts

    • Manage LLM evaluation workloads at scale with EvalHub and Kueue

    • Store immutable AI evaluation records with EvalHub and OCI

    • Add automated AI evaluations to your CI/CD pipeline

    • Bring your own evaluation framework to EvalHub

    • Understanding evaluation collections in EvalHub

    • Evaluation-driven development with EvalHub

    Recent Posts

    • Connect EvalHub to protected production model servers

    • Building a custom Red Hat Enterprise Linux kernel for NVIDIA DGX Spark

    • SQL with GenAI: Building an Apache Iceberg lakehouse on Red Hat OpenShift

    • Right-sizing recommendations with MCOA and Perses dashboards

    • Designing distributed AI inference: Core concepts and scaling dimensions

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.