3 lessons for building reliable ServiceNow AI integrations

In this post, I share three critical lessons learned from building a Model Context Protocol (MCP)-powered AI agent that integrates with ServiceNow for automated laptop refresh requests. This agent is part of the it-self-service-agent AI quickstart. Whether you're a developer building AI integrations, an IT administrator managing ServiceNow instances, or an architect designing AI-driven enterprise workflows, these insights can help you avoid common pitfalls and build more reliable enterprise AI solutions.

Read this post to learn:

How to structure testing environments for enterprise integrations without impacting production systems
Best practices for implementing safeguards in AI-powered automation tools
A phased approach to deploy enterprise AI integrations from development to production

About AI quickstarts

AI quickstarts are a catalog of ready-to-run industry-specific use cases for your Red Hat AI environment. They provide a fast, hands-on way to see how AI powers solutions on reliable, open source infrastructure. To learn more, read AI quickstarts: An easy and practical way to get started with Red Hat AI.

This is the seventh post in a series covering what we learned while developing the it-self-service-agent AI quickstart. Catch up on the previous parts in the series:

Part 1: AI quickstart: Self-service agent for IT process automation
Part 2: AI meets you where you are: Slack, email & ServiceNow
Part 3: Prompt engineering: Big vs. small prompts for AI agents
Part 4: Automate AI agents with the Responses API in Llama Stack
Part 5: Eval-driven development: Build and evaluate reliable AI agents
Part 6: Distributed tracing for agentic workflows with OpenTelemetry
Part 7: 3 lessons for building reliable ServiceNow AI integrations
Part 8: Guardrails: Enterprise safety shields with Llama Stack
Part 9: Deploy with confidence: Continuous integration and continuous delivery for agentic AI

ServiceNow APIs and integration overview

Before diving into the lessons learned, it's helpful to understand the ServiceNow APIs we used and how our MCP server connects to them.

ServiceNow REST API integration

Our MCP ServiceNow server (mcp-servers/snow/src/snow/servicenow/client.py) integrates with four primary ServiceNow REST API endpoints.

User management API

Endpoint: /api/now/table/sys_user
Purpose: Retrieves user information by email address for laptop assignment.
Query pattern: Uses sysparm_query=email={email} to find users.
Key fields: Returns sys_id, name, email, user_name, location, and active status.

Configuration management database API

Endpoint: /api/now/table/cmdb_ci_computer
Purpose: Retrieves computer/laptop assets assigned to specific users.
Query pattern: Filters by assigned_to={user_sys_id} field to find user's assigned computers.
Key fields: Returns asset details including model_id, serial_number, purchase_date, warranty_expiration, and assignment status.

Service catalog API

Endpoint: /api/sn_sc/servicecatalog/items/{laptop_refresh_id}/order_now
Purpose: Creates new laptop refresh requests through ServiceNow's service catalog.
Request format: Uses structured JSON with sysparm_quantity and variables containing laptop choice and requester information.
Integration: Automatically links requests to users and populates required catalog item variables.

Service request item API

Endpoint: /api/now/table/sc_req_item
Purpose: Queries existing open laptop refresh requests for duplicate detection
Query pattern: Uses complex queries filtering by user, request state, and catalog item
Safeguard function: Enables intelligent duplicate prevention and rate limiting

Authentication and connection setup

The MCP server uses API key authentication:

Authentication method: API key with a configurable header name (default: x-sn-apikey).
Environment variables: SERVICENOW_INSTANCE_URL, SERVICENOW_API_KEY_HEADER, and token passed via the request header.
Configuration: Supports configurable timeout values and debug mode.
Security: All API communications use HTTPS with proper request headers.

Data population and testing scripts

To support both development and production ServiceNow environments, we created bootstrap scripts in scripts/servicenow-bootstrap that automate the setup process.

Bootstrap script architecture

Our ServiceNow bootstrap system consists of multiple specialized scripts orchestrated by a main setup script.

Main orchestration (setup.py):

Coordinates the ServiceNow Personal Developer Instance (PDI) setup process.
Supports selective execution with skip flags for different components.
Loads configuration from JSON files and overrides sensitive values with environment variables.
Validates that a fresh instance has the required configuration.

Running ServiceNow bootstrap setup with config...
============================================================
🤖 ServiceNow PDI Setup Automation
============================================================
📋 Loading configuration...
✅ Configuration loaded and validated successfully!
🎯 Setup target: https://dev183500.service-now.com/
👤 Admin user: admin
🤖 Agent user: mcp_agent
📦 Catalog item: PC Refresh
📝 Steps to execute:
   1. Create MCP Agent user
   2. Configure API keys and authentication
   3. Create PC Refresh catalog item
   4. Create evaluation users and test data

Lesson 1: Architect your testing strategy around external service mocking

Like many enterprise integration projects, our first iteration of the self-service AI agent used mocked data. This approach allowed us to develop and test the core functionality without making actual calls to ServiceNow systems that could impact production workflows or create test tickets.

Our initial MCP ServiceNow server exposed two essential tools:

Asset retrieval tool: Fetches user laptop information from the ServiceNow configuration management database.
Ticket creation tool: Opens laptop refresh requests through the ServiceNow incident management system.

The problem with embedded mocking

In our second iteration, we began the ServiceNow integration. This required our MCP server to support 2 operation modes: Mock mode for CI/CD tests and development, and Production mode for the ServiceNow integration.

Initially, we embedded this logic into our MCP server code, but this created several challenges. Mixed code paths increased debugging complexity and made it difficult to isolate issues. It also added maintenance overhead, as changes to mock logic required updates to the core MCP server. Furthermore, the scattered conditional logic reduced code clarity and made it harder to understand the purpose of each section.

The solution: External mock services

A better solution came from treating the mock as a completely separate service. Instead of embedding mock logic in our MCP server, we created a lightweight web server that mimics ServiceNow's API responses. You can see the implementation in it-self-service-agent/mock-service-now/src/mock_servicenow/server.py.

This architecture provides four key benefits that streamline the development and testing process.

Unified code paths

The same MCP server code executes in both development and production environments. The only difference is the target URL specified in the SERVICENOW_URL environment variable.

Simplified debugging

When issues arise, we can be confident they're not caused by mock-specific logic, since the same code runs in both environments.

Realistic testing

The mock service responds with actual ServiceNow API response formats, ensuring our integration code handles real-world data structures correctly.

Easy CI/CD integration

Our continuous integration pipeline can run the full test suite without requiring access to a ServiceNow instance or generating test tickets that need cleanup.

INFO:     Started server process [2]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
{"email": "alice.johnson@company.com", "user_name": "Alice Johnson", "event": "Found user for email" }
INFO:     10.129.7.181:43606 - "GET /api/now/table/sys_user?sysparm_query=email=alice.johnson@company.com HTTP/1.1" 200 OK
{"computer_count": 1, "user_sys_id": "1001", "event": "Found computers for user" }
INFO:     10.129.7.181:43608 - "GET /api/now/table/cmdb_ci_computer?sysparm_query=assigned_to=1001 HTTP/1.1" 200 OK

Lesson 2: Implement safeguards for AI automation

AI agents can automate repetitive tasks at scale, but this capability introduces risks. When an agent creates ServiceNow tickets, modifies database records, or triggers workflows, you need robust safeguards to prevent unintended consequences and automation abuse.

In our case, the MCP tools exposed by our ServiceNow server perform write operations that impact business processes. Without proper controls, we could face scenarios like:

Ticket flooding: An AI agent might create dozens of duplicate requests due to repeated user queries.
Resource waste: IT teams can become overwhelmed with unnecessary tickets.
System performance: Excessive API calls can slow ServiceNow instances.
Compliance issues: Automated requests that skip procedures can clutter audit trails.

Building smart automation controls

We implemented two key safeguards as optional environment variables, allowing organizations to customize protection based on their needs.

Duplicate ticket prevention

When you enable SERVICENOW_LAPTOP_AVOID_DUPLICATES, the MCP tool performs a pre-check before creating a ticket.

How it works: When the safeguard is active, the tool first queries ServiceNow for open tickets that match the specific laptop model. If an existing ticket is found, the agent returns that information to the user instead of creating a duplicate. The system then logs the prevention action for audit purposes and explains to the user why a new ticket was not required.

Implementing this safeguard reduces the IT workload by preventing duplicate requests and maintains data quality within ServiceNow. It also improves the user experience by connecting users to relevant existing tickets rather than leaving them to wait for a new one

# Step 2: Check if there's already an open request for the same laptop model
current_laptop_model = params.laptop_choices
has_existing_model_request, existing_request = (
	self._has_existing_request_for_laptop_model(
		existing_requests, current_laptop_model
	)
)

Rate limiting per user

This SERVICENOW_LAPTOP_REQUEST_LIMITS safeguard prevents a user from overwhelming the system.

How it works: This tool tracks the number of open tickets per user across all laptop models. Before creating a ticket, it checks if the user has reached their limit. If the limit is met, the agent provides an error message explaining the restriction and next steps. The limit resets automatically when tickets are closed.

Implementation considerations:

User identification can be based on email, employee ID, or Active Directory integration.
Limits should be configurable based on organization size and typical usage patterns.
Consider different limits for different user roles (standard users versus IT administrators).

# Step 3: Check if adding a new request would exceed the limits
if self._would_exceed_request_limit(existing_requests):
	return {
		"success": False,
		"message": f"Cannot open new laptop request. User already has {len(existing_requests)} open request(s), which meets or exceeds the limit of {self.laptop_request_limits}.",
		"data": {
			"existing_requests": existing_requests,
			"limit": self.laptop_request_limits,
		},
	}

Configuration flexibility for enterprise needs

Both safeguards are disabled by default, which allows organizations to manage their own implementation:

# Enable duplicate prevention only
SERVICENOW_LAPTOP_AVOID_DUPLICATES=true
# Enable rate limiting with custom threshold
SERVICENOW_LAPTOP_REQUEST_LIMITS=3
# Enable both safeguards
SERVICENOW_LAPTOP_AVOID_DUPLICATES=true
SERVICENOW_LAPTOP_REQUEST_LIMITS=5

This approach allows you to test the AI agent without restrictions during initial deployment and gradually enable safeguards as usage patterns emerge. You can customize limits based on your operational requirements and maintain flexibility as your business needs evolve.

Lesson 3: Design a phased implementation roadmap for enterprise AI adoption

Building reliable AI agents for enterprise systems requires more than connecting APIs. While you might want to move quickly from prototype to production, successful implementations depend on systematic planning and incremental deployment strategies. This approach balances speed with stability.

Many organizations struggle with AI projects because they move too fast, rushing to production without adequate testing and causing system outages or data quality issues. Others move too slow, getting stuck in planning phases while business value remains unrealized. Often, these projects lack structure, resulting in point solutions that do not consider long-term maintenance or scalability.

The solution: Use a structured phased deployment

Our approach focused on three interconnected pillars:

Architecture decisions that support development and production workflows.
Proactive safeguards that prevent automation abuse while maintaining flexibility.
Testing strategies that validate real-world scenarios without impacting production systems.

Recommended implementation approach

When building your own MCP integrations, consider the following phased approach.

Phase 1: Foundation (Weeks 1-2):

Design your external mock services to match target system APIs.
Implement basic MCP tools with minimal safeguards.
Establish a CI/CD pipeline with mock-based testing.

Phase 2: Production integration (Weeks 3-4)

Connect to production systems using environment variable configurations.
Enable duplicate prevention and other safeguards appropriate to your use case.
Monitor usage patterns and adjust rate limits accordingly.

Phase 3: Optimization (Week 5 and beyond)

Fine-tune safeguards based on usage data.
Add security measures as needed.
Scale horizontally by adding MCP tools for other enterprise systems.

Next steps

Try it yourself! Run the AI quickstart (60-90 minutes) to deploy a working multi-agent system.

Save time: You can have a working system in under 90 minutes rather than spending weeks building orchestration and evaluation frameworks from scratch. Start in testing mode to explore the system, then switch to production mode using Knative Eventing and Kafka when you are ready to scale.
What you'll learn: Production patterns for AI agent systems that apply beyond IT automation, such as how to test non-deterministic systems, implement distributed tracing for asynchronous AI workflows, integrate LLMs with enterprise systems safely, and design for scale. These patterns transfer to any agentic AI project.
Customization path: The laptop refresh agent is just one example. The same framework supports Privacy Impact Assessments, Request for Proposal (RFP) generation, access requests, software licensing, or your own custom IT processes. Swap the specialist agent, add your own MCP servers for different integrations, customize the knowledge base and define your own evaluation metrics.

Learn more

If this blog post sparked your interest in the IT self-service agent AI quickstart, here are additional resources.

Browse the AI quickstarts catalog for other production-ready use cases, including fraud detection, document processing, and customer service automation.
Questions? Open an issue in the it-self-service-agent GitHub repository.
Learn more about the tech stack:
Read Part 8: Guardrails: Enterprise safety shields with Llama Stack

Last updated: May 18, 2026