The evolution of agentic AI and text-to-SQL

Welcome back to our series on building a conversational analyst with Red Hat OpenShift AI and EnterpriseDB's (EDB's) PG Airman MCP server. In Integrate OpenShift AI and PG Airman MCP Server, we discussed the need for making relational databases accessible to non-technical users and successfully deployed our Data Governance Copilot AI quickstart. In this article, we will explore the exact meaning of agentic AI, how it differs from traditional retrieval-augmented generation (RAG), and how modern LLM-based agents address AI's decades-old text-to-SQL challenge.

Series note

This is part 2 in a series covering how to build a governance-aware, agentic conversational analyst for PostgreSQL using Red Hat OpenShift AI and EnterpriseDB's (EDB's) PG Airman MCP server. Read the other parts in the series:

Part 1: Integrate OpenShift AI and PG Airman MCP Server
Part 2: The evolution of agentic AI and text-to-SQL
Part 3: How to integrate OpenShift AI and PG Airman MCP Server
Part 4: Deploy secure agentic AI: Protocols and performance tuning

Agentic AI versus chat and RAG

People often toss around the terms agentic AI and agentic loop in AI discussions. You might have wondered about their exact meaning, how they differ from simply using a chat interface to an LLM or an application built using RAG, or whether there is a universal pattern for implementing agentic systems.

As far back as the origins of artificial intelligence, computer scientists sought to transition their programs from simple tools to assistants capable of agency or autonomy, endowing them with the ability to learn from past mistakes and map high-level requests and goals to concrete plans machines could execute.

Today, this distinction of agency is not considered monolithic or binary—systems achieve aspects of agency in varying degrees. In the context of LLMs, the key idea that separates an agentic conversational analyst from a human using an LLM-backed chat tool or a RAG application is the degree to which the LLM can autonomously interact with the target database to answer questions.

Direct chat with prompt engineering

In a typical chat-based setting, the user submits the query and any relevant information to the LLM at once and waits for an answer. Though users can apply multiple prompt engineering techniques that provide examples to the LLM (few-shot prompting) or step-by-step reasoning (chain of thought prompting), the interaction is fundamentally unchanged. The LLM takes the next turn in the conversation and provides an answer. If the answer is incorrect (that is, if the LLM-generated SQL statement is invalid) or the LLM needs more information about the database, it's the user's responsibility to provide this information and resubmit the query.

RAG applications

RAG-based systems elevate this state of affairs slightly by anticipating and providing the additional information the LLM might need through an automated augmentation pipeline; but the orchestration is pre-wired, bespoke, and largely dependent on semantic search.

Agentic AI

In the agentic setting, programmers abstract this orchestration much further by handing their toolbox to the LLM and letting it decide which tools to use, how to use them, and the sequence. In a conversational analyst, the LLM becomes responsible for learning the database schema, generating a sequence of SQL statements and other code to answer user queries, and processing the results of these statements in a feedback loop that provides it the opportunity to analyze the data and revise its approach toward the correct outcome. The core difference is that RAG follows a fixed retrieve-then-respond pipeline; whereas agentic systems let the LLM determine, execute, and evaluate a plan of approach autonomously.

Agentic systems are not a completely open act. An orchestrator and the MCP protocol mediate interaction between the LLM and backend tools to keep the LLM within its boundaries. The LLM is like a magician, talented and unscripted; but confined to a stage and handed only certain props (or tools) the show's proprietor (you) has deemed safe, always with a nearby Vaudeville hook ready to yank it off the stage if the show strays too far off-book.

These distinctions are not merely academic. They're about getting results. The authors of Spider 2, the much more difficult successor to the Spider benchmark widely used for evaluating the text-to-SQL task, show that agentic-based LLM systems (like our AI quickstart) result in a four-fold improvement in accuracy over single-turn, one-shot approaches.

A final key distinction is that when most RAG systems incorporate relational data, they typically use the data to augment predefined operational tasks, such as HR self-service or proposal writing. Conversational analytics systems, by contrast, focus purely on data analysis as their primary objective.

The modern agentic AI approach to text-to-SQL

The modern agentic AI approach to text-to-SQL: Is it a new variation on an old magic trick? Researchers and engineers have presented agentic solutions to the text-to-SQL task in AI for over four decades. To evaluate a new technology within your organization and make informed risk benefit decisions, you need balanced information that builds on what you already know with what's new and different. Hence, an important question this article series addresses is how modern agentic solutions to the text-to-SQL task compare with prior approaches, some of which you might have tried in your organization with potentially poor outcomes you don't want to repeat.

Some researchers have opted to treat modern agentic AI as an overlay on the existing language and terminology developed over its decades-long history. This approach begins by distilling all agentic applications as examples of systems that can sense their environment, reason, plan, and take actions toward some goal. To unify these systems with mathematical rigor, researchers typically formalize these ideas with conceptual frameworks such as belief-desire-intention (BDI), partially observable Markov decision processes (POMDP), and many others.

Hence, through this lens, modern LLM-based agentic systems are just another implementation variation on an old idea. For example, one or more LLMs orchestrated by a host application satisfies the reasoning component of agentic AI today. Whereas in 1982, researchers at SRI mapped this component to a Prolog program generated from a subset of English using a formal grammar.

Alternatively, some authors have argued that such overlay approaches are a type of conceptual retrofitting that impedes progress and adoption by confusing fundamentally different technologies with vastly different potential and risk profiles.

Used wisely, however, both of these mindsets provide valid insight to help you evaluate agentic AI within your organization. If you are an engineer or researcher with prior experience in earlier agentic AI projects, you have a head start understanding where technologies like LLMs, MCP, and Llama Stack support agentic applications today, but it's also important to recognize that modern LLM-based techniques address serious and fundamental shortcomings of earlier approaches.

The following table summarizes the evolution of agentic text-to-SQL research over the last four decades. This will help put things into context so we can better appreciate how LLMs alter the text-to-SQL landscape.

Approach	Year	Example	Characteristics	Benefits and Drawbacks
Grammar and logic based	1982	Chat-80	Used templates to map a subset of English (<150 words) to prolog statements	Limited expressivity and schema specific results
Inductive logic	1996	CHILL	Corpus driven technique that learns parser for natural language using positive/negative samples	Requires schema-specific samples, computationally expensive, brittle, with poor expressivity
Deep neural networks (LSTMs), graph neural networks (GNNs) and reinforcement learning	2017, 2019	Seq2SQL, Spider-schema-GNN	Applied natural language translation techniques to the small output space of SQL and database schema terms	Schema-specific learning, brittle with limited expressivity
Many hybrid neural network techniques that combined earlier seq-2-seq models with transformer encoders (pre-trained language models or PLMs), GNNs and LSTMs	2019-2022	SQLova, RAT-SQL	Used hybrid transformer-LSTM network to improve accuracy	Improved schema linking (that is, mapping between English word variants and table/column names), but performance still brittle. Effective schema linking still lacking sufficient background knowledge.
LLM-based agents	>2022	Survey	LLM-based approach with prompt-engineering / in-context learning	Provides state-of-the-art performance without fine-tuning; vastly improved schema-linking and incorporation of world knowledge.

Powering conversational analytics with LLM-based AI agents

Today's LLM-powered AI agents fundamentally shift how organizations approach conversational analytics in several key areas.

It improves schema linking. LLMs address a long-standing issue with schema-linking, the task of mapping a user's natural language terms to database tables, columns, operators, and functions. For example, given the query "How much did we make last year?", the system must understand that the query maps to the concept of revenue and that summation over the amt column in the cust_sales_2025 materialized view leads to the answer in 2026. Effective schema linking requires information outside the schema and found in background or world knowledge which LLMs excel at over earlier approaches.

It provides schema-independent results. While earlier techniques (and the benchmarks they used) were limited to the specific schema they were trained on, today's LLM-based agents perform well over novel schema without expensive schema-specific training using prompt engineering techniques, including MCP-assisted schema augmentation, few-shot learning, and chain-of-thought prompting. The behavior of LLMs to perform well in novel situations has transformed text-to-SQL research into a practical analytic tool accessible to any organization.

Aside from its role in supporting schema linking, the background knowledge LLMs encode are essential to answering queries that require factual information outside the scope of the target database. For example, to answer the query "List the capitals of the top 10 states in the U.S. Southeast by sales last year,” the system must know which states lie in the Southeast and their capitals, information which might be outside the scope of the target database. LLM-based systems can either provide this knowledge using their built-in training (parametric knowledge) or obtain it from external systems using MCP.

Perhaps surprisingly, LLM-based agents are simpler for IT departments to maintain and understand. Unlike prior agentic solutions that relied on brittle, and complex code bases, modern agentic systems encapsulate reasoning within the LLM. As LLMs improve, you can easily swap them without significant modifications to the remaining system.

Evaluation

Red Hat has built an AI quickstart with EDB to help you evaluate agentic conversational analytics within your organization using a sample database. You can also easily modify the AI quickstart to point to your Postgres database based on your actual user needs. This approach is often more effective than simply monitoring leaderboards and benchmarks since they often miscalculate the complexity of your environment.

For example, the models tested in the direct text-to-SQL task scored above 85% in SQL execution accuracy on the venerable Spider benchmark. To advance the field, however, they created the Spider 2 benchmark. Spider 2 evaluates agentic and baseline LLM models with complex data transformation and analysis tasks over 200 large databases, multiple SQL dialects, nested data types (fields that contain arrays), very complex regular expressions, and poorly designed schema.

While many enterprise analytic systems exhibit some degree of complexity as they evolved organically over time, many also exhibit strong localization properties–discrete sets of well-designed and simple views and tables that are sufficient to support analysis in specific subject areas. These systems might be excellent candidates for conversational analytics.

It's also important to recognize the arms race that typically exists between benchmarks and technology. For example, in November 2024, the best performing approach scored only 5.68% on the Spider 2.0-lite leaderboard. In April 2026, the best performing approach scored over 72%. You can review this information on GitHub. Benchmarks create a moving target that can fail to reflect the true complexity in your environment. It's best to gauge the potential of agentic conversational analytics against the actual needs and technical environment within your organization.

On the other hand, an agentic analyst might perform poorly on extremely complex schemas with poor documentation, in environments with real-time requirements or environments that rely on transactional semantics that use explicit row-level locks (for example, SELECT FOR SHARE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, and SELECT FOR KEY SHARE). The agentic loop which we discuss later often involves multiple rounds of SQL execution and evaluation between the host application and the LLM.

The resulting latency prevents the use of long-running, multi-statement transactions. While this is not an issue for many read-only analytic systems that rely on the default behavior of SELECT statements in Postgres and other multi-version concurrency control (MVCC) architectures, it might present challenges in environments where we use explicit locking clauses to prevent concurrent modification by other transactions during analysis. Organizations with these requirements should work with database architects and experts to assess the best approach for integrating agentic tooling into their operational model.

Next steps

As we've seen, shifting from a fixed RAG pipeline to an autonomous agentic loop significantly improves our ability to query and analyze relational data accurately. However, agentic AI requires a comprehensive orchestrator behind the scenes to integrate the LLM with external tools like EDB's PG Airman MCP server. In our next article, we will pull back the curtain on the copilot's architecture and examine how we handle LLM orchestration, message formatting, and tool-call parsing with models like Qwen3 and Nemotron.

Get a head start now to discover what conversational analytics can do for your organization. Visit Data Governance Copilot to download this AI quickstart and see how it aligns with your technical and business requirements.

Read part 3: How to integrate OpenShift AI and PG Airman MCP Server

Last updated: July 1, 2026

The evolution of agentic AI and text-to-SQL

Conversational analytics for Postgres

Series note

Agentic AI versus chat and RAG

Direct chat with prompt engineering

RAG applications

Agentic AI

The modern agentic AI approach to text-to-SQL

Powering conversational analytics with LLM-based AI agents

Evaluation

Next steps

Why is pytorch compile so fast?

The hidden cost of observability sprawl

Camel integration quarterly digest: Q2 2026

Optimize OpenShift workloads with software-defined memory

Why your AI agent needs two sandboxes: Benchmark data

Demystify RAG with OpenShift AI and Elasticsearch

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links