Blog
How AI agents should access enterprise data: tools, MCP, and the deterministic execution gap
The way AI agents reach enterprise data is changing fast. A year ago the question was how to stuff enough context into a prompt. Today the answer is increasingly the same across every major framework: don't stuff context, give the agent tools it can call.
The shift sounds technical, but the implications are organisational. Once an agent can call your data, the quality of that call layer decides whether the agent is a demo or a system you would let near a real customer record. This post walks the pattern, the protocol, the failure modes most current implementations carry into production, and what a tool actually has to do to be safe.
Key takeaways
- The agent-as-tool-user pattern is replacing prompt-stuffing and brittle RAG across LangChain, AutoGen, Microsoft Agent Framework, and Copilot Studio.
- A tool is a function with typed parameters that an agent can call. MCP is the connectivity layer that lets any agent talk to any tool.
- Most current data tools are unsafe in production: they wrap CRUD across all tables under the calling user, or generate SQL at runtime with no per-agent scope.
- A production-ready data tool needs per-agent scoping, named operations carrying business logic, deterministic execution, durable context, per-call audit, and multi-source reach behind one endpoint.
- In 2026, governance and accountability become the actual differentiator between agent platforms, not model size.
The agent-as-tool-user pattern is taking over
For a while, retrieval-augmented generation looked like the dominant architecture for enterprise AI. Pull the relevant documents, paste them into the prompt, ask the model to answer. It works for unstructured corpora. It struggles the moment the question needs an exact number from a specific table.
The pattern that is winning in 2026 looks different. The agent reasons about user intent. The agent does not fetch data itself. Instead, it calls a tool. The tool is the piece of software that knows how to talk to the data system. The agent decides what to ask; the tool runs the actual operation and returns the result.
This split shows up everywhere you look right now. LangChain has had a tool abstraction from the start. AutoGen orchestrates multi-agent conversations around shared tools. The Microsoft Agent Framework is built around it. Copilot Studio agents call connectors and MCP servers as tools. Custom orchestrators built directly on top of an LLM API converge on the same shape because the alternative, an agent that tries to be the database client itself, does not survive contact with a real schema.
These platforms argue about a lot. They agree on one thing: the agent reasons, the tool executes.
What "tool" actually means in this pattern
A tool in this context is not a UI button or a CLI command. It is a function with a name, a typed parameter list, and a return shape that an agent can discover and call. The model sees a catalogue of tools, picks one based on the user's request, supplies the parameters, and reads the result back.
Until 2024, every agent framework defined its own tool protocol. LangChain tools were not OpenAI function calls were not Anthropic tool use messages were not Copilot Studio connectors. Building a tool meant building it once per framework, or accepting that your tool worked with one client only.
Model Context Protocol (MCP) closes that fragmentation. It is an open standard that defines how a client (the agent) discovers and calls tools served by a server (the data or system endpoint). Any MCP-compliant agent can connect to any MCP-compliant tool. The protocol is the connectivity layer; what the tools actually do is up to whoever runs the server.
For a longer read on why this matters for data teams, see what MCP means for enterprise data teams. The short version: MCP turns the AI-to-data integration problem from N times M into N plus M.
Why most current data tools are unsafe for production
Connectivity is solved. Governance is not. A lot of the MCP servers and agent tools in circulation today are written for demos, then pushed into production with the same shape. Three patterns recur, and each one breaks something real.
Pattern one: CRUD across every table under the calling user. The tool wraps create, read, update, and delete verbs against the whole data layer. Whoever invokes the agent gives the agent their full permissions for that call. An admin runs the agent, and for the duration of that call the agent has admin rights. There is no way to say "this agent reads accounts but never writes them" without rebuilding the permission model from scratch.
Pattern two: SQL generation at runtime. The tool exposes a single "query the database" operation and asks the model to write the SQL. This is the deterministic execution gap. The same question asked twice can return two different queries, two different shapes, two different answers. Worse, the model can write a join that is syntactically valid and semantically wrong, and nothing in the pipeline catches it. Read deterministic execution for the longer treatment; the short version is that a query layer the model writes is a query layer the model can hallucinate.
Pattern three: no audit dimension separate from the user. Standard data-layer audit captures record-level changes against whoever invoked the call. If the agent is acting as the user, the audit says the user did it. There is no way after the fact to ask "what did the lease agent touch this week?" because the lease agent does not exist as a distinct actor in the log.
Each of these is fine for a demo. None of them is fine for an agent that updates customer data, books revenue, or sends communications on the company's behalf.
What a production-ready data tool actually requires
If you are evaluating an MCP server, a Copilot Studio connector, a LangChain toolkit, or a homegrown agent tool, here is the list to run it against. Each criterion is what separates a generic tool from one fit for an enterprise AI agent: a requirement of the tool, not a property of the agent or the model.
- Per-agent, per-table, per-action scoping. One agent reads Accounts and updates Cases. Another agent has its own scope, defined separately. The tool's configuration, not the invoking user's role, decides what the agent can touch.
- Named operations carrying business logic, not just CRUD. Custom business actions and any platform-specific custom APIs should be registered as named tools. The agent calls "RenewLeaseContract" by name and the tested logic for that operation runs. The model is not asked to figure out the CRUD chain that approximates it.
- Deterministic execution against tested templates. The tool runs defined queries with parameters, not model-generated SQL. The same question always returns the same answer. Failures are query failures, not silent semantic drift. The underlying pattern is captured in data access templates: define the operation once, parameterise it, run it forever.
- Agent credentials separate from the invoking user. The agent authenticates with its own identity. The scope granted to the agent is separate from what the invoking user can see, so the same person can use different agents with different access without juggling role assignments.
- Row-level and field-level filters configurable per agent. "Only this department," "only active records," "only the user's own opportunities" are expressible without inventing new security roles in the underlying data system.
- Durable domain context. Domain concepts like "lease agreement" or "qualified opportunity" are defined once with their tables, relationships, and rules, then reused across agents. Every new agent does not start from a blank schema.
- Per-agent audit trail of tool calls. The log records what the agent did, with the call inputs and the result, separated from the user dimension. "The lease agent did it" is a sentence the audit can support.
- Multi-source aggregation behind a single endpoint. One server combines data from multiple systems behind one MCP endpoint. For the agent it looks like one source. The access rules and audit apply across all of them, not per connector.
Use the list to evaluate any agent platform. A tool that fails three or more of these is still useful for a prototype. It is not the tool you want behind an agent that ships to customers.
Where the field is going in 2026
Three things are happening at once, and they are reinforcing each other.
The first is framework consolidation. The number of agent frameworks worth building against is shrinking, not growing. Microsoft Agent Framework, Copilot Studio, LangChain, AutoGen, and a handful of custom orchestrators cover most production workloads. The differences between them are getting smaller as they converge on similar primitives.
The second is MCP becoming the default. As of May 2026, every major agent platform either supports MCP natively or has a published path to it. The proprietary tool protocols are not going away yet, but new integrations default to MCP, and the cost of not supporting it is climbing fast.
The third is the one that matters most. Once the frameworks consolidate and the protocol is shared, the actual differentiator between agent platforms in production is governance. Not model size. Not context window. Whether a given agent platform can answer the questions an auditor will ask: who can this agent reach, what did it do, under what identity, with what business logic, and can you prove it.
The teams that win in production are the ones whose tools are governed. The teams that stay in pilot are the ones whose tools are clever.
Where dhino fits
dhino is one implementation of this pattern in the Microsoft ecosystem: an MCP server for Dataverse and adjacent sources, built around named operations, per-agent scoping, deterministic execution against tested templates, and a per-call audit trail. For a concrete capability comparison against Microsoft's standard Dataverse MCP server, see dhino's Dataverse MCP server vs Microsoft's. For the product page, dhino Trust.