AI RFP & Tender Automation for Pharma - discover, parse, respond and win 3x more bids with our Agentic Tender Automation Solution. Know More →

The Agentic Loop Problem and the Memory Architecture We Built to Solve It

SwishX Engineering

There is a version of AI-native software that looks impressive in demos and fails quietly in production. It is the version where AI agents take actions without adequate grounding in organisational context, where retrieval-augmented generation gives the model a window of recent documents and calls it memory, and where the human-in-the-loop is a checkbox on a compliance deck rather than a genuine architectural constraint.

We built some of that version in our early prototypes. The failure modes were instructive.

This post covers two problems that turned out to be the same problem: how to build AI agents that can take real commercial actions safely, and how to give those agents the organisational memory required to take those actions well. At SwishX, these questions are not academic. Our agents generate hospital rate contracts, send promotional content to tens of thousands of doctors, trigger tender submissions, and make channel intervention recommendations that field teams act on. The cost of a wrong action is not a bad recommendation that gets ignored. It is a compliance exposure, a commercial relationship damaged, or a contract executed on incorrect terms.

Getting the architecture right for this class of problem required us to rethink several things we thought we understood about how AI systems should be designed.

The agentic loop problem

Most discussions of agentic AI focus on capability: what can the agent do? Fewer focus on the architectural question that actually determines whether an agentic system is safe to deploy in a production commercial environment: at what points in the action sequence should the agent pause, and what information should it have before it resumes?

The naive version of an agentic loop looks like this: the agent receives a goal, plans a sequence of actions to achieve it, executes those actions, observes the outcomes, and updates its plan. This works reasonably well for tasks where the cost of individual actions is low and reversible. It breaks down for tasks where individual actions are high-cost, irreversible, or carry regulatory consequences.

Generating a hospital rate contract and routing it for CXO signature is not the same class of action as querying a database. Distributing a promotional video reel to 6,00,000+ verified HCPs is not the same class of action as generating a draft for human review. Submitting a tender bid on GEM with a committed price and delivery schedule is not the same class of action as populating a spreadsheet with bid parameters.

What distinguishes these actions is not their complexity. It is their consequence topology. A consequence topology is our internal term for the graph of downstream effects an action triggers: who or what is affected, how reversible those effects are, and what the regulatory or commercial exposure is if the action turns out to have been wrong.

Actions with dense, irreversible consequence topologies need different treatment in the agentic loop than actions with sparse, reversible ones. This is the core insight that shaped SwishX's agent architecture.

Designing for consequence topology

When we mapped the action space across SwishX's five modules, three categories emerged that required distinct architectural treatment.

Class 1: Read and Recommend. Actions that observe state and generate outputs for human consumption without modifying any external system. Generating a tender eligibility report. Surfacing a ranked list of HCPs for an upcoming campaign. Identifying contract clauses that deviate from the company's preferred positions. These actions can run autonomously with full agent authority. The human is in the loop as a consumer of the output, not as an approver of the action.

Class 2: Modify with Review Gate. Actions that modify internal state and require a human review step before the modification becomes externally visible. Generating a hospital rate contract draft. Creating a promotional content asset from an approved monograph. Updating a distributor performance score based on new secondary sales data. These actions run with partial agent authority: the agent executes the generation step autonomously, but the output is held in a staging state until a human review gate is cleared. The agent can proceed with subsequent planning steps while the review is pending, but cannot execute any downstream Class 2 or Class 3 actions that depend on the staged output until the review gate opens.

Class 3: External Commit. Actions that commit the company to an external obligation or irreversibly change external state. Sending content distribution to HCPs. Executing a contract signature workflow. Submitting a tender bid. Triggering a supply chain order. These actions require explicit human authorisation at the action level, not just at the goal level. The authorisation is cryptographically logged with a timestamp, the identity of the authorising user, and a hash of the exact action being authorised. If the action parameters change after authorisation is granted, the authorisation is invalidated and must be re-obtained.

The boundary between Class 2 and Class 3 is the most important design decision in the system. We drew it at the point of external commitment: the moment when an action transitions from modifying internal state to creating an obligation or effect in the world outside the SwishX platform.

This boundary is not always obvious from the action description alone. Routing a contract for CXO signature looks like an internal action. But it is actually a Class 3 action because the routing initiates an external process that cannot be cleanly reversed without a commercial consequence. We learned this from experience.

The human-in-the-loop is not a speed bump

A common failure mode in enterprise AI design is treating the human-in-the-loop as a compliance requirement to be minimised. The underlying assumption is that human review adds latency without adding value, and the goal is to compress that latency to near zero.

This assumption is wrong for commercial AI systems, and it leads to interfaces that present humans with the output of agent actions in a format that discourages genuine review. A busy CXO shown a 47-page hospital rate contract with a one-line summary and an approve button is not providing meaningful human oversight. They are providing the legal cover of having clicked approve.

We designed our review interfaces with a different assumption: the human reviewer is adding genuine value when they review an agent output, and the interface should be designed to make that value addition efficient rather than to minimise the time the human spends before clicking approve.

For contract review, this means the interface surfaces the clauses that deviate from the company's preferred positions first, with the deviation magnitude and the rationale for the deviation, before presenting the full document. The reviewer is directed toward the decision points that require judgment rather than the standard provisions that require only confirmation.

For content distribution review, this means the interface shows a sample of the personalised content variants being sent to different HCP segments, with the underlying personalisation rationale, before presenting the aggregate distribution plan.

The design principle is that the review interface should make the reviewer smarter, not faster. Faster follows from smarter when the interface correctly identifies what actually needs to be reviewed.

Why RAG is not memory

Retrieval-augmented generation is a well-understood pattern and it works well for a specific class of problem: when you need an LLM to reason over a corpus of documents that is too large to fit in context, you retrieve the most semantically relevant chunks and include them in the prompt. The model reasons over the retrieved context. The output is grounded in your corpus rather than in the model's parametric knowledge. This is genuinely useful for a wide class of applications.

The failure mode of RAG in enterprise commercial workflows is not that it does not work. It is that it solves the wrong problem.

RAG retrieves relevant documents. What enterprise commercial AI needs is not relevant documents. It is organisational memory: a persistent, structured understanding of the company's commercial state that persists across agent interactions, updates continuously as new events occur, and can be queried by agents at multiple levels of abstraction.

The difference between document retrieval and organisational memory is the difference between a filing cabinet and a colleague who has been at the company for three years.

When a SwishX agent is generating a hospital rate contract, it does not just need to retrieve the template and the relevant pricing guidelines. It needs to understand the history of the relationship with this specific hospital: what previous contracts were offered and on what terms, whether there were any compliance incidents, what the hospital's procurement team's sensitivity to specific clause types has been in previous negotiations, and what the strategic priority of this account is relative to the company's broader institutional business objectives.

None of this is in any single document. It is distributed across contract history, CRM interaction records, supply performance data, and commercial strategy documents. RAG retrieval over these sources produces a bag of relevant chunks. What the agent needs is a synthesised understanding of the relationship.

The memory architecture we built

We organised SwishX's agent memory into four layers that operate at different time horizons and different levels of abstraction.

Layer 1: Episodic Memory. Individual events with their full context: this contract was generated on this date with these terms, this HCP received this content at this time and responded in this way, this tender was submitted at this price and the outcome was this result. Episodic memory is append-only and immutable. Events are never modified after they are written; corrections are new events that reference the original. This immutability is not just good engineering hygiene. It is a regulatory requirement in several of the frameworks we operate under. Episodic memory is indexed by entity, by time, and by event type. Agent queries against episodic memory are structured queries that return event sequences, not semantic similarity searches. The precision requirements for episodic memory retrieval are too high for embedding-based retrieval to be reliable.

Layer 2: Semantic Memory. Derived understanding synthesised from episodic events: this hospital tends to negotiate payment terms aggressively, this HCP segment responds strongly to mechanism of action content, this tender category has a historical L1 distribution that clusters within a specific price band. Semantic memory is updated periodically by a background synthesis process that reads the episodic log and updates the derived understanding. This is the layer where we do use embedding-based retrieval because semantic queries over synthesised understanding are genuinely well-matched to semantic similarity search. The key architectural decision is to keep the synthesis process separate from the agent runtime rather than deriving semantic memory on demand. Agents consume pre-synthesised semantic memory rather than synthesising it themselves during task execution. This keeps agent latency predictable and prevents the model from performing lossy synthesis under time pressure.

Layer 3: Working Memory. The context window the agent maintains during a specific task execution: the goal, the plan, the actions taken so far, the observations from those actions, and the pending decisions. Working memory is ephemeral and scoped to a single task execution. It does not persist after the task is complete, but its contents are written to episodic memory as events before it is discarded. The agent's working memory prompt is dynamically assembled from Layer 1 and Layer 2 at task initiation, with the assembly algorithm designed to maximise the relevance density of the context provided rather than simply filling the context window with retrieved chunks. We run ablation experiments on the context assembly algorithm regularly. The performance delta between a naive top-K retrieval assembly and our current relevance-density-optimised assembly is meaningful on the tasks that matter most: contract generation accuracy, HCP personalisation calibration, and tender bid pricing judgment.

Layer 4: Procedural Memory. Learned patterns about how to execute specific task types effectively, accumulated from historical task executions. Which section ordering in a hospital rate contract correlates with faster counterparty acceptance? Which personalisation parameters most reliably predict HCP engagement uplift for cardiologists in tier 2 cities? Which tender documentation patterns correlate with technical qualification success? Procedural memory is updated by a reinforcement signal derived from task outcomes: contract acceptance rates, HCP engagement metrics, tender win rates. The update frequency is deliberately slow: procedural memory is updated on a weekly batch cycle rather than in real time, because the reinforcement signal requires sufficient outcome data to be statistically meaningful and because rapid procedural memory updates create instability in agent behaviour that is difficult to diagnose.

The grounding problem

There is a failure mode that cuts across both the agentic architecture and the memory architecture: hallucination under uncertainty. When an agent does not have adequate memory context to be confident in an action, it can fill the gap with plausible-sounding but incorrect inferences. In a consumer application, this is annoying. In a commercial AI system generating contracts or compliance communications, it is dangerous.

Our approach to the grounding problem is not to try to eliminate hallucination at the model level, which is not currently achievable with sufficient reliability for high-stakes commercial actions. It is to design the system so that agents operating on inadequate context are constrained to Class 1 actions until adequate context is retrieved or obtained.

When an agent's confidence in a required context element falls below a threshold, it generates a structured information request rather than proceeding with an uncertain inference. The information request is routed to the human interface as a clarification question, not as a blocking error. The agent continues planning and executing the portions of the task that do not depend on the uncertain context, so that human latency on the clarification does not block the entire task.

This pattern, which we call confident execution with graceful degradation rather than confident execution with optimistic inference, has been one of the most important architectural decisions in making the system safe to deploy for high-consequence commercial actions.

What this architecture is not

We want to be clear about what this is not, because the engineering literature on agentic AI includes a lot of architectures that sound similar but make different trade-offs.

This is not a system where agents operate with broad autonomy and are corrected by human feedback after the fact. The consequence topology classification is designed to prevent high-consequence actions from being executed on the basis of agent confidence alone, regardless of how high that confidence is.

This is not a system where the memory architecture is designed to minimise the context provided to the model in order to reduce token costs. The architecture is designed to maximise the quality of the context provided, with cost as a secondary optimisation. For the commercial actions we are supporting, the cost of a well-grounded wrong action is lower than the cost of a poorly-grounded correct one, but the cost of a well-grounded correct action is substantially lower than the cost of either.

And this is not a finished architecture. The boundary between Class 2 and Class 3 actions is a judgment call that we revisit as we see new patterns in production. The memory layer boundaries shift as we learn more about what information agents actually need for which tasks. The grounding threshold is calibrated on a task-type basis based on outcome data that continues to accumulate.

The system is designed to be wrong in observable and correctable ways. That design choice, more than any specific technical decision, is what makes it deployable in a regulated commercial environment.

Ask AI how SwishX powers commercial excellence in Pharma

X

Download Pharma Report 2026
Submit your info & we'll send you the full report for free

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.