The Memory Problem
Somewhere in a wealth management firm right now, an AI assistant is confidently citing a regulation that was updated four months ago. It doesn’t know the regulation changed. It doesn’t know that it doesn’t know. The analyst reading its output will either catch the mistake and lose another hour fact-checking the machine, or they won’t catch it and the mistake will travel downstream into a client recommendation, a compliance filing, or a product decision that shouldn’t have been made on stale information.
This is happening thousands of times a day across financial services, and the industry’s response so far has been to shrug and call it a hallucination problem. It’s not. Hallucination is a symptom. The actual disease runs much deeper.
Every major AI deployment in financial services right now is built on the same basic architecture. You take a large language model, connect it to your data through retrieval-augmented generation (RAG), and let it answer questions. The model chunks your documents into pieces, converts those pieces into mathematical representations called vectors, and when someone asks a question it finds the most similar chunks and feeds them back to the model as context. The model generates a response, the response sounds authoritative, and everyone moves on.
For simple questions about static information, this works well enough. Ask it to summarise an earnings report and it will do a reasonable job. Ask it to define a regulatory term and it will pull the right paragraph. The problem arrives the moment you need it to think across time, across sources, or across cause and effect, which in financial services is almost immediately.
Consider what happens when the ECB announces a rate decision.
An analyst at a wealth management firm needs to understand not just what the decision was, but what it means. How does the new rate affect sovereign bond yields? What does that do to European bank margins? Which equities are exposed? How should client portfolios adjust? This is a chain of reasoning that connects monetary policy to fixed income to banking sector profitability to specific portfolio positions, and every link in that chain depends on understanding how these relationships have changed over time.
A junior analyst can trace that chain in their head because they’ve built a mental model of how these things connect. They remember the last rate decision. They know which banks are most sensitive to margin compression. They can reason about cause and effect because they understand the structure of the problem, not just the words that describe it.
The AI can’t do any of that. It retrieves paragraphs. Similar-looking paragraphs about ECB decisions, sovereign yields, bank stocks. But it has no concept of why these things are connected, when the connections were established, or how they’ve changed. It pattern-matches on semantic similarity, which is a sophisticated way of saying it finds words that look related and puts them next to each other. That’s retrieval, and retrieval is not reasoning.
The industry knows this is a problem. GraphRAG, the approach of building knowledge graphs and retrieving from them, was supposed to be the fix. Microsoft published influential research on it and the community has been running with the concept. Build a graph of entities and relationships, retrieve from the graph instead of from raw vectors, and you get structured context instead of loose paragraphs.
The concept is right. The execution, almost everywhere, falls short.
Building a knowledge graph that’s genuinely useful for financial reasoning means you have to understand the information you’re putting into it. You can’t just chunk documents into pieces and embed them as vectors with a different label. You need to extract the actual entities (the people, the companies, the regulators, the instruments), map the real relationships between them, identify the claims being made, tag when things happened, assess how confident you are in each source, and track what contradicts what.
That takes serious compute, serious time, and serious domain expertise. It doesn’t scale to indexing the entire internet, which is what most AI companies want to do because scale is the story that raises funding. So they skip the hard part. They build thin graphs with flat edges that tell you two things are connected and nothing more. An edge between “Federal Reserve” and “US Interest Rates” that carries no information about when the connection was established, what changed, who said so, or how reliable they are. It’s barely more useful than the vector search it was meant to replace.
And then there’s the time problem, which almost nobody is solving properly. Financial data is inherently temporal. A company’s credit rating in January is different from its credit rating in June. A regulation that was in force last quarter may have been amended this quarter. A forecast that was consensus three months ago may have been contradicted by new data two weeks ago. Most AI systems treat all of this as static. They store the latest version, overwrite the old one, and lose the history that makes the information meaningful.
For a consumer chatbot, this doesn’t matter much. For financial advisory, where a recommendation needs to be traceable, auditable, and grounded in the specific data that was available at the time it was made, it’s a fundamental flaw. MiFID II requires that firms demonstrate the basis for their recommendations. The EU AI Act demands transparency and explainability. GDPR governs how personal data feeds into automated decisions. An AI system that can’t show its working, that can’t trace a recommendation back through the specific data points and confidence levels that produced it, isn’t just inaccurate. It’s a regulatory liability.
So where does that leave the industry?
There are hundreds of AI products in financial services right now and nearly all of them share the same structural limitations. They retrieve when they should reason. They treat information as static when it’s constantly evolving. They store connections without context, confidence, or history. They generate outputs that sound plausible but can’t be audited, and they operate in a regulatory environment that increasingly demands exactly that.
The firms using these tools know the limitations. They work around them with human review layers, compliance checkpoints, and the quiet understanding that the AI is a starting point, not an answer. Which raises an uncomfortable question about what the AI is actually contributing. If every output needs to be fact-checked by an analyst, and every recommendation needs to be validated by a compliance officer, the efficiency gain starts to look more like an efficiency shuffle. The work didn’t disappear, it just changed shape.
What would a real solution look like? It would need to understand relationships, not just retrieve them. It would need to track how those relationships change over time, what caused the change, and whether newer information contradicts older information. Every connection between entities would need to carry its own context: which sources established it, how confident those sources are, when it was last validated. The system would need to reason across causal chains, not just find similar-looking text. And every output would need to be traceable, all the way back through the specific data points, confidence levels, and reasoning steps that produced it.
That’s a fundamentally different architecture from what exists today. Not an incremental improvement on RAG, not a thinner wrapper around a better model, but a ground-up rethinking of how AI stores, connects, and reasons about financial information.
We think we’ve built it. More on that soon.