So, you want to deploy GenAI to automate your customer communications. LLMs writing emails, chatbots responding, generating web pages. A beautiful scene.
To do that, you need a source: a knowledge base, product information, policy documents, or just your website (though using that last one is often a terrible idea).
Between that source and the output to the customer sits the LLM and also a promptEen prompt is de instructie die je aan een AI-model geeft zoals bijvoorbeeld ChatGPT. Het is hoe je communiceert met het systeem: wat je vraagt, hoe je het vraagt en... Meer that steers what that LLM does exactly.
Check.
But then you notice the output is just not good enough. Not precise enough, not factually correct, not reliable enough for customer communication.
Your first reaction? Let’s tinker with the prompt. And honestly, sometimes that helps.
But in most cases, it’s not your prompt. It is just your source data that sucks. It’s poorly tagged and poorly modeled. You might tighten things up a bit with your prompt, but you can’t prompt your way out of a data issue.
So you look for other solutions.
The source
Your source usually consists of a mix of structured data (tables, databases, structured fields) and unstructured data (texts, documents, policy pieces). Plus, it often mostly consists of things that fall somewhere in between those two. The binary idea of structured versus unstructured is a reality for slides, not for the daily roll-up-your-sleeves reality.
That last bit aside #sidequest, back to the data.
Okay, simple structured data like references, definitions, key-value pairs? That goes through your GenAI pipeline fairly easily without losing too much semantic integrity. Simple data can be kneaded and pulled through systems while keeping its meaning. Not because GenAI is explicitly good at this, but because structured data intrinsically has less relational vulnerability (it is structured right).
But as soon as you’re dealing with unstructured data OR structured data that contains relational meaning (policy rules with conditions, product catalogs with exceptions, regulations with references to other sections), you run into problems. Because the meaning of that data lives in the relationships between pieces of information, not just in the information itself.
Still with me? Because here’s where RAGRAG (Retrieval-Augmented Generation) is een techniek waarbij een LLM bij het antwoorden op een vraag/uitvoeren van een prompt, informatie opzoekt in bronnen die je vooraf hebt gedefinieerd. Daardoor hoef je... Meer takes the stage as a solution for your meaning/semantic problem.
RAG lends a hand
Retrieval-Augmented Generation (RAG) gets deployed to reduce hallucinations and to ground the output in your own organization’s knowledge.
With RAG, you give the LLM access to your documents, so it doesn’t just generate based on its training, but also based on your company-specific information.
It makes sense that RAG improves what the LLM spits out, now that you have limited what it can look at. For specific types of information like references, definitions, and stable facts, RAG usually works great. However for everything where meaning is relational? It gets complicated.
Remind me: what does RAG do?
RAG looks roughly like this:
Your source documents get divided into smaller pieces. Usually paragraphs or sections of a few hundred words.
Those chunks get converted into embeddings (numerical representations of text that make it possible to calculate which pieces of text are semantically similar).
The system calculates which chunks best match what’s being asked and retrieves those. Only those most relevant pieces get shown to the LLM when generating.
Every step in this process is a transformation of what got pulled from your source. And with every transformation, the meaning of that piece, its semantic integrity, can be lost. The meaning can shift. Sometimes subtly, sometimes obviously.
This doesn’t happen because the data factually disappears or literally gets changed, but because the context needed to correctly apply that data disappears.
An example
Let’s walk through the steps:
Say you have the following two paragraphs somewhere in your documentation:
Paragraph 1:
“Customers with a Sustainable Home Loan automatically receive an interest rate discount of 0.3% starting in 2024. The discount applies to all existing loans that fall under the program.”
And somewhere else (maybe further down in this document, on that intranet page, whatever):
Paragraph 2:
“For customers who entered before 2022, the discount is only activated when the energy label has been re-registered in the past 18 months. Without re-registration, the discount automatically expires.”
On their own, both look fine. Paragraph 1 is clear. Paragraph 2 too. But the actual meaning of your policy only emerges when you read them together:
Yes, there’s an automatic interest rate discount. But: for part of the customers (entry before 2022), it only applies if an energy label has been recently re-registered.
So when a customer asks: “Am I eligible for that 0.3% interest discount?” the correct answer depends on both paragraphs together: entry date + energy label registration + program conditions.
And that’s where it goes wrong in a RAG pipeline.
The RAG pipeline does this
Let’s run our example through it:
Step 1: ingestion (chunking & embedding)
Chunking: we chop the data into pieces. Preferably smart pieces, but what is smart here? Data scientists break their brains over this question and often come up with good solutions. But unstructured data has no predictable structure, which makes chunking vulnerable, regardless of which method (token-based, paragraph-based, semantic, or document-aware) you use.
Let’s assume it goes well in most cases, but that in 10% of cases (low estimate, folks, I suspect higher) the relationship between 2 chunks gets “cut”.
The interest rate and the conditions are then no longer one single package and get stored as 2 separate vectors.
Step 2: retrieval
The separate packages sit there waiting until someone (you, a customer, my nephew, a system) asks a question. The LLM then searches these packages through Semantic Search. The system looks at which pieces of text, as vectors, are most similar to each other.
A customer asks the chatbot:
“Do I get an interest discount on the Sustainable Home Loan?” (though chances are your customer asks it way more creatively, with many more or far fewer words, and that also has impact).
The LLM gets to work, searches the vectors for semantic similarities: sustainable+home+loan+interest+discount. Where does it find language patterns with sufficient similarities to include in its answer?
In paragraph 1, probably yes. The Sustainable Home Loan is literally mentioned there.
Paragraph 2, however, which through chunking no longer has a relationship with the Sustainable Home Loan, doesn’t light up in its search for relevant information. Sometimes such a second paragraph still gets retrieved, but that’s luck, not a guarantee. And without that guarantee, you don’t have reliable and compliant customer communication.
The LLM chatbot responds:
“Starting in 2024, you automatically receive an interest discount with us. Isn’t that awesome?”
Except that’s not true. The customer doesn’t receive that discount at all if they don’t meet the now-lost conditions. Oops.
But what if that customer asks: do I still have to meet conditions, in the same conversation? Then they’ll come up, right?
Nope. Not necessarily. If the relationship is broken (reminder: never go back to your ex), then the LLM doesn’t retrieve the conditions. Because it has no idea those conditions belong to that product. It knows nothing.
And then we have extra complications
And to make things even better:
The LLM always answers. But now it’s guessing (based on what’s semantically close to the question in your data or based on its own training data, but that’s not Your data). Maybe it even guesses right. Maybe. Maybe not at all. Surprise answers. Compliance loves those.
Oh and got multiple products that look alike? 10 types of loans? Then you run extra risks when relationships between information disappear, because the language patterns are very similar. The LLM makes a nice guessing game out of it.
In short, even with RAG you’re playing with compliance and legal fire. Is that always bad? Probably not for every question, in every sector, for every product. But depending on the exact quality of your source data, how RAG is implemented, and what you do between retrieval and generation (there’s more, but that’s another article), there’s basically always some risk. The question isn’t whether it’s risk-free, but how much risk you’re willing to take.
TL;DR
What are you supposed to do with this information? Let me break it down:
RAG is often used to make inaccessible or dispersed knowledge more usable. But it doesn’t solve underlying data problems.
The more complex your source data (10 products that look alike, each with 3 buts and howevers and a few with stacked dependencies), the less RAG actually fixes. The more chaotic your source data, the harder you need RAG to save it. But that’s exactly where it doesn’t work.
For the purists: yes, mitigations are absolutely possible, but the core remains: RAG doesn’t replace good data modeling.
The question you should be asking:
Should you use an LLM for tasks that require a precise (deterministic) answer, where the potential impact of errors is large? Because LLMs work based on language patterns in a probability field: what’s the most likely next tokenEen token is de kleinste eenheid waarin een LLM tekst verwerkt. Dat kan een heel woord zijn, maar ook een stukje van een woord, een leesteken of zelfs een spatie.... Meer, based on patterns.
That this won’t go well, even Stevie Wonder can see. What exactly will go wrong is hard to tell. Because not only is it difficult to determine how often it goes wrong (the breadth), but you also don’t know how badly it goes wrong (the depth). Does the LLM give a wrong interest rate or does the LLM give a wrong interest discount and guarantee it under all circumstances?
Hypothetical decisions
I’m not a CEO, but in my hypothetical multinational I don’t use an LLM as a chatbot. I use a deterministic system with clean, well-organized data that delivers exact answers. Then I give those to an LLM so it can say it nicely.
And even that doesn’t absolve me from having to set it up well, calibrate it, execute it properly. But that setup gives me a lot more confidence.
By the way: still want to deploy GenAI for your customer communication? Then preferably choose small, well-structured domains with limited dependencies. That helps a lot.
But that’s just me. Maybe you see it completely differently and clearer. I’d love to hear that so I can learn. Do drop your comment!
Don’t stress….it’s just me!
I’ve spent over 25 years working in content strategy and digital transformation, which means I’ve seen enough technology hype cycles to be skeptical and enough genuine innovation to stay curious.
Want to talk shop? Do get in touch!
I have a newsletter and it's bearable. Subscribe to read my (Gen)AI articles!



