Blog

RAG for Startups: A Practical Guide to Real Costs and Data Privacy

Table of Contents

Introduction

You've been building a product for months. You have users, you have traction. And someone tells you that with RAG you can automate searches, reduce support load, and make information accessible that previously required intermediaries.

It sounds great. And it probably is.

But before you write the first line of code, there are two questions that almost no founder asks. And the answer to both can determine whether RAG is a real advantage or a black hole of costs and legal complexity.

The first is about where your data goes. The second is about what it actually costs to implement.

These aren't minor technical questions. They're architectural decisions that, if you get them wrong, can take a heavy toll when you're already in production.

RAG has associated costs that must be carefully considered
RAG has associated costs that must be carefully considered

The Question Nobody Asks: Where Does Your Data Actually Go

RAG takes your documents, chunks them, converts them into vectors, and stores them. Depending on how your system is built, those documents could be passing through external servers, third-party APIs, or cloud models you don't control.

This isn't a minor technical detail. It's a legal obligation.

GDPR requires that you know exactly where your data is, who has access to it, and under what conditions. If you're processing customer data—contracts, invoices, personal information, transaction history—and you're sending it to an external API without proper safeguards, you're assuming a legal risk that can be very expensive.

But there's more. The problem isn't just that data passes through third parties. It's that when it flows through multiple services in a RAG chain, you lose visibility and control over how it's processed.

A Concrete Example

A residential property management company wants its tenants to check their account information: balance statements, pending repairs, approved expenses. Their documents contain financial data for dozens of different properties.

If you build a RAG system without strict filtering by property, one tenant could end up seeing data that doesn't belong to them. A user accesses the chatbot, asks about their balance, and the system retrieves documents containing information from other properties.

What happened? The vector database wasn't confined by access context. The architecture was insecure from the design stage.

This isn't an AI problem. It's an architectural problem you need to solve before you write the first line of code.

How to Prevent It

You need to answer these questions before you start:

Where are your documents processed? If you use an external API to generate embeddings, those documents are being processed on servers you don't control. Some providers accept Data Processing Agreements; others don't.

How are vectors stored? Vector databases can be cloud-hosted by a provider or on your own infrastructure. If they're in the cloud, you need to verify where and under what privacy conditions.

Is there access filtering on each retrieval? When the system retrieves documents, it needs to know who's asking the question and what they have permission to see. If that filtering doesn't exist, you have a security problem.

What's your data retention plan? Documents stay in the vector database indefinitely. Do you know how you're going to delete them when a customer asks you to? (They will ask, and GDPR requires you to do it.)

The answer to each of these questions should be clear in your architecture before you spend a single dollar on development.

The Question That Costs Money: The Real Price Tag

A RAG system isn't a simple call to a generative model API. It's a chain of tasks, and each one has associated costs you need to know and calculate before it's too late.

Many founders see the cost of a ChatGPT call (a fraction of a cent) and think RAG will be cheap. That's a mistake.

How Costs Accumulate

First, you pay to generate embeddings. Converting your documents into vectors has a cost per token. If you have 20,000 documents and each one averages 2,000 words, we're talking about 40 million tokens processed just once.

With current models, that could cost between 50 and 200 euros. Just once.

But here's the catch: when a document changes, you have to process it again. If you add 100 new documents every week, those costs keep adding up.

Then you pay to store vectors in a vector database. Each vector takes up space. Vector databases charge for storage, for queries, sometimes for both. Depending on the provider and volume, this could range from 50 euros per month to hundreds.

Every time a user asks a question, the system retrieves relevant fragments and includes them as context in the generative model call. This increases the token count in that call. A small embedding, a retrieval, a prompt with recovered context, a generation—every step adds tokens.

If your chatbot receives 1,000 questions per month, and each generates 5,000 tokens of total processing, we're talking about 5 million tokens per month. That's real money.

A Realistic Calculation

Let's say your startup has an MVP with RAG for customer inquiries.

  • 500 documents (contracts, invoices, product documentation)
  • 100 active users per month
  • 500 queries monthly

Initial costs:

  • Generate embeddings for 500 documents: 10 euros

Monthly costs:

  • Vector storage: 30 euros
  • Retrieval and queries (500 queries × 3,000 tokens average): 1.50 euros
  • New embeddings (20 new documents/month): 0.50 euros

Total monthly: 32 euros

Sounds cheap. Now scale:

  • 5,000 documents (10x)
  • 2,000 monthly queries (4x)

New monthly costs:

  • Vector storage: 100 euros
  • Queries: 12 euros
  • New embeddings: 2 euros

Total monthly: 114 euros

Still manageable. But the problem is that these costs scale with your business, and if you haven't budgeted for them, they appear without warning.

What Most Founders Forget

They don't count the system prompt tokens. If you have a system prompt explaining to the model who your chatbot is, what it can do, and how it should behave, those tokens are sent with every query. A 500-word system prompt is 600-700 tokens. At 2,000 monthly queries, that's 1.2 million tokens just in system prompts.

They don't count retries. Sometimes the model fails or returns an incomplete response. You have to retry. Each retry costs money.

They don't count vector database maintenance. Occasionally you need to reindex, deduplicate, clean up obsolete documents. That requires work and costs money.

They don't count price changes from providers. AI API costs have dropped, but they can also increase.

RAG is Still a Real Advantage

Despite all this, RAG isn't a bad idea. It automates repetitive queries, reduces support load, and makes information accessible that previously required intermediaries.

But like any technical decision with business impact, it deserves thorough evaluation before implementation.

That means sitting down with a spreadsheet and writing:

In the data column: What data will flow through RAG? Is there sensitive data? Where is it processed? Who has access?

In the costs column: How many documents do you have? How many active users? How many queries do you expect? What's the estimated monthly cost?

In the architecture column: How will I filter access by user? Where will I store the vectors? What's my data retention plan?

If you can answer those questions clearly and document them, RAG probably makes sense in your roadmap.

If you can't, you might not be ready to implement it yet.

Conclusion

RAG is everywhere. In blogs, in podcasts, in founder conversations. And it's easy to get caught up in the enthusiasm.

But the difference between RAG that generates value and RAG that becomes a black hole of costs and complexity is asking two questions before you start:

Where does my data go and am I certain I'm complying with the law?

What does this actually cost when I'm using it in production?

If you have doubts about either of these questions, you're probably not ready yet. And that's okay. It's better to recognize it early than to discover it in production with real money at stake.

If you want a detailed technical evaluation of whether RAG is the right solution for your product, we can talk with no strings attached. I have a guide that helps you evaluate this in less than a week.

© 2026 Fran Hurtado PortfolioPrivacy PolicyES