Data & trust 7 min read

Keeping your data out of the model

You can get real value from AI without handing your customer records to a vendor for training. A practical look at retention settings, data boundaries, and the questions to ask before anything sensitive goes near a prompt.

0:00 0:00

A locked wooden drawer with folders inside and one selected folder being pulled out.

There’s a quiet worry behind a lot of AI hesitation in small businesses, and it’s a reasonable one: if we start using these tools, are we handing our customer data to some company to train its model? The fear is understandable, and the marketing around AI does little to settle it. But the honest answer is that you can get real value from AI without giving up control of your data — if you set it up deliberately instead of just pasting things into whatever tool is open.

This isn’t about becoming a security expert or reading every word of a vendor’s terms. It’s about understanding a few boundaries and asking a few questions before anything sensitive goes near a prompt. Here’s the practical version.

”Training on your data” isn’t the only thing to worry about

People tend to fixate on one fear — will my data be used to train the model? — and miss the broader, more useful question: where does my data go, who can see it, and how long does it stick around? Training is one piece of that. It’s not the whole picture.

When you send text to an AI tool, a few different things can happen to it. It might be used to generate a response and then discarded. It might be stored for a while so the vendor can monitor for abuse. It might be retained and used to improve future models. It might be visible to the vendor’s staff under certain conditions. These are separate settings, and they vary a lot between a consumer app and a business-grade service. Knowing the difference is most of the battle.

The consumer-app trap

The first practical rule: the free consumer version of an AI tool and its business tier are not the same product, and the defaults are not the same.

Many free or personal-tier tools reserve the right to use what you type to improve their systems. That’s a fine trade when you’re asking for a recipe. It’s the wrong trade when you’re pasting a customer list, a contract, or anything you wouldn’t want leaving the building. The fix is usually straightforward — business and enterprise tiers of the major providers typically commit not to train on your inputs and offer real controls over retention — but you have to actually be on that tier and check the setting, not assume.

So before sensitive data touches any tool, find out which version your team is actually using. A surprising number of “is our data safe” problems come down to someone using a personal account for company work because it was the one already logged in.

Three settings worth finding

You don’t need to audit everything. For any tool that will touch real business data, find the answers to three things:

Training. Does this tool use our inputs to train or improve its models? On business tiers the answer is often no by default, but confirm it, and confirm whether it’s a setting you can control.

Retention. How long is our data kept after we send it, and can we shorten or turn off that retention? “Used to generate the response and then deleted” is very different from “kept for thirty days” or “kept indefinitely.”

Access. Who at the vendor can see what we send, and under what circumstances? Reputable services are clear about this and limit it tightly.

Three questions. Most credible vendors answer them plainly in their documentation, and if a vendor won’t answer them clearly, that’s your answer about whether to send them anything sensitive.

Draw the boundary before the prompt

The most reliable protection isn’t a setting at all — it’s deciding what should never leave your systems in the first place, and building the boundary in.

Not all data carries the same risk. A draft blog post is low stakes. A spreadsheet of customer names, payment details, or health information is not. A useful habit is to sort your data into “fine to send” and “keep inside,” and then make the safe path the easy path so people don’t have to make the call under pressure.

In practice that often means one of a few things: stripping the sensitive bits before anything goes to a model (the AI can draft a reply without needing the customer’s full account number), keeping sensitive processing on systems you control, or choosing tools that are built to run within your existing, governed environment rather than shipping data off to a black box. When we build for clients, we favor systems you own with access you can revoke, and we document what runs where — so the boundary is something you can see and change, not something you have to trust on faith.

Sensitive doesn’t mean off-limits — it means deliberate

It’s worth saying plainly, because the fear can tip into avoiding AI entirely: sensitive data isn’t a reason to give up on the value. It’s a reason to be deliberate. Plenty of high-value workflows touch sensitive information, and they can be built responsibly — by minimizing what’s exposed, choosing the right tier and the right tools, and keeping the sensitive core on systems you control.

The mistake in both directions is the same: not thinking about it. Pasting customer records into a personal chatbot is careless. So is refusing to use AI for anything because of a vague worry you never actually examined. The middle path is to know where your data goes and decide on purpose.

The questions to ask before you start

If you remember nothing else, take these five questions into any new AI tool that will touch real data:

Which version are we on — personal or business — and does it train on our inputs?
How long is our data retained, and can we control that?
Who at the vendor can access what we send?
Does this specific task even need the sensitive data, or can we strip it first?
Could this run on systems we control instead of shipping the data out?

None of these require a law degree or a security team. They require asking before, not after. Get in the habit of asking them and you can adopt AI with your eyes open — keeping the value, and keeping your customers’ trust, which in a small business is the thing you can least afford to spend.

This piece is general guidance, not legal advice — if you handle regulated data, check your specific obligations with a professional.

PineyWoods builds AI systems for small and medium businesses on infrastructure you own, with access you can revoke and a clear record of what runs where. Want to use AI without losing control of your data? Book a free call. Thirty minutes, plain answers, useful either way.

Related Field Guide

Put this into practice with AI Agents Without the Risk

A plain-English playbook for using AI agents in a small business, with a person always on the last step.

Get the field guide

All Field Notes