Data sovereignty

The founding principle

CLEVYA is built on a simple rule: the known sensitive direct identifiers (names, amounts, IBANs) never leave your server. Before a request goes to a language model, these elements are replaced locally with consistent and typed tokens. The mapping from token to real value stays on your infrastructure; the transcription of the response back is also done locally.

This is what sets CLEVYA apart from a direct call to a cloud assistant: with a classic assistant, your real text transits through the provider. With CLEVYA, what transits is an anonymized version in which the sensitive values have been substituted.

The honest wording

We do not say “the real data NEVER leaves the server” in an absolute sense - that would be false, and a technical lead would take it apart in one question. The exact wording is:

The known sensitive direct identifiers (names, amounts, IBANs) never leave the server, and any known real value that remains is blocked before sending. For untypable secrets, a fully local mode (Ollama) keeps the data on site.

What really goes to the model

For a typical HR note, the content sent to the model looks like this:

Stays local (never sent)	What goes to the model
Sophie Marchand	`[PERSON_1]`
salary 48000	salary `[SALARY_1]`
IBAN FR76 3000…	`[IBAN_1]`
Payroll manager	Payroll manager (in clear)
salary increase request	salary increase request (in clear)

What really goes out: the typed and consistent tokens, the relational structure (“[PERSON_1] manages the file of [PERSON_2]”) and the context deemed non-sensitive, left in clear (job title, nature of the request). What does not go out: no real name, no real amount, no real IBAN.

The limits to know

Two real limits, which we set ourselves rather than letting them be discovered:

Re-identification by quasi-identifiers. The non-sensitive context left in clear (job title
- location + rare event) can, in a small workforce, point to a single person. This is insufficient anonymity (k-anonymity), not a leak of raw value, but it is a real risk.
Sensitive data not typed and unknown to the reference list. An internal project name, an in-house product reference or a contract clause are recognized neither by the entity detection (trained on persons, places, organizations) nor by the reference-list check.

Product answer to both: the fully local mode (Ollama) for untypable secrets - the data then goes nowhere.

Going further

Local anonymization - how detection and substitution work.
Egress log - the verifiable proof of what went out.
BYOAK - why you keep control of the provider and the cost.