Data sovereignty
The founding principle
CLEVYA is built on a simple rule: the known sensitive direct identifiers (names, amounts, IBANs) never leave your server. Before a request goes to a language model, these elements are replaced locally with consistent and typed tokens. The mapping from token to real value stays on your infrastructure; the transcription of the response back is also done locally.
This is what sets CLEVYA apart from a direct call to a cloud assistant: with a classic assistant, your real text transits through the provider. With CLEVYA, what transits is an anonymized version in which the sensitive values have been substituted.
The honest wording
We do not say “the real data NEVER leaves the server” in an absolute sense - that would be false, and a technical lead would take it apart in one question. The exact wording is:
The known sensitive direct identifiers (names, amounts, IBANs) never leave the server, and any known real value that remains is blocked before sending. For untypable secrets, a fully local mode (Ollama) keeps the data on site.
What really goes to the model
For a typical HR note, the content sent to the model looks like this:
| Stays local (never sent) | What goes to the model |
|---|---|
| Sophie Marchand | [PERSON_1] |
| salary 48000 | salary [SALARY_1] |
| IBAN FR76 3000… | [IBAN_1] |
| Payroll manager | Payroll manager (in clear) |
| salary increase request | salary increase request (in clear) |
What really goes out: the typed and consistent tokens, the relational structure
(“[PERSON_1] manages the file of [PERSON_2]”) and the context deemed non-sensitive, left in
clear (job title, nature of the request). What does not go out: no real name, no real amount, no
real IBAN.
The limits to know
Two real limits, which we set ourselves rather than letting them be discovered:
- Re-identification by quasi-identifiers. The non-sensitive context left in clear (job title
- location + rare event) can, in a small workforce, point to a single person. This is insufficient anonymity (k-anonymity), not a leak of raw value, but it is a real risk.
- Sensitive data not typed and unknown to the reference list. An internal project name, an in-house product reference or a contract clause are recognized neither by the entity detection (trained on persons, places, organizations) nor by the reference-list check.
Product answer to both: the fully local mode (Ollama) for untypable secrets - the data then goes nowhere.
Going further
- Local anonymization - how detection and substitution work.
- Egress log - the verifiable proof of what went out.
- BYOAK - why you keep control of the provider and the cost.