Implement · Compliance

On-premise AI for Swiss companies

A Lenovo ThinkStation PGX at your premises. Powerful enough for serious LLM inference, small enough for any office. revDSG-aware. Your data does not leave the building.

  • Lenovo ThinkStation PGX
  • NVIDIA GB10 Grace-Blackwell, 128 GB
  • Ollama + Open-WebUI
  • Hardware sits with you
  • CHF 2,200 / day build & delivery

You are here because cloud AI is not an option for you. Maybe you process client data in a fiduciary practice. Maybe you have a codebase that must not land at OpenAI. Maybe your compliance department has said: "Data does not leave the building."

We build you a local LLM setup on a Lenovo ThinkStation PGX. A workstation with the NVIDIA GB10 Grace-Blackwell Superchip and 128 GB unified memory, sitting at your premises. The standard stack: Ollama as the engine, Open-WebUI as the frontend. We set it up, connect it to your tools and show your team how to operate it. Day rate CHF 2,200 for build and delivery. No cloud subscription. No vendor lock-in.

What you get

ComponentDeliveryNote
Hardware platformLenovo ThinkStation PGXNVIDIA GB10 Grace-Blackwell Superchip, 128 GB unified memory
Locationat the client's premisesNo cloud, no third country
LLM engineOllama (default)other engines possible
FrontendOpen-WebUI (default)chat UI, multi-user auth, model routing
Orchestration (optional)Xinity EngineVienna-based open-source layer (Apache 2.0) for an EU-sovereign stack
Model selectionOpen-source models (Llama, Qwen, Mistral, DeepSeek, etc.)Version choice in the architecture conversation
IntegrationEmail (IMAP/SMTP), DMS, n8n, Slack/Teams, your own APIsper use case
Maintenanceper day rate on demand or monthly retainer
Day rateCHF 2,200 / daybuild, delivery, maintenance

Instead of ChatGPT Enterprise / Copilot

Cloud solutions are often faster to set up and cheaper to start with. But: your data goes to OpenAI or Microsoft, often via US servers, with revDSG implications that are barely tenable in regulated industries. If cloud is enough, use it. If not, on-prem.

Instead of a Mac mini cluster or a self-built GPU server

Many technical teams try Ollama on a Mac mini, an old workstation or a self-configured GPU server. And then get stuck: too little memory for serious models, plus multi-user auth, model routing, logging, RAG connection, maintenance. A ThinkStation PGX with 128 GB unified memory solves the memory problem. We solve the software layer on top.

Partner: Xinity, Vienna sovereign-AI stack

Sometimes a client needs a fully EU-sovereign software stack. So not only hardware in Switzerland, but also an orchestration layer developed in the EU. Then we integrate the Xinity Engine. Xinity is a Vienna startup, founded by Alexander Zehetmaier and Jonas Vander. It delivers an OpenAI-compatible API, model routing across multiple GPUs, audit trails and cost control as open-source software (Apache 2.0). For clients with heightened requirements for EU AI Act readiness and sovereign-AI architecture, the most honest choice.

Use cases

  • Fiduciary SME: local LLM for mandate correspondence; RAG on the client DMS; data does not leave the business.
  • Software firm with IP: coding copilot local on the code repository, without code going to OpenAI/GitHub. → For tech teams
  • Medical practice / clinic office: dictation transcription and document processing locally.

Frequently asked questions

What does "on-premise AI" mean?

An AI setup where the language model runs on hardware that sits at your premises. Typically a workstation in an office or server room. Your data does not leave your administrative perimeter. Unlike cloud AI such as ChatGPT or Copilot, where requests run on US servers.

Do I need on-prem or is cloud AI enough?

Cloud AI is enough for many office workflows. Even under revDSG, provided the data processing agreement is cleanly arranged. On-prem becomes relevant with IP-sensitive data, regulated industries such as fiduciary, healthcare or finance, or with the internal rule: data stays in-house. The architecture guide helps you decide.

Which LLM models run on-prem?

Open-source models: Llama, Qwen, Mistral, DeepSeek and others. Which model fits concretely depends on the use case, hardware and language quality. We clarify that in the architecture conversation. Models age fast, which is why there are no version numbers here.

What does an on-prem setup cost?

Two components. First the hardware, one-off: a Lenovo ThinkStation PGX as the standard platform, current Lenovo Switzerland list price in the architecture conversation. Second the build and maintenance effort at the day rate of CHF 2,200. A typical initial build needs 3–8 days of effort, plus the hardware.

Is on-prem revDSG-compliant?

An on-prem architecture makes revDSG compliance considerably easier, because data processing to third countries falls away. "revDSG-compliant" is not a certificate, though. Compliance depends on the overall architecture and the data processed. For regulated industries we recommend additional legal guidance.

Which hardware do I need?

The standard platform is the Lenovo ThinkStation PGX with the NVIDIA GB10 Grace-Blackwell Superchip and 128 GB unified memory. This workstation covers teams of roughly 5–40 users, depending on model size and usage profile. For larger setups we link multiple stations or add GPU servers.

Can you host it in a Swiss data centre?

The standard delivery model is hardware at your premises. That is exactly the point. Swiss data-centre hosting is possible, but only sensible if you already run a data-centre setup. When in doubt: on-site.

Who maintains the setup?

Three options. You maintain it yourself, we guide you through it. Or we maintain on demand at the day rate. Or you take a monthly maintenance retainer.

Can we swap models?

Yes, the stack is open-source-based (Ollama as the engine). Model changes are days, often hours. If a new model runs better, we swap it in.

How long does the build take?

From the architecture conversation to production typically 2–6 weeks, depending on complexity, hardware lead time and integrations. A first working version often after just 1–2 weeks.

What happens if Waldsee is no longer available?

Hardware (Lenovo) and software (Ollama, Open-WebUI) are standard components. Any other AI engineering team can take over the architecture and carry it forward. No proprietary black boxes. Documentation is part of the delivery standard.

Can we combine on-prem with cloud?

Yes, hybrid architectures are often sensible: sensitive workflows on-prem, generic workflows in a revDSG-compliant cloud. This is planned per use case in the architecture conversation.

What is Xinity, and when do you use it?

Xinity is an open-source orchestration layer for on-prem LLMs, developed in Vienna: OpenAI-compatible API, model routing, audit trails. We integrate Xinity when a client wants not only Swiss hardware but also an EU-sovereign software stack. For pure single-team setups, Ollama + Open-WebUI is enough.

Data belongs in the business, not in the cloud.

Book a 60-minute architecture conversation. Free, qualifying, no sales pitch.