For tech companies
AI for tech companies in Switzerland
Coding copilots, RAG systems, on-prem LLMs for software houses that do not want to send their codebase to OpenAI.
- On-prem on ThinkStation PGX (128 GB)
- Ollama + Open-WebUI
- Xinity Engine optional
- revDPA + IP protection
- CHF 2,200 / day
You lead a tech team. Your engineers use Cursor, Claude Code, Copilot. Some paid by the company, some privately. Your codebase contains IP that must not end up in US cloud logs.
Marketing consultancies offer "AI workshops" that bring nothing, because their consultants do not know the stack. We are different. We know Ollama, vLLM, llama.cpp, Open-WebUI, n8n. We build local LLM setups on a Lenovo ThinkStation PGX at your site, integrate them into your engineering workflows and show how a coding copilot works that does not phone home with your code.
Three typical tech-team use cases
| Use case | Stack |
|---|---|
| Code-aware assistant without cloud egress | Ollama host model + RAG on repository index + IDE extension or chat UI |
| PR review bot for security-relevant code | Self-hosted LLM + Git webhook + custom prompt layer |
| Internal knowledge bot over Confluence/Notion/Sharepoint | Ollama + Open-WebUI + RAG pipeline |
Why Waldsee instead of a big-cloud consultancy
We understand why a software shop with IP-sensitive code cannot "just take OpenAI". We have set up the open-source stack ourselves, not just read about it. And we say honestly when cloud AI is enough. And when it is not.
Hardware: ThinkStation PGX in the tech team
The Lenovo ThinkStation PGX with NVIDIA GB10 Grace Blackwell superchip and 128 GB unified memory is the hardware base for serious LLM inference in the tech team. It covers typical engineering teams (5–40 devs) and runs with Ollama as engine and Open-WebUI as UI. On request we add the Xinity Engine as an orchestration layer if you want an EU-sovereign software stack.
Common questions
Which models run sensibly on a ThinkStation PGX?
Open-source models like Llama, Qwen, Mistral, the DeepSeek Coder family. We clarify the specific version choice in the architecture conversation. Models age fast.
How does it integrate with our IDE?
Via IDE extensions (e.g. Continue.dev for VSCode), custom CLI wrappers, or Open-WebUI as a chat frontend for out-of-IDE sessions.
What does it cost?
Hardware: current Lenovo Switzerland list price (in the architecture conversation). Setup: 3–8 days of Waldsee effort at the day rate CHF 2,200.
Is a Mac mini with Ollama not enough?
For single-dev experiments, yes. For a team with serious memory needs (large models, RAG indexes, multi-user) the Mac mini quickly becomes a bottleneck.
Does this replace Cursor / GitHub Copilot?
No, often not. It complements them. Cursor for "normal" code, on-prem for IP-sensitive code. Hybrid setups are the rule, not the exception.