For tech companies

AI for tech companies in Switzerland

Coding copilots, RAG systems, on-prem LLMs for software houses that do not want to send their codebase to OpenAI.

  • On-prem on ThinkStation PGX (128 GB)
  • Ollama + Open-WebUI
  • Xinity Engine optional
  • revDPA + IP protection
  • CHF 2,200 / day

You lead a tech team. Your engineers use Cursor, Claude Code, Copilot. Some paid by the company, some privately. Your codebase contains IP that must not end up in US cloud logs.

Marketing consultancies offer "AI workshops" that bring nothing, because their consultants do not know the stack. We are different. We know Ollama, vLLM, llama.cpp, Open-WebUI, n8n. We build local LLM setups on a Lenovo ThinkStation PGX at your site, integrate them into your engineering workflows and show how a coding copilot works that does not phone home with your code.

Three typical tech-team use cases

Use caseStack
Code-aware assistant without cloud egressOllama host model + RAG on repository index + IDE extension or chat UI
PR review bot for security-relevant codeSelf-hosted LLM + Git webhook + custom prompt layer
Internal knowledge bot over Confluence/Notion/SharepointOllama + Open-WebUI + RAG pipeline

Why Waldsee instead of a big-cloud consultancy

We understand why a software shop with IP-sensitive code cannot "just take OpenAI". We have set up the open-source stack ourselves, not just read about it. And we say honestly when cloud AI is enough. And when it is not.

Hardware: ThinkStation PGX in the tech team

The Lenovo ThinkStation PGX with NVIDIA GB10 Grace Blackwell superchip and 128 GB unified memory is the hardware base for serious LLM inference in the tech team. It covers typical engineering teams (5–40 devs) and runs with Ollama as engine and Open-WebUI as UI. On request we add the Xinity Engine as an orchestration layer if you want an EU-sovereign software stack.

Common questions

Which models run sensibly on a ThinkStation PGX?

Open-source models like Llama, Qwen, Mistral, the DeepSeek Coder family. We clarify the specific version choice in the architecture conversation. Models age fast.

How does it integrate with our IDE?

Via IDE extensions (e.g. Continue.dev for VSCode), custom CLI wrappers, or Open-WebUI as a chat frontend for out-of-IDE sessions.

What does it cost?

Hardware: current Lenovo Switzerland list price (in the architecture conversation). Setup: 3–8 days of Waldsee effort at the day rate CHF 2,200.

Is a Mac mini with Ollama not enough?

For single-dev experiments, yes. For a team with serious memory needs (large models, RAG indexes, multi-user) the Mac mini quickly becomes a bottleneck.

Does this replace Cursor / GitHub Copilot?

No, often not. It complements them. Cursor for "normal" code, on-prem for IP-sensitive code. Hybrid setups are the rule, not the exception.

Let us talk concretely. 30 min, free.