PlatStone

Services

A private AI platform, built around your team

We stand up local models and a codebase-aware RAG server on your hardware, connect every developer's IDE, and keep the whole thing improving — without your code ever leaving the building.

Track 1

Private & Compliant by Design

For teams in regulated or IP-sensitive fields. We deploy entirely on-prem or air-gapped, with access controls, audit logging, and data governance that pass security review — so your developers get modern AI without a single byte leaving your network.

Track 2

One Brain for Every Developer

For orgs standardizing AI across teams. One shared, codebase-aware platform connected to every IDE — consistent answers, flat predictable cost, and measurable acceleration instead of a scatter of per-seat tools that don't know your code.

What's Included

Engagements we offer

Local Inference Server

A private, high-throughput inference server on your own hardware — the right open model, GPU sizing, and serving stack (vLLM, Ollama, or TGI) for your entire team.

  • Model & hardware sizing
  • vLLM / Ollama / TGI setup
  • OpenAI-compatible API
  • Throughput & latency tuning

Private Codebase RAG

A retrieval server indexed on your repos, internal docs, ADRs, and tickets — so every answer is grounded in how your team actually builds, not the public internet.

  • Codebase & docs ingestion
  • Embeddings & vector DB
  • Retrieval tuning & evals
  • Incremental re-indexing

IDE Integration & Rollout

Connect every developer to the platform from Cursor, VS Code (Continue), and JetBrains — completion, chat, and codebase Q&A — with a smooth org-wide rollout.

  • IDE plugin configuration
  • Team onboarding & docs
  • Usage analytics
  • Adoption playbook

Platform Acceleration (Managed)

An ongoing engagement that keeps your platform sharp — re-indexing new code, upgrading models, adding data sources, and tuning against evals so velocity keeps climbing.

  • Model upgrades
  • Retrieval & eval tuning
  • New data sources
  • Velocity reporting

How We Work

Stand it up fast — then keep making it faster

We get a private platform live quickly, then run a continuous loop that keeps accelerating your team. No guesswork, no black boxes.

01

Assess

A focused technical conversation about your stack, security constraints, hardware, and team workflows. We map where AI will save the most time — and what "private" has to mean for you.

02

Stand Up

We deploy a local inference server and RAG server on your hardware — on-prem or air-gapped. No code leaves your network, from day one.

03

Tailor

We index your codebase, docs, and standards, then tune models and retrieval to your domain so the answers feel like they came from your most senior engineer.

04

Connect & Measure

Every developer is wired in through Cursor, VS Code, and JetBrains. We baseline real metrics — acceptance rate, cycle time, review speed — so impact is visible, not vibes.

05

Improve — continuously

ongoing

As an optional managed engagement, we keep the platform sharp — re-indexing new code, upgrading to better models as they ship, adding data sources, and tuning retrieval against evals. Your team's AI gets smarter every week, not stale.

Platform Acceleration

Questions

Frequently asked

Does any of our code or data leave our network?

No. That is the entire point. The inference server and RAG server run on your hardware — on-prem or fully air-gapped. There are no third-party API calls and no telemetry. Your source code and data never leave your walls.

What hardware do we actually need?

Less than most teams expect. A single modern GPU server can comfortably serve a team with a strong open model, and quantized models run on surprisingly modest hardware. During the discovery call we size everything to your team and budget — and we can start on hardware you already have.

Which models and IDEs do you support?

We run leading open models such as Llama, Qwen, DeepSeek, and Mistral, and pick the best fit for your languages and domain. Developers connect from Cursor, VS Code (via Continue), and JetBrains IDEs — completion, chat, and codebase Q&A, all pointed at your private server.

How is this different from Copilot or cloud AI tools?

Cloud tools send your code to someone else and bill per seat or per token. Ours runs locally, knows your codebase through a private RAG index, and costs are flat and predictable. You get tailored, context-aware answers without the privacy and compliance headaches.

Can it run fully air-gapped for compliance?

Yes. We regularly deploy into air-gapped and regulated environments (finance, healthcare, legal, defense) with access controls, audit logging, and data governance designed to pass security review.

How do you keep the platform improving over time?

Through our optional Platform Acceleration engagement: we re-index new code, upgrade to better models as they ship, add data sources, and tune retrieval against evals — so your team’s AI keeps getting sharper instead of going stale.

How do we get started?

Book a discovery call. It is a focused technical conversation with no sales pitch and no commitment. We will discuss your stack, your constraints, and whether a private AI platform is the right move for your team.

Let's scope your platform

Tell us what you're building. We'll tell you honestly how we'd approach it — and whether we're the right team for the job.