AI that actually lives in your processes — not as a demo, but as a tool.
AI integration and workflow automation for mid-sized and larger companies, from Germany: LLMs and assistants placed where they take real work off the table — inside CRM and ERP, portals, inboxes, databases, and APIs. With RAG on your real content, clear permissions and logging, human-in-the-loop, and honest evaluations. No bot theatre, no buzzwords — engineering that stays in production. Founder-led, from Krefeld.
- OpenAI · Azure OpenAI · Anthropic · Mistral · open-weight models
- RAG · vector DBs · LangChain / LlamaIndex · our own eval setup
- Based in Krefeld · Germany & EU · GDPR-aware architecture
AI as a tool — not as an end in itself.
Most companies have learned two things in the last 24 months: AI is impressive in a demo — and disappointing more often than not in real operations. The gap is rarely the model; it is almost always the integration. p24.co builds AI into existing processes: grounded in your data (RAG), embedded in your systems (CRM, ERP, portals, email, databases), with clear permissions, logs, eval criteria, and human-in-the-loop where it belongs. What does not work gets sunset honestly, not patched with one more prompt-engineering band-aid. The result is AI that is still running six months later — and that you can operate yourself, without needing us.
Real use cases where AI actually carries weight today.
We build AI where it solves a clear, recurring bottleneck — not everywhere. These are the six scenarios we see most often in practice:
Internal assistant for staff
An assistant that answers questions from manuals, tickets, contracts, wiki, and intranet — with citations, with permissions, and without hallucinating about anything that is not actually in the data. Cuts onboarding time, frees up senior people, and replaces the “let me just ask a colleague” pattern.
RAG knowledge search on your content
Semantic search across documents, PDFs, SharePoint, Confluence, tickets, and email. Answers with source links, not just a list of hits. Updates automatically when the source changes — and respects permissions (whatever a user must not see does not surface in the answer either).
Support and sales copilots
Assistants that draft ticket replies, formulate quotes, pull similar cases, summarise customer calls, or draft an email in the right tone. Always as a draft — the human keeps the last word.
Document processing & data extraction
Invoices, orders, contracts, delivery notes, technical datasheets: structured fields, classification, anomaly detection, handover into ERP or CRM. Instead of retyping, AI runs the standard cases automatically — and flags cleanly when a document is unclear.
Workflow and process automation
Automating multi-step flows: classify an incoming email, route it to the right team, open a ticket, draft a follow-up, create a CRM record. AI only fills the gaps where deterministic rules fail — the rest stays deterministic.
Data extraction, reporting & insights
Pulling regular insights from sales data, logs, support tickets, NPS feedback, and free text inside databases: topic clusters, anomalies, trends, automatic weekly and monthly reports. Instead of a dashboard nobody opens, an email with the three things that actually matter this week.
Why AI projects fail — and how we avoid it.
When we take over from another setup, we see the same six patterns nearly every time. We walk through them deliberately before the project — they decide whether the AI is still alive in six months:
A use case without a real bottleneck
“We need to do something with AI” is not a use case. We ask about volume, repetition rate, current manual effort, and the cost of getting it wrong. When those numbers are small, AI is rarely the honest answer — a better form, a script, or a clearer process tends to do more.
AI without a link to real data
An LLM without your data produces generalities and hallucinations. RAG (retrieval-augmented generation) grounds the model in your actual content: manuals, tickets, PDFs, ERP data, database rows. Answers carry citations, and the “I cannot say anything about that” zone is clearly marked.
Missing permissions, logs, and auditability
AI must not be the one system that quietly bypasses every other system’s permissions. We wire permission checks into the retrieval layer, log queries and answers, attribute conversations to users, and make sure an admin can reconstruct what was said and on what basis later.
No human-in-the-loop where it would matter
Full autonomy sounds nice, but for quotes, legal text, customer email, and financial decisions it is the fastest path to an escalation spiral. We design workflows so AI prepares and suggests — and a human confirms, edits, or rejects. Where true autopilot is acceptable, we define it explicitly.
No honest evaluation
“It feels pretty good” is not a quality bar. We build an eval setup per use case with real examples from your work (positive, negative, tricky), measure hit, hallucination, and escalation rates, and compare models, prompts, and retrieval strategies on hard numbers instead of vibes.
AI with no plan for rollout and change
An assistant nobody uses is technical debt. We plan pilot, training, clear communication (“here is what it will do, here is what it will not”), feedback channels, and versioning. AI is also a change project — and that belongs in the plan, not in a five-minute demo at the end.
Building blocks, stack, and architecture — what you actually get.
A p24.co AI integration is not “an API key plus a chat window”. Every block below is deliberately chosen, documented, and handed over:
1. AI discovery & use-case scan
A structured inventory: where is real manual effort today, where is the cost of a wrong answer high, where does the data sit? Output: a prioritised use-case list with effort estimate, value rationale, risk, and an honest “not yet” for use cases that are not ready.
2. Model & stack decision
Choosing between OpenAI, Azure OpenAI, Anthropic, Mistral, and open-source models (Llama, Qwen) — based on requirements, privacy, hosting (EU / Azure West Europe vs. public cloud), and cost. Vendor lock-in is avoided by abstracting the provider cleanly in the integration.
3. RAG & data pipeline
Ingestion from SharePoint, Confluence, file shares, S3, databases, CRM/ERP, helpdesk. Chunking, embeddings, vector DB (pgvector, Qdrant, Azure AI Search), re-ranking, citation enrichment. Incremental updates instead of full re-indexing on every change.
4. Assistant application & UI
A chat or copilot surface that fits your world: a standalone portal, embedded in the intranet, in a CRM side-panel, in Outlook/Teams, or inside an existing helpdesk. With clean source display, feedback buttons, history, and a “please do not answer this” block list.
5. CRM, ERP, portal, email, and API integration
AI is only useful once it can read and write. We integrate with HubSpot, Salesforce, Microsoft Dynamics, SAP Business One, SAP S/4HANA interfaces, your own .NET/Node backends, Exchange/Microsoft 365, Postgres/MSSQL, and JSON APIs — with clear idempotency and retry rules.
6. Permissions, logging, audit & GDPR
Identity (SSO/Entra ID), permission checks at the retrieval layer, clean logging of queries and answers, data-processing agreements with model providers, EU hosting where needed, a template DPIA. No personal data into uncontrolled models.
7. Human-in-the-loop & escalation workflows
Clear rules for when AI decides, when it only suggests, and when a human must confirm. With UI blocks for “confirm, edit, reject”, an audit trail for every decision, and configurable thresholds per use case.
8. Evaluation, telemetry & guardrails
Test sets with real examples per use case, an eval pipeline (LLM-as-judge plus hard metrics), live telemetry (response times, escalation and error rates), prompt and model versioning. Guardrails against prompt injection, PII leakage, and hallucinations.
9. Documentation, handover & training
Architecture document, data-flow diagrams, prompt library, eval guide, operations runbooks. Training for power users, admins, and IT — so you can keep evolving the system with your own team. No lock-in, no secret sauce.
From use-case scan to production rollout — the process.
AI projects almost always die at the demo-to-production step. We plan exactly that step from the start: small pilot, honest eval, then expansion.
- 01
Discovery & use-case scan
Week 1–2Workshops with business and IT, a tour of the data landscape, capturing volumes and effort, prioritising use cases. Output: a compact AI brief with 2–3 prioritised use cases, assumptions, risks, effort estimate, and an honest recommendation on where to start.
output → AI brief · use-case list - 02
Architecture, model & data plan
Week 2–3Decisions: model, hosting, RAG stack, vector DB, integration with existing systems, permissions, logging. Architecture sketch, data-flow diagrams, privacy assessment, stack rationale — written, aligned with you.
output → Architecture doc · data flows - 03
Prototype on real data
Week 3–6A runnable prototype on your actual content (anonymised where needed). First eval round on real examples: what works, what hallucinates, which answers need a human in the middle. Expectations get calibrated here, not later.
output → Prototype · first eval - 04
Pilot with real users
Week 6–10A controlled pilot with a single team or department. Telemetry live, feedback channel defined, weekly iteration on prompts, retrieval, and UX. We measure usage rate, hit rate, escalation, and hallucination — no gut-feel.
output → Pilot · eval report - 05
Rollout, training & change
Week 10–14Staged expansion to further teams, training for power users and admins, clear communication (“here is what it will do, here is what it will not”), feedback and escalation paths, prompt and model versioning.
output → Rollout · training · runbooks - 06
Operations, eval loop & roadmap
From week 14Go-live, telemetry, regular eval rounds, update paths for models and prompts, a clean way to migrate between providers. Then roadmap-based expansion to more use cases — based on real numbers, not more hype cycles.
output → Live operation · eval loop · roadmap
What “good” means for an AI integration, in practice.
You do not recognise a good AI integration in a live demo — you recognise it by what is still running six months later. This is the bar:
Answers carry sources
Every substantive answer points to a place in a document, ticket, or record. If somebody wants to verify, they can. Anything not covered by a source is flagged as such — not delivered as a confident hallucination.
Hallucinations are visible, not hidden
We measure hallucination rates per use case and per model and keep them on a line over time. A model update is not allowed to silently degrade behaviour — that has to show up in eval before it shows up in production.
Permissions are non-negotiable
Whatever a user cannot see in the source system must not surface in an AI answer. Permission checks sit at the retrieval layer, not inside the prompt — and they are tested, not hoped for.
A human has the last word where it matters
Quotes, legal text, customer email, financial decisions, HR topics: AI prepares, a human decides. Full autopilot exists only where risk and reversibility are checked.
Eval is a process, not a demo moment
We run eval pipelines on real examples, not marketing samples. When models or prompts change, eval runs automatically — and blocks the rollout if metrics drop.
AI lives with the business, not next to it
Assistants are wired into your systems so answers stay fresh: if a contract changes, the model sees it on the next call. Nothing sits six weeks stale in an index without anyone noticing.
Privacy, EU hosting, and founder accountability.
AI is a sensitive layer between your data and an outside world you do not fully control — models, providers, logs. p24.co is run by Dimitri Kronich from Krefeld, Germany. You get a direct technical counterpart with an EU base who really owns the architecture, the data flows, and the contracts with model providers — not someone who delegates them downwards.
- 01Founder-level ownership — a direct line to the person who decides architecture, data flows, and provider choice.
- 02Based in Krefeld · Germany & EU — GDPR-aware delivery, EU hosting preferred (Azure West Europe, Hetzner, AWS Frankfurt), clear processing grounds.
- 03Data-processing agreements with model providers, clean separation of personal data, documented data flows, DPIA template included.
- 04No “secret sauce” — architecture, prompts, and configuration are documented; handover to your team or another vendor is possible at any point.
- 05Provider-agnostic architecture — OpenAI, Azure OpenAI, Anthropic, Mistral, or open-weight models are interchangeable; no lock-in as a pseudo-strategy.
Frequently asked questions about AI integration and workflow automation.
What does a serious AI integration actually cost?
A first production use case with RAG, integration into one or two systems, an eval setup, and a permission model typically sits in the mid- to upper five-figure euro range, plus ongoing model usage and hosting costs. Larger setups with several use cases, several data sources, workflow automation, and training reach six figures quickly. We make assumptions, model costs, and follow-up costs transparent — no “fixed price” on top of half-defined requirements.
Which models do you use — and does my data have to go to the US?
We choose per use case: OpenAI / Azure OpenAI (with EU regions and without using your data for training), Anthropic, Mistral, or open-weight models (Llama, Qwen) self-hosted in the EU. For sensitive use cases we use Azure OpenAI in West Europe or self-hosted open source — your data does not leave the EU.
What is RAG and do we really need it?
RAG (retrieval-augmented generation) grounds a language model in your real content: documents, databases, tickets. The model pulls relevant passages from your data on every request and answers based on them — with citations. Without RAG you get generalities and hallucinations; with RAG you get answers that match your reality and that you can verify.
How do you stop the AI from making things up (hallucinations)?
Three levers together: first, RAG with solid source grounding so the model works from your content instead of guessing. Second, prompt design with strict “only answer when the information is in the sources” rules. Third, an eval setup that actively measures hallucinations and warns us when a model or prompt change makes things worse. Hallucinations cannot be eliminated entirely, but they become measurable and manageable.
Do the AI models see confidential employee or customer data?
Only if it is intentionally part of the use case and properly contracted — and even then only via providers and regions with a real DPA and no training use. We wire permission checks into the retrieval layer: whatever a user must not see in the source system does not surface in the AI answer either. For especially sensitive areas we use self-hosted open-source models in the EU.
How do you measure whether the AI is “good enough”?
For each use case we build an eval set from real examples in your business (positive, negative, tricky cases) and measure hit, hallucination, escalation, and response-time metrics. Before every rollout — and on every model or prompt change — this eval runs. “Good enough” is not a vibe, it is a number we agree on together.
Can AI work fully autonomously, or does a human always need to be in the loop?
Both — it varies per use case. For internal knowledge search, topic clustering, or data enrichment, AI usually runs autonomously. For customer email, quotes, legal text, financial decisions, or HR topics we design human-in-the-loop: AI prepares, a human confirms, edits, or rejects. Wherever true autopilot is acceptable, we define risk and reversibility explicitly.
Which systems can you integrate the AI with?
CRM (HubSpot, Salesforce, Microsoft Dynamics, Pipedrive), ERP (SAP Business One, SAP S/4 interfaces, Microsoft Dynamics 365 Business Central, Odoo), helpdesk (Zendesk, Freshdesk, Intercom), Microsoft 365 (Outlook, Teams, SharePoint), databases (Postgres, MSSQL, MySQL), file stores (SharePoint, S3, Azure Blob, on-prem shares), and any JSON/REST API. If a system has an API, we can integrate it — and if it does not, we tell you up front.
Who runs the system after the project?
Optionally us, your team, or a mix. We hand over architecture, data-flow diagrams, prompt library, eval guide, and runbooks so that your team can run the system alone. On request we keep operating it under a clear mode (monitoring, eval loop, model updates, extensions). You are never locked to us — that is part of the architecture, not just a promise.
What if a model provider raises prices or sunsets a model?
That is exactly why we abstract the provider. Prompts, retrieval, workflow logic, and eval are provider-agnostic. Switching from OpenAI to Azure OpenAI, to Anthropic, to Mistral, or to a self-hosted open-weight model is a configuration and eval exercise, not a new project. Vendor lock-in is a design choice — we deliberately design against it.
Related services and topics
Let’s price your first honest AI use case.
Tell us briefly where the effort sits today: a lot of near-identical emails, hard-to-find knowledge, repetitive quoting, document capture, or reporting. You will get an honest, technical view — including “AI is the right tool here” or “not really” — directly from the founder, without a sales layer and without buzzwords.