Condense

Developers

Company

Resources

Try For Free

Condense

Developers

Company

Resources

Try For Free

Back to All Blogs

How We Built a Website Assistant on Vapr and Deployed It on Condense

Written by

Sachin Kamath

|

AVP - Marketing & Design

Published on

Jun 28, 2026

Condense Apps

AI Agent

cover image - website agent built on Condense

Share this Article

Share This Article

TL;DR

We built an AI assistant that lives on zeliot.in and handles everything for a first-time visitor or an active user might need. Ask about Condense or Vapr, book a meeting, read a blog, download an eBook, or sign up to try Condense. It queries Zeliot's website and documentation in real time, maintains conversation context across turns, and can take actions directly inside the chat. Orchestration runs on Vapr, and the entire stack is deployed on Condense on Zeliot's own cloud. This post walks through the architecture and the decisions that shaped it.

Why We Built This

People land on zeliot.in at very different stages. Some are evaluating Condense for the first time and want to understand how BYOC works. Some are engineers looking for a specific integration in the docs. Some are ready to sign up and just need the right link. Some have a question at 11 PM when no one is online.

For all of these, the standard website experience like navigation menus, a search bar, and a contact form, creates friction. Users have to figure out where to look, open multiple pages, and leave the site to book a meeting or find an eBook. A lot of intent gets lost in that process.

The assistant we built collapses that entirely. A user can type "how does BYOC work", "show me blogs on Kafka pricing", "I want to try Condense", or "book a demo", and get a direct, useful response without leaving the chat window.

What the Assistant Can Do

Before getting into how it's built, here's what it actually handles:

Product questions

Anything about Condense, Vapr, pricing, deployment models, connectors, certifications, or comparisons with other platforms. It pulls from zeliot.in and docs.zeliot.in in real time.

Blog discovery

A user can ask "show me articles on Kafka migration" or "what have you written about BYOC" and the assistant surfaces relevant posts directly.

eBook and resource lookup

Similar to blogs, but for downloadable guides and whitepapers from the resources section.

Meeting booking

The assistant checks availability and creates an appointment directly through Microsoft Graph APIs without redirecting the user to an external page.

Condense sign-up

Users who want to try the platform can be guided through the sign-up flow from the same conversation.

Each of these is implemented as a discrete tool that Vapr invokes based on what the user is asking. The distinction between "answering a question" and "completing an action" is important, both happen inside the same chat, and the user doesn't have to context-switch between them.

Architecture Overview

The system has three layers:

Orchestration

Orchestration is handled by Vapr, Zeliot's autonomous AI agent. Vapr determines what the user is asking, which tools to invoke, and how to combine retrieved information into a coherent response.

Tool Layer

A set of purpose-built tools Vapr can call: documentation and website search, blog retrieval, eBook lookup, availability checking, appointment creation, and sign-up flow initiation.

Knowledge Sources

zeliot.in, docs.zeliot.in, and structured data for blogs, eBooks, and resources. These are indexed and queried in real time.

Keeping orchestration, tool execution, and knowledge sources as separate components means each can be updated independently. Adding a new tool, say, surfacing podcast episodes doesn't require touching the retrieval pipeline or the session layer.

Building the Knowledge Layer

The assistant needs to answer questions about Condense and Vapr accurately. That means pulling from real content: product pages, documentation, comparison pages, customer stories, rather than from a model's training data, which goes stale.

Crawling and Extraction

The ingestion pipeline crawls zeliot.in and docs.zeliot.in recursively from configured base URLs. At each page, it strips navigation, sidebars, footers, and framework-generated elements before passing content downstream. Raw HTML is noisy: navigation menus, repeated CTAs, and GitBook artifacts all add content that hurts retrieval quality. Cleaning this before indexing made a visible difference in early testing.

Chunking

Content is split into semantically meaningful chunks rather than at arbitrary character limits. Arbitrary splits produce chunks that are syntactically complete but miss the point of what a section is about. Chunk size and overlap are configurable, which matters because documentation pages and blog posts have different structural patterns.

Embeddings

Chunks are embedded using Nomic's embedding model, strong retrieval performance, long-context support, and fast enough to embed zeliot.in's full content without heavy infrastructure. The pipeline uses asymmetric retrieval patterns, embedding documents and queries differently for better search accuracy.

Storage

Embedded chunks are stored in ChromaDB with an HNSW index. Each entry includes the chunk content, metadata, and source URL, which lets the assistant return source links with answers rather than generating responses without attribution.

Two-Stage Retrieval

When a user asks a question, a single vector search isn't sufficient for a production assistant. Too much noise gets through.

User Query > Embedding Search > Top Candidate Documents

The retrieval pipeline runs in two stages. The first stage does a fast semantic similarity search against ChromaDB to get a candidate set, the goal is recall, not precision. The second stage passes those candidates through a Cross Encoder reranker, which evaluates each query-document pair together instead of comparing vectors. This removes the noise, and the result is more precise, more relevant context is what gets passed to Vapr for response generation.

Candidate Documents > Cross Encoder > Ranked Results

The difference in answer quality between vector-only retrieval and the two-stage approach was significant enough that we wouldn't have shipped without the reranker.

Conversational Memory

The assistant maintains context across turns so a conversation feels continuous rather than a series of isolated Q&As. A user who starts by asking about Kafka connectors, then asks "which of these work with MQTT?", then says "can I book a call to discuss?", that whole thread stays coherent.

Session state is stored in SQLite: session identifiers, chat history, and cached responses. For long conversations, a compression mechanism periodically summarizes older turns while preserving the context that matters. Without this, long sessions become slow and expensive quickly.

Streaming Responses

The assistant streams responses back using Server-Sent Events rather than waiting for a complete answer before showing anything. Users see output almost immediately, which matters more than it might seem a three-second wait for the first token feels significantly slower than a three-second wait before a complete response appears all at once, even if the total time is the same.

User Request > Orchestrator > Token Stream > SSE Channel > Browser

Streaming uses dedicated thread-isolated event loops to prevent blocking other application threads under concurrent load.

Meeting Booking and Actions

The scheduling flow is worth describing in detail because it's one of the more useful things the assistant does.

When a user says something like "I'd like to book a demo" or "can I speak to someone about pricing", Vapr invokes the availability tool, which queries Microsoft Graph APIs to find open slots. The user picks a time, the appointment is created, and a confirmation comes back, all within the chat. The user never leaves zeliot.in to use an external booking tool.

The same pattern applies to Condense sign-ups. Rather than sending the user to a separate flow, the assistant can guide them through it from the conversation. Actions happen where the intent is expressed, not somewhere else.

Deploying on Condense

The full stack runs on Condense, on Zeliot's own cloud. Condense handles containerization, scaling, environment variable management, and monitoring without requiring separate infrastructure decisions for each.

The deployment follows the Condense Applications workflow:

Step 1

Create a workspace inside Condense and click Create Custom to start a new application.

Step 2

Connect your GitHub, GitLab, or Bitbucket account. Select the repository and branch to deploy from.

Step 3

Under Publish As, select Output Connector, add a description, set an expiry period, and click Publish Application.

Step 4

Confirm the Dockerfile is at the root of the repository. The built-in VS Code interface lets you make and push changes directly from the browser without switching to a local terminal.

Step 5

Configure environment variables using + Configure Envs. Add variable names, configuration names, and values, then save.

Step 6

Choose HTTPS or TCP exposure depending on your application's requirements.

Step 7

Click Build Application, enter an image name and tag, and start the build. Logs stream in real time so you can catch failures immediately.

Step 8

Once the build completes, click Publish Application, select Custom from the Categories dropdown, adjust environment variables and resource settings if needed, and click Deploy Connector.

The assistant is now live. The deployed connector view gives you the ingress path, environment variable overrides, and application logs. Use the Start button in the Logs tab to confirm it's running correctly.

Our Deployment

The assistant is live on zeliot.in. It queries content from zeliot.in and docs.zeliot.in, surfaces blogs and eBooks from the resources section, handles meeting booking through Microsoft Graph, and guides users through Condense sign-up, all from a single chat interface.

Vapr manages orchestration: deciding when to retrieve from the vector store, when to invoke the scheduling tool, when to surface a blog or eBook, and when to guide a sign-up. The entire stack runs on Condense on Zeliot's cloud the same platform we're recommending to customers for their own streaming infrastructure.

What We Learned Building This

Content quality matters more than retrieval sophistication

Cleaning navigation, sidebars, and framework-generated content before indexing had a bigger impact on answer quality than any retrieval tuning. Bad input produces bad output regardless of how good the retrieval pipeline is.

Two-stage retrieval is non-negotiable for production

The Cross Encoder reranker eliminated the category of answers that were technically sourced from the right general area but missed the specific point the user was asking about. We wouldn't ship a customer-facing assistant without it.

Actions need to be tools, not reasoning

Having Vapr invoke discrete, deterministic tools for booking and sign-up produced far more reliable outcomes than prompting a model to reason through the same actions. For anything that creates a side effect, booking a calendar slot, initiating a sign-up use a tool.

Streaming is a UX decision, not a technical nicety

The isolated event loop design adds implementation complexity, but the experience difference for users is real enough that it was worth it.

What It Can Do

The assistant can answer any question about Zeliot, Condense, or Vapr, Condense Apps, from pricing to compliance certifications to connector specifics. It surfaces blogs and eBooks on demand. It books meetings without leaving the chat. It guides users through Condense sign-up. And it does all of this while maintaining context across a full conversation.

More than anything, it removes the gap between "I have a question" and "I have an answer", which is the thing a website is supposed to do but, manages to do it more easily and quickly.

Frequently Asked Questions (FAQs)

ChromaDB with an HNSW index was the right fit for this use case, it runs embedded without a separate server, handles the document volumes zeliot.in produces without over-engineering, and integrates cleanly with the rest of the Python stack. For a larger corpus or multi-tenant retrieval, a managed vector database would make more sense.

Nomic offers strong retrieval performance, long-context support, and significantly faster embedding generation without depending on an external API call for every document chunk. For a pipeline that re-indexes on content changes, local embedding speed matters.

Vector search compares embeddings independently, the query vector against each document vector. A Cross Encoder takes the query and each candidate document together as a single input and scores their relevance jointly. This is more computationally expensive but produces significantly better precision. In our setup, vector search handles recall (finding candidates quickly), and the Cross Encoder handles precision (ranking those candidates correctly).

Chunks are created semantically rather than at fixed character limits. The pipeline uses configurable chunk size and overlap parameters, but the boundary logic respects sentence and section structure rather than splitting mid-thought. Documentation pages and blog posts have different structural patterns, so the parameters are tuned separately for each content type.

When conversation history exceeds a configured token threshold, the compressor summarises older turns into a condensed representation while keeping the most recent turns verbatim. The summary replaces the raw history for context injection, which keeps token consumption bounded without losing continuity. This is implemented as a separate service that runs between turns rather than inline with response generation.

SSE is unidirectional, server to client, which is exactly what streaming a response requires. WebSockets add bidirectional overhead that isn't needed here. SSE also reconnects automatically on network interruption, which matters for a customer-facing assistant on a website. The isolated event loop per session prevents one user's stream from blocking another's.

Vapr evaluates the user's intent against the available tool definitions and selects the appropriate tool or combination of tools based on what the request requires. For ambiguous requests it can ask a clarifying question before invoking. The tool definitions include descriptions of what each tool does and when it should be used, which Vapr uses for selection without hard-coded routing logic.

Environment variables are configured directly in Condense's Applications IDE masked, named, and versioned per deployment. They're never stored in the repository or visible in build logs. Changing a variable value creates a new deployment without requiring a code change or rebuild.

The ingestion pipeline can be triggered on a schedule or on content change events. When re-indexing runs, it crawls the configured URLs, extracts and chunks content, generates new embeddings, and updates ChromaDB. Existing entries are replaced rather than duplicated. For zeliot.in, we tie this to the publish pipeline so the knowledge base stays current with the live site.

Yes. The retrieval pipeline, session layer, and tool execution framework are all generic. Swapping in a different content source, a different tool set, or a different embedding model doesn't require changing the orchestration layer. The Condense deployment follows the same steps regardless of what the application does.

Dive Deeper with AI

Get exclusive blogs, articles and videos on data streaming, use cases and more delivered right in your inbox!

Ready to Switch to Condense and Simplify Real-Time Data Streaming? Get Started Now!

Switch to Condense for a fully managed, Kafka-native platform with built-in connectors, observability, and BYOC support. Simplify real-time streaming, cut costs, and deploy applications faster.

Back to All Blogs

How We Built a Website Assistant on Vapr and Deployed It on Condense

Written by

Written by

Sachin Kamath

Sachin Kamath

|

AVP - Marketing & Design

AVP - Marketing & Design

Published on

Published on

Jun 28, 2026

Condense Apps

AI Agent

Share this Article

Share This Article

TL;DR

Why We Built This

What the Assistant Can Do

Product questions

Blog discovery

eBook and resource lookup

Meeting booking

Condense sign-up

Architecture Overview

Orchestration

Tool Layer

Knowledge Sources

Building the Knowledge Layer

Crawling and Extraction

Chunking

Embeddings

Storage

Two-Stage Retrieval

Conversational Memory

Streaming Responses

Meeting Booking and Actions

Deploying on Condense

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Our Deployment

What We Learned Building This

Content quality matters more than retrieval sophistication

Two-stage retrieval is non-negotiable for production

Actions need to be tools, not reasoning

Streaming is a UX decision, not a technical nicety

What It Can Do

Frequently Asked Questions (FAQs)

Why ChromaDB over Pinecone or Weaviate?

Why Nomic over OpenAI embeddings?

How does the Cross Encoder reranker actually work differently from vector search?

How is chunking implemented, what determines chunk boundaries?

How does history compression work technically?

Why Server-Sent Events over WebSockets for streaming?

How does Vapr decide which tool to invoke?

How are environment variables and secrets managed in the Condense deployment?

What happens if a content source changes, how does re-indexing work?

Can this architecture be replicated for a different product or website using Condense?

Dive Deeper with AI

On this page

Get exclusive blogs, articles and videos on data streaming, use cases and more delivered right in your inbox!

Ready to Switch to Condense and Simplify Real-Time Data Streaming? Get Started Now!

Book a Meeting

Book a Meeting

Explore Documentation

Explore Documentation

Other Blogs and Articles

Other Blogs and Articles

How We Built an Automated RSS Feed for Our Framer Website Without a Plugin

Kafka Streams: A Production Guide to Joins, Aggregations, and Stateful Processing

NEW

Oracle Cloud

COMING SOON

HIRING