Voice OS Context Window: What the AI Knows Per Turn (2026)

Voice OS Context Window

The context window of a voice OS is the prompt assembled before every conversation turn. It includes the system prompt, the persona, retrieved memories, current calendar and inbox state, the recent conversation, and the user's latest utterance. The art of voice OS engineering is keeping this prompt small and relevant. A bloated context window slows down the LLM, dilutes attention, and increases cost. A starved context window makes the AI forget what you just said. The right size is usually 3,000 to 6,000 tokens.

WHAT TO LOOK FOR

The three things that actually matter

System prompt

The base instructions that define what Lucy is, how it should behave, and what guardrails apply. The system prompt is roughly 800 tokens and is identical across all turns and users. It establishes the persona, the response style, and the rules of engagement.

Calendar injection

The next 8 to 12 calendar events from Google Calendar, formatted as a compact list. This is what lets Lucy answer 'what is on my plate today' without making any tool call. Calendar injection costs about 400 tokens per turn.

Inbox injection

The most recent 10 inbox subjects from Gmail, with sender and a one-line snippet. This is what lets Lucy answer 'did Sarah email me back yet' instantly. Inbox injection costs about 600 tokens per turn.

TLDR:Lucy OS1 builds the context window dynamically per turn. A scheduler runs a few milliseconds before each LLM call: it pulls your next 8 calendar events, the top 10 unread inbox subjects, your most recent 10 conversation turns, and 8 to 15 retrieved memories ranked by semantic relevance to what you just said. The total context typically lands at 4,000 to 5,000 tokens, which keeps time-to-first-token under 200 milliseconds while still giving Lucy real situational awareness for the answer.

Why Lucy OS1

System prompt

Calendar injection

Inbox injection

Memory injection

8 to 15 retrieved memories selected for semantic relevance to the current conversation. Memories are injected as a structured block: subject, fact, when stored. This is the heaviest dynamic component and typically costs 1,000 to 1,800 tokens.

Recent conversation

The last 10 to 20 turns of the current conversation, included so the LLM can reference what was just discussed. Older turns get summarized into a single line to compress them while preserving the thread.

Tool definitions

Function signatures for tools the LLM can call: send_email, create_calendar_event, web_search, set_reminder. These cost about 600 tokens per turn but are required for the LLM to know what actions are available.

QUICK COMPARISON

Lucy OS1 vs most AI tools

Capability	Lucy OS1	Most AI tools
Memory across sessions	✓ Permanent, never resets	✗ Resets after every session
Voice quality	✓ Lucy OS1 Natural Voice (best-in-class)	✗ Basic STT, struggles with noise
Calendar awareness	✓ Reads Google Calendar in real time	✗ No calendar access
Available 24/7	Always on, any device	Available but stateless each time
Gets personal over time	✓ Builds your context continuously	✗ Starts from zero every session

Try Lucy OS1, setup takes 30 seconds

Voice-first AI with memory and calendar integration. Free to try.

Start Talking

Free tier available. No credit card required.

GET STARTED

How to use Lucy OS1

Create your free account

No credit card required. Sign in with your Google account and you're inside in under a minute.

Connect your Google Calendar

Lucy reads your upcoming events before every conversation, so it already knows your day before you say a word.

Start talking about voice os context window

Speak naturally. Lucy listens, responds by voice, and begins building context from your very first exchange. The more you use it, the better it gets.

Start for free → Free tier available. No credit card.

Frequently Asked Questions

Why not just use a giant context window since modern LLMs support 100,000 plus tokens?

Latency. Time to first token grows linearly with prompt size. A 100,000 token prompt can add over a second of latency before the first generated token. For voice AI, that is unacceptable. The right move is curating the prompt aggressively, not enlarging it.

How does the context window stay coherent across long conversations?

Older turns get summarized into one-line thread summaries instead of being kept verbatim. Important facts from earlier in the conversation get promoted to memories. The LLM never sees the raw conversation more than 20 turns deep.

Can the context window adapt to the user's question?

Yes. If the user asks about email, inbox injection expands. If the user asks about a project, memory injection prioritizes that project's memories. Dynamic context shaping is one of the most impactful optimizations available.

What happens when the calendar or inbox is empty?

The injection sections are simply omitted to save tokens. The system prompt is written to handle absence gracefully, so the LLM does not invent calendar events or emails that do not exist.

How are sensitive memories or emails kept out of the context?

User-marked private memories and emails matching exclusion rules are filtered before injection. The LLM never sees them. This is a privacy boundary enforced before the prompt assembly, not by asking the LLM to ignore them.

Do tool definitions count against the context budget?

Yes. Every tool the LLM can call adds 50 to 200 tokens of definition. Production systems typically expose only the 5 to 10 tools needed for the current conversation, dynamically loaded per user, rather than every available tool.