ai-agents local-ai tutorials

I Built an AI Chief of Staff That Runs Entirely on My Laptop

3 min read
Updated

How I created Vault, a locally-running AI assistant that manages my professional workflow without relying on cloud services—keeping my data private and my costs low.

I spend way too much time context-switching between client emails, project notes, meeting transcripts, and CRM data. Every morning, I’m asking myself the same questions: What’s urgent? Who needs a response? What deadlines am I forgetting?

So I built an AI assistant that answers all of this and it runs 100% locally on my MacBook.

No API costs. No data leaving my machine. Just my documents, my AI, completely private.

I’m calling it Vault. Not a very creative name but it works!

Introducing Vault

Vault is my AI Chief of Staff—a locally-running assistant designed to manage my professional workflow without sending a single byte to the cloud. It ingests my emails, documents, and notes, then answers natural-language queries about my priorities, projects, and commitments.

Why Local AI Matters?

Two main reasons drove this decision:

Privacy: As a consultant, I handle sensitive client information daily. Sending that data to external APIs—even with enterprise agreements—always felt like an unnecessary risk.

Cost: API costs add up fast when you’re constantly querying an LLM. Running locally means I can ask as many questions as I want without watching my bill climb.

Enter Parallax

I’m using Parallax from Gradient Network for local inference. It’s a fully decentralized inference engine for local AI models, and the setup is dead simple:

git clone https://github.com/GradientHQ/parallax.git
cd parallax

# Enter Python virtual environment
python3 -m venv ./venv
source ./venv/bin/activate

pip install -e '.[mac]'

The Architecture

Vault is built on a few key components:

  1. Parallax - A decentralized inference engine from Gradient Network that handles local model execution
  2. ChromaDB - Vector database for semantic search and document retrieval
  3. Gmail & Google Drive integrations - Automated syncing of my emails and documents
  4. RAG (Retrieval-Augmented Generation) - The system retrieves relevant context before generating responses

Document Processing

The system accepts multiple formats: PDFs, Word documents, emails (.eml), CSVs, and JSON files. Each document gets chunked and embedded for semantic search.

The chunk_size and chunk_overlap parameters are crucial. Too large, and you waste context window space. Too small, and you lose important connections between ideas. I settled on 1000 tokens with 200 token overlap after some experimentation.

Email metadata extraction captures sender, recipient, subject, and date—allowing the AI to understand not just what was said, but who said it and when.

What It Can Do

Vault functions as an executive assistant, answering queries like:

  • “What are my priorities today based on recent emails?”
  • “Summarize the status of Project X across all my notes”
  • “Who haven’t I responded to in the last 48 hours?”
  • “What deadlines are coming up this week?”

The conversational memory means I can ask follow-up questions without re-explaining context. It remembers what we were discussing.

The Technical Details

The core of the system uses:

  • OAuth 2.0 for Gmail API integration
  • Cosine similarity with HNSW indexing for fast vector search
  • Server-sent events for streaming responses
  • Session management for multi-turn conversations

What’s Next

I’m planning to scale this with additional nodes and experiment with larger language models while keeping everything local. The goal is an AI assistant that truly knows my work—without ever leaving my machine.

If there’s interest, I might open-source the core components. Let me know what you’d want to see.

S

Sid Bharath

Writing about AI development tools, technical content strategy, and developer experience.