Using AI as your personal knowledge base

May 28, 2026

You starred that repo two years ago. The one with the clever rate-limiter implementation. You remember it exists. You do not remember what it was called, who wrote it, or what search terms would surface it again.

So you open Google and start from scratch.

This is the loop most developers live in. You read a paper on consistent hashing in 2022. You bookmarked a thread about PostgreSQL index strategies. You downloaded a PDF on distributed tracing that you planned to read on a flight. That flight was three years ago. The PDF is in a folder called “Reading” alongside 47 others.

None of it is findable when you need it. Not because you forgot it, but because there’s no index over your own knowledge. You’ve been accumulating for years without building any retrieval mechanism.

Starring a GitHub repo takes two seconds. It’s a promise to future-you: this will be useful someday. The problem is future-you has no way to query that promise. GitHub’s star list is a flat scroll. Your browser bookmarks are organized by whatever folder made sense when you saved them. Your PDF downloads are named things like 2021-09-14-paper.pdf.

The graveyard fills up fast.

I’ve been building a personal knowledge base on top of Friday that fixes the retrieval side of this. Not a new note-taking app. Not another folder hierarchy. A workspace that ingests the things I’ve already saved and lets me ask questions against them.

“What do I know about rate limiting strategies?” Returns sources from starred repos, PDFs I uploaded, wiki pages I’ve clipped. Ranked by relevance. Cited back to the original.

“Did I ever find a good library for distributed tracing in Go?” Pulls from a repo I starred 14 months ago that I had completely forgotten about.

The Architecture of an AI Knowledge Base

The stack is two agents and a SQLite file.

Ingestion handles URLs and PDFs separately but stores them the same way. For a URL, the agent fetches the page, strips HTML, and extracts the title. For a PDF, it reads the bytes with PyMuPDF page by page, preserving page boundaries, so the extracted text stays coherent. Both paths feed into the same chunking function.

Chunking splits on paragraph boundaries first, then on sentence boundaries within each paragraph. Each chunk targets 500 characters with a 50-character overlap at the boundary, so context doesn’t get cut mid-thought when a sentence straddles two chunks. It’s not ML-based splitting, no semantic similarity checks between adjacent sentences, but paragraph-then-sentence gives you natural document structure without the overhead.

Each chunk gets embedded with BAAI/bge-large-en-v1.5 via sentence-transformers. BGE-large produces 1024-dimensional vectors and it runs locally, no API calls, no data leaving the machine. Embeddings are normalized before storage so cosine similarity becomes a dot product at query time.

Vectors go into SQLite via sqlite-vec, a loadable extension that adds a vec0 virtual table type. The schema is three tables: documents for source metadata, chunk_metadata for the text and chunk index, and chunk_embeddings as the virtual vector table. The entire knowledge base is a single .db file on disk.

Query embeds the question with the same model, runs an approximate nearest-neighbor search against chunk_embeddings, joins back to chunk_metadata to retrieve the text, then hands the top-K chunks to an LLM with a prompt that forces it to cite sources. No hallucinated answers: if it’s not in the retrieved chunks, it says so.

The interesting design constraint was deciding what not to build. No tagging interface. No manual categorization. No “inbox zero for bookmarks” UX. The whole point is that curation takes effort, and effort is why the graveyard exists in the first place.

The only interface is a question.

You can build this yourself in Friday, or import the workspace available here. You can inspect the full agent code, the FSM job definitions, and import it directly into Friday if you want to run it yourself. The only dependencies outside the standard library are sentence-transformers and sqlite-vec, both installable with pip.

If you’ve spent ten years reading papers, clipping articles, starring repos, and downloading PDFs, you’ve built a knowledge base. You just haven’t built the query layer. That’s the gap this fills.

Download hellofriday.ai and import this space and get it running in less than 10 minutes.

A guest post by

Michał Gryko

Since childhood, fascinated by electronics. Over the past 20 years, I've worked as what today would be called a systems engineer or infrastructure specialist, building and maintaining complex systems across a wide range of environments.

Friday AI

Discussion about this post

Ready for more?