Personal Knowledge Librarian

Overview

I save a lot of links — articles, Kaggle notebooks, YouTube tutorials — and almost never go back to them because I can’t remember what was in them. The Knowledge Librarian solves that by turning every saved URL into a structured summary with tags and key concepts, then letting me search across everything with a natural language question.

The app uses Claude to extract and summarise content, and stores entries locally as a JSON index that can be queried by semantic similarity.

What I learned

How to use trafilatura for reliable web content extraction across different site types
Handling different source types (YouTube descriptions, Kaggle pages, general web) with a unified interface
Structuring a Streamlit app around a persistent local index
The gap between “works on my machine” and “works on Streamlit Cloud” — specifically around persistent storage, which resets on redeploy

The biggest design lesson: storing knowledge as structured JSON (with summary, tags, key_concepts, domain) is far more useful than storing raw text, because it makes search and filtering much more powerful.

Tech

Python
Streamlit
Claude API (Anthropic)
trafilatura (web content extraction)
JSON (local knowledge index)

Citation

For attribution, please cite this work as:

Wang, Ray. 2026. “Personal Knowledge Librarian.” April 10. https://changruiraywang.com/project/2026-04-10-personal-knowledge-librarian/.