Over the last few months, I have built and launched a free semantic search tool for GitHub called SemHub. In this blog post, I share what I’ve learned and why I’ve failed, so that other builders can learn from my experience. This blog post runs long and I have sign-posted each section. I have marked the sections that I consider the particularly insightful with an asterisk (
*
).I have also summarized my key lessons here:
- Default to pgvector, avoid premature optimization.
- You probably can get away with shorter embeddings if you’re using Matryoshka embedding models.
- Filtering with vector search may be harder than you expect.
- If you love full stack TypeScript and use AWS, you’ll love SST. One day, I wish I can recommend Cloudflare in equally strong terms too.
- Building is only half the battle. You have to solve a big enough problem and meet your users where they’re at.
Read in full here: