Last I left off, I had way more content that I wanted to embed and search relative to what my laptop could keep in memory, leaving me with no interactive tooling.
Research¶
My initial point of research is Simon Willison's post about sqlite-vec for working with embeddings in sqlite. There are many vector databases out there, but I've grown quite found of sqlite and appreciate that it's available almost everywhere.
Sqlite also has a nice benefit that the database is managed as a single file, so it's easy to experiment and blow away unwanted experimental directions, or checkpoint and restore if I'd like.
Process¶
I really did enjoy the fairly simple interface of Kagi's vectordb, so I'm going to stick to re-implementing that approach (and also borrowing the embeddings).
Testing¶
Sticking to the same interface also offers me the opportunity to write some fairly straightforward testing for the project by writing each test with two test cases: the oracle (the existing database) and the new version, based by sqlite.
This leverages a nice feature of pytest called parameterized testing
The process looks something like:
- Use the existing kagi Memory class as the model
- Implement a new class with the same interface
- Fill out the class implementation with sqlite-vec
- Assert I can reproduce the same results
My original testing used the pytest feature tmp_path to allow for temporary memory files managed by the test runner; however, in the end I think it's easier to use sqlite's in-memory database (but allowing for me to run tests as if it's backed by a sqlite file.
Links¶
Content To Explore¶
... once I finish this implementation tangent to expand the knowledge base size I can manage
- https://arxiv.org/category_taxonomy
- https://arxiv.org/search/?query=compressive+sensing&searchtype=all&source=header
- https://arxiv.org/search/?query=simd&searchtype=all&source=header
VectorDB implementations¶
- https://github.com/kagisearch/vectordb/blob/main/vectordb/memory.py
- https://github.com/jina-ai/vectordb/ (alternative vector db)
sqlite-vec¶
- https://til.simonwillison.net/sqlite/sqlite-vec
- https://github.com/asg017/sqlite-vec
- https://github.com/asg017/sqlite-vec/blob/main/examples/simple-python/demo.py
- https://alexgarcia.xyz/sqlite-vec/features/vec0.html#metadata
- https://www.sqlite.org/vtab.html
- https://docs.python.org/3/library/sqlite3.html#sqlite3.Cursor.fetchone
Pytest Testing¶
- https://docs.pytest.org/en/stable/example/parametrize.html
- https://docs.pytest.org/en/6.2.x/tmpdir.html
- https://stackoverflow.com/questions/36070031/creating-a-temporary-directory-in-pytest
Python and Makefile Curiosities¶
-
https://stackoverflow.com/questions/17330160/how-does-the-property-decorator-work-in-python
-
https://stackoverflow.com/questions/3267145/makefile-execute-another-target