Skip to content

Commit

Permalink
Added experimental support for SQLite (sqlite-vec)
Browse files Browse the repository at this point in the history
  • Loading branch information
ankane committed Oct 5, 2024
1 parent 4da3b49 commit 1974c00
Show file tree
Hide file tree
Showing 19 changed files with 336 additions and 36 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
## 0.5.0 (unreleased)

- Added experimental support for SQLite (sqlite-vec)
- Changed `normalize` option to use Active Record normalization
- Dropped support for Active Record < 7

Expand Down
2 changes: 2 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ gem "rake"
gem "minitest", ">= 5"
gem "activerecord", "~> 7.2.0"
gem "pg"
gem "sqlite3"
gem "sqlite-vec"
gem "railties", require: false
50 changes: 47 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Neighbor

Nearest neighbor search for Rails and Postgres
Nearest neighbor search for Rails

Supports:

- Postgres (cube and pgvector)
- SQLite (sqlite-vec, experimental, unreleased)

[![Build Status](https://github.com/ankane/neighbor/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/neighbor/actions)

Expand All @@ -12,7 +17,7 @@ Add this line to your application’s Gemfile:
gem "neighbor"
```

## Choose An Extension
### For Postgres

Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [pgvector](https://github.com/pgvector/pgvector). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.

Expand All @@ -30,16 +35,35 @@ rails generate neighbor:vector
rails db:migrate
```

### For SQLite

Add this line to your application’s Gemfile:

```ruby
gem "sqlite-vec"
```

And run:

```sh
rails generate neighbor:sqlite
```

## Getting Started

Create a migration

```ruby
class AddEmbeddingToItems < ActiveRecord::Migration[7.2]
def change
# cube
add_column :items, :embedding, :cube
# or

# pgvector
add_column :items, :embedding, :vector, limit: 3 # dimensions

# sqlite-vec
add_column :items, :embedding, :blob
end
end
```
Expand Down Expand Up @@ -81,6 +105,7 @@ See the additional docs for:

- [cube](#cube)
- [pgvector](#pgvector)
- [sqlite-vec](#sqlite-vec)

Or check out some [examples](#examples)

Expand Down Expand Up @@ -241,6 +266,25 @@ embedding = Neighbor::SparseVector.new({0 => 0.9, 1 => 1.3, 2 => 1.1}, 3)
Item.nearest_neighbors(:embedding, embedding, distance: "euclidean").first(5)
```

## sqlite-vec

### Distance

Supported values are:

- `euclidean`
- `cosine`

### Dimensions

For sqlite-vec, it’s a good idea to specify the number of dimensions to ensure all records have the same number.

```ruby
class Item < ApplicationRecord
has_neighbors :embedding, dimensions: 3
end
```

## Examples

- [Embeddings](#openai-embeddings) with OpenAI
Expand Down
14 changes: 14 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,20 @@
require "bundler/gem_tasks"
require "rake/testtask"

namespace :test do
Rake::TestTask.new(:postgresql) do |t|
t.description = "Run tests for Postgres"
t.libs << "test"
t.test_files = FileList["test/**/*_test.rb"].exclude("test/sqlite*_test.rb")
end

Rake::TestTask.new(:sqlite) do |t|
t.description = "Run tests for SQLite"
t.libs << "test"
t.test_files = FileList["test/**/sqlite*_test.rb"]
end
end

Rake::TestTask.new(:test) do |t|
t.libs << "test"
t.test_files = FileList["test/**/*_test.rb"]
Expand Down
2 changes: 2 additions & 0 deletions gemfiles/activerecord70.gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ gem "rake"
gem "minitest", ">= 5"
gem "activerecord", "~> 7.0.0"
gem "pg"
gem "sqlite3", "< 2"
gem "sqlite-vec"
gem "railties", require: false
2 changes: 2 additions & 0 deletions gemfiles/activerecord71.gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ gem "rake"
gem "minitest", ">= 5"
gem "activerecord", "~> 7.1.0"
gem "pg"
gem "sqlite3", "< 2"
gem "sqlite-vec"
gem "railties", require: false
2 changes: 2 additions & 0 deletions gemfiles/activerecord80.gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@ gem "rake"
gem "minitest", ">= 5"
gem "activerecord", "~> 8.0.0.beta1"
gem "pg"
gem "sqlite3"
gem "sqlite-vec"
gem "railties", "~> 8.0.0.beta1", require: false
13 changes: 13 additions & 0 deletions lib/generators/neighbor/sqlite_generator.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
require "rails/generators"

module Neighbor
module Generators
class SqliteGenerator < Rails::Generators::Base
source_root File.join(__dir__, "templates")

def copy_templates
template "sqlite.rb", "config/initializers/neighbor.rb"
end
end
end
end
2 changes: 2 additions & 0 deletions lib/generators/neighbor/templates/sqlite.rb.tt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Load the sqlite-vec extension
Neighbor::SQLite.initialize!
3 changes: 3 additions & 0 deletions lib/neighbor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# modules
require_relative "neighbor/reranking"
require_relative "neighbor/sparse_vector"
require_relative "neighbor/sqlite"
require_relative "neighbor/utils"
require_relative "neighbor/version"

Expand Down Expand Up @@ -31,11 +32,13 @@ def initialize_type_map(m = type_map)
end

ActiveSupport.on_load(:active_record) do
require_relative "neighbor/attribute"
require_relative "neighbor/model"
require_relative "neighbor/normalized_attribute"
require_relative "neighbor/type/cube"
require_relative "neighbor/type/halfvec"
require_relative "neighbor/type/sparsevec"
require_relative "neighbor/type/sqlite_vector"
require_relative "neighbor/type/vector"

extend Neighbor::Model
Expand Down
31 changes: 31 additions & 0 deletions lib/neighbor/attribute.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
module Neighbor
class Attribute < ActiveRecord::Type::Value
delegate :type, :serialize, :deserialize, :cast, to: :new_cast_type

def initialize(cast_type:, model:)
@cast_type = cast_type
@model = model
end

private

def cast_value(...)
new_cast_type.send(:cast_value, ...)
end

def new_cast_type
@new_cast_type ||= begin
if @cast_type.is_a?(ActiveModel::Type::Value)
case @model.connection_db_config.adapter
when /sqlite/i
Type::SqliteVector.new
else
@cast_type
end
else
@cast_type
end
end
end
end
end
101 changes: 69 additions & 32 deletions lib/neighbor/model.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ def self.neighbor_attributes
@neighbor_attributes[attribute_name] = {dimensions: dimensions, normalize: normalize}
end

if ActiveRecord::VERSION::STRING.to_f >= 7.2
decorate_attributes(attribute_names) do |_name, cast_type|
Neighbor::Attribute.new(cast_type: cast_type, model: self)
end
else
attribute_names.each do |attribute_name|
attribute attribute_name do |cast_type|
Neighbor::Attribute.new(cast_type: cast_type, model: self)
end
end
end

if normalize
if ActiveRecord::VERSION::STRING.to_f >= 7.1
attribute_names.each do |attribute_name|
Expand Down Expand Up @@ -76,39 +88,57 @@ def self.neighbor_attributes
column_info = columns_hash[attribute_name.to_s]
column_type = column_info&.type

adapter =
case connection.adapter_name
when /sqlite/i
:sqlite
else
:postgresql
end

operator =
case column_type
when :bit
case adapter
when :sqlite
case distance
when "hamming"
"<~>"
when "jaccard"
"<%>"
when "hamming2"
"#"
end
when :vector, :halfvec, :sparsevec
case distance
when "inner_product"
"<#>"
when "cosine"
"<=>"
when "euclidean"
"<->"
when "taxicab"
"<+>"
end
when :cube
case distance
when "taxicab"
"<#>"
when "chebyshev"
"<=>"
when "euclidean", "cosine"
"<->"
"vec_distance_L2"
when "cosine"
"vec_distance_cosine"
end
else
raise ArgumentError, "Unsupported type: #{column_type}"
case column_type
when :bit
case distance
when "hamming"
"<~>"
when "jaccard"
"<%>"
when "hamming2"
"#"
end
when :vector, :halfvec, :sparsevec
case distance
when "inner_product"
"<#>"
when "cosine"
"<=>"
when "euclidean"
"<->"
when "taxicab"
"<+>"
end
when :cube
case distance
when "taxicab"
"<#>"
when "chebyshev"
"<=>"
when "euclidean", "cosine"
"<->"
end
else
raise ArgumentError, "Unsupported type: #{column_type}"
end
end

raise ArgumentError, "Invalid distance: #{distance}" unless operator
Expand Down Expand Up @@ -140,10 +170,17 @@ def self.neighbor_attributes
end
end

order = "#{quoted_attribute} #{operator} #{query}"
if operator == "#"
order = "bit_count(#{order})"
end
order =
case adapter
when :sqlite
"#{operator}(#{quoted_attribute}, #{query})"
else
if operator == "#"
"bit_count(#{quoted_attribute} # #{query})"
else
"#{quoted_attribute} #{operator} #{query}"
end
end

# https://stats.stackexchange.com/questions/146221/is-cosine-similarity-identical-to-l2-normalized-euclidean-distance
# with normalized vectors:
Expand Down
20 changes: 20 additions & 0 deletions lib/neighbor/sqlite.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
module Neighbor
module SQLite
def self.initialize!
require "sqlite_vec"
require "active_record/connection_adapters/sqlite3_adapter"

ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(InstanceMethods)
end

module InstanceMethods
def configure_connection
super
db = ActiveRecord::VERSION::STRING.to_f >= 7.1 ? @raw_connection : @connection
db.enable_load_extension(1)
SqliteVec.load(db)
db.enable_load_extension(0)
end
end
end
end
29 changes: 29 additions & 0 deletions lib/neighbor/type/sqlite_vector.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
module Neighbor
module Type
class SqliteVector < ActiveRecord::Type::Binary
def serialize(value)
if Utils.array?(value)
value = value.to_a.pack("f*")
end
super(value)
end

def deserialize(value)
value = super
cast_value(value) unless value.nil?
end

private

def cast_value(value)
if value.is_a?(String)
value.unpack("f*")
elsif Utils.array?(value)
value.to_a
else
raise "can't cast #{value.class.name} to vector"
end
end
end
end
end
Loading

0 comments on commit 1974c00

Please sign in to comment.