Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Exact search on a post title #428

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion app/controllers/search_controller.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def search
.paginate(page: params[:page], per_page: 25)

if search_data[:search].present?
posts.search(search_data[:search]).user_sort({ term: params[:sort], default: :search_score },
posts.search(search_data[:search]).user_sort({ term: params[:sort] },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with updating to re-add a default for the new weighted search, and to change the relevance sort to the same weighted search. Not sure whether our implementation of user_sort supports that via Arel.sql, though, so that's probably not one for this PR - needs investigation.

relevance: :search_score, score: :score, age: :created_at)
else
posts.user_sort({ term: params[:sort], default: :score },
Expand Down
15 changes: 11 additions & 4 deletions app/models/application_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ def self.fuzzy_search(term, **cols)

def self.match_search(term, **cols)
sanitized = sanitize_for_search term, **cols
select(Arel.sql("`#{table_name}`.*, #{sanitized} AS search_score")).where(sanitized)

mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActiveSupport has a neat extension you can use to simplify this:

mapped_cols = sanitized.map.with_index { |val, idx| "#{val} AS search_score_#{idx}" }.join(', ')

whereClause = sanitized.map { |val| val.to_s }.join(' OR')
Comment on lines +12 to +13
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Ruby variables are named in snake_case, i.e. mapped_cols and where_clause


select(Arel.sql("`#{table_name}`.*, #{mappedCols}")).where(whereClause).order('search_score_0 * 2 + search_score_1 * 1 DESC')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap raw SQL statements in Arel.sql when you pass them to ActiveRecord:

.order(Arel.sql('search_score_0 * 2 + search_score_1 * 1 DESC'))

Copy link

@manassehkatz manassehkatz Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "* 1" do? Seems unnecessary - i.e., is that any different from just "search_score_0 * 2 + search_score_1 DESC"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not - I just copied it out without really thinking about it :)

Copy link
Contributor Author

@BoxedFruits BoxedFruits Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, does nothing really. Was just experimenting on how we would determine search weights

end

def self.sanitize_name(name)
Expand All @@ -26,13 +30,16 @@ def attributes_print
def self.sanitize_for_search(term, **cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behaviour of this method was to take a search parameter and list of columns, and turn them into a search against a single fulltext index, i.e.

sanitize_for_search 'query', posts: :body
# => "MATCH (`posts`.`body`) AGAINST ('query' IN BOOLEAN MODE)"

Do I understand right that what you're trying to do is make this return multiple MATCH ... AGAINST clauses for multiple fulltext indexes? At the moment, I don't think this does what you want it to. If you pass in an array of columns, you get the array of clauses:

sanitize_for_search 'query', posts: [:body, :title]
# => [" MATCH `posts`.`body` AGAINST ('query' IN BOOLEAN MODE)", " MATCH `posts`.`title` AGAINST ('query' IN BOOLEAN MODE)"]

If you only pass a single column, you just get the column's name, no MATCH clause at all:

sanitize_for_search 'query', posts: :body
# => "`posts`.`body`"

What's probably an easier way to do it (if I understand what you want correctly) is:

  • keep the purpose of the cols = cols.map block the same, i.e. map unsanitized column names to sanitized column names; then
  • return a map of that, which turns it into the MATCH clauses.

Something like this:

cols = cols.map do |table_name, column_or_array|
  if column_or_array.is_a?(Array)
    column_or_array.map { |column_name| "#{sanitize_name table_name}.#{sanitize_name column_name}" }
  else
    "#{sanitize_name table_name}.#{sanitize_name column_or_array}"
  end
end.flatten

cols.map do |sanitized_name|
  ActiveRecord::Base.sanitize_sql(["MATCH (#{sanitized_name}) AGAINST (? IN BOOLEAN MODE)", term])
end

Copy link
Contributor Author

@BoxedFruits BoxedFruits Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand right that what you're trying to do is make this return multiple MATCH ... AGAINST clauses for multiple fulltext indexes?

Yup exactly what I was trying to do. Ran your suggestion and it worked like a charm!

Would there be a case where we pass in a single column? The methods are first being called here https://github.com/codidact/qpixel/blob/develop/app/models/post.rb#L59 and it looks like it will only ever look at the columns that we tell it. If there is, then the way I am trying to do the search weight would break (not that it was a good way to do it anyways 😛) since it will try to look for a column that doesn't exist.

cols = cols.map do |k, v|
if v.is_a?(Array)
v.map { |vv| "#{sanitize_name k}.#{sanitize_name vv}" }.join(', ')
v.map do |vv|
#prob can do this just once at end of map
ActiveRecord::Base.sanitize_sql([" MATCH #{sanitize_name k}.#{sanitize_name vv} AGAINST (? IN BOOLEAN MODE)", term])
end
else
"#{sanitize_name k}.#{sanitize_name v}"
end
end.join(', ')
end

ActiveRecord::Base.send(:sanitize_sql_array, ["MATCH (#{cols}) AGAINST (? IN BOOLEAN MODE)", term])
cols[0]
end

def self.sanitize_sql_in(ary)
Expand Down
2 changes: 1 addition & 1 deletion app/models/post.rb
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ class Post < ApplicationRecord
after_save :recalc_score

def self.search(term)
match_search term, posts: :body_markdown
match_search term, posts: [:title, :body_markdown]
end

# Double-define: initial definitions are less efficient, so if we have a record of the post type we'll
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
class AddFulltextIndexToPostsTitle < ActiveRecord::Migration[5.2]
def change
add_index :posts, :title, type: :fulltext
end
end
3 changes: 2 additions & 1 deletion db/schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema.define(version: 2021_01_12_225651) do
ActiveRecord::Schema.define(version: 2021_01_23_211307) do

create_table "abilities", options: "ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci", force: :cascade do |t|
t.bigint "community_id"
Expand Down Expand Up @@ -376,6 +376,7 @@
t.index ["post_type_id"], name: "index_posts_on_post_type_id"
t.index ["score"], name: "index_posts_on_score"
t.index ["tags_cache"], name: "index_posts_on_tags_cache"
t.index ["title"], name: "index_posts_on_title", type: :fulltext
t.index ["upvote_count"], name: "index_posts_on_upvote_count"
t.index ["user_id"], name: "index_posts_on_user_id"
end
Expand Down