Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Exact search on a post title #428

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

BoxedFruits
Copy link
Contributor

Working on addressing #387.

Previously the search function was only using body_markdown. I changed it so that it also uses the title for the search. Because of this, the sql statement got a little more complex. I am trying to have the title's search score weigh more than body_markdown and was having a bit of trouble figuring out how to do that without hard coding it. Wondering if there is any way I could improve this before opening up the PR for review.

Copy link
Member

@ArtOfCode- ArtOfCode- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a solid start! I'm not sure it's doing exactly what you want it to right now - some comments below that will hopefully help.

@@ -26,13 +30,16 @@ def attributes_print
def self.sanitize_for_search(term, **cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original behaviour of this method was to take a search parameter and list of columns, and turn them into a search against a single fulltext index, i.e.

sanitize_for_search 'query', posts: :body
# => "MATCH (`posts`.`body`) AGAINST ('query' IN BOOLEAN MODE)"

Do I understand right that what you're trying to do is make this return multiple MATCH ... AGAINST clauses for multiple fulltext indexes? At the moment, I don't think this does what you want it to. If you pass in an array of columns, you get the array of clauses:

sanitize_for_search 'query', posts: [:body, :title]
# => [" MATCH `posts`.`body` AGAINST ('query' IN BOOLEAN MODE)", " MATCH `posts`.`title` AGAINST ('query' IN BOOLEAN MODE)"]

If you only pass a single column, you just get the column's name, no MATCH clause at all:

sanitize_for_search 'query', posts: :body
# => "`posts`.`body`"

What's probably an easier way to do it (if I understand what you want correctly) is:

  • keep the purpose of the cols = cols.map block the same, i.e. map unsanitized column names to sanitized column names; then
  • return a map of that, which turns it into the MATCH clauses.

Something like this:

cols = cols.map do |table_name, column_or_array|
  if column_or_array.is_a?(Array)
    column_or_array.map { |column_name| "#{sanitize_name table_name}.#{sanitize_name column_name}" }
  else
    "#{sanitize_name table_name}.#{sanitize_name column_or_array}"
  end
end.flatten

cols.map do |sanitized_name|
  ActiveRecord::Base.sanitize_sql(["MATCH (#{sanitized_name}) AGAINST (? IN BOOLEAN MODE)", term])
end

Copy link
Contributor Author

@BoxedFruits BoxedFruits Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand right that what you're trying to do is make this return multiple MATCH ... AGAINST clauses for multiple fulltext indexes?

Yup exactly what I was trying to do. Ran your suggestion and it worked like a charm!

Would there be a case where we pass in a single column? The methods are first being called here https://github.com/codidact/qpixel/blob/develop/app/models/post.rb#L59 and it looks like it will only ever look at the columns that we tell it. If there is, then the way I am trying to do the search weight would break (not that it was a good way to do it anyways 😛) since it will try to look for a column that doesn't exist.

Comment on lines +12 to +13
mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ')
whereClause = sanitized.map { |val| val.to_s }.join(' OR')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Ruby variables are named in snake_case, i.e. mapped_cols and where_clause

@@ -8,7 +8,11 @@ def self.fuzzy_search(term, **cols)

def self.match_search(term, **cols)
sanitized = sanitize_for_search term, **cols
select(Arel.sql("`#{table_name}`.*, #{sanitized} AS search_score")).where(sanitized)

mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActiveSupport has a neat extension you can use to simplify this:

mapped_cols = sanitized.map.with_index { |val, idx| "#{val} AS search_score_#{idx}" }.join(', ')

mappedCols = sanitized.map { |val| "#{val} AS search_score_#{sanitized.find_index(val)}" }.join(', ')
whereClause = sanitized.map { |val| val.to_s }.join(' OR')

select(Arel.sql("`#{table_name}`.*, #{mappedCols}")).where(whereClause).order('search_score_0 * 2 + search_score_1 * 1 DESC')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap raw SQL statements in Arel.sql when you pass them to ActiveRecord:

.order(Arel.sql('search_score_0 * 2 + search_score_1 * 1 DESC'))

Copy link

@manassehkatz manassehkatz Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "* 1" do? Seems unnecessary - i.e., is that any different from just "search_score_0 * 2 + search_score_1 DESC"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not - I just copied it out without really thinking about it :)

Copy link
Contributor Author

@BoxedFruits BoxedFruits Feb 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, does nothing really. Was just experimenting on how we would determine search weights

@@ -8,7 +8,7 @@ def search
.paginate(page: params[:page], per_page: 25)

if search_data[:search].present?
posts.search(search_data[:search]).user_sort({ term: params[:sort], default: :search_score },
posts.search(search_data[:search]).user_sort({ term: params[:sort] },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with updating to re-add a default for the new weighted search, and to change the relevance sort to the same weighted search. Not sure whether our implementation of user_sort supports that via Arel.sql, though, so that's probably not one for this PR - needs investigation.

@cellio
Copy link
Member

cellio commented Sep 19, 2022

It looks like we lost track of this. Does anybody have any updates? There is a more recent effort looking into moving to elasticsearch, which interacts with searching titles, but it's also marked as a draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants