Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OR body search fails for last token adjacent to </a> and/or </p> #878

Open
tripleee opened this issue Sep 13, 2021 · 6 comments
Open

OR body search fails for last token adjacent to </a> and/or </p> #878

tripleee opened this issue Sep 13, 2021 · 6 comments
Labels
area: search Post search on metasmoke

Comments

@tripleee
Copy link
Member

I wanted to search for this word Ollie just watched:

"OR" search for "sexyloveomegle" in title, body, or username

Zero hits! But if I only search the body, I can find it:

Search for "sexyloveomegle" in body

(The hit is in post 324969.)

Looking for another example, I discovered the same to be true for this one:

"OR" search for "Mukteshwar" in title, body, or username

Again, if I only search for body hits, it's there:

Search for "Mukteshwar" in body

(The hit is post 325248.)

What these two seem to have in common is that the search word is the very last word in the body, and the markup includes adjacent closing HTML tags.

kik: sexyloveomegle</p>
...
<a href="https://spam.example.com/elided" rel="nofollow noreferrer">Camp in Mukteshwar</a></p>

By comparison, where the terminating close tags are preceded by whitespace, the search works. So for example, "OR" search for "Burton" in title, body, or username
finds post 324551 which has

...
<p> Harold Burton </p>

as the last line of the post.

(Tangentially, https://metasmoke.erwaysoftware.com/search?utf8=%E2%9C%93&title=sexyloveomegle&body=sexyloveomegle%3C%2Fp%3E&username=sexyloveomegle&or_search=1 gets me a traceback from metasmoke.)

@makyen
Copy link
Contributor

makyen commented Sep 13, 2021

I'd note that using a regex search, which more accurately reflects what a watch or keyword blacklist would search for works fine: sexyloveomegle and Mukteshwar. While that doesn't invalidate this as an issue, it does provide a work-around for most cases. Using a regex without the bookending done by the watchlist/keyword blacklist also works sexyloveomegle and Mukteshwar.

@tripleee
Copy link
Member Author

No doubt; but this requires the searcher to be a registered metasmoke user.

Not being able to share links to searches with users who don't have an account is a major blocker for many situations where I would otherwise much prefer to use regex search.

@makyen
Copy link
Contributor

makyen commented Sep 13, 2021

Yes. It would be nice to be able to save a search in a way that cached the result (so it didn't result in using significant resources if reused within a reasonable period, which would be automatically renewed when next used) and made it available through a short link which could be viewed by non-core users (or some other methodology of reasonably sharing a regex-based search with users who do not have the Core role). There have been multiple times when I would have used such an ability in flags, or even just posting in chat.

@Undo1
Copy link
Member

Undo1 commented Sep 13, 2021

What if MS kept a list of 'okay' regex queries run by core users, maybe meaning those that ran within a time limit? Then when a non-core user submits a regex query, if it's on that list it's fine.

@makyen
Copy link
Contributor

makyen commented Sep 13, 2021

That would be reasonable; particularly with a time limit and limited to either searches which took < N seconds or for which the results are still in cache. If there is a time limit, it would probably be helpful if it was at least a couple/few days, in order to allow time for moderators to handle a flag which contains such a link.

IIRC, the primary reason we don't permit non-Core access to regex search is that regex searches can result in very substantial consumption of compute/database resources.

Maybe, MS could keep a cache of regex search results, and anything which hit the cache could be served to anyone, There could be a link/button, similar to what's done with Blazer/the SQL Data Explorer which allows a Core user to refresh the search results at any time. That would also result in a user time/compute savings when we're sharing searches in chat.

@thesecretmaster
Copy link
Member

I wrote this cache feature for search in #797. It probably could use a bit more polish, but it works pretty well IIRC.

@thesecretmaster thesecretmaster added the area: search Post search on metasmoke label Dec 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: search Post search on metasmoke
Development

No branches or pull requests

4 participants