-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display of Accession in search results - duplicates? #30
Comments
@vanaukenk commented on Feb 29, 2016 Duplicate papers are still being returned with searches; one of the entries has all of the relevant IDs, the other only the PMID. The search scores are different for each entry. |
@goldturtle commented on Feb 29, 2016 This smells like the paper is in the PMCOA corpus as well as the C. M. |
@vanaukenk commented on Feb 29, 2016 Yes, that makes sense. When there are duplicates, though, which one should be returned? Note also that the PMID only papers display formatting when you click on the arrow to see the sentences: |
@vanaukenk commented on Feb 29, 2016 For testing purposes, this is the search that I performed to get these results: |
@vanaukenk commented on Jul 1, 2016 What is the current status of this issue wrt the C. elegans corpus? Screenshots of searches of the C. elegans corpus still display duplicate papers. Did we decide that we would go with the PMCOA version if it existed, and if not, then use the PDF version of the paper? |
I was doing some searches of the C. elegans Textpresso site and it looks like the duplicate paper problem is becoming even more pervasive: I don't see this on the main Textpresso site, although the search results are very different there from the C. elegans site, as expected. Am I searching the correct site? |
The site changed a bit in a sense that it now includes three
literatures: C. elegans, C. elegans and Suppl, and C. elegans
Supplementals. If you search more than one literatures, you could get
multiple entries. Also, if you search C. elegans Supplementals, you will
get multiple entries if your query finds mathces in multiple Supplementals.
Michael.
…On 4/15/21 7:23 AM, vanaukenk wrote:
@goldturtle <https://github.com/goldturtle> @valearna
<https://github.com/valearna>
I was doing some searches of the C. elegans Textpresso site and it
looks like the duplicate paper problem is becoming even more pervasive:
image
<https://user-images.githubusercontent.com/1730534/114883468-f25ff500-9dd2-11eb-921f-05c26e5660c6.png>
I don't see this on the main Textpresso site, although the search
results are very different there from the C. elegans site, as
expected. Am I searching the correct site?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACB4CG3K45PNASN7MLSPTUTTI3ZG5ANCNFSM4HZEF4RQ>.
|
Okay, that makes more sense now. Thanks for pointing that out. Should the results for checking 'C. elegans' AND 'C. elegans supplementals' be the same, then, as just selecting 'C. elegans and Supplementals? I didn't see that in the search that I'm doing, but maybe there is another reason for that? Perhaps the default literature setting should be to check just 'C. elegans and supplementals' and then users could narrow that to either category if they want to. We could see what people think on the Textpresso call. |
On 4/15/21 12:05 PM, vanaukenk wrote:
Should the results for checking 'C. elegans' AND 'C. elegans
supplementals' be the same, then, as just selecting 'C. elegans and
Supplementals? I didn't see that in the search that I'm doing, but
maybe there is another reason for that?
In principle, yes. However, "C. elegans and Supplementals" is a
completely new document (it's a merged pdf), and the scoring might
different as the length of a document factors in to the score.
Michael.
|
Okay, got it. Thanks. |
The default literature is now C. elegans for people who are not logged in. |
@vanaukenk commented on Aug 17, 2015
What determines which paper accession is displayed in the search results?
A keyword search on 'mut-7' lists both WBPaper ID and PMIDs, even though the PMIDs have corresponding WBPaper IDs.
Sometimes the same sentence appears listed under each accession separately, but the sentence actually has a different score depending on the accession.
As an example, a search with 'MUT-7' and the 'mf enz activ assay' and 'mf enz activ verbs' categories lists, as the third and fourth entries, the same sentence with scores of 0.611 and 0.592, respectively, for WBPaper00024699 and PMID 15653635.
Thx.
--Kimberly
The text was updated successfully, but these errors were encountered: