You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Came across a bug where MER fails to identify entities if the entity is right next to a punctuation mark.
For example, in the following picture, Calcimycin is identified in the sentences "I like Calcimycin" and "I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", but not in "I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase" which has a comma right after the word Calcimycin.
The code that I used to create the lexicon (mesh_lex) is the following:
import merpy
with open('MeSH_name_id_mapping.txt', encoding='utf-8') as finput_terms:
l_terms = finput_terms.readlines()
dict_terms = {}
for i in l_terms:
aux = i.split('=')
dict_terms[aux[0].strip()] = aux[1].replace('\n','')
with open('mesh_terms_synonyms.txt', encoding='utf-8') as finput_terms:
l_terms_syn = finput_terms.readlines()
dict_terms_synonyms = {}
for i in l_terms_syn:
aux = i.split('\t')
dict_terms_synonyms[aux[0]] = aux[1].replace('\n','')
conv_dict = {}
for key, values in dict_terms_synonyms.items():
l_synonyms = values.split(',')
if key not in l_synonyms:
l_synonyms.append(key)
for i in l_synonyms:
conv_dict[i.strip()] = dict_terms.get(key)
merpy.create_lexicon(conv_dict.keys(), "mesh_lex")
merpy.create_mappings(conv_dict, "mesh_lex")
merpy.show_lexicons()
merpy.process_lexicon("mesh_lex")
#Examples
print(merpy.get_entities("I like abdominal injuries", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))
print(merpy.get_entities("I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", "mesh_lex"))
Hi!
Came across a bug where MER fails to identify entities if the entity is right next to a punctuation mark.
For example, in the following picture, Calcimycin is identified in the sentences "I like Calcimycin" and "I like Calcimycin it is a good aurelia aurita and Temefos is awesome! abate lowercase", but not in "I like Calcimycin, it is a good aurelia aurita and Temefos is awesome! abate lowercase" which has a comma right after the word Calcimycin.
The code that I used to create the lexicon (mesh_lex) is the following:
Here are the files with the entities:
MeSH_name_id_mapping.txt
mesh_terms_synonyms.txt
The text was updated successfully, but these errors were encountered: