-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uspt gender labels and experiments #87
Comments
Hi @Hamedloghmani, @edwinpaul121 and I started working on the gender mappings for uspt, and we were able to generate the gender.csv file. mappings = {}
with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/teams.pkl", "rb") as f:
with open("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/indexes.pkl", "rb") as f_2:
teams_pkl = pkl.load(f)
indexes_pkl = pkl.load(f_2)
# print(teams_pkl[0])
c2i = indexes_pkl['c2i']
for patent in teams_pkl:
for member in patent.members:
ind = c2i[member.id + "_" + member.name]
if(ind not in mappings):
mappings[ind] = member.gender
df = pd.DataFrame.from_dict(mappings, orient="index", columns=["gender"])
df.to_csv("data/preprocessed/uspt/patent.tsv.filtered.mt75.ts3/gender.csv") However, we have a few concerns about our code:
|
Thank you so much @gabrielrueda and @edwinpaul121 |
Hi @Hamedloghmani, I just wanted to let you know that I checked some of the gender values with those in the inventor.tsv file in the USPT dataset and can confirm that the gender values were valid. Also, I'll upload the resulting gender.csv file in the Adila teams channel -> USPT Labelling Files |
Hi @gabrielrueda . Thanks a lot for the update and confirmation. |
Hi @edwinpaul121 and @gabrielrueda
Please log the process for extracting gender labels for uspt dataset in this issue page and let me know if you have any questions.
Thank you.
The text was updated successfully, but these errors were encountered: