Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out how to get the database to properly recognise author papers #5

Open
edoyango opened this issue Jun 20, 2024 · 3 comments
Open

Comments

@edoyango
Copy link
Collaborator

Currently, database doesn't seem to pull papers based on author(s) e.g.

python query_data.py "Do you know of any papers authored by Edward Yang?"
Response:  Yes, based on the provided context, it is known that Edward Yang has 
co-authored a research paper titled "Numerical investigation of the mechanism of granular 
flow impact on rigid control structures". The year of submission and acceptance are 
provided as well.
Sources: ['data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:16:1', 
'data/RCP-Projects.aspx.html:None:22', 
'data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:15:12', 
'data/1-s2.0-S0266352X20300379-main-1.pdf:20:9', 'data/s11440-021-01162-4.pdf:0:0']

Which is partially right as it's one of the papers included in the dataset. Interestingly data/1-s2.0-S0266352X20300379-main-1.pdf (my other paper included in the paper) was thought to be more relevant, but not mentioned by the LLM - probably because the database returned a chunk later in the paper.

Another example:

python query_data.py "Do you know of any papers authored by Michael Milton?"
Response:  No, there is no information in the provided context that indicates if Michael Milton 
has authored any papers or not. The term "Milton" refers to a high-performance computer 
(HPC) at WEHI, not an individual author.
Sources: ['data/RCP-Projects.aspx.html:None:22', 
'data/Milton-SLURM-2022-uplift.aspx.html:None:0', 
'data/What-is-Milton.aspx.html:None:0', 
'data/RCP-AnnualSummary.aspx.html:None:20', 
'data/RCP-AnnualReport.aspx.html:None:20']

Need to figure out how to get the database to return/recognise author information.

@edoyango
Copy link
Collaborator Author

edoyango commented Jun 24, 2024

Tried to prepend each chunk with the article's title and authors, but didn't help at all.

Detailed example A few chunks for example (all three chunks taken from the same paper authored by Julie Iskander):
This chunk was taken from the article: Using biomechanics to investigate the effect of VR on eye vergence system, who's authors are: Julie Iskander, Mohammed Hossny, Saeid Nahavandi

Contents lists available at ScienceDirect
Applied Ergonomics
journal homepage: www.elsevier.com/locate/apergo
Using biomechanics to investigate the e ffect of VR on eye vergence system
Julie Iskander*, Mohammed Hossny, Saeid Nahavandi
Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
ARTICLE INFO
Keywords:
Virtual realityEye vergence movement
Eye tracking
Biomechanical simulationExtraocular musclesABSTRACT
Vergence-accommodation con flict (VAC) is the main contributor to visual fatigue during immersion in virtual
environments. Many studies have investigated the eff ects of VAC using 3D displays and expensive complex
apparatus and setup to create natural and con flicting viewing conditions. However, a limited number of studies
This chunk was taken from the article: Using biomechanics to investigate the effect of VR on eye vergence system, who's authors are: Julie Iskander, Mohammed Hossny, Saeid Nahavandi

targeted virtual environments simulated using modern consumer-grade VR headsets. Our main objective, in this
work, is to test how the modern VR headsets (VR simulated depth) could a ffect our vergence system, in addition
to investigating the eff ect of the simulated depth on the eye-gaze performance. The virtual scenario used in-
cluded a common virtual object (a cube) in a simple virtual environment with no constraints placed on the headand neck movement of the subjects. We used ocular biomechanics and eye tracking to compare between ver-gence angles in matching (ideal) and con flicting (real) viewing conditions. Real vergence angle during im-
mersion was signi ficantly higher than ideal vergence angle and exhibited higher variability which leads to
This chunk was taken from the article: Using biomechanics to investigate the effect of VR on eye vergence system, who's authors are: Julie Iskander, Mohammed Hossny, Saeid Nahavandi

incorrect depth cues that a ffects depth perception and also leads to visual fatigue for prolonged virtual ex-
periences. Additionally, we found that as the simulated depth increases, the ability of users to manipulate virtual
objects with their eyes decreases, thus, decreasing the possibilities of interaction through eye gaze. The bio-
mechanics model used here can be further extended to study muscular activity of eye muscles during immersion.It presents an efficient and flexible assessment tool for virtual environments.
1. Introduction
Virtual reality (VR) headsets have become more a ffordable and
accessible to a broader population that includes young adults and
children. In a few years, it turned from expensive devices that needed

But when querying:

python query_data.py "Can you give me the title of any papers authored by Julie Iskander please"

The corresponding prompt with the "relevant" chunks:

Human: 
Answer the question based only on the following context:

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

vol. 36, no. 3, pp. 321–328, 2003.
[184] T. S. Buchanan, D. G. Lloyd, K. Manal, and T. F. Besier, ‘‘Estimation
of muscle forces and joint moments using a forward-inverse dynamics
model,’’ Med. Sci. Sports Exercise , vol. 37, no. 11, pp. 1911–1916, 2005.
JULIE ISKANDER received the B.Sc. and M.Sc.
degrees in electrical engineering from Alexandria
University, Egypt, in 2004 and 2009, respec-
tively. She is currently pursuing the Ph.D. degree
with the Institute for Intelligent Systems Research
and Innovation, Deakin University. She was with
the Information Technology Institute as a Teach-
ing Assistant, then a Software Development
Department Head, and then as a Branch Man-
ager. Her research interests include neuromuscular

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

ager. Her research interests include neuromuscular
modeling and ocular motility biomechanics. In addition, she focuses on
analyzing and differentiating different mental states from eye tracking.
19360 VOLUME 6, 2018

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

Figure 3: Fields of research in which respondents work. Multiple options could beselected. Topics included under ‘Other’ were: proteomics, venoms, mass spec omics data,epigenomics, conservation, metagenome, environment, ecology, molecular nutrition,computational biochemistry, transcriptomics, wastewater treatment, epigenetics,microbiology, food science
7

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

(P.M.L.), phylogenetic analysis (P.M.L., C.G., and M.M.), protein structuralmodeling (R.G., P.M.L., and C.G.), gas chromatography measurements(P.M.L., E.T-M., and L.J.), H
2uptake kinetic characterization (P.M.L.),
archaeal survival assay (P.M.L.), biochemical characterization (J.P.L., P.M.L.,
R.G., and A.K.), shotgun proteomics (P.M.L., H.L., I.H., E.T., and R.B.S.), andecological theory (P.M.L., C.G., M.B.S., C.R.C. and H.A.P.). P.M.L., C.G., andR.G. analyzed data and wrote the manuscript with inputs from all authors.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information The online version contains
supplementary material available at
https://doi.org/10.1038/s41467-024-47324-2 .

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

Vice-Chancellor (Defence Technologies), the
Chair of Engineering, and the Director of the
Institute for Intelligent Systems Research and
Innovation with Deakin University. He has
authored or co-authored over 600 papers in
various international journals and conferences.
His research interests include the modeling of
complex systems, robotics, and haptics.
He is a fellow of Engineers Australia (FIEAust) and the Institution of
Engineering and Technology.
He is the Co-Editor-in-Chief of IEEE S YSTEMS JOURNAL , an Associate
Editor of IEEE/ASME T RANSACTIONS ON MECHATRONICS , an Associate Editor
of IEEE T RANSACTIONS ON SYSTEMS , M AN AND CYBERNETICS : SYSTEMS , and an
IEEE Access Editorial Board Member.
VOLUME 6, 2018 19361

---

Answer the question based on the above context: Can you give me the title of any papers authored by Julie Iskander please

And Ollama (Mistral) answers:

Response:  Unfortunately, the provided context does not contain information about the titles of papers authored by Julie Iskander.
Sources: ['data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:15:12', 'data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:15:13', 'data/1_Australian_bioinformatics_training_needs_survey_2021_22_Report.pdf:7:0', 'data/s41467-024-47324-2.pdf:16:4', 'data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:16:1']

When querying the database for papers authored by Julie Iskander, the Chroma DB similarity search failed to notice that "Julie Iskander" was in the prepended author list.

Changing the surrounding text didn't change anything either e.g. printing a dict rather than a sentence didn't really help.

Alternatively, I could probably use metadata filtering instead: https://python.langchain.com/v0.2/docs/integrations/vectorstores/chroma/#filtering-on-metadata

@edoyango
Copy link
Collaborator Author

edoyango commented Jun 26, 2024

I wanted to try other models that might perform better with publication documents (e.g. specter2.

This led me to try and use langchain_huggingface.embeddingsHuggingFaceEmbeddings instead of langchain_community.embeddings.ollama.OllamaEmbeddings because Ollama has pretty limited compatibility with models ref. HuggingFaceEmbeddings works for more models. These are on a seperate branch (https://github.com/WEHI-ResearchComputing/rag/tree/ollama-to-hf)

Looking at the MTEB leaderboard, I tried Alibaba-NLP/gte-large-en-v1.5, which gave me better results.

Prompt:

Human: 
Answer the question based only on the following context:

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

musculoskeletal modeling and simulation framework for in silico investi-
gations and exchange,’’ Procedia IUTAM , vol. 2, pp. 212–232, Jan. 2011.
[182] D. G. Thelen and F. C. Anderson, ‘‘Using computed muscle control to
generate forward dynamic simulations of human walking from experi-
mental data,’’ J. Biomech. , vol. 39, no. 6, pp. 1107–1115, 2006.
[183] D. G. Thelen, F. C. Anderson, and S. L. Delp, ‘‘Generating dynamic
simulations of movement using computed muscle control,’’ J. Biomech. ,
vol. 36, no. 3, pp. 321–328, 2003.
[184] T. S. Buchanan, D. G. Lloyd, K. Manal, and T. F. Besier, ‘‘Estimation
of muscle forces and joint moments using a forward-inverse dynamics
model,’’ Med. Sci. Sports Exercise , vol. 37, no. 11, pp. 1911–1916, 2005.
JULIE ISKANDER received the B.Sc. and M.Sc.
degrees in electrical engineering from Alexandria
University, Egypt, in 2004 and 2009, respec-

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

University, Egypt, in 2004 and 2009, respec-
tively. She is currently pursuing the Ph.D. degree
with the Institute for Intelligent Systems Research
and Innovation, Deakin University. She was with
the Information Technology Institute as a Teach-
ing Assistant, then a Software Development
Department Head, and then as a Branch Man-
ager. Her research interests include neuromuscular
modeling and ocular motility biomechanics. In addition, she focuses on
analyzing and differentiating different mental states from eye tracking.
19360 VOLUME 6, 2018

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

delivery method for reaching dispersed and distant trainees.PLOS Comput. Biol.17, e1008715 (2021).5.Unsworth, Kathrynet al.DReSA: Project team reflections.(2021)doi:10.5281/ZENODO.5712128.6.Beard, N.et al.TeSS: a platform for discoveringlife-science training opportunities.Bioinformatics36, 3290–3291 (2020).

---

This chunk was taken from the article: UNKNOWN, who's authors are: UNKNOWN

Institute for Intelligent Systems Research and
Innovation with Deakin University. He has
authored or co-authored over 600 papers in
various international journals and conferences.
His research interests include the modeling of
complex systems, robotics, and haptics.
He is a fellow of Engineers Australia (FIEAust) and the Institution of
Engineering and Technology.
He is the Co-Editor-in-Chief of IEEE S YSTEMS JOURNAL , an Associate
Editor of IEEE/ASME T RANSACTIONS ON MECHATRONICS , an Associate Editor
of IEEE T RANSACTIONS ON SYSTEMS , M AN AND CYBERNETICS : SYSTEMS , and an
IEEE Access Editorial Board Member.
VOLUME 6, 2018 19361

---

This chunk was taken from the article: Using biomechanics to investigate the effect of VR on eye vergence system, who's authors are: Julie Iskander, Mohammed Hossny, Saeid Nahavandi

eye and head tracking data.
The paper is organised as follows. Section 2describes the experi-
mental procedure, the participants and the apparatus used. The visualtask and the biomechanical simulation are also described here. Section
3presents the statistical analysis results of the di fferent tests performed.
Andfinally a discussion is presented in Section 4and a conclusion in
Section 5.
2. Methods
In this section, we describe the design of the experiment and the
processing and analysis done on the collected data.
2.1. Participants
Twenty six subjects participated in the study with no physical or
Fig. 2. (a) The VR scene used in the experiment with the eye gaze points overlain as green circles. The green circles were not shown to the user, to avoid distraction.

---

Answer the question based on the above context: can you give me papers authored by Julie Iskander?

Response:

Based on the provided context, the paper(s) authored by Julie Iskander are:

1. Using biomechanics to investigate the effect of VR on eye vergence system (co-authored with Mohammed Hossny and Saeid Nahavandi)

The paper can be found in Volume 6, 2018, but the specific page number is not provided in the context. You may need to search for the title or authors to find the exact location of this paper.
Sources: ['data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:15:10', 'data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:15:11', 'data/1_Australian_bioinformatics_training_needs_survey_2021_22_Report.pdf:16:1', 'data/A_Review_on_Ocular_Biomechanic_Models_for_Assessing_Visual_Fatigue_in_Virtual_Reality.pdf:16:1', 'data/1-s2.0-S0003687018302904-main.pdf:2:3']

Unlike using mxbai-embed-large-v1, A relevant chunk with the added author information was pulled. But there were two papers that I annotated, so it was only half right (better than the previous models though).

Salesforce/SFR-Embedding-Mistral didn't too well despite being higher ranked and larger than Alibaba-NLP/gte-large-en-v1.5.

@edoyango
Copy link
Collaborator Author

Ok I've now understood that to use HuggingFaceEmbeddings, the models have to have a sentence-transformer model available. If it doesn't langchain will convert the model to a sentence-transformer, but needs to be trained (i.e., will produce nonsense)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant