1
- ---
2
- title : git_config_-global_credential.helper_store
3
- app_file : scripts/run_db_interface.py
4
- sdk : gradio
5
- sdk_version : 4.40.0
6
- ---
7
1
# Mapping the Data Landscape For Generalizable Scientific Models
8
2
9
3
This is a WIP that builds a knowledge base to store structured information extracted from scientific publications, datasets and articles using LLMs.
@@ -14,25 +8,49 @@ This tool helps us identify the gaps where current foundation models lack covera
14
8
15
9
We use the Llama-3-70B-Instruct model for structured information extraction.
16
10
17
- <div style =" display : flex ; justify-content : space-between ; gap : 20px ;" >
18
- <figure style="margin: 0 ; width: 48%;">
19
- <img src="misc/eval_pipeline.png" alt="Fig 1" style="width: 100 %; height: 300px ; object-fit: contain;">
20
- <figcaption style="font-size: 0.9em ; text-align: center; margin-top: 10px;">
11
+ <!-- < div style="display: flex; justify-content: space-between; gap: 20px;">
12
+ <figure style="margin: 10 ; width: 48%;">
13
+ <img src="misc/eval_pipeline.png" alt="Fig 1" style="width: 45 %; height: 200px ; object-fit: contain;">
14
+ <figcaption style="font-size: 0.5em ; text-align: center; margin-top: 10px;">
21
15
Prompt optimization pipeline to maximize precision of the model annotated
22
16
predictions by running on manually annotated subset of scientific corpora.
23
17
The tagged outputs can be generated as JSON or in a readable format, and be
24
18
generated using temperature and nucleus sampling (sweep hyperparams).
25
19
</figcaption>
26
20
</figure>
27
- <figure style="margin: 0 ; width: 48%;">
28
- <img src="misc/pipeline.png" alt="Fig 2" style="width: 100 %; height: 300px ; object-fit: contain;">
29
- <figcaption style="font-size: 0.9em ; text-align: center; margin-top: 10px;">
21
+ <figure style="margin: 10 ; width: 48%;">
22
+ <img src="misc/pipeline.png" alt="Fig 2" style="width: 45 %; height: 200px ; object-fit: contain;">
23
+ <figcaption style="font-size: 0.5em ; text-align: center; margin-top: 10px;">
30
24
Illustration of the structured prediction pipeline on the full corpus of
31
25
scientific papers, which runs optimized prompts and stores the model's
32
26
outputs in a SQL db.
33
27
</figcaption>
34
28
</figure>
35
- </div >
29
+ </div> -->
30
+
31
+ <table >
32
+ <tr >
33
+ <td width="50%" valign="top">
34
+ <img src="misc/eval_pipeline.png" alt="Fig 1" width="100%">
35
+ <p align="center">
36
+ <em>Prompt optimization pipeline to maximize precision of the model annotated
37
+ predictions by running on manually annotated subset of scientific corpora. The
38
+ tagged outputs can be generated as JSON or in a readable format, and be
39
+ generated using temperature and nucleus sampling (sweep hyperparams).</em>
40
+ </p>
41
+ </td>
42
+ <td width="50%" valign="top">
43
+ <img src="misc/pipeline.png" alt="Fig 2" width="100%">
44
+ <p align="center">
45
+ <em>Illustration of the structured prediction pipeline on the full corpus of
46
+ scientific papers, which runs optimized prompts and stores the model's outputs in
47
+ a SQL db.</em>
48
+ </p>
49
+ </td>
50
+ </tr >
51
+ </table >
52
+
53
+
36
54
37
55
## Installation
38
56
0 commit comments