Skip to content

Commit 3978ce9

Browse files
committed
update readme and env keys
1 parent add96bc commit 3978ce9

13 files changed

+25
-25
lines changed

.gitignore

+3-3
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ requirements.txt
165165
wandb/
166166
slurm_logs/
167167
notebooks/
168-
misc/polymathic_data_files
169-
misc/notes
170-
misc/test.ipynb
168+
assets/polymathic_data_files
169+
assets/notes
170+
assets/test.ipynb
171171
Meta-Llama-3-70B-Instruct/

README.md

+20-18
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ We use the Llama-3-70B-Instruct model with 2 A100 80GB GPUs for structured infor
2121
<table>
2222
<tr>
2323
<td width="50%" valign="top">
24-
<img src="misc/eval_pipeline.png" alt="Fig 1" width="100%">
24+
<img src="assets/eval_pipeline.png" alt="Fig 1" width="100%">
2525
<p align="center">
2626
<em>Fig 1: Prompt optimization pipeline to maximize precision of the model annotated
2727
predictions by running on manually annotated subset of scientific corpora. The
@@ -30,7 +30,7 @@ We use the Llama-3-70B-Instruct model with 2 A100 80GB GPUs for structured infor
3030
</p>
3131
</td>
3232
<td width="50%" valign="top">
33-
<img src="misc/pipeline.png" alt="Fig 2" width="100%">
33+
<img src="assets/pipeline.png" alt="Fig 2" width="100%">
3434
<p align="center">
3535
<em>Fig 2: Illustration of the structured prediction pipeline on the full corpus of
3636
scientific papers, which runs optimized prompts and stores the model's outputs in
@@ -59,10 +59,26 @@ Set up code formatting and pre-commit hooks:
5959
```
6060
pre-commit install
6161
```
62+
## Quickstart
6263

63-
## Running the tool
64+
### Run an existing DB
6465

65-
### On new data: Download raw data from arXiv
66+
To run an existing database in the `databases` directory:
67+
68+
```
69+
sqlite3 databases/<table_name>
70+
```
71+
72+
### Launch a Gradio interface for SQL query search over the created databases
73+
```
74+
gradio scripts/run_db_interface.py
75+
```
76+
The interface shows all the created databases in the `data/databases` directory which can be loaded and queried.
77+
78+
79+
## Running the tool on new data
80+
81+
### Download raw data from arXiv
6682

6783
Run `scripts/collect_data.py` to download papers for arXiv:
6884
```
@@ -150,20 +166,6 @@ Options:
150166

151167
All current databases are in the ```data/databases``` directory which can be downloaded and loaded with ```sqlite3``` to run queries on your own terminal. Refer to the [databases README](data/databases/README.md) for information on the tables that constitute each of the databases.
152168

153-
## Run an existing DB
154-
155-
To run an existing database in the `databases` directory:
156-
157-
```
158-
sqlite3 databases/<table_name>
159-
```
160-
161-
## Launch a Gradio interface for SQL query search over the created databases
162-
```
163-
gradio scripts/run_db_interface.py
164-
```
165-
The interface shows all the created databases in the `data/databases` directory which can be loaded and queried.
166-
167169

168170
## Relevant Resources for Reference
169171
### Tools

access_keys.json

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
{"openai_api_key": "", "openai_org_id": "", "hf_token": "", "hf_token_write": ""}
File renamed without changes.
File renamed without changes.

misc/graph.mp4 assets/graph.mp4

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

scripts/create_db.py

-2
Original file line numberDiff line numberDiff line change
@@ -198,8 +198,6 @@ def check_db_exists(db_path):
198198
"--force", is_flag=True, help="Force overwrite if database already exists"
199199
)
200200
def main(data_path, pred_path, db_name, force):
201-
set_env_vars()
202-
203201
ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
204202
tables_dir = os.path.join(ROOT, DEFAULT_TABLES_DIR)
205203
os.makedirs(tables_dir, exist_ok=True)

scripts/run_db_interface.py

-1
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,6 @@ def submit_canned_query(query_description, limit, wrap):
281281
)
282282

283283
if __name__ == "__main__":
284-
set_env_vars()
285284
demo.launch(share=True)
286285

287286
demo.launch()

src/utils/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ def save_best_config(metrics, config):
9494
json.dump(best_config, f, indent=4)
9595

9696

97-
def set_env_vars(fname="../access_keys.json"):
97+
def set_env_vars(fname="access_keys.json"):
9898
with open(fname) as f:
9999
keys = json.load(f)
100100
for key in keys:

0 commit comments

Comments
 (0)