-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Dockerfile to run the scraper in a container (#201)
* Dockerfile: Add initial Dockerfile for running ARD scraper as job. * Dockerfile: Exclude data files from image, force CPU-only torch. Pulling CUDA into a small job without a GPU seems to be wasteful. * GCP Credentials: Move to subdirectory, so we can mount them as secret. * DB: Take full connection URI from env, rather than components. This allows us to use UNIX sockets to connect to the DB, which plays nice with some Google Cloud SQL Auth Proxy setups. * Dockerfile: Include required data files, pandoc dependency. * DB: Update README and GitHub workflows for taking full connection URI * Revert "DB: Update README and GitHub workflows for taking full connection URI" This reverts commit a168894. * DB: Allow specifying either full connection URI or components. * GCP Credentials: Ignore credentials file in repository root. We now look for this in a subdirectory, but should ignore at old path.
- Loading branch information
Showing
4 changed files
with
33 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -122,5 +122,7 @@ carado.moe/ | |
!requirements.txt | ||
*.epub | ||
|
||
secrets/ | ||
credentials.json | ||
|
||
data/raw/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
FROM python:3.11-slim-bookworm | ||
|
||
COPY align_data /source/align_data | ||
COPY main.py /source/main.py | ||
COPY requirements.txt /source/requirements.txt | ||
COPY data/raw/agentmodels.org /source/data/raw/agentmodels.org | ||
COPY data/raw/ai-alignment-papers.csv /source/data/raw/ai-alignment-papers.csv | ||
COPY data/raw/alignment_newsletter.xlsx /source/data/raw/alignment_newsletter.xlsx | ||
WORKDIR /source | ||
|
||
RUN apt-get update | ||
RUN apt-get -y install git pandoc | ||
|
||
RUN useradd --create-home --shell /bin/bash ard | ||
RUN chown ard:ard -R /source | ||
USER ard:ard | ||
|
||
RUN python -m pip install --upgrade pip | ||
RUN pip3 install torch --index-url https://download.pytorch.org/whl/cpu | ||
RUN pip install -r requirements.txt | ||
|
||
CMD ["python", "main.py", "fetch-all"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters