Skip to content

Commit 882f7d5

Browse files
docs: update README to include S3 environment variable setup and clarify local configuration steps
1 parent 13f82bf commit 882f7d5

File tree

1 file changed

+31
-5
lines changed

1 file changed

+31
-5
lines changed

README.md

+31-5
Original file line numberDiff line numberDiff line change
@@ -38,18 +38,44 @@ $ python3.11 -mpipenv install
3838
$ python3.11 -mpipenv shell
3939
```
4040

41-
Adapt `env.sample` to your needs and copy it to `.env`.
41+
For s3-based file processing, the following environment variables need to be set:
42+
43+
```sh
44+
SE_ACCESS_KEY=
45+
SE_SECRET_KEY=
46+
SE_HOST_URL=
47+
```
48+
49+
If your global environment does not contain these variables, you can set them in a local
50+
`.env` file. The `python-dotenv` package is used to read these variables.
51+
52+
```sh
53+
cp env.sample .env
54+
edit .env
55+
```
4256

4357
# Running the pipeline
4458

45-
Adapt the local paths for the input and output directories according in the
46-
`config.local.mk` (see `config.local.mk.sample` for an example).
47-
and run the following command:
59+
## Local configuration
60+
61+
Adapt the local paths for the input and output directories in the
62+
`config.local.mk` (see `config.local.mk.sample` for default settings.)
4863

4964
```sh
65+
cp config.local.mk.sample config.local.mk
66+
edit config.local.mk
67+
```
68+
69+
## Running the pipeline
70+
71+
The build process is controlled by the `Makefile`.
72+
73+
```sh
74+
make help # show available targets
75+
5076
make newspaper -j N # process specific newspaper/year pairs in parallel typically for testing
5177

52-
make each -j N # process all newspapers using parallel processing within newspaper/year pairs
78+
make collection MAKE_PARALLEL_OPTION=16 # process all newspapers using parallel processing within newspaper/year pairs
5379
```
5480

5581
## Command-Line Options for `spacy_linguistic_processing.py`

0 commit comments

Comments
 (0)