-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6191930
commit 115c0a0
Showing
1 changed file
with
11 additions
and
142 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
<a href="https://github.com/alexandrainst/foqa"><img src="https://github.com/alexandrainst/foqa/raw/main/gfx/alexandra_logo.png" width="239" height="175" align="right" /></a> | ||
# foqa | ||
# FoQA | ||
|
||
Faroese question-answering dataset. | ||
Faroese question-answering dataset, generated by GPT-4. | ||
|
||
______________________________________________________________________ | ||
[![Code Coverage](https://img.shields.io/badge/Coverage-100%25-brightgreen.svg)](https://github.com/alexandrainst/foqa/tree/main/tests) | ||
|
@@ -16,149 +16,18 @@ Developer(s): | |
- Dan Saattrup Nielsen ([email protected]) | ||
|
||
|
||
## Setup | ||
## Quickstart | ||
|
||
### Installation | ||
|
||
1. Run `make install`, which sets up a virtual environment and all Python dependencies therein. | ||
1. Run `make install`, which sets up a virtual environment and all Python dependencies | ||
therein. | ||
2. Run `source .venv/bin/activate` to activate the virtual environment. | ||
3. Run `python src/scripts/create_dataset.py` to create the dataset. | ||
|
||
### Adding and Removing Packages | ||
|
||
To install new PyPI packages, run: | ||
``` | ||
poetry add <package-name> | ||
``` | ||
|
||
To remove them again, run: | ||
``` | ||
poetry remove <package-name> | ||
``` | ||
|
||
To show all installed packages, run: | ||
``` | ||
poetry show | ||
``` | ||
|
||
|
||
## A Word on Modules and Scripts | ||
In the `src` directory there are two subdirectories, `foqa` | ||
and `scripts`. This is a brief explanation of the differences between the two. | ||
|
||
### Modules | ||
All Python files in the `foqa` directory are _modules_ | ||
internal to the project package. Examples here could be a general data loading script, | ||
a definition of a model, or a training function. Think of modules as all the building | ||
blocks of a project. | ||
|
||
When a module is importing functions/classes from other modules we use the _relative | ||
import_ notation - here's an example: | ||
|
||
``` | ||
from .other_module import some_function | ||
``` | ||
|
||
### Scripts | ||
Python files in the `scripts` folder are scripts, which are short code snippets that | ||
are _external_ to the project package, and which is meant to actually run the code. As | ||
such, _only_ scripts will be called from the terminal. An analogy here is that the | ||
internal `numpy` code are all modules, but the Python code you write where you import | ||
some `numpy` functions and actually run them, that a script. | ||
|
||
When importing module functions/classes when you're in a script, you do it like you | ||
would normally import from any other package: | ||
|
||
``` | ||
from foqa import some_function | ||
``` | ||
|
||
Note that this is also how we import functions/classes in tests, since each test Python | ||
file is also a Python script, rather than a module. | ||
|
||
|
||
## Features | ||
|
||
### Docker Setup | ||
|
||
A Dockerfile is included in the new repositories, which by default runs | ||
`src/scripts/your_script.py`. You can build the Docker image and run the Docker | ||
container by running `make docker`. | ||
|
||
### Automatic Documentation | ||
|
||
Run `make docs` to create the documentation in the `docs` folder, which is based on | ||
your docstrings in your code. You can view this by running `make view-docs`. | ||
|
||
### Automatic Test Coverage Calculation | ||
|
||
Run `make test` to test your code, which also updates the "coverage badge" in the | ||
README, showing you how much of your code base that is currently being tested. | ||
|
||
### Continuous Integration | ||
|
||
Github CI pipelines are included in the repo, running all the tests in the `tests` | ||
directory, as well as building online documentation, if Github Pages has been enabled | ||
for the repository (can be enabled on Github in the repository settings). | ||
|
||
### Code Spaces | ||
The raw dataset will be stored in `data/raw` and will be updated continuously during | ||
creation, and the final dataset will appear in your `data/final`. | ||
|
||
Code Spaces is a new feature on Github, that allows you to develop on a project | ||
completely in the cloud, without having to do any local setup at all. This repo comes | ||
included with a configuration file for running code spaces on Github. When hosted on | ||
`alexandrainst/foqa` then simply press the `<> Code` button | ||
and add a code space to get started, which will open a VSCode window directly in your | ||
browser. | ||
|
||
## Docker | ||
|
||
## Project structure | ||
``` | ||
. | ||
├── .devcontainer | ||
│ └── devcontainer.json | ||
├── .github | ||
│ └── workflows | ||
│ ├── ci.yaml | ||
│ └── docs.yaml | ||
├── .gitignore | ||
├── .pre-commit-config.yaml | ||
├── CODE_OF_CONDUCT.md | ||
├── CONTRIBUTING.md | ||
├── Dockerfile | ||
├── LICENSE | ||
├── README.md | ||
├── config | ||
│ ├── __init__.py | ||
│ ├── config.yaml | ||
│ └── hydra | ||
│ └── job_logging | ||
│ └── custom.yaml | ||
├── data | ||
│ ├── final | ||
│ │ └── .gitkeep | ||
│ ├── processed | ||
│ │ └── .gitkeep | ||
│ └── raw | ||
│ └── .gitkeep | ||
├── docs | ||
│ └── .gitkeep | ||
├── gfx | ||
│ ├── .gitkeep | ||
│ └── alexandra_logo.png | ||
├── makefile | ||
├── models | ||
│ └── .gitkeep | ||
├── notebooks | ||
│ └── .gitkeep | ||
├── poetry.toml | ||
├── pyproject.toml | ||
├── src | ||
│ ├── scripts | ||
│ │ ├── fix_dot_env_file.py | ||
│ │ └── your_script.py | ||
│ └── foqa | ||
│ ├── __init__.py | ||
│ └── your_module.py | ||
└── tests | ||
├── __init__.py | ||
└── test_dummy.py | ||
``` | ||
You can also run the `Dockerfile` directly, which builds the dataset without having to | ||
set up a Python environment. |