-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Create OER Commons CSV from OCW (#1)
* Initial setup --------- Co-authored-by: Ibrahim Javed <[email protected]>
- Loading branch information
1 parent
ecdbd2e
commit 4183324
Showing
24 changed files
with
2,236 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
# See https://pre-commit.com for more information | ||
# See https://pre-commit.com/hooks.html for more hooks | ||
repos: | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v4.5.0 | ||
hooks: | ||
- id: trailing-whitespace | ||
- id: end-of-file-fixer | ||
- id: check-ast | ||
- id: check-docstring-first | ||
- id: check-merge-conflict | ||
- id: check-yaml | ||
- id: check-added-large-files | ||
- id: debug-statements | ||
- repo: https://github.com/adrienverge/yamllint.git | ||
rev: v1.33.0 | ||
hooks: | ||
- id: yamllint | ||
args: [--format, parsable, -d, relaxed] | ||
- repo: https://github.com/astral-sh/ruff-pre-commit | ||
rev: "v0.1.9" | ||
hooks: | ||
- id: ruff | ||
args: [ --fix ] | ||
- id: ruff-format |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,98 @@ | ||
# ocw-oer-export | ||
# OCW OER Export | ||
|
||
This demonstration project showcases how to utilize the MIT Open API. It specifically focuses on extracting MIT OpenCourseWare courses' metadata and creating a CSV file for export to OER Commons, aligning with their specific [requirements](https://help.oercommons.org/support/solutions/articles/42000046853-import-resources-with-the-bulk-import-template). | ||
|
||
**SECTIONS** | ||
|
||
1. [Initial Setup & Usage](#initial-setup) | ||
1. [Requirements](#requirements) | ||
1. [Tests](#tests) | ||
1. [Committing & Formatting](#committing-&-formatting) | ||
|
||
|
||
## Initial Setup & Usage | ||
|
||
The _ocw_oer_export_ package is available [on PyPI](link). To install: | ||
|
||
``` | ||
pip install ocw_oer_export | ||
``` | ||
|
||
### Usage as a Python Package | ||
|
||
To use `ocw_oer_export` in your Python code: | ||
|
||
``` | ||
from ocw_oer_export import create_csv | ||
create_csv() | ||
``` | ||
|
||
By default, the `create_csv` function uses `source="api"` and `output_file="ocw_oer_export.csv"`. The `source` parameter can be altered to `source="json"` if a local JSON file of courses' metadata is available. To generate the JSON file: | ||
|
||
``` | ||
from ocw_oer_export import create_json | ||
create_json() | ||
``` | ||
|
||
Then, create the CSV from the JSON file: | ||
|
||
``` | ||
create_csv(source="json") | ||
``` | ||
|
||
### CLI Usage | ||
|
||
`ocw_oer_export` also provides a Command Line Interface (CLI). After installation, you can use the following commands: | ||
|
||
To create the CSV file: | ||
|
||
``` | ||
ocw-oer-export --create_csv | ||
``` | ||
|
||
To generate a JSON file: | ||
|
||
``` | ||
ocw-oer-export --create_json | ||
``` | ||
|
||
To create a CSV file from the local JSON file: | ||
|
||
``` | ||
ocw-oer-export --create_csv --source=json | ||
``` | ||
|
||
## File Output Directory | ||
|
||
When using either the Python package or the CLI, the output files (CSV or JSON) are saved in the current working directory from which it is executed. | ||
|
||
## Requirements | ||
|
||
For successful execution and correct output, ensure the [MIT Open's API](https://mit-open-rc.odl.mit.edu//api/v1/courses/?platform=ocw) contains the following fields: | ||
|
||
`title`, `url`, `description`, `topics`, `course_feature`, `runs: instructors` | ||
|
||
Additionally, the `mapping_files` should be up-to-date. If new topics are added in OCW without corresponding mappings in `ocw_oer_export/mapping_files/ocw_topic_to_oer_subject.csv`, this will lead to `null` entries for those topics in the CSV (`CR_SUBJECT`). | ||
|
||
## Tests | ||
|
||
To run unit tests: | ||
|
||
``` | ||
python -m unittest discover | ||
``` | ||
|
||
## Committing & Formatting | ||
|
||
To ensure commits to GitHub are safe, first install [pre-commit](https://pre-commit.com/): | ||
|
||
``` | ||
pip install pre-commit | ||
pre-commit install | ||
``` | ||
|
||
Running pre-commit can confirm your commit is safe to be pushed to GitHub and correctly formatted: | ||
|
||
``` | ||
pre-commit run --all-files | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
__all__ = ["create_json", "create_csv", "main"] | ||
|
||
import logging | ||
|
||
from .create_csv import create_csv | ||
from .create_json import create_json | ||
from .cli import main | ||
|
||
logging.root.setLevel(logging.INFO) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
""" | ||
Command-line interface (CLI) for the OCW OER Export Project. | ||
This module provides a CLI to generate JSON or CSV files containing | ||
MIT OpenCourseWare courses' metadata. | ||
""" | ||
import argparse | ||
from .create_csv import create_csv | ||
from .create_json import create_json | ||
|
||
|
||
def main(): | ||
""" | ||
Parses command-line arguments and executes the appropriate function. | ||
""" | ||
parser = argparse.ArgumentParser(description="OCW OER Export") | ||
|
||
parser.add_argument("--create_csv", action="store_true", help="Create CSV file") | ||
parser.add_argument("--create_json", action="store_true", help="Create JSON file") | ||
parser.add_argument( | ||
"--source", | ||
choices=["api", "json"], | ||
default="api", | ||
help="Specify data source for CSV creation (default: api)", | ||
) | ||
|
||
args = parser.parse_args() | ||
|
||
if args.create_csv: | ||
create_csv(source=args.source) | ||
elif args.create_json: | ||
create_json() | ||
else: | ||
parser.print_help() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
""" | ||
Module for interacting with the MIT OpenCourseWare API. | ||
""" | ||
import math | ||
import logging | ||
import requests | ||
from retry import retry | ||
from tqdm import tqdm | ||
|
||
logging.basicConfig(level=logging.ERROR) | ||
logger = logging.getLogger(__name__) | ||
|
||
|
||
@retry(tries=3, delay=2, logger=logger) | ||
def make_request(next_page, page_size): | ||
""" | ||
Make a request to the API with retry logic. | ||
""" | ||
return requests.get(next_page, params={"limit": page_size}, timeout=60) | ||
|
||
|
||
def paginated_response(api_url, page_size=100): | ||
""" | ||
Generate paginated responses from the API. | ||
""" | ||
next_page = api_url | ||
while next_page: | ||
response = make_request(next_page, page_size) | ||
data = response.json() | ||
next_page = data.get("next") | ||
yield data | ||
|
||
|
||
def extract_data_from_api(api_url): | ||
"""Extract all data from the MIT OpenCourseWare API.""" | ||
page_size = 100 | ||
pages = paginated_response(api_url, page_size) | ||
|
||
first_page = next(pages) | ||
api_data = first_page.get("results", []) | ||
total_pages = math.ceil(first_page["count"] / page_size) | ||
|
||
# Remaining pages | ||
for page in tqdm( | ||
pages, | ||
desc="Loading data from MIT OCW API", | ||
total=total_pages - 1, | ||
): | ||
page_results = page.get("results", []) | ||
api_data.extend(page_results) | ||
|
||
return api_data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
""" | ||
Module containing constants. | ||
""" | ||
API_URL = "https://mitopen.odl.mit.edu/api/v1/courses/?platform=ocw" |
Oops, something went wrong.