Skip to content

Commit

Permalink
Create OER Commons CSV from OCW (#1)
Browse files Browse the repository at this point in the history
* Initial setup

---------

Co-authored-by: Ibrahim Javed <[email protected]>
  • Loading branch information
ibrahimjaved12 and Ibrahim Javed committed Jan 8, 2024
1 parent ecdbd2e commit 4183324
Show file tree
Hide file tree
Showing 24 changed files with 2,236 additions and 1 deletion.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,10 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# DS_Store
.DS_Store

# OER Export JSON and CSV files created
ocw_api_data.json
ocw_oer_export.csv
26 changes: 26 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-ast
- id: check-docstring-first
- id: check-merge-conflict
- id: check-yaml
- id: check-added-large-files
- id: debug-statements
- repo: https://github.com/adrienverge/yamllint.git
rev: v1.33.0
hooks:
- id: yamllint
args: [--format, parsable, -d, relaxed]
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: "v0.1.9"
hooks:
- id: ruff
args: [ --fix ]
- id: ruff-format
99 changes: 98 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,98 @@
# ocw-oer-export
# OCW OER Export

This demonstration project showcases how to utilize the MIT Open API. It specifically focuses on extracting MIT OpenCourseWare courses' metadata and creating a CSV file for export to OER Commons, aligning with their specific [requirements](https://help.oercommons.org/support/solutions/articles/42000046853-import-resources-with-the-bulk-import-template).

**SECTIONS**

1. [Initial Setup & Usage](#initial-setup)
1. [Requirements](#requirements)
1. [Tests](#tests)
1. [Committing & Formatting](#committing-&-formatting)


## Initial Setup & Usage

The _ocw_oer_export_ package is available [on PyPI](link). To install:

```
pip install ocw_oer_export
```

### Usage as a Python Package

To use `ocw_oer_export` in your Python code:

```
from ocw_oer_export import create_csv
create_csv()
```

By default, the `create_csv` function uses `source="api"` and `output_file="ocw_oer_export.csv"`. The `source` parameter can be altered to `source="json"` if a local JSON file of courses' metadata is available. To generate the JSON file:

```
from ocw_oer_export import create_json
create_json()
```

Then, create the CSV from the JSON file:

```
create_csv(source="json")
```

### CLI Usage

`ocw_oer_export` also provides a Command Line Interface (CLI). After installation, you can use the following commands:

To create the CSV file:

```
ocw-oer-export --create_csv
```

To generate a JSON file:

```
ocw-oer-export --create_json
```

To create a CSV file from the local JSON file:

```
ocw-oer-export --create_csv --source=json
```

## File Output Directory

When using either the Python package or the CLI, the output files (CSV or JSON) are saved in the current working directory from which it is executed.

## Requirements

For successful execution and correct output, ensure the [MIT Open's API](https://mit-open-rc.odl.mit.edu//api/v1/courses/?platform=ocw) contains the following fields:

`title`, `url`, `description`, `topics`, `course_feature`, `runs: instructors`

Additionally, the `mapping_files` should be up-to-date. If new topics are added in OCW without corresponding mappings in `ocw_oer_export/mapping_files/ocw_topic_to_oer_subject.csv`, this will lead to `null` entries for those topics in the CSV (`CR_SUBJECT`).

## Tests

To run unit tests:

```
python -m unittest discover
```

## Committing & Formatting

To ensure commits to GitHub are safe, first install [pre-commit](https://pre-commit.com/):

```
pip install pre-commit
pre-commit install
```

Running pre-commit can confirm your commit is safe to be pushed to GitHub and correctly formatted:

```
pre-commit run --all-files
```
9 changes: 9 additions & 0 deletions ocw_oer_export/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
__all__ = ["create_json", "create_csv", "main"]

import logging

from .create_csv import create_csv
from .create_json import create_json
from .cli import main

logging.root.setLevel(logging.INFO)
38 changes: 38 additions & 0 deletions ocw_oer_export/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""
Command-line interface (CLI) for the OCW OER Export Project.
This module provides a CLI to generate JSON or CSV files containing
MIT OpenCourseWare courses' metadata.
"""
import argparse
from .create_csv import create_csv
from .create_json import create_json


def main():
"""
Parses command-line arguments and executes the appropriate function.
"""
parser = argparse.ArgumentParser(description="OCW OER Export")

parser.add_argument("--create_csv", action="store_true", help="Create CSV file")
parser.add_argument("--create_json", action="store_true", help="Create JSON file")
parser.add_argument(
"--source",
choices=["api", "json"],
default="api",
help="Specify data source for CSV creation (default: api)",
)

args = parser.parse_args()

if args.create_csv:
create_csv(source=args.source)
elif args.create_json:
create_json()
else:
parser.print_help()


if __name__ == "__main__":
main()
52 changes: 52 additions & 0 deletions ocw_oer_export/client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""
Module for interacting with the MIT OpenCourseWare API.
"""
import math
import logging
import requests
from retry import retry
from tqdm import tqdm

logging.basicConfig(level=logging.ERROR)
logger = logging.getLogger(__name__)


@retry(tries=3, delay=2, logger=logger)
def make_request(next_page, page_size):
"""
Make a request to the API with retry logic.
"""
return requests.get(next_page, params={"limit": page_size}, timeout=60)


def paginated_response(api_url, page_size=100):
"""
Generate paginated responses from the API.
"""
next_page = api_url
while next_page:
response = make_request(next_page, page_size)
data = response.json()
next_page = data.get("next")
yield data


def extract_data_from_api(api_url):
"""Extract all data from the MIT OpenCourseWare API."""
page_size = 100
pages = paginated_response(api_url, page_size)

first_page = next(pages)
api_data = first_page.get("results", [])
total_pages = math.ceil(first_page["count"] / page_size)

# Remaining pages
for page in tqdm(
pages,
desc="Loading data from MIT OCW API",
total=total_pages - 1,
):
page_results = page.get("results", [])
api_data.extend(page_results)

return api_data
4 changes: 4 additions & 0 deletions ocw_oer_export/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""
Module containing constants.
"""
API_URL = "https://mitopen.odl.mit.edu/api/v1/courses/?platform=ocw"
Loading

0 comments on commit 4183324

Please sign in to comment.