Skip to content

Commit

Permalink
#5 create package structure and update output paths
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewtavis committed Apr 6, 2022
1 parent 62e0563 commit 2916028
Show file tree
Hide file tree
Showing 186 changed files with 651 additions and 140 deletions.
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,14 @@
##########
.DS_Store
.vscode

# Python files
##############
# setup.py working directory
build
# setup.py dist directory
dist
# Egg metadata
*.egg-info
# Caches
__pycache__
31 changes: 31 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Changelog

Scribe-Data tries to follow [semantic versioning](https://semver.org/), a MAJOR.MINOR.PATCH version where increments are made of the:

- MAJOR version when we make incompatible API changes
- MINOR version when we add functionality in a backwards compatible manner
- PATCH version when we make backwards compatible bug fixes

Emojis for the following are chosen based on [gitmoji](https://gitmoji.dev/).

# Scribe-Data 1.0.0

### 🚀 Deployment

Releasing a Python package so that codes are accessible and the structure is set for future project iterations.

### ✨ Features

- Data updates are done via a single file that loads new formatted data into each Scribe application.

### 🗃️ Data

- Data extraction and formatting scripts for each of Scribe's current languages as well as those with significant data on Wikidata are included.

### 🐞 Bug Fixes

- The data update process has been fixed to work for all queries.

### ♻️ Code Refactoring

- The data update process now updates files in Android and Desktop directories if they're present.
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include CHANGELOG.* LICENSE.*
graft src
global-exclude *.py[cod]
global-exclude .DS_Store
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,15 @@
[![issues](https://img.shields.io/github/issues/scribe-org/Scribe-Data)](https://github.com/scribe-org/Scribe-Data/issues)
[![discussions](https://img.shields.io/github/discussions/scribe-org/Scribe-Data)](https://github.com/scribe-org/Scribe-Data/discussions)
[![language](https://img.shields.io/badge/Python-3-306998.svg?logo=python&logoColor=ffffff)](https://github.com/scribe-org/Scribe-Data/blob/main/CONTRIBUTING.md)
[![pypi](https://img.shields.io/pypi/v/scribe-data.svg?color=4B8BBE)](https://pypi.org/project/scribe-data/)
[![pypistatus](https://img.shields.io/pypi/status/scribe-data.svg)](https://pypi.org/project/scribe-data/)
[![license](https://img.shields.io/github/license/scribe-org/Scribe-Data.svg)](https://github.com/scribe-org/Scribe-Data/blob/main/LICENSE.txt)
[![coc](https://img.shields.io/badge/coc-Contributor%20Covenant-ff69b4.svg)](https://github.com/scribe-org/Scribe-Data/blob/main/.github/CODE_OF_CONDUCT.md)
[![codestyle](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

### Data extraction and formatting for Scribe applications

This repository contains the scripts for extracting and formatting data from [Wikidata](https://www.wikidata.org/) for Scribe applications. Updates to the language keyboard and interface data can be done using [data/update_data.py](https://github.com/scribe-org/Scribe-Data/tree/main/data/update_data.py).
This repository contains the scripts for extracting and formatting data from [Wikidata](https://www.wikidata.org/) for Scribe applications. Updates to the language keyboard and interface data can be done using [scribe_data/load/update_data.py](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/load/update_data.py).

# **Contents**<a id="contents"></a>

Expand All @@ -24,7 +26,7 @@ This repository contains the scripts for extracting and formatting data from [Wi

# Process [``](#contents) <a id="process"></a>

[data/update_data.py](https://github.com/scribe-org/Scribe-Data/tree/main/data/update_data.py) is used to update all data for [Scribe-iOS](https://github.com/scribe-org/Scribe-iOS), with this functionality later being expanded to update [Scribe-Android](https://github.com/scribe-org/Scribe-Android) and [Scribe-Desktop](https://github.com/scribe-org/Scribe-Desktop) when they're active. The ultimate goal is that this repository will house language packs that are periodically updated with new [Wikidata](https://www.wikidata.org/) lexicographical data, with these packs then being available to download by users of Scribe applications.
[scribe_data/load/update_data.py](https://github.com/scribe-org/Scribe-Data/tree/main/src/scribe_data/load/update_data.py) is used to update all data for [Scribe-iOS](https://github.com/scribe-org/Scribe-iOS), with this functionality later being expanded to update [Scribe-Android](https://github.com/scribe-org/Scribe-Android) and [Scribe-Desktop](https://github.com/scribe-org/Scribe-Desktop) when they're active. The ultimate goal is that this repository will house language packs that are periodically updated with new [Wikidata](https://www.wikidata.org/) lexicographical data, with these packs then being available to download by users of Scribe applications.

# Supported Languages [``](#contents) <a id="supported-languages"></a>

Expand Down
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ channels:
- defaults
dependencies:
- black>=19.10b0
- transformers>=4.12
- sentencepiece>=0.1.95
- transformers>=4.12
- pip:
- wikidataintegrator>=0.9.22
- python-dateutil>=2.8.2
7 changes: 7 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
black>=19.10b0
certifi>=2020.12.5
packaging>=20.9
sentencepiece>=0.1.95
transformers>=4.12
wikidataintegrator>=0.9.22
python-dateutil>=2.8.2
5 changes: 5 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[metadata]
description-file = README.md

[options.data_files]
. = requirements.txt
51 changes: 51 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import os

try:
from setuptools import setup
except ImportError:
from distutils.core import setup

from setuptools import find_packages

package_directory = os.path.abspath(os.path.dirname(__file__))
with open(os.path.join(package_directory, "README.md"), encoding="utf-8") as fh:
long_description = fh.read()

with open(
os.path.join(package_directory, "requirements.txt"), encoding="utf-8"
) as req_file:
requirements = req_file.readlines()

on_rtd = os.environ.get("READTHEDOCS") == "True"
if on_rtd:
requirements = []

setup_args = dict(
name="scribe-data",
packages=find_packages(where="src"),
package_dir={"": "src"},
version="1.0.0",
author="Andrew Tavis McAllister",
author_email="[email protected]",
classifiers=[
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"Intended Audience :: Education",
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Operating System :: OS Independent",
],
python_requires=">=3.6",
install_requires=requirements,
description="Data extraction and formatting for Scribe applications",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/scribe-org/Scribe-Data",
)

if __name__ == "__main__":
setup(**setup_args)
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,20 @@
Formats the nouns queried from Wikidata using query_nouns.sparql.
"""

# pylint: disable=invalid-name

# pylint: disable=invalid-name, wrong-import-position

import collections
import json
import os
import sys

from data.data_utils import (
LANGUAGE = "French"
PATH_TO_SCRIBE_ORG = os.path.dirname(sys.path[0]).split("Scribe-Data")[0]
PATH_TO_SCRIBE_DATA_SRC = f"{PATH_TO_SCRIBE_ORG}Scribe-Data/src"
sys.path.insert(0, PATH_TO_SCRIBE_DATA_SRC)

from scribe_data.load.update_utils import (
get_android_data_path,
get_desktop_data_path,
get_ios_data_path,
Expand All @@ -22,30 +29,50 @@
file_path = sys.argv[0]

update_data_in_use = False # check if update_data.py is being used
if "French/nouns/" not in file_path:
if f"{LANGUAGE}/nouns/" not in file_path:
with open("nouns_queried.json", encoding="utf-8") as f:
nouns_list = json.load(f)
else:
with open("./French/nouns/nouns_queried.json", encoding="utf-8") as f:
update_data_in_use = True
with open(
f"../extract_transform/{LANGUAGE}/nouns/nouns_queried.json", encoding="utf-8"
) as f:
nouns_list = json.load(f)
update_data_in_use = True

path_from_file = get_path_from_format_file()
path_from_update_data = get_path_from_update_data()
ios_data_dir_from_org = get_ios_data_path("French", "nouns")
android_data_dir_from_org = get_android_data_path("French", "nouns")
desktop_data_dir_from_org = get_desktop_data_path("French", "nouns")
# Get paths to load formatted data into.
ios_data_dir_from_org = get_ios_data_path(LANGUAGE, "nouns")
android_data_dir_from_org = get_android_data_path(LANGUAGE, "nouns")
desktop_data_dir_from_org = get_desktop_data_path(LANGUAGE, "nouns")

path_from_file = get_path_from_format_file()
ios_output_path = f"{path_from_file}{ios_data_dir_from_org}"
android_output_path = f"{path_from_file}{android_data_dir_from_org}"
desktop_output_path = f"{path_from_file}{desktop_data_dir_from_org}"
if update_data_in_use:
ios_output_path = f"{path_from_update_data}{ios_data_dir_from_org}"
android_output_path = f"{path_from_update_data}{android_data_dir_from_org}"
desktop_output_path = f"{path_from_update_data}{desktop_data_dir_from_org}"
path_from_file = get_path_from_update_data()
ios_output_path = f"{path_from_file}{ios_data_dir_from_org}"
android_output_path = f"{path_from_file}{android_data_dir_from_org}"
desktop_output_path = f"{path_from_file}{desktop_data_dir_from_org}"

all_output_paths = [ios_output_path, android_output_path, desktop_output_path]

# Check to make sure that Scribe application directories are present for data updates.
if not os.path.isdir(f"{PATH_TO_SCRIBE_ORG}Scribe-iOS"):
all_output_paths = [p for p in all_output_paths if p != ios_output_path]

if not os.path.isdir(f"{PATH_TO_SCRIBE_ORG}Scribe-Android"):
all_output_paths = [p for p in all_output_paths if p != android_output_path]

if not os.path.isdir(f"{PATH_TO_SCRIBE_ORG}Scribe-Desktop"):
all_output_paths = [p for p in all_output_paths if p != desktop_output_path]

if not all_output_paths:
raise OSError(
"""No Scribe project directories have been found to update.
Scribe-Data should be in the same directory as applications that data should be updated for.
"""
)


def map_genders(wikidata_gender):
"""
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,19 @@
Formats the verbs queried from Wikidata using query_verbs.sparql.
"""

# pylint: disable=invalid-name
# pylint: disable=invalid-name, wrong-import-position

import collections
import json
import os
import sys

from data.data_utils import (
LANGUAGE = "French"
PATH_TO_SCRIBE_ORG = os.path.dirname(sys.path[0]).split("Scribe-Data")[0]
PATH_TO_SCRIBE_DATA_SRC = f"{PATH_TO_SCRIBE_ORG}Scribe-Data/src"
sys.path.insert(0, PATH_TO_SCRIBE_DATA_SRC)

from scribe_data.load.update_utils import (
get_android_data_path,
get_desktop_data_path,
get_ios_data_path,
Expand All @@ -22,19 +28,22 @@
file_path = sys.argv[0]

update_data_in_use = False # check if update_data.py is being used
if "French/verbs/" not in file_path:
if f"{LANGUAGE}/verbs/" not in file_path:
with open("verbs_queried.json", encoding="utf-8") as f:
verbs_list = json.load(f)
else:
with open("./French/verbs/verbs_queried.json", encoding="utf-8") as f:
update_data_in_use = True
with open(
f"../extract_transform/{LANGUAGE}/verbs/verbs_queried.json", encoding="utf-8"
) as f:
verbs_list = json.load(f)
update_data_in_use = True

# Get paths to load formatted data into.
path_from_file = get_path_from_format_file()
path_from_update_data = get_path_from_update_data()
ios_data_dir_from_org = get_ios_data_path("French", "verbs")
android_data_dir_from_org = get_android_data_path("French", "verbs")
desktop_data_dir_from_org = get_desktop_data_path("French", "verbs")
ios_data_dir_from_org = get_ios_data_path(LANGUAGE, "verbs")
android_data_dir_from_org = get_android_data_path(LANGUAGE, "verbs")
desktop_data_dir_from_org = get_desktop_data_path(LANGUAGE, "verbs")

ios_output_path = f"{path_from_file}{ios_data_dir_from_org}"
android_output_path = f"{path_from_file}{android_data_dir_from_org}"
Expand All @@ -46,6 +55,23 @@

all_output_paths = [ios_output_path, android_output_path, desktop_output_path]

# Check to make sure that Scribe application directories are present for data updates.
if not os.path.isdir(f"{PATH_TO_SCRIBE_ORG}Scribe-iOS"):
all_output_paths = [p for p in all_output_paths if p != ios_output_path]

if not os.path.isdir(f"{PATH_TO_SCRIBE_ORG}Scribe-Android"):
all_output_paths = [p for p in all_output_paths if p != android_output_path]

if not os.path.isdir(f"{PATH_TO_SCRIBE_ORG}Scribe-Desktop"):
all_output_paths = [p for p in all_output_paths if p != desktop_output_path]

if not all_output_paths:
raise OSError(
"""No Scribe project directories have been found to update.
Scribe-Data should be in the same directory as applications that data should be updated for.
"""
)

verbs_formatted = {}

all_keys = [
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,19 @@
Formats the nouns queried from Wikidata using query_nouns.sparql.
"""

# pylint: disable=invalid-name
# pylint: disable=invalid-name, wrong-import-position

import collections
import json
import os
import sys

from data.data_utils import (
LANGUAGE = "German"
PATH_TO_SCRIBE_ORG = os.path.dirname(sys.path[0]).split("Scribe-Data")[0]
PATH_TO_SCRIBE_DATA_SRC = f"{PATH_TO_SCRIBE_ORG}Scribe-Data/src"
sys.path.insert(0, PATH_TO_SCRIBE_DATA_SRC)

from scribe_data.load.update_utils import (
get_android_data_path,
get_desktop_data_path,
get_ios_data_path,
Expand All @@ -22,19 +28,22 @@
file_path = sys.argv[0]

update_data_in_use = False # check if update_data.py is being used
if "German/nouns/" not in file_path:
if f"{LANGUAGE}/nouns/" not in file_path:
with open("nouns_queried.json", encoding="utf-8") as f:
nouns_list = json.load(f)
else:
with open("./German/nouns/nouns_queried.json", encoding="utf-8") as f:
update_data_in_use = True
with open(
f"../extract_transform/{LANGUAGE}/nouns/nouns_queried.json", encoding="utf-8"
) as f:
nouns_list = json.load(f)
update_data_in_use = True

# Get paths to load formatted data into.
path_from_file = get_path_from_format_file()
path_from_update_data = get_path_from_update_data()
ios_data_dir_from_org = get_ios_data_path("German", "nouns")
android_data_dir_from_org = get_android_data_path("German", "nouns")
desktop_data_dir_from_org = get_desktop_data_path("German", "nouns")
ios_data_dir_from_org = get_ios_data_path(LANGUAGE, "nouns")
android_data_dir_from_org = get_android_data_path(LANGUAGE, "nouns")
desktop_data_dir_from_org = get_desktop_data_path(LANGUAGE, "nouns")

ios_output_path = f"{path_from_file}{ios_data_dir_from_org}"
android_output_path = f"{path_from_file}{android_data_dir_from_org}"
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 2916028

Please sign in to comment.