Wikitongues Language Indexing

This is the web crawling tool that powers language indexing at Wikitongues. The tool works by visiting a number of online platforms and gathering links related to every language in the world. Wikitongues will provide these links as a resource to anyone who wants to learn, study, or revitalize their language.

Background

With nearly four billion people online, there has been an explosion of mother-tongue content in the form of memes, YouTube channels, public feeds on WhatsApp and Telegram, and other kinds of accessible media. In addition, there are two centuries of linguistic research gathering dust in university archives. How many of the world's 7,000 languages are already accessible online? With web crawling, we plan to find an answer.

Learn more about the project here.

Setup

Prerequisites:

To run this tool, Python 3 must be installed on your system.

Install locally:

Clone the repository and run from the root directory:

pip install .

Run

language-indexing

Develop

This project utilizes Scrapy, a web crawling framework.

Using a virtual environment for development is recommended.

Use a virtual environment:

Run this in the root directory to setup a virtual environment. You'll only need to do this once. This will create an env folder in the root directory which will contain the project dependencies as well as the python executable itself.

python -m venv env

Activate virtual environment:

Run this in the root directory to activate the virtual environment. Do this whenever you open the project in a new shell. This makes the installed dependencies available. The command line should indicate when "env" is active.

Mac/Unix/Linux:

source env/bin/activate

Windows:

env\Scripts\activate.bat

Install in virtual environment

Run this after creating the virtual environment and after changing the code. This installs the project and its dependencies into the active environment. You'll be able to run the tool with the language-indexing command.

pip install .

Run style guide check

Install flake8 if it is not already on your system. Run from the root directory:

flake8

Contribute

We're looking for help developing this tool. We invite contributors of any skill level. Please contact Scott if you are interested!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
wikitongues		wikitongues
.editorconfig		.editorconfig
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikitongues Language Indexing

Background

Setup

Prerequisites:

Install locally:

Run

Develop

Use a virtual environment:

Activate virtual environment:

Mac/Unix/Linux:

Windows:

Install in virtual environment

Run style guide check

Contribute

About

Releases

Packages

Languages

madamc/Language-Indexing

Folders and files

Latest commit

History

Repository files navigation

Wikitongues Language Indexing

Background

Setup

Prerequisites:

Install locally:

Run

Develop

Use a virtual environment:

Activate virtual environment:

Mac/Unix/Linux:

Windows:

Install in virtual environment

Run style guide check

Contribute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages