This repository hosts files and conversion scripts related to the St. Lawrence Island / Siberian Yupik Eskimo Dictionary compiled by Linda Womkon Badten (Aghnaghaghpik), Vera Oovi Keneshiro (Uqiitlek), Marie Oovi (Uvegtu), and Christopher Koonooka (Petuwaq); edited by Steven A. Jacobson; and published by the Alaska Native Language Center at the University of Alaska Fairbanks in 2008.
The code directory contains the following files:
This Python script requires Python 3.7 or later and the BeautifulSoup4 Python library, which can be installed in a local virtual environment as follows:
python3.7 -m venv venv
source venv/bin/activate
pip3 install beautifulsoup4
deactivate
The code can then be run as follows:
source venv/bin/activate
./code/html2xml.py data/foo.html
deactivate
Where base
or postbase
is substituted for foo
as appropriate.
This code is responsible for cleaning up the formatting tags and whitespace in the original html file, and creating the XML files. The idea is that this script should contain all of the tweaks and hacks necessary to handle any oddities in the original source files. The original source files themselves should not be edited.
The data directory contains the following files that were extracted from the original FileMaker Pro source files:
These files should not be edited.
The RTF files were produced using the following procedure:
- The relevant Filemaker Pro file was opened.
- Within Filemaker Pro, the Script Workspace was opened, and a new script was creating. The scripting command Copy All Records/Requests was added to the file, and the script was saved and then run.
- At this point, the macOS clipboard contained the contents of the Filemaker Pro file in rich text format (RTF).
- The macOS application TextEdit was opened, and a new document was created. The contents of the clipboard were pasted into the new document, which was then saved.
Each line in the resulting RTF file contains one dictionary entry from the Filemaker Pro file. The fields within each line are delineated with a tab character.
The HTML files were produced using the following procedure:
- The relevant Filemaker Pro file was opened.
- Within Filemaker Pro, the Script Workspace was opened, and a new script was creating. The scripting command Copy All Records/Requests was added to the file, and the script was saved and then run.
- At this point, the macOS clipboard contained the contents of the Filemaker Pro file in rich text format (RTF).
- The macOS application Terminal was opened, and the command
pbpaste -Prefer rtf | textutil -stdin -convert html -output foo.html
was run, wherebase
orpostbase
was substituted forfoo
as appropriate.
The raw XML files were produced using the following procedure:
- The relevant Filemaker Pro file was opened.
- Within Filemaker Pro, Export Records... was selected from the File... menu, and XML was selected as the export type.
- FMPXMLRESULT was selected as the Grammar type, and no XSL style sheet was used.
- In Specify Field Order for Export, all fields were selected. The fields were listed in the Field export order in case-insensitive alphabetical order.
- The file was exported.
The XLSX files were produced using the following procedure:
- The relevant Filemaker Pro file was opened.
- Within Filemaker Pro, the Save/Send Records As... Excel command was run.
Each line in the resulting XLSX file contains one dictionary entry from the Filemaker Pro file. NOTE: A few fields contain the newline character.
The ODS files were produced using the following procedure:
- The relevant Filemaker Pro file was opened.
- Within Filemaker Pro, the Save/Send Records As... Excel command was run.
- The newly created XLSX file was open using Microsoft Excel for Mac version 16.38.
- Within Microsoft Excel, the Save As... command was run, selecting OpenDocument Spreadsheet (.ods) as the file format.
Each line in the resulting ODS file contains one dictionary entry from the Filemaker Pro file. NOTE: A few fields contain the newline character.
The TSV files were produced using the following procedure:
- The relevant Filemaker Pro file was opened.
- Within Filemaker Pro, the Save/Send Records As... Excel command was run.
- The newly created XLSX file was open using Microsoft Excel for Mac version 16.38.
- Within Microsoft Excel, the Save As... command was run, selecting OpenDocument Spreadsheet (.ods) as the file format.
- The newly created ODS file was opened using LibreOffice version 6.0.6.2
- Within LibreOffice, the Find & Replace command was chosen.
- Within the Find & Replace dialog box, with Regular expressions checked, where the Find value was \n and the Replace value was four space characters, Replace All was run.
- Within LibreOffice, the Save As... command was run, selecting Text CSV (.csv) as the file type. The Edit Filter Settings box was checked. During export, the character set was Unicode (UTF-8), the Field delimiter was {Tab}, the String delimiter was set to the empty string, and the remaining checkboxes were unchecked.
Each line in the resulting TSV file contains one dictionary entry from the Filemaker Pro file. NOTE: No fields in this file contain the newline character.
This resource is part of the linguistic and cultural heritage of the St. Lawrence Island Yupik people. By accessing this resource, you agree to treat this resource, the St. Lawrence Island Yupik language, the St. Lawrence Island Yupik culture, and the St. Lawrence Island Yupik people with dignity and respect. If you do not agree to these conditions, you may not access this resource and you may not make copies of this resource.
If you agree to these conditions, you may access this resource under the terms of the Creative Commons Attribution No-Commercial 4.0 license (https://creativecommons.org/licenses/by-nc/4.0/).