Skip to content

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible

License

Notifications You must be signed in to change notification settings

Freely-Given-org/macula-hebrew

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

macula-hebrew (מכלה)

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible

This repository contains the MACULA linguistic datasets for the Hebrew Bible, including data from:

  1. The text of the Westminster Leningrad Codex, released into the public domain by the Groves Center, and available at tanach.us.
  2. Morphology from the Open Scriptures Hebrew Bible, available on Github.
  3. Syntax trees developed by Clear Bible, Inc. together with the Groves Center. (Note: Clear was formerly known as Global Bible Initiative from 2014-2020 and Asia Bible Society before that.) Recently, the Groves Center graciously released Westminster Hebrew Syntax without Morphology under a Creative Commons CC BY 4.0 license.
  4. Word sense data from the United Bible Societies MARBLE project, based on the Semantic Dictionary of Biblical Hebrew.
  5. Cherith Glosses for the Hebrew Old Testament, by Andi Wu, Copyright (C) 2022 by Cherith Analytics, is licensed under a Creative Commons Attribution 4.0 International License ("CC BY 4.0").
  6. Semantic roles: Who does what to whom? (Agent, Verb, Patient …)
  7. Participant referents: Who is “he,” “she,” or “it” in this sentence?

We are adding further datasets, one at a time.

This data has been combined into a single set of trees. There are three variants of this data, found in the following directories:

  1. WLC/nodes contains this data in a set of nested Node elements suitable for many NLP systems and other systems that use recursive algorithms.
  2. WLC/lowfat contains the same data in a form more suitable for some kinds of query systems and some kinds of display.
  3. WLC/tsv contains the word-level data in a TSV table, without syntactic tree structure. This is simpler for many programs that do not need the complexity of graph structures.

Copyright statements for the individual sources can be found in the MACULA Hebrew license.

About

Syntax trees, morphology, and linguistic annotations for the Hebrew Bible

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 33.2%
  • XQuery 21.1%
  • XSLT 21.0%
  • Jupyter Notebook 19.8%
  • CSS 4.9%