Mahabharata

This is a compilation of the text of Mahabharata from the following sources

Complete Translation by K. M. Ganguli: Complete text is here from this github repository
Laura Gibbs Tiny Tales: This is a retelling of the Mahabharata using two hundred episodes that are each 100 words long.
Kaggle data repo by Tilak: All 18 parvas of Mahabharata in txt format for NLP
Wikipedia Parva Summaries

Directory Structure

text

The text copied from the sources mentioned above.

data_parse

Python Notebooks for parsing the data into CSV files

data processing

Notebooks for processing the data. This directory contains the NER notebook for calculating named entitied for the text chunks. I am using the following model for Named Entity Recognition

2rtl3/mn-xlm-roberta-base-named-entity using the Hugging Face transformers library

data

Contains the final output of data parsing notebooks into pandas dataframes as | delimited CSV files. All the metadata, including the source, chapter, section, etc. are maintained as columns in the csv. Each csv has a text column containing the text chunk with 100 to 500 tokens each. Each row also has a chunk_id, which is a uuid. This chunk id is used to index the named entities in the named entities dataframes.

data/named_entities

Each data csv also has a corrosponding Named Entities csv. The chunk_id is used as an index for tagging named entities to corrosponding chunks.

Note: If you regenerate the data csv files, you must also regenerate the named entities, or else the chunk_id in the named entity dataframes will not corrospond to the regenerated csv rows

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
data_parse		data_parse
data_processing		data_processing
text		text
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mahabharata

Directory Structure

text

data_parse

data processing

data

data/named_entities

About

Releases

Packages

Languages

rahulnyk/mahabharata

Folders and files

Latest commit

History

Repository files navigation

Mahabharata

Directory Structure

text

data_parse

data processing

data

data/named_entities

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages