GitHub - TimKoornstra/learn-like-an-llm: Learn Like An LLM is an interactive tool that helps users understand language models by guessing masked words in sentences. It combines advanced models like BERT and MiniLM to provide contextual feedback, making learning both fun and educational.

Learn Like An LLM

Learn Like An LLM is a project designed to help users understand and engage with language models by simulating the experience of guessing masked words in sentences. The initial implementation includes a BERT-based model for generating probable words, and the all-MiniLM-L6-v2 model to generate cosine similarity between user input and the original text. Future expansions aim to incorporate more sophisticated models, improve the scoring system, improve the difficulty system, and provide a user-friendly frontend interface.

Introduction

Learn Like An LLM offers an interactive way to engage with language models by allowing users to guess masked words in sentences. The project demonstrates how language models can be used to learn a new language by providing more contextual feedback than just "correct" or "incorrect." This is a fun and educational tool for those interested in natural language processing and machine learning.

Installation

To run this project, you'll need Python 3.11 or higher and the required dependencies. Follow these steps to set up the project:

Clone the repository:

git clone https://github.com/your-username/learn-like-an-llm.git
cd learn-like-an-llm

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Download the NLTK data required for tokenization:
```
import nltk
nltk.download('punkt')
```

Usage

To start the word masking and guessing game, run the main.py script:

python src/main.py

You will be prompted to choose a language (e.g., 'english', 'spanish', 'french'). The program will then load the corresponding text corpus from data/corpus/<language> and begin the interactive guessing game. You will guess the missing word in masked sentences and receive feedback on your guesses, including a 'fitness' score, which measures how well the language model predicts the missing word fits in the context of that sentence. The higher the fitness score (ranging from 0 to 1), the better the model (in this case: you) perform.

Features

Language Models: Use BERT and MiniLM models for word prediction and similarity analysis.
Interactive Game: Engage with the language model by guessing masked words in sentences.
Custom Corpora: Load and preprocess text corpora for analysis.
User-Uploaded Corpora: Upload and play with your own corpora by uploading text files in the data/corpus/<language> directory.
Automatic Translation: Automatic translation of the original and user input text to English after guessing the masked word.

Future Plans

Frontend Interface: Develop a user-friendly frontend for easier interaction.
Performance Metrics: Provide detailed performance metrics and analysis.
Improved Difficulty Levels: Improve the way the masked words and sentences are chosen to provide a more challenging learning experience.
Improved Scoring System: Develop a more sophisticated scoring system based on user performance.
Progress Tracking: Track user progress and provide personalized feedback.

Contributing

I welcome contributions from the community! If you'd like to contribute, please follow these steps:

Fork the repository.
Create a new branch for your feature or bugfix.
Commit your changes and push your branch to your fork.
Submit a pull request with a detailed description of your changes.

License

This project is licensed under the GPL-3.0 License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data/corpus/english		data/corpus/english
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learn Like An LLM

Table of Contents

Introduction

Installation

Usage

Features

Future Plans

Contributing

License

About

Releases

Languages

License

TimKoornstra/learn-like-an-llm

Folders and files

Latest commit

History

Repository files navigation

Learn Like An LLM

Table of Contents

Introduction

Installation

Usage

Features

Future Plans

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages