llm-datasets

Star

Here are 17 public repositories matching this topic...

neo4j-labs / text2cypher

Star

collection of text2cypher datasets, evaluations, and finetuning instructions

neo4j graph cypher cypher-query-language llm llms llm-training llm-datasets text2cypher

Updated Jun 13, 2024
Jupyter Notebook

dsdanielpark / open-llm-datasets

Sponsor

Star

Repository for organizing datasets and papers used in Open LLM.

natural-language-processing datasets large-language-models llm llm-training llm-datasets

Updated Jul 6, 2023

discus-labs / discus

Star

A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ

python openai gpt synthetic-data fine-tuning synthetic-dataset-generation ner-data huggingface-transformers gpt-4 large-language-models llms llm-training llm-datasets fine-tuning-llm

Updated Nov 20, 2023
Python

asimsinan / LLM-Research

Star

A collection of LLM related papers, thesis, tools, datasets, courses, open source models, benchmarks

arxiv-papers large-language-models llm llms llm-datasets llm-tools buyuk-dil-modelleri llm-research llm-theses llm-benchmarking llm-frameworks

Updated Oct 8, 2024
Python

altunenes / rustysozluk

Sponsor

Star

Efficiently fetch and perform sentiment analysis (Turkish Only) on eksisozluk.com entries using Rust

rust scraper sentiment-analysis turkish eksisozluk rust-lang webscraping eksi-sozluk reqwest duyguanalizi rust-scraping llm-training llm-datasets

Updated Feb 8, 2024
Rust

amao0o0 / awesome-AI-Math-Datasets

Star

A collection of recent open-source math datasets for training and evaluating Math LLMs

math mathematics llm ai4math llm-datasets math-llm

Updated Apr 15, 2025

A framework to analyze how AGI/ASI might emerge from decentralized, adaptive systems, rather than as the fruit of a single model deployment. It also aims to present orientation as a dynamic and self-evolving Magna Carta, helping to guide the emergence of such phenomena.

machine-learning agi dataset artificial-general-intelligence machine-learning-library datasets quantum-field-theory machine-learning-projects artificial-gene-regulatory-networks llm llms rlhf llm-datasets quantum-fields llm-framework llms-benchmarking llm-benchmarking artificial-general-super-intelligence agi-development

Updated Apr 30, 2025

neuralwork / audio2chat

Star

Convert multi-speaker audio files to structured chat data for LLMs

chat transcription whisper speaker-diarization llm llm-datasets

Updated Jan 29, 2025
Python

DefinetlyNotAI / LLM_Data

Star

A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI

c data cpp cuda jupyter-notebook python3 code-examples llm llm-datasets data-dum programming-data programming-data-sets llm-code

Updated Dec 12, 2024
Python

tiddly-gittly / TiddlyWiki-LLM-dataset

Star

WikiText syntax dataset generation pipeline and open dataset for auto UI generation in TiddlyWiki. (WIP)

dataset tiddlywiki wikitext llm llm-training llm-datasets

Updated Nov 20, 2024
TypeScript

arian-askari / SOLID

Star

Synthetically Generating Intent-Aware Information-Seeking Dialogues! Useful for various tasks such as training/evaluating User Intent Predictors with the possibility to training/evaluating on real human dialogues. The backbone LLM of SOLID is Zephyr-7b-beta.

solid dataset-generation conversational-ai intent-classification llm-training llm-inference llm-datasets llm-dialogs llm-conversations zephyr-7b-beta intent-aware-conversation-generation solid-rl

Updated Aug 18, 2024
Python

dmeldrum6 / LLMDatasetBuilder

Star

LLM-Powered Dataset Creation Tool

llm llm-training llm-datasets

Updated Mar 13, 2025
HTML

redblock-ai / parrot-python

Star

PARROT (Performance Assessment of Reasoning and Responses On Trivia) is a novel benchmarking framework designed to evaluate Large Language Models (LLMs) on real-world, complex, and ambiguous QA tasks.

benchmarking-framework llm-inference llm-datasets llm-qa-document llm-benchmarking