weave

🚀 Transform your data with AI-powered synthetic generation and augmentation

Weave is a powerful Python framework that helps you create high-quality synthetic datasets using state-of-the-art Language Models. Whether you're training ML models, testing applications, or augmenting existing datasets, Weave makes it easy to generate diverse, realistic data at scale.

✨ Why Weave?

🎯 Production-Ready Data Generation: Create synthetic datasets that mirror real-world complexity and edge cases
🔄 Smart Data Augmentation: Enhance your training data with intelligent noise and variations
🎨 Style Transfer & Persona Simulation: Generate content in different writing styles and personas
🌍 Multi-Language Support: Work with content across different languages and domains
🔍 Context-Aware Transformations: Maintain coherence and relevance in your synthetic data
📊 Advanced Dataset Management: Seamlessly merge and manage synthetic and real datasets

🚀 Quick Start

pip install git+https://github.com/ashikshafi08/weave.git

from weave.noisers import StyleTransferNoiser
from weave.llms import OpenAILLM

# Initialize with your favorite LLM
llm = OpenAILLM(model="gpt-4o-mini")

# Create a technical writer persona
noiser = StyleTransferNoiser(
    model_connector=llm,
    style_config={"style": "technical_documentation"}
)

# Transform casual text into technical documentation
casual_text = "This code helps you make fake data that looks real"
technical_doc = noiser.augment(casual_text)
print(technical_doc)
# Output: "This framework facilitates the generation of synthetic data 
#          that accurately simulates real-world characteristics..."

🎯 Use Cases

Data Augmentation

from weave.datasets import DatasetLoader, DatasetMerger

# Load your existing dataset
loader = DatasetLoader()
real_data = loader.load("path/to/data.csv")

# Generate complementary synthetic data
synthetic_data = generate_synthetic_samples(real_data)

# Intelligently merge real and synthetic data
merger = DatasetMerger()
enhanced_dataset = merger.merge(
    real_data, 
    synthetic_data,
    strategy="mix",
    ratio=0.3  # 30% synthetic data
)

Multi-Style Content Generation

from weave.noisers import LanguageNoiser, SentimentNoiser

# Create content variations
lang_noiser = LanguageNoiser(
    model_connector=llm,
    language_config={
        "language": "en",
        "locale": "UK"
    }
)

sentiment_noiser = SentimentNoiser(
    model_connector=llm,
    sentiment_config={
        "target_sentiment": "positive",
        "intensity": 0.8
    }
)

# Transform content
uk_text = lang_noiser.augment("Color the background blue")
# Output: "Colour the background blue"

positive_review = sentiment_noiser.augment("The service was okay")
# Output: "The service exceeded my expectations!"

📚 Documentation

Check out our detailed guides to see Weave in action:

🛠️ Features

Advanced Noisers

Style Transfer: Transform content between different writing styles
Language Adaptation: Handle language-specific nuances and variations
Domain-Specific Errors: Simulate realistic mistakes and edge cases
Sentiment Transformation: Adjust content tone and emotional impact
Context-Aware Noising: Maintain coherence across transformations

Dataset Tools

Smart Merging: Intelligently combine synthetic and real data
Quality Validation: Ensure synthetic data meets quality standards
Format Support: Work with CSV, JSON, JSONL, and streaming data
HuggingFace Integration: Direct access to public datasets

🤝 Contributing

We welcome contributions! Check out our contribution guidelines to get started.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with ❤️ by the Weave team

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
docs		docs
examples		examples
weave		weave
LICENSE		LICENSE
README.md		README.md
create_weave_structure.sh		create_weave_structure.sh
requirements.txt		requirements.txt
roadmap.md		roadmap.md
test.txt		test.txt
weave.png		weave.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

weave

✨ Why Weave?

🚀 Quick Start

🎯 Use Cases

Data Augmentation

Multi-Style Content Generation

📚 Documentation

🛠️ Features

Advanced Noisers

Dataset Tools

🤝 Contributing

📄 License

About

Releases

Packages

Contributors 3

Languages

License

ashikshafi08/weave

Folders and files

Latest commit

History

Repository files navigation

weave

✨ Why Weave?

🚀 Quick Start

🎯 Use Cases

Data Augmentation

Multi-Style Content Generation

📚 Documentation

🛠️ Features

Advanced Noisers

Dataset Tools

🤝 Contributing

📄 License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages