Skip to content
Change the repository type filter

All

    Repositories list

    • HTML
      0000Updated Dec 13, 2024Dec 13, 2024
    • Python
      MIT License
      0100Updated Nov 6, 2024Nov 6, 2024
    • 1000Updated Oct 31, 2024Oct 31, 2024
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.9k000Updated Oct 26, 2024Oct 26, 2024
    • Python
      MIT License
      1000Updated Jul 18, 2024Jul 18, 2024
    • qlora

      Public
      QLoRA: Efficient Finetuning of Quantized LLMs
      Jupyter Notebook
      MIT License
      825000Updated Nov 20, 2023Nov 20, 2023
    • llm-awq

      Public
      AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
      Python
      MIT License
      215100Updated Nov 20, 2023Nov 20, 2023
    • OmniQuant

      Public
      OmniQuant is a simple and powerful quantization technique for LLMs.
      Python
      56000Updated Nov 8, 2023Nov 8, 2023
    • rulm

      Public
      Language modeling and instruction tuning for Russian
      Jupyter Notebook
      Apache License 2.0
      50000Updated Oct 18, 2023Oct 18, 2023
    • AutoAWQ

      Public
      AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
      C++
      MIT License
      222000Updated Oct 16, 2023Oct 16, 2023
    • [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
      Python
      MIT License
      151000Updated Oct 13, 2023Oct 13, 2023
    • peft

      Public
      🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
      Python
      Apache License 2.0
      1.7k000Updated Sep 25, 2023Sep 25, 2023
    • Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
      Python
      Apache License 2.0
      258000Updated Aug 16, 2023Aug 16, 2023