Skip to content
Change the repository type filter

All

    Repositories list

    • velox

      Public
      A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
      C++
      Apache License 2.0
      1.2k21819Updated Dec 4, 2024Dec 4, 2024
    • raydp

      Public
      RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.
      Python
      Apache License 2.0
      703183611Updated Nov 21, 2024Nov 21, 2024
    • oap-mllib

      Public
      Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.
      Scala
      Apache License 2.0
      1220366Updated Nov 14, 2024Nov 14, 2024
    • .github

      Public
      Other
      0000Updated Aug 19, 2024Aug 19, 2024
    • vllm-fork

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      4.8k000Updated Jul 23, 2024Jul 23, 2024
    • text2sql-gluten

      Public archive
      Python
      3500Updated Jul 11, 2024Jul 11, 2024
    • English SDK for Apache Spark
      Python
      Apache License 2.0
      128101Updated Jul 9, 2024Jul 9, 2024
    • libhdfs3

      Public
      HDFS file read access for ClickHouse
      C++
      Apache License 2.0
      56200Updated Jul 5, 2024Jul 5, 2024
    • oap-tools

      Public archive
      Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.
      Jupyter Notebook
      Apache License 2.0
      131692Updated Mar 27, 2024Mar 27, 2024
    • Gluten: Plugin to Double SparkSQL's Performance
      Scala
      Apache License 2.0
      440000Updated Mar 26, 2024Mar 26, 2024
    • Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.
      Scala
      Apache License 2.0
      132131Updated Mar 15, 2024Mar 15, 2024
    • protobuf

      Public
      A Intel customized Protocol Buffers - Google's data interchange format
      C++
      Other
      16k001Updated Nov 21, 2023Nov 21, 2023
    • Gluten-Trino

      Public archive
      Gluten: Plugin to Boost Trino's Performance
      Java
      Apache License 2.0
      167061Updated Oct 25, 2023Oct 25, 2023
    • cloudtik

      Public archive
      Cloud Scale Platform for Distributed Analytics and AI
      Python
      Apache License 2.0
      72311Updated Oct 12, 2023Oct 12, 2023
    • pmem-shuffle

      Public archive
      Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote persistent memory (for read) to provide extremely high performance and low latency shuffle solutions for Spark*.
      C++
      Apache License 2.0
      914151Updated Sep 18, 2023Sep 18, 2023
    • recdp

      Public archive
      Python
      Apache License 2.0
      4210Updated Sep 18, 2023Sep 18, 2023
    • oap-project.github.io

      Public archive
      The OAP project web site
      HTML
      Apache License 2.0
      4000Updated Sep 5, 2023Sep 5, 2023
    • arrow

      Public
      Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication…
      C++
      Apache License 2.0
      3.6k6021Updated May 18, 2023May 18, 2023
    • gazelle_plugin

      Public archive
      Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
      Scala
      Apache License 2.0
      7525619124Updated Feb 21, 2023Feb 21, 2023
    • solution-navigator

      Public archive
      Example solutions or code for using OAP features.
      Jupyter Notebook
      Apache License 2.0
      3000Updated Jan 25, 2023Jan 25, 2023
    • sql-ds-cache

      Public archive
      Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
      Scala
      Apache License 2.0
      2537154Updated Jan 3, 2023Jan 3, 2023
    • libhdfs3-downstream

      Public archive
      a native c/c++ hdfs client (downstream fork from apache-hawq)
      C++
      Apache License 2.0
      54000Updated Jan 3, 2023Jan 3, 2023
    • arrow-data-source

      Public archive
      Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
      Scala
      Apache License 2.0
      10630Updated Jan 3, 2023Jan 3, 2023
    • pmem-spill

      Public archive
      Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.
      Scala
      Apache License 2.0
      57111Updated Dec 15, 2021Dec 15, 2021
    • pmem-common

      Public archive
      Common library for accessing PMEM native library functions including memkind, vmemcache and so on.
      Java
      Apache License 2.0
      7331Updated Dec 14, 2021Dec 14, 2021