Poppy

poppy is dataframe library for java, which provides common SQL operations (e.g. select, from, where, group by, order by, distinct) to process data in java.

Unlike other dataframe libraries, which keep all the data in memory, poppy process data in streaming manager. That is, it is more similar as Java8 Stream library, but relational version.

Here is a simple example. We have a Student class

public class Student {
    private int studentId;
    private String name;
    private int grade;
    private int room;
    private int height;
    private int weight;
    ...
}

In SQL, we have a query like this

select 
    grade, 
    room, 
    avg(weight) as weight, 
    avg(height) as height
from Student
group by grade, room
order by grade, room

Here is the Poppy's version

List<Student> students = ...;

DataFrame
.from(students, Student.class)
.groupby("grade", "room")
.aggregate(
    avgLong("weight").as("weight"),
    avgLong("height").as("height"))
.sort("grade", "room")
.print();

Getting Started

Requirement

Java 8 or higher

Dependency

Poppy's package is managed by JCenter repository.

Maven

<dependency>
  <groupId>io.tenmax</groupId>
  <artifactId>poppy</artifactId>
  <version>0.1.8</version>
  <type>pom</type>
</dependency>

Gradle

compile 'io.tenmax:poppy:0.1.8'

Features

Support the most common operations in SQL. e.g. select, from, where, group by, order by, distinct
Support the most common aggregation functions in SQL. e.g. avg(), sum(), count(), min(), max()
Custom aggregation functions. by java.util.stream.Collector
Partition support. Partition is the unit of parallelism. Multiple partitions allow you processing data concurrently.
Multi-threaded support. For CPU-bound jobs, it leverages all your CPU resources for better performance; for IO-bound jobs, it reduces the waiting time, and take adventages of better concurrency.
Suitable for both batch and streaming scenario.
Lightweight. Comparing to Spark DataFrame API, it is much more lightweight to embed in your application.
Stream-based design. Comparing to joinery, which keeps the whole data in memory. Poppy's streaming behaviour allows limited memory to process huge volume of data.

Documentation

Contribution

Please fork this project and pull request to me and any comment would be appreciated!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
gradle/wrapper		gradle/wrapper
src		src
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
release.sh		release.sh
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Poppy

Getting Started

Requirement

Dependency

Features

Documentation

Contribution

About

Releases

Packages

Languages

tenmax/poppy

Folders and files

Latest commit

History

Repository files navigation

Poppy

Getting Started

Requirement

Dependency

Features

Documentation

Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages