Skip to content

Commit

Permalink
feat (format): Introduce buf (#519)
Browse files Browse the repository at this point in the history
* feat(spark): Refactoring datasources (#514)

### Reason for this PR
By moving datasources under `org.apache.spark.sql` we are able to access private Spark API. Last time when I was trying to fully migrate datasources to V2 it was a blocker. Detailed motivation is in #493 

### What changes are included in this PR?
Mostly refactoring.

### Are these changes tested?
Unit tests are passed

I manually checked the generated JARs:
![image](https://github.com/apache/incubator-graphar/assets/29755009/1b094516-88b1-490a-a2ea-8dcd092a3b1d)

### Are there any user-facing changes?
Mostly not because `GarDataSource` was left under the same package.


Close #493

* feat(dev): Add release and verify scripts (#507)

Reason for this PR
Add scripts for developer or release manager to easily release version or verify a version.

What changes are included in this PR?
Add release and verify scripts
related document is updated to website, see Update the release and verify document, and add development document incubator-graphar-website#18
Are these changes tested?
yes

Are there any user-facing changes?
no
---------

Signed-off-by: acezen <[email protected]>

* chore: Bump to version v0.12.0 (Round 1) (#517)


Signed-off-by: acezen <[email protected]>

* chore: Add CHANGELOG.md (#513)


Signed-off-by: acezen <[email protected]>

* Introduce buf

- v2
- buf.gen
- buf

 On branch format-definition-dev
 Your branch is up to date with 'origin/format-definition-dev'.

 Changes to be committed:
	new file:   buf.gen.yaml
	new file:   buf.yaml
	modified:   format/adjacent_list.proto
	modified:   format/edge_info.proto
	modified:   format/graph_info.proto
	modified:   format/property_group.proto
	modified:   format/types.proto
	modified:   format/vertex_info.proto

---------

Signed-off-by: acezen <[email protected]>
Co-authored-by: Weibin Zeng <[email protected]>
  • Loading branch information
SemyonSinchenko and acezen authored Jun 13, 2024
1 parent cbc7a6c commit 22aa41f
Show file tree
Hide file tree
Showing 59 changed files with 963 additions and 149 deletions.
6 changes: 6 additions & 0 deletions .devcontainer/graphar-dev.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,12 @@ RUN git clone --branch v1.8.3 https://github.com/google/benchmark.git /tmp/bench
&& make install \
&& rm -rf /tmp/benchmark

RUN git clone --branch v3.6.0 https://github.com/catchorg/Catch2.git /tmp/catch2 --depth 1 \
&& cd /tmp/catch2 \
&& cmake -Bbuild -H. -DBUILD_TESTING=OFF \
&& cmake --build build/ --target install \
&& rm -rf /tmp/catch2

ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib:/usr/local/lib64
ENV JAVA_HOME=/usr/lib/jvm/default-java

Expand Down
360 changes: 360 additions & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

13 changes: 8 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,7 @@ For small or first-time contributions, we recommend the dev container method. An
### Using a dev container environment

GraphAr provides a pre-configured [dev container](https://containers.dev/)
that could be used in [GitHub Codespaces](https://github.com/features/codespaces),
[VSCode](https://code.visualstudio.com/docs/devcontainers/containers), [JetBrains](https://www.jetbrains.com/remote-development/gateway/),
that could be used in [VSCode](https://code.visualstudio.com/docs/devcontainers/containers), [JetBrains](https://www.jetbrains.com/remote-development/gateway/),
[JupyterLab](https://jupyterlab.readthedocs.io/en/stable/).
Please pick up your favorite runtime environment.

Expand All @@ -107,6 +106,10 @@ Please pick up your favorite runtime environment.
Different components of GraphAr may require different setup steps. Please refer to their respective `README` documentation for more details.

- [C++ Library](cpp/README.md)
- [Java Library](java/README.md)
- [Spark Library](spark/README.md)
- [PySpark Library](pyspark/README.md)
- [Scala with Spark Library](spark/README.md)
- [Python with PySpark Library](pyspark/README.md) (under development)
- [Java Library](java/README.md) (under development)

----

This doc refer from [Apache OpenDAL](https://opendal.apache.org/)
16 changes: 13 additions & 3 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ Apache-2.0 licenses
The following components are provided under the Apache-2.0 License. See project link for details.
The text of each license is the standard Apache 2.0 license.

* spark 3.1.1 and 3.3.4 (https://github.com/apache/spark)
* Apache Spark 3.1.1 and 3.3.4 (https://github.com/apache/spark)
Files:
maven-projects/spark/datasourcs-32/src/main/scala/org/apache/graphar/datasources/GarCommitProtocol.scala
maven-projects/spark/datasourcs-32/src/main/scala/org/apache/graphar/datasources/GarDataSource.scala
Expand All @@ -234,9 +234,13 @@ The text of each license is the standard Apache 2.0 license.
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/orc/ORCOutputWriter.scala
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/orc/ORCWriteBuilder.scala
maven-projects/spark/datasourcs-33/src/main/scala/org/apache/graphar/datasources/parquet/ParquetWriteBuilder.scala
are modified from spark.
are modified from Apache Spark.

* Apache Arrow 12.0.0 (https://github.com/apache/arrow)
Files:
dev/release/setup-ubuntu.sh
are modified from Apache Arrow.

* arrow 12.0.0 (https://github.com/apache/arrow)
* fastFFI v0.1.2 (https://github.com/alibaba/fastFFI)
Files:
maven-projects/java/src/main/java/org/apache/graphar/stdcxx/StdString.java
Expand All @@ -251,6 +255,12 @@ The text of each license is the standard Apache 2.0 license.
maven-projects/java/src/main/java/org/apache/graphar/stdcxx/StdUnorderedMap.java
are modified from GraphScope.

* Apache OpenDAL v0.45.1 (https://github.com/apache/opendal)
Files:
dev/release/release.py
dev/release/verify.py
are modified from OpenDAL.

================================================================
MIT licenses
================================================================
Expand Down
8 changes: 8 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,11 @@ which includes the following in its NOTICE file:

fastFFI
Copyright 1999-2021 Alibaba Group Holding Ltd.

--------------------------------------------------------------------------------

This product includes code from Apache OpenDAL, which includes the following in
its NOTICE file:

Apache OpenDAL
Copyright 2022 and onwards The Apache Software Foundation.
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,24 +207,29 @@ See [GraphAr C++
Library](./cpp) for
details about the building of the C++ library.


### The Scala with Spark Library

See [GraphAr Spark
Library](./maven-projects/spark)
for details about the Scala with Spark library.

### The Java Library

The Java library is under development.

The GraphAr Java library is created with bindings to the C++ library
(currently at version v0.10.0), utilizing
[Alibaba-FastFFI](https://github.com/alibaba/fastFFI) for
implementation. See [GraphAr Java
Library](./maven-projects/java) for
details about the building of the Java library.

### The Spark Library

See [GraphAr Spark
Library](./maven-projects/spark)
for details about the Spark library.
### The Python with PySpark Library

### The PySpark Library
The Python with PySpark library is under development.

The GraphAr PySpark library is developed as bindings to the GraphAr
The PySpark library is developed as bindings to the GraphAr
Spark library. See [GraphAr PySpark
Library](./pyspark)
for details about the PySpark library.
Expand Down
18 changes: 18 additions & 0 deletions buf.gen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: v2
managed:
enabled: true
disable:
- file_option: java_package
plugins:
# Python classes
- remote: buf.build/protocolbuffers/python:v27.1
out: pyspark/graphar_pyspark/proto/
# Python headers for IDEs and MyPy
- remote: buf.build/protocolbuffers/pyi
out: pyspark/graphar_pyspark/proto/
# Cpp
- remote: buf.build/protocolbuffers/cpp:v27.1
out: cpp/src/proto
# Java
- remote: buf.build/protocolbuffers/java:v27.1
out: maven-projects/info/src/main/java/
3 changes: 3 additions & 0 deletions buf.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
version: v2
modules:
- path: format
4 changes: 2 additions & 2 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ if (CMAKE_VERSION VERSION_GREATER_EQUAL "3.24.0")
endif()

set(GRAPHAR_MAJOR_VERSION 0)
set(GRAPHAR_MINOR_VERSION 11)
set(GRAPHAR_PATCH_VERSION 4)
set(GRAPHAR_MINOR_VERSION 12)
set(GRAPHAR_PATCH_VERSION 0)
set(GREAPHAR_VERSION ${GRAPHAR_MAJOR_VERSION}.${GRAPHAR_MINOR_VERSION}.${GRAPHAR_PATCH_VERSION})
project(graphar-cpp LANGUAGES C CXX VERSION ${GREAPHAR_VERSION})

Expand Down
4 changes: 1 addition & 3 deletions cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,7 @@ repository and navigated to the ``cpp`` subdirectory with:

```bash
$ git clone https://github.com/apache/graphar.git
$ cd graphar
$ git submodule update --init
$ cd cpp
$ cd graphar/cpp
```

Release build:
Expand Down
3 changes: 1 addition & 2 deletions cpp/test/test_arrow_chunk_reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -158,8 +158,7 @@ TEST_CASE_METHOD(GlobalFixture, "ArrowChunkReader") {
<< '\n';
std::cout << "Column Nums: " << table->num_columns() << "\n";
std::cout << "Column Names: ";
for (int i = 0;
i < table->num_columns() && i < expected_cols.size(); i++) {
for (int i = 0; i < table->num_columns(); i++) {
REQUIRE(table->ColumnNames()[i] == expected_cols[i]);
std::cout << "`" << table->ColumnNames()[i] << "` ";
}
Expand Down
32 changes: 32 additions & 0 deletions dev/download_test_data.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# A script to download test data for GraphAr

if [ -n "${GAR_TEST_DATA}" ]; then
if [[ ! -d "$GAR_TEST_DATA" ]]; then
echo "GAR_TEST_DATA is set but the directory does not exist, cloning the test data to $GAR_TEST_DATA"
git clone https://github.com/apache/incubator-graphar-testing.git "$GAR_TEST_DATA" --depth 1 || true
fi
else
echo "GAR_TEST_DATA is not set, cloning the test data to /tmp/graphar-testing"
git clone https://github.com/apache/incubator-graphar-testing.git /tmp/graphar-testing --depth 1 || true
echo "Test data has been cloned to /tmp/graphar-testing, please run"
echo " export GAR_TEST_DATA=/tmp/graphar-testing"
fi
22 changes: 22 additions & 0 deletions dev/release/conda_env_cpp.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

cmake
conda-forge::arrow-cpp=13.0.0
make
clangxx_linux-64
conda-forge::catch2=3.6.0
19 changes: 19 additions & 0 deletions dev/release/conda_env_scala.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

maven
openjdk=11.0.13
119 changes: 119 additions & 0 deletions dev/release/release.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#!/usr/bin/env python3
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Derived from Apache OpenDAL v0.45.1
# https://github.com/apache/opendal/blob/5079125/scripts/release.py

import re
import subprocess
from pathlib import Path

ROOT_DIR = Path(__file__).parent.parent.parent

def get_package_version():
major_version = None
minor_version = None
patch_version = None
major_pattern = re.compile(r'set\s*\(\s*GRAPHAR_MAJOR_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
minor_pattern = re.compile(r'set\s*\(\s*GRAPHAR_MINOR_VERSION\s+(\d+)\s*\)', re.IGNORECASE)
patch_pattern = re.compile(r'set\s*\(\s*GRAPHAR_PATCH_VERSION\s+(\d+)\s*\)', re.IGNORECASE)

file_path = ROOT_DIR / "cpp/CMakeLists.txt"
with open(file_path, 'r') as file:
for line in file:
major_match = major_pattern.search(line)
minor_match = minor_pattern.search(line)
patch_match = patch_pattern.search(line)

if major_match:
major_version = major_match.group(1)
if minor_match:
minor_version = minor_match.group(1)
if patch_match:
patch_version = patch_match.group(1)

if major_version and minor_version and patch_version:
return f"{major_version}.{minor_version}.{patch_version}"
else:
return None

def archive_source_package():
print(f"Archive source package started")

version = get_package_version()
assert version, "Failed to get the package version"
name = f"apache-graphar-{version}-incubating-src"

archive_command = [
"git",
"archive",
"--prefix",
f"apache-graphar-{version}-incubating-src/",
"-o",
f"{ROOT_DIR}/dist/{name}.tar.gz",
"HEAD",
]
subprocess.run(
archive_command,
cwd=ROOT_DIR,
check=True,
)

print(f"Archive source package to dist/{name}.tar.gz")


def generate_signature():
for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
print(f"Generate signature for {i}")
subprocess.run(
["gpg", "--yes", "--armor", "--output", f"{i}.asc", "--detach-sig", str(i)],
cwd=ROOT_DIR / "dist",
check=True,
)

for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
print(f"Check signature for {i}")
subprocess.run(
["gpg", "--verify", f"{i}.asc", str(i)], cwd=ROOT_DIR / "dist", check=True
)


def generate_checksum():
for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
print(f"Generate checksum for {i}")
subprocess.run(
["sha512sum", str(i.relative_to(ROOT_DIR / "dist"))],
stdout=open(f"{i}.sha512", "w"),
cwd=ROOT_DIR / "dist",
check=True,
)

for i in Path(ROOT_DIR / "dist").glob("*.tar.gz"):
print(f"Check checksum for {i}")
subprocess.run(
["sha512sum", "--check", f"{str(i.relative_to(ROOT_DIR / 'dist'))}.sha512"],
cwd=ROOT_DIR / "dist",
check=True,
)


if __name__ == "__main__":
(ROOT_DIR / "dist").mkdir(exist_ok=True)
archive_source_package()
generate_signature()
generate_checksum()
Loading

0 comments on commit 22aa41f

Please sign in to comment.