-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve GADBC API to be GObject Introspection friendly #1280
Comments
I don't want to add hard Apache Arrow GLib (that depends on Apache Arrow C++) dependency to ADBC GLib. Apache Arrow C data/stream interfaces ( https://arrow.apache.org/docs/format/CDataInterface.html / https://arrow.apache.org/docs/format/CStreamInterface.html ) are interfaces to avoid the hard dependency. Can we provide a convenient wrapper in Vala layer instead like we did for the Ruby bindings? |
Vala produce C code, so if we wraps but use symbols from other libraries, they depends on them; in this case Ruby use Then GADBC requires to add objects to handle data returned by queries, streams and data structures, but that is the job of Arrow I think; so, make depends GADBC to Arrow should be a natural step. I see GADBC as the bridge between Data Sources and Data Structures and Streams to handle them, where the data types and streams are provided by Apache Arrow, while the data source driver translate his own format to Arrow. |
I'm writting a library called libvda in Vala, it will use GADBC to access Databases using its drivers. This could be the wrapper we are talking about. Internally libvda could wrap GADBC API and Arrow to access the data in the way more convenient for Vala programmers. Please help me to merge my PR #1152 so any one can use Vala API as is currently, including libvda. |
It's OK that the Ruby bindings and the Vala API depends on Apache Arrow GLib because Apache Arrow GLib is the de fact library to process Apache Arrow data in Ruby and Vala. But I want to avoid that ADBC GLib depends on Apache Arrow GLib because there are many options to process Apache Arrow data for C users. Can I push some changes to your branch directly to merge #1152 as soon as possible? |
Maybe I can use Metadata to generate the required fixes on Vala API, but if VAPI is generated is possible that the development package will depend on Arrow development package, to provide de GIR/VAPI required by this bindings. Is this acceptable to you?
Yes please! Feel free to do any thing you consider is better for GADBC, thanks in advance. |
So as for other languages, having its own directory, I can add a folder to create a pure GObject/Vala bindings library, with interfaces and classes to make easy to use ADBC with GObject (C)/Vala and any other language using GObject-Introspection, is it that good for you? @kou The new library could be called |
Does this mean that you want to implement a new GObject based library with Apache Arrow GLib dependency from scratch? |
No really. It means I will reuse the existing API and wrap it to one providing GObject one but GObject Introspection friendly. It will depend on ADBC and Arrow to provide Database access and data handling. I don’t plan to add new, for example, Connection object but a derived one providing the above API, just for the ones that use I think that your current API, if is GLib/GObject and depending on GObject Introspection, should be GObject Introspection friendly. |
Also, if some generic is required, may ADBC API should use a |
I see. Then we don't need to create a new top-level directory. We can just create a new sub-directory under diff --git a/glib/example/meson.build b/glib/example/meson.build
index ed34fd02..41c9ab23 100644
--- a/glib/example/meson.build
+++ b/glib/example/meson.build
@@ -18,7 +18,6 @@
# under the License.
if build_example
- arrow_glib = dependency('arrow-glib')
executable('sqlite', 'sqlite.c',
dependencies: [adbc_glib, arrow_glib],
link_language: 'c')
diff --git a/glib/meson.build b/glib/meson.build
index 0deee035..a7b716c5 100644
--- a/glib/meson.build
+++ b/glib/meson.build
@@ -78,6 +78,13 @@ if generate_vapi
endif
subdir('adbc-glib')
+build_arrow_integration = get_option('arrow')
+if build_arrow_integration or build_example
+ arrow_glib = dependency('arrow-glib')
+endif
+if build_arrow_integration
+ subdir('adbc-arrow-glib')
+endif
subdir('example')
install_data('../LICENSE.txt',
diff --git a/glib/meson_options.txt b/glib/meson_options.txt
index 8603b783..39a318c5 100644
--- a/glib/meson_options.txt
+++ b/glib/meson_options.txt
@@ -22,6 +22,11 @@ option('adbc_build_dir',
value: '',
description: 'Use this option to build with not installed ADBC')
+option('arrow',
+ type: 'boolean',
+ value: false,
+ description: 'Enable Apache Arrow GLib integration')
+
option('example',
type: 'boolean',
value: false, If we provide a Apache Arrow GLib integration library, we should use |
Checking out
Great! Now which should be the name space of this new library?
Just extend the API by adding new GObject Introspectable methods to the existing objects, means different libraries, with the same API and same namespace, but using GArrow and different name. Current is called We can use Above method, will allow to reuse the same code and create a new library, but just adding a MACRO definition. Is this good for you? |
How about I don't want to use the |
I see. On using This is how var statement = new GADBC.Statement (connection);
string sql = "SELECT sqlite_version() AS version";
statement.set_sql_query (sql);
int64 n_rows_affected = 0;
GArrow.RecordBatchReader reader = statement.execute_stream (out n_rows_affected);
var table = reader.read_all ();
stdout.printf ("Result:\n%s", table.to_string ()); Above code is introspectable, but This is how var statement = new GADBC.Statement (connection);
string sql = "SELECT sqlite_version() AS version";
statement.set_sql_query (sql);
void *c_abi_array_stream = null;
int64 n_rows_affected;
statement.execute (true, out c_abi_array_stream, out n_rows_affected);
var reader = GArrow.RecordBatchReader.import (c_abi_array_stream);
var table = reader.read_all ();
stdout.printf ("Result:\n%s", table.to_string ()); Above code works currently, but is not completely introspectable. This is how var statement = new GADBC.Statement (connection);
string sql = "SELECT sqlite_version() AS version";
statement.set_sql_query (sql);
var stm = GADBCArrow.Statement (statement);
GArrow.RecordBatchReader reader = stm.execute (true, out n_rows_affected);
var table = reader.read_all ();
stdout.printf ("Result:\n%s", table.to_string ()); In above code we use There is a way, where we wrap all API from This is how var statement = new GADBCArrow.Statement (connection);
string sql = "SELECT sqlite_version() AS version";
statement.set_sql_query (sql);
GArrow.RecordBatchReader reader = statement.execute (true, out n_rows_affected);
var table = reader.read_all ();
stdout.printf ("Result:\n%s", table.to_string ()); Above code all objects including I consider the wrapping solution is harder to implement and maintain; while the use of Consider that Hope all above convince you to create a C API far from GLIb/GObject and allow that |
How about just providing #define GADBC_ARROW_TYPE_STATEMENT (gadbc_arrow_statement_get_type())
G_DECLARE_DERIVABLE_TYPE(GADBCArrowStatement, gadbc_arrow_statement, GADBCArrow, STATEMENT, GADBCStatement)
struct _GADBCArrowStatementClass {
GADBCStatementClass parent_class;
};
GADBCArrowStatement* gadbc_arrow_statement_new(GADBCConnection* connection, GError** error);
gboolean
gadbc_arrow_statement_bind(GADBCArrowStatement* statement, GArrowArray *array, GArrowSchema *schema, GError ** error);
gboolean gadbc_arrow_statement_bind_stream(GADBCArrowStatement* statement,
GArrowRecordBatchReader *reader, GError** error);
GArrowRecordBatchReader *gadbc_arrow_statement_execute(GADBCArrowStatement* statement, gboolean need_result,
gint64* n_rows_affected,
GError** error); Why do we need to wrap all APIs in |
Because you can use just one namespace to access all objects and its API. If leave just parts of the API , you have to always add the namespace in order to distinguish between types, when using on Vala, Python or other languages, making weir to use this API. Also, above code requires a |
I'm trying to create a functional API, but found I haven't enough knowledge to handle memory issues when C++ is in the middle. There is an example for Vala using correctly the API when This method calls This is the method I'm using to call from Vala: GArrowRecordBatchReader*
gadbc_statement_execute_stream(GADBCStatement* statement,
gint64* n_rows_affected,
GError** error) {
gpointer stream;
if (!gadbc_statement_execute_internal(statement, TRUE,
&stream, n_rows_affected, error)) {
return NULL;
}
GArrowRecordBatchReader *reader = garrow_record_batch_reader_import(stream, error);
//g_free(stream);
return reader;
} This is the code in Vala: // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
int main (string[] args) {
var exit_code = Posix.EXIT_FAILURE;
try {
var database = new GADBC.Database ();
database.set_option ("driver", "adbc_driver_sqlite");
database.set_option ("uri", ":memory:");
database.init ();
try {
var connection = new GADBC.Connection ();
connection.init (database);
try {
var statement = new GADBC.Statement (connection);
string sql = "SELECT sqlite_version() AS version";
statement.set_sql_query (sql);
try {
int64 n_rows_affected = 0;
GArrow.RecordBatchReader reader = statement.execute_stream (out n_rows_affected);
try {
var table = reader.read_all ();
stdout.printf ("Result:\n%s", table.to_string ());
} catch (GLib.Error error) {
GLib.error ("Failed to read table: %s", error.message);
}
exit_code = Posix.EXIT_SUCCESS;
} catch (GLib.Error error) {
GLib.error ("Failed to execute a statement: %s", error.message);
}
}
catch (GLib.Error error) {
GLib.error ("Failed to create a statement: %s", error.message);
}
}
catch (GLib.Error error) {
GLib.error ("Failed to create a connection: %s", error.message);
}
}
catch (GLib.Error error) {
GLib.error ("Failed to create a database: %s", error.message);
}
return exit_code;
} Fails exactly on |
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes #1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
Fixes apache#1280. We can use `GArrowRecordBatch`/`GArrowSchema`/`GArrowRecordBatchReader` instead of C data API with this library.
The C API could be correct and useful as is, but when switch to Vala using
void*
doesn't feet Vala's automatic memory management, as we need to tell it if the pointer should or not be freed. In other wordsvoid*
means manual memory management, and that push back the advantage of Vala for GObject.For
GADBC.Statement
, currently there are this API:The C comments declares
gpointer c_abi_array
andgpointer c_abi_array_stream
asout
parameters, but the pointer in not a pointer to pointer (void**
), so should not be declared as it.But when looking to a better GObject API, the C API should be:
Internally, the above statements should use
*_get_raw()
methods to get the nativeArrow
structures and fill out them with the provided data.I'm not quite sure if
c_abi_array_stream
is aGArrow.InputStream
or anGArrow.OutputStream
, so please guide me here, because there is not a genericGArrow.Stream
to use here.This issue is affecting #1152 because the current API forces to manual memory management, that is not a good idea, both in C and in Vala.
The text was updated successfully, but these errors were encountered: