Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Result#columns to store field types #124

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ipc103
Copy link

@ipc103 ipc103 commented Oct 3, 2023

Closes #99

This PR adds a new Result#columns method which returns an array of Result::Column objects to represent column information such as name, length, and type. At some point this could replaced the fields method, but as of now was added as a non-breaking change.

We added a new Columns class as well. When instantiating a new Result in the C extension, we assign a new instance of our Columns class which carries a pointer to a struct. After populating the properties of the result, we cache the column_info in a struct on the Columns Value and add the information in (including casting some integers into symbol values). We can then lazily-load the data out of the struct and populate the ruby array when calling #columns

Full disclosure: I've never written a C extension (or any C for that matter) before. Any and all feedback welcome - I added a few inline comments in the PR.

CC @composerinteralia @ywenc

#undef XX
};

static char *downcase(const char *str)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to convert the charset/type name into a lowercase symbol. I'm sure there is a better way to do this.

Trilogy_Result_Column, rb_intern("new"), 6, trilogy_result_columns_ctx->column_info[i].name,
rb_id2sym(rb_intern(downcase(trilogy_type_names[trilogy_result_columns_ctx->column_info[i].type]))),
rb_int_new(trilogy_result_columns_ctx->column_info[i].len), rb_int_new(trilogy_result_columns_ctx->column_info[i].flags),
rb_id2sym(rb_intern(downcase(trilogy_charset_names[trilogy_result_columns_ctx->column_info[i].charset]))),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping rb_id2sym(rb_intern(...)) seemed to be a way to get the actual string-like symbol value i.e. :long. There is probably a better way to do this but I couldn't find one 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this leak memory? Afaik rb_intern doesn't take ownership of the allocated memory but ends up copying it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - given the fact that this already felt sub-optimal, I wonder if it makes more sense to just set an int here and then cast to the symbol on the Ruby class.



class Columns
include Enumerable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this to be Enumerable? It doesn't look like we return a Columns object anywhere (it's an intermediate so we have a place to hold the raw column data)—rather we call all on it and return an Array.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call - you are correct that we don't need this. Removed and updated the previous commit

(As an aside, not sure what the preference is for commit history here. Happy to squash these three down to a single commit if we prefer)

ipc103 and others added 3 commits October 23, 2023 11:10
Towards trilogy-libraries#99, this exposes additional field data on the query result
via a `Result#columns` method. We were already using this data in the
C extension to determine how to cast the column values. We now add a pointer
to the `column_info` struct onto the result and then lazily add the values
to the `Result` instance when called.

This method could eventually replace the `#fields` method which at the moment only
includes the column name.

Co-authored-by: Daniel Colson <[email protected]>
Co-authored-by: Charlotte Wen <[email protected]>
We want to set some of these values as symbols instead of ints
on the Column to make it easier to work with. In this commit, we
created loopups for int to string using macros, then changed the
type to a symbol via `rb_intern` + `rb_id2sym`. It's possible there's
a single function to do this, but we couldn't find one.

I also added a downcase function to make sure we get the values in
lower instead of uppercase. I'm sure there is a better way to do this.
I have no idea how to write C.

Co-authored-by: Daniel Colson <[email protected]>
Previously, we had added a pointer to our column_info struct onto
our `Result` for lazy loading the data. This resulted in changing the
type of `Result` to `T_DATA` which could have negative performance
implications. Instead, we can create a separate `Columns` class to
store the pointer to our struct. A reference to `columns` now gets
eagerly loaded onto the result, but we only populate the actual information
for the columns when calling `Result#columns`. This seems like a good
trade off for lazy loading (vs. putting the pointer directly on `Result`).

Having said that, it's probably worth asking if the lazy loading is worth this
extra complexity at all, or if we shoudl simply populate `@columns` with an
already loaded array for the `Result` immediately.
Comment on lines +37 to +40
struct trilogy_result_columns_ctx {
struct column_info *column_info;
uint64_t column_count;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct trilogy_result_columns_ctx {
struct column_info *column_info;
uint64_t column_count;
};
struct trilogy_result_columns_ctx {
uint64_t column_count;
struct column_info column_info[];
};

You could use a flexible array member to allocate the whole list at once and avoid pointer chasing and malloc churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No access to field types for a query result
4 participants