-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
a simple example for C# #51
base: main
Are you sure you want to change the base?
Changes from 2 commits
81bafe8
95810d1
4bad324
0ed514c
88f6cbc
5895fc0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# C\# Arrow Flight Client Application Example | ||
|
||
This simple example C-sharp client application connects to the Dremio Arrow Flight server endpoint. Developers can use admin or regular user credentials for authentication. Any datasets in Dremio that are accessible by the provided Dremio user can be queried. By default, the hostname is `localhost` and the port is `32010`. Developers can change these default settings by providing the hostname and port as arguments when running the client. | ||
|
||
Note: This uses Microsoft.Data.Analysis as an example library for working with the data -- this is similar to python pandas. However, the python pandas DataFrame is more mature and supports more data types. | ||
|
||
### Prerequisites | ||
- dotnet 7 [sdk](https://dotnet.microsoft.com/en-us/download/dotnet/7.0) | ||
- Dremio 21 or later | ||
|
||
NOTE: This code was tested using MacOS x64 with localhost running on Docker | ||
- `docker run -p 9047:9047 -p 31010:31010 -p 32010:32010 dremio/dremio-oss:latest` | ||
- For quick setup, login to your local docker instance using http://localhost:9047 in a browser to add the 'dremio/dremio123' user as ADMIN | ||
|
||
### Build the C\# sample application | ||
- Clone this repository. | ||
- `git clone https://github.com/dremio-hub/arrow-flight-client-examples` | ||
- Navigate to arrow-flight-client-examples/csharp/example. | ||
- Build the sample application on the command line with: | ||
- `dotnet build` | ||
|
||
### Instructions on using this C\# sample application | ||
- By default, the hostname is `localhost` and the port is `32010` with user `dremio` and password `dremio123`. There is also a default query on Samples datasource | ||
- `dotnet run` | ||
- NOTE: To use the default query you will need to first add the Samples datasource in Dremio. "Format" the zips.json file in the Dremio. | ||
- Run the dotnet sample application with command line args: | ||
- `dotnet run -query <QUERY> -host <DREMIO_HOSTNAME> -port <DREMIO_PORT> -user <DREMIO_USER> -pass <DREMIO_PASSWORD>` | ||
- `dotnet run -host localhost -user dremio -pass dremio123 -port 32010 -query "SELECT job_id, status, queue_name, query from sys.jobs"` | ||
|
||
### Usage | ||
``` | ||
usage: dotnet run <ARGUMENTS> | ||
|
||
Arguments: | ||
-port | ||
Dremio flight server port. | ||
Defaults to 32010. | ||
-host | ||
Dremio coordinator hostname. | ||
Defaults to "localhost". | ||
-pass | ||
Dremio password. | ||
Defaults to "dremio123". | ||
-query | ||
SQL query to test. | ||
-user | ||
Dremio username. | ||
Defaults to "dremio". | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
/* | ||
* Copyright (C) 2023 Dremio Corporation | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
using Microsoft.Data.Analysis; | ||
using Grpc.Net.Client; | ||
using Grpc.Core; | ||
using Apache.Arrow.Flight.Client; | ||
using Apache.Arrow.Flight; | ||
using Apache.Arrow; | ||
|
||
namespace FlightClientExample | ||
{ | ||
public class Program | ||
{ | ||
public static async Task Main(string[] args) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe splitting main into many smaller methods that are all called in main could lead to more modularity and better readability. |
||
{ | ||
string host = "localhost"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could these magic numbers be placed outside of the Main function like below? Would a class also be a good way to modularise this information? There would only be one argument when passing these arguments into methods. |
||
string port = "32010"; | ||
string user = "dremio"; | ||
string pass = "dremio123"; | ||
// For default query, add the Samples source and Format the zips.json in Dremio UI | ||
string query = "SELECT city, loc[0] AS longitude, loc[1] AS latitude, pop, state, _id FROM Samples.\"samples.dremio.com\".\"zips.json\" LIMIT 100"; | ||
string protocol = "http"; | ||
|
||
// Parse command-line arguments to override the defaults when set | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Parsing the command line arguments should fall into a static private method. |
||
for (int i = 0; i < args.Length-1; i++) | ||
{ | ||
if (i % 2 == 0 && args[i].StartsWith("-")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic assumes that the user will use the command flags correctly. I think a more robust algorithm would be to:
|
||
{ | ||
var key = args[i]; | ||
var value = args[i+1]; | ||
if (key.EndsWith("host")) host = value; | ||
if (key.EndsWith("port")) port = value; | ||
if (key.EndsWith("user")) user = value; | ||
if (key.EndsWith("pass")) pass = value; | ||
if (key.EndsWith("query")) query = value; | ||
if (key.EndsWith("protocol")) protocol = value; | ||
} | ||
} | ||
|
||
// Basic auth using username and password | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating a separate method for this would modularise and make the code more legible. |
||
string encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(user + ":" + pass)); | ||
Console.WriteLine($"The encoded credentials: {encoded}"); | ||
|
||
// Create client | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating a separate method for this would modularise and make the code more legible. |
||
var address = $"{protocol}://{host}:{port}"; | ||
Console.WriteLine($"Connecting to: {address}"); | ||
|
||
var handler = new HttpClientHandler(); | ||
// For localhost https (TLS) endpoint testing, uncomment the following to avoid a cert error | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating a separate option to disable certificate validation would be more user friendly - the control structure for the arguments list would have to be modified slightly to accept a flag with no value. |
||
// handler.ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator; | ||
|
||
var httpClient = new HttpClient(handler); | ||
httpClient.DefaultRequestHeaders.Add("Authorization", "Basic " + encoded); | ||
|
||
// An example header for token authentication instead of Basic auth | ||
// httpClient.DefaultRequestHeaders.Add("Authorization", "Bearer " + "X4NxSDN5...H11kUqYU/vWmzA=="); | ||
|
||
var channel = GrpcChannel.ForAddress(address, new GrpcChannelOptions | ||
{ | ||
HttpClient = httpClient | ||
}); | ||
|
||
FlightClient client = new FlightClient(channel); | ||
|
||
// Pass the query text as the Command Descriptor | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating a separate method for this would modularise and make the code more legible. |
||
Console.WriteLine($"Query: \n {query}"); | ||
var descriptor = FlightDescriptor.CreateCommandDescriptor(query); | ||
var schema = await client.GetSchema(descriptor).ResponseAsync; | ||
|
||
foreach(var schema_item in schema.Fields) | ||
{ | ||
Console.WriteLine($"Schema Item: {schema_item.Key} - {schema_item.Value.DataType.Name}"); | ||
// The following is advance warning of an upcoming Exception due to specific data types not supported by Microsoft.Data.Analysis | ||
// TODO: There may be a better alternative to Microsoft.Data.Analysis library | ||
if (schema_item.Value.DataType.Name == "list" || schema_item.Value.DataType.Name == "timestamp") | ||
{ | ||
// The fix would be to create a VDS that converts this column to a string instead (or a VDS that does not include this column) | ||
Console.WriteLine($"ERROR: Found column of type '{schema_item.Value.DataType.Name}'. This is not supported by Microsoft.Data.Analysis DataFrame conversion"); | ||
} | ||
} | ||
|
||
var info = await client.GetInfo(descriptor).ResponseAsync; | ||
|
||
Console.WriteLine("-----BEGIN-----"); | ||
// Download data using existing channel | ||
await foreach (var batch in StreamRecordBatches(info, channel)) | ||
{ | ||
// Microsoft.Data.Analysis library behaves similar to python pandas, but limited support for DataTypes | ||
var df = DataFrame.FromArrowRecordBatch(batch); | ||
|
||
for (long index = 0; index < df.Rows.Count; index++) | ||
{ | ||
DataFrameRow row = df.Rows[index]; | ||
Console.WriteLine(row); | ||
} | ||
} | ||
Console.WriteLine("-----END-----"); | ||
} | ||
|
||
public static async IAsyncEnumerable<RecordBatch> StreamRecordBatches( | ||
FlightInfo info, | ||
GrpcChannel channel | ||
) | ||
{ | ||
// Assuming one endpoint for example | ||
var endpoint = info.Endpoints[0]; | ||
// Console.WriteLine($"endpoint.Ticket.GetHashCode: {endpoint.Ticket.GetHashCode()}"); | ||
// Console.WriteLine($"endpoint locations uri: \n {endpoint.Locations.First().Uri}"); | ||
|
||
var download_client = new FlightClient(channel); | ||
var stream = download_client.GetStream(endpoint.Ticket); | ||
|
||
while (await stream.ResponseStream.MoveNext()) | ||
{ | ||
yield return stream.ResponseStream.Current; | ||
} | ||
} | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
|
||
<PropertyGroup> | ||
<OutputType>Exe</OutputType> | ||
<TargetFramework>net7.0</TargetFramework> | ||
<RootNamespace>dremio_arrow_test</RootNamespace> | ||
<ImplicitUsings>enable</ImplicitUsings> | ||
<Nullable>enable</Nullable> | ||
</PropertyGroup> | ||
|
||
<ItemGroup> | ||
<PackageReference Include="Apache.Arrow" Version="11.0.0" /> | ||
<PackageReference Include="Apache.Arrow.Flight" Version="11.0.0" /> | ||
<PackageReference Include="Grpc.Core" Version="2.46.6" /> | ||
<PackageReference Include="Grpc.Net.Client" Version="2.51.0" /> | ||
<PackageReference Include="Microsoft.Data.Analysis" Version="0.20.1" /> | ||
</ItemGroup> | ||
|
||
</Project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is there a reason why a namespace is needed here?