Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a simple example for C# #51

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions csharp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# C\# Arrow Flight Client Application Example

This simple example C-sharp client application connects to the Dremio Arrow Flight server endpoint. Developers can use admin or regular user credentials for authentication. Any datasets in Dremio that are accessible by the provided Dremio user can be queried. By default, the hostname is `localhost` and the port is `32010`. Developers can change these default settings by providing the hostname and port as arguments when running the client.

Note: This uses Microsoft.Data.Analysis as an example library for working with the data -- this is similar to python pandas. However, the python pandas DataFrame is more mature and supports more data types.

### Prerequisites
- dotnet 7 [sdk](https://dotnet.microsoft.com/en-us/download/dotnet/7.0)
- Dremio 21 or later

NOTE: This code was tested using MacOS x64 with localhost running on Docker
- `docker run -p 9047:9047 -p 31010:31010 -p 32010:32010 dremio/dremio-oss:latest`
- For quick setup, login to your local docker instance using http://localhost:9047 in a browser to add the 'dremio/dremio123' user as ADMIN

### Build the C\# sample application
- Clone this repository.
- `git clone https://github.com/dremio-hub/arrow-flight-client-examples`
- Navigate to arrow-flight-client-examples/csharp/example.
- Build the sample application on the command line with:
- `dotnet build`

### Instructions on using this C\# sample application
- By default, the hostname is `localhost` and the port is `32010` with user `dremio` and password `dremio123`. There is also a default query on Samples datasource
- `dotnet run`
- NOTE: To use the default query you will need to first add the Samples datasource in Dremio. "Format" the zips.json file in the Dremio.
- Run the dotnet sample application with command line args:
- `dotnet run -query <QUERY> -host <DREMIO_HOSTNAME> -port <DREMIO_PORT> -user <DREMIO_USER> -pass <DREMIO_PASSWORD>`
- `dotnet run -host localhost -user dremio -pass dremio123 -port 32010 -query "SELECT job_id, status, queue_name, query from sys.jobs"`

### Usage
```
usage: dotnet run <ARGUMENTS>

Arguments:
-port
Dremio flight server port.
Defaults to 32010.
-host
Dremio coordinator hostname.
Defaults to "localhost".
-pass
Dremio password.
Defaults to "dremio123".
-query
SQL query to test.
-user
Dremio username.
Defaults to "dremio".
```
134 changes: 134 additions & 0 deletions csharp/example/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
/*
* Copyright (C) 2023 Dremio Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

using Microsoft.Data.Analysis;
using Grpc.Net.Client;
using Grpc.Core;
using Apache.Arrow.Flight.Client;
using Apache.Arrow.Flight;
using Apache.Arrow;

namespace FlightClientExample
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason why a namespace is needed here?

{
public class Program
{
public static async Task Main(string[] args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe splitting main into many smaller methods that are all called in main could lead to more modularity and better readability.

{
string host = "localhost";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these magic numbers be placed outside of the Main function like below?
const string DEFAULT_HOST = "localhost"
This would make them more easy to find and change.

Would a class also be a good way to modularise this information? There would only be one argument when passing these arguments into methods.

string port = "32010";
string user = "dremio";
string pass = "dremio123";
// For default query, add the Samples source and Format the zips.json in Dremio UI
string query = "SELECT city, loc[0] AS longitude, loc[1] AS latitude, pop, state, _id FROM Samples.\"samples.dremio.com\".\"zips.json\" LIMIT 100";
string protocol = "http";

// Parse command-line arguments to override the defaults when set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing the command line arguments should fall into a static private method.

for (int i = 0; i < args.Length-1; i++)
{
if (i % 2 == 0 && args[i].StartsWith("-"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic assumes that the user will use the command flags correctly. I think a more robust algorithm would be to:

  1. Have the method for parsing the command line arguments return a Dictionary <string, string> that has the keys being the command line flags and the values being the value of the flag.
  2. When creating the Dictionary, have the initial values be the default values of the flags
  3. Loop through the arguments list by each argument
  4. Have a null string lastFlag to keep track of the previous command flag
  5. Go through the following control structure for each argument:
// Skip if is null or white space
if (string.IsNullOrWhiteSpace(argument))
{
    continue;
}
// Add argument to lastFlag if starts with -
if (argument.StartsWith("-", StringComparison.Ordinal))
{
    lastFlag = argument;
}
// If lastFlag is not null, save the 
else if (lastFlag != null && optionsDic.ContainsKey(lastFlag))
{
    commandLineArguments.Add(lastFlag, argument);
    lastFlag = null;
}
  1. Return the dictionary

{
var key = args[i];
var value = args[i+1];
if (key.EndsWith("host")) host = value;
if (key.EndsWith("port")) port = value;
if (key.EndsWith("user")) user = value;
if (key.EndsWith("pass")) pass = value;
if (key.EndsWith("query")) query = value;
if (key.EndsWith("protocol")) protocol = value;
}
}

// Basic auth using username and password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a separate method for this would modularise and make the code more legible.

string encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(user + ":" + pass));
Console.WriteLine($"The encoded credentials: {encoded}");

// Create client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a separate method for this would modularise and make the code more legible.

var address = $"{protocol}://{host}:{port}";
Console.WriteLine($"Connecting to: {address}");

var handler = new HttpClientHandler();
// For localhost https (TLS) endpoint testing, uncomment the following to avoid a cert error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a separate option to disable certificate validation would be more user friendly - the control structure for the arguments list would have to be modified slightly to accept a flag with no value.

// handler.ServerCertificateCustomValidationCallback = HttpClientHandler.DangerousAcceptAnyServerCertificateValidator;

var httpClient = new HttpClient(handler);
httpClient.DefaultRequestHeaders.Add("Authorization", "Basic " + encoded);

// An example header for token authentication instead of Basic auth
// httpClient.DefaultRequestHeaders.Add("Authorization", "Bearer " + "X4NxSDN5...H11kUqYU/vWmzA==");

var channel = GrpcChannel.ForAddress(address, new GrpcChannelOptions
{
HttpClient = httpClient
});

FlightClient client = new FlightClient(channel);

// Pass the query text as the Command Descriptor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a separate method for this would modularise and make the code more legible.

Console.WriteLine($"Query: \n {query}");
var descriptor = FlightDescriptor.CreateCommandDescriptor(query);
var schema = await client.GetSchema(descriptor).ResponseAsync;

foreach(var schema_item in schema.Fields)
{
Console.WriteLine($"Schema Item: {schema_item.Key} - {schema_item.Value.DataType.Name}");
// The following is advance warning of an upcoming Exception due to specific data types not supported by Microsoft.Data.Analysis
// TODO: There may be a better alternative to Microsoft.Data.Analysis library
if (schema_item.Value.DataType.Name == "list" || schema_item.Value.DataType.Name == "timestamp")
{
// The fix would be to create a VDS that converts this column to a string instead (or a VDS that does not include this column)
Console.WriteLine($"ERROR: Found column of type '{schema_item.Value.DataType.Name}'. This is not supported by Microsoft.Data.Analysis DataFrame conversion");
}
}

var info = await client.GetInfo(descriptor).ResponseAsync;

Console.WriteLine("-----BEGIN-----");
// Download data using existing channel
await foreach (var batch in StreamRecordBatches(info, channel))
{
// Microsoft.Data.Analysis library behaves similar to python pandas, but limited support for DataTypes
var df = DataFrame.FromArrowRecordBatch(batch);

for (long index = 0; index < df.Rows.Count; index++)
{
DataFrameRow row = df.Rows[index];
Console.WriteLine(row);
}
}
Console.WriteLine("-----END-----");
}

public static async IAsyncEnumerable<RecordBatch> StreamRecordBatches(
FlightInfo info,
GrpcChannel channel
)
{
// Assuming one endpoint for example
var endpoint = info.Endpoints[0];
// Console.WriteLine($"endpoint.Ticket.GetHashCode: {endpoint.Ticket.GetHashCode()}");
// Console.WriteLine($"endpoint locations uri: \n {endpoint.Locations.First().Uri}");

var download_client = new FlightClient(channel);
var stream = download_client.GetStream(endpoint.Ticket);

while (await stream.ResponseStream.MoveNext())
{
yield return stream.ResponseStream.Current;
}
}
}
}

19 changes: 19 additions & 0 deletions csharp/example/dremio_arrow_test.csproj
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<Project Sdk="Microsoft.NET.Sdk">

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net7.0</TargetFramework>
<RootNamespace>dremio_arrow_test</RootNamespace>
<ImplicitUsings>enable</ImplicitUsings>
<Nullable>enable</Nullable>
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Apache.Arrow" Version="11.0.0" />
<PackageReference Include="Apache.Arrow.Flight" Version="11.0.0" />
<PackageReference Include="Grpc.Core" Version="2.46.6" />
<PackageReference Include="Grpc.Net.Client" Version="2.51.0" />
<PackageReference Include="Microsoft.Data.Analysis" Version="0.20.1" />
</ItemGroup>

</Project>