Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A PySpark API for Gaffer #2

Open
m316257 opened this issue Sep 20, 2018 · 2 comments
Open

A PySpark API for Gaffer #2

m316257 opened this issue Sep 20, 2018 · 2 comments
Labels

Comments

@m316257
Copy link
Member

m316257 commented Sep 20, 2018

Gaffer has a Spark library with Scala and Java APIs for accessing data using Spark; generating RDDs and Spark DataFrames from Gaffer graphs.

Gaffer also has a python shell with implementations of standard Gaffer operations that can be executed on the graph using Gaffer's rest service.

Extending the python API to support spark operations - producing RDDs and DataFrames - would open Gaffer up to a lot of useful python and spark data science and machine learning libraries

@m316257 m316257 self-assigned this Sep 20, 2018
JSWard referenced this issue in JSWard/gaffer-tools Mar 29, 2019
gh-595-pyspark-api
Add to python-api README
Include data auths in the python user
Change PythonSerialiserConfig to look for the nested json object "serialisers" within the python config json
JSWard referenced this issue in JSWard/gaffer-tools Mar 29, 2019
m316257 referenced this issue in gchq/gaffer-tools Apr 1, 2019
gh-595-pyspark-api
Add to python-api README
Include data auths in the python user
Change PythonSerialiserConfig to look for the nested json object "serialisers" within the python config json
GCHQ-83497 referenced this issue in gchq/gaffer-tools Apr 3, 2019
m316257 referenced this issue in gchq/gaffer-tools Apr 4, 2019
@n3101
Copy link

n3101 commented Mar 31, 2021

@m316257 @GCHQ-83497 Hello, please will you tell me the status of this issue? FYI, we are considering the alternative "fishbowl" shell as our way forward; and would be interested in whether anything you have here is complete enough / compatible to lift & reuse.

@GCHQ-83497
Copy link
Member

GCHQ-83497 commented Apr 6, 2021

@m316257 correct me if I am wrong - been a long old time since I have worked on this - @n3101 idea was to be able interact directly with gaffer across a network, so currently in this can run most if not all queries from python and get the those results back, had jaffer which was a java version of this. This was the same for adding in PySpark, so believe that runs in a sort of remote mode as well (sorry its been nearly 2 years!). Last time I worked on this had added in some features so that you could hook into Authentication and Policy type stuff - cannot for the life of me remember if that works or not. I also think there was the first draft attempt at containerising Gaffer in this as well

@t92549 t92549 transferred this issue from gchq/gaffer-tools Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants