FITS DatasourceV2 #89

mayurdb · 2019-12-24T18:35:43Z

With Spark moving towards the v2 for both reads as well write, eventually Fits Datasource also needs to move to v2.
This PR adds support to read Fits data in Spark using v2 APIs by specifying the format as, fitsv2.
v2 takes a somewhat different approach compared to v1 on how the data is partitioned across multiple tasks. The new way has simplified the code a bit

There are still few ToDos -
~~1. Add unit tests~~
2. Add examples
3. Check the working on a larger dataset
~~4. Test corner cases (covered in the UTs I guess)~~
4. Currently, the user will have to give format as fitsv2. We will have to check based on some conf if we can toggle between v1 and v2, as that would be more ideal.
5. Some of the v1 code is replicated in utils for use in v2, refactor the v1 code to use the code in utils

…en the Catalyst type is not same as the scala type, String type for example is converted internally to UTF8String; Lets seee

mayurdb · 2020-01-13T08:41:21Z

Scala 2.11 the build will fail. This is something we should consider, these changes are compiled against Spark 3.0 which requires scala 2.12. I don't think we can compile this against Spark 2.4.3 because as far as I know the V2 APIs are different between those two versions. I'll give it a try, if they have ported the changes back to 2.4.3 it might work

JulienPeloton · 2020-01-13T12:02:31Z

Good point - we are at this moment of transition where we need to take a decision on what to support. The future being spark v3 with the associated DSv2, we should primarily focus on this combination for this PR. We can always let a branch open to deal with spark 2.4 support. If there is a chance of backporting, it should be secondary (although important!).

mayurdb · 2020-01-16T12:26:45Z

@JulienPeloton Yup checked it. As expected does not compile with Spark 2.4.3

JulienPeloton · 2020-01-16T13:56:50Z

Thanks - good to know for the future. I will make the review of the current code asap.

JulienPeloton · 2020-02-26T07:32:59Z

Hi @mayurdb - I haven't forgotten this important one, I am just overwhelmed these times...

mayurdb added 16 commits December 1, 2019 22:21

Initial commit

d088edc

Added the V2 skeleton. Still a long way to go!!!

ab61755

Baby steps towards Fits V2

9477838

Added per job level setup; ToDo: Implement per record level code

1033083

Have deviated quite a bit from V1 way of reading; Testing time now

d407a53

Resolved compilation failures

8337df5

Lets not hard code

fc20760

Moved the step closer to first V2 read; There seems to be an issue wh…

4c7bf14

…en the Catalyst type is not same as the scala type, String type for example is converted internally to UTF8String; Lets seee

Was able to hack this to work, got the first output (o_o)

38aeb28

Changed v2 source name, Code clean-up

2fd825e

Added license headers

6b7002e

Further cleanup

ad2efe5

Code cleanup

db7a93a

Added UTs and fixes along the way

deadd76

Code cleanup

10ab444

Dummy commit

6bfbd70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FITS DatasourceV2 #89

FITS DatasourceV2 #89

mayurdb commented Dec 24, 2019 •

edited

Loading

mayurdb commented Jan 13, 2020

JulienPeloton commented Jan 13, 2020

mayurdb commented Jan 16, 2020

JulienPeloton commented Jan 16, 2020

JulienPeloton commented Feb 26, 2020

FITS DatasourceV2 #89

Are you sure you want to change the base?

FITS DatasourceV2 #89

Conversation

mayurdb commented Dec 24, 2019 • edited Loading

mayurdb commented Jan 13, 2020

JulienPeloton commented Jan 13, 2020

mayurdb commented Jan 16, 2020

JulienPeloton commented Jan 16, 2020

JulienPeloton commented Feb 26, 2020

mayurdb commented Dec 24, 2019 •

edited

Loading