Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FITS DatasourceV2 #89

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Conversation

mayurdb
Copy link

@mayurdb mayurdb commented Dec 24, 2019

  • With Spark moving towards the v2 for both reads as well write, eventually Fits Datasource also needs to move to v2.
  • This PR adds support to read Fits data in Spark using v2 APIs by specifying the format as, fitsv2.
  • v2 takes a somewhat different approach compared to v1 on how the data is partitioned across multiple tasks. The new way has simplified the code a bit

There are still few ToDos -
1. Add unit tests
2. Add examples
3. Check the working on a larger dataset
4. Test corner cases (covered in the UTs I guess)
4. Currently, the user will have to give format as fitsv2. We will have to check based on some conf if we can toggle between v1 and v2, as that would be more ideal.
5. Some of the v1 code is replicated in utils for use in v2, refactor the v1 code to use the code in utils

@mayurdb
Copy link
Author

mayurdb commented Jan 13, 2020

Scala 2.11 the build will fail. This is something we should consider, these changes are compiled against Spark 3.0 which requires scala 2.12. I don't think we can compile this against Spark 2.4.3 because as far as I know the V2 APIs are different between those two versions. I'll give it a try, if they have ported the changes back to 2.4.3 it might work

@JulienPeloton
Copy link
Member

Good point - we are at this moment of transition where we need to take a decision on what to support. The future being spark v3 with the associated DSv2, we should primarily focus on this combination for this PR. We can always let a branch open to deal with spark 2.4 support. If there is a chance of backporting, it should be secondary (although important!).

@mayurdb
Copy link
Author

mayurdb commented Jan 16, 2020

@JulienPeloton Yup checked it. As expected does not compile with Spark 2.4.3

@JulienPeloton
Copy link
Member

Thanks - good to know for the future. I will make the review of the current code asap.

@JulienPeloton
Copy link
Member

Hi @mayurdb - I haven't forgotten this important one, I am just overwhelmed these times...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants