Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK][EASY] MaxScanStrategy supports DSv2 #6315

Closed
2 of 3 tasks
pan3793 opened this issue Apr 17, 2024 · 0 comments
Closed
2 of 3 tasks

[TASK][EASY] MaxScanStrategy supports DSv2 #6315

pan3793 opened this issue Apr 17, 2024 · 0 comments

Comments

@pan3793
Copy link
Member

pan3793 commented Apr 17, 2024

What's the level of this task?

EASY

Code of Conduct

Search before creating

  • I have searched in the task list and found no similar tasks.

Mentor

  • I have sufficient expertise on this task, and I volunteer to be a mentor of this task to guide contributors through the task.

Skill requirements

Spark, DSv2

Background and Goals

Now, MaxScanStrategy can be adopted to limit max scan file size in some datasources, such as Hive. Hopefully we can enhance MaxScanStrategy to include support for the datasourcev2.

Implementation steps

#5852

Additional context

Introduction of 2024H1 Kyuubi Code Contribution Program

pan3793 pushed a commit that referenced this issue Apr 17, 2024
# 🔍 Description
## Issue References 🔗

Now, MaxScanStrategy can be adopted to limit max scan file size in some datasources, such as Hive. Hopefully we can enhance MaxScanStrategy to include support for the datasourcev2.
## Describe Your Solution 🔧

get the statistics about files scanned through datasourcev2 API

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my feature works
- [x] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [x] No license issues.
- [x] Milestone correctly set?
- [x] Test coverage is ok
- [x] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5852 from zhaohehuhu/dev-1213.

Closes #6315

3c5b0c2 [hezhao2] reformat
fb113d6 [hezhao2] disable the rule that checks the maxPartitions for dsv2
acc3587 [hezhao2] disable the rule that checks the maxPartitions for dsv2
c8399a0 [hezhao2] fix header
70c845b [hezhao2] add UTs
3a07396 [hezhao2] add ut
4d26ce1 [hezhao2] reformat
f87cb07 [hezhao2] reformat
b307022 [hezhao2] move code to Spark 3.5
73258c2 [hezhao2] fix unused import
cf893a0 [hezhao2] drop reflection for loading iceberg class
dc128bc [hezhao2] refactor code
661834c [hezhao2] revert code
6061f42 [hezhao2] delete IcebergSparkPlanHelper
5f1c3c0 [hezhao2] fix
b15652f [hezhao2] remove iceberg dependency
fe620ca [hezhao2] enable MaxScanStrategy when accessing iceberg datasource

Authored-by: hezhao2 <[email protected]>
Signed-off-by: Cheng Pan <[email protected]>
(cherry picked from commit 8edcb00)
Signed-off-by: Cheng Pan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant