Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process non-nullable scala type before udf #1471

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wycccccc
Copy link
Collaborator

resolved #1286
統一處理爲string type不太好,因此換了一種做法。如果檢測到column 爲null提前設置好null就行。
順便修了一些bug。
grep 不會匹配 . 因此在腳本中會把註解也匹配到。

@wycccccc wycccccc requested a review from chia7712 January 31, 2023 19:42
@wycccccc wycccccc marked this pull request as ready for review January 31, 2023 19:42
#Spark checkpoint path
checkpoint =
#Spark checkpoint
checkpoint.path =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

請問為何加上.path? 如果是要統一命名的話,Metadata裡面用的變數名稱也要跟著改

Copy link
Collaborator Author

@wycccccc wycccccc Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

主要是shell 如果按照checkpoint去搜索會把上方的註解也一併識別,因此乾脆改一個統一的名字。

@@ -89,7 +89,7 @@ function runContainer() {

if [[ "$master" == "spark:"* ]] || [[ "$master" == "local"* ]]; then
docker run -d --init \
--name "csv-kafka-${source_name}" \
--name "csv-kafka${source_name}" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這邊拿掉-是有目的的嗎?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

沒有,我在查上面那個bug時不小心刪掉的,已恢復。

@@ -171,10 +178,6 @@ object DataFrameProcessor {

private def schema(columns: Seq[DataColumn]): StructType =
StructType(columns.map { col =>
if (col.dataType != DataType.StringType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

現在支援非string 型別了嗎?

Copy link
Collaborator Author

@wycccccc wycccccc Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

沒錯,目前我測試下來已支援。因爲在column時是能夠處理null的,但如果放在udf中轉換回scala中的某些type就不支持null處理了。

cols.flatMap(c =>
List(
lit(c.name),
when(col(c.name).isNotNull, col(c.name)).otherwise(lit(null))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

或許我們可以直接把 null 的欄位取消掉,因為當null的時候就代表沒有該值,直接過濾掉可能還可以提升一點效能

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

這是我能想到的將null欄位取消掉的寫法,看上去沒有很優雅,但我也找不到其他的了。有優雅的我再修改。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

java.lang.NullPointerException: Null value appeared in non-nullable field
2 participants