-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process non-nullable scala type before udf #1471
base: main
Are you sure you want to change the base?
Conversation
#Spark checkpoint path | ||
checkpoint = | ||
#Spark checkpoint | ||
checkpoint.path = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
請問為何加上.path
? 如果是要統一命名的話,Metadata
裡面用的變數名稱也要跟著改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
主要是shell 如果按照checkpoint去搜索會把上方的註解也一併識別,因此乾脆改一個統一的名字。
docker/start_etl.sh
Outdated
@@ -89,7 +89,7 @@ function runContainer() { | |||
|
|||
if [[ "$master" == "spark:"* ]] || [[ "$master" == "local"* ]]; then | |||
docker run -d --init \ | |||
--name "csv-kafka-${source_name}" \ | |||
--name "csv-kafka${source_name}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這邊拿掉-
是有目的的嗎?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
沒有,我在查上面那個bug時不小心刪掉的,已恢復。
@@ -171,10 +178,6 @@ object DataFrameProcessor { | |||
|
|||
private def schema(columns: Seq[DataColumn]): StructType = | |||
StructType(columns.map { col => | |||
if (col.dataType != DataType.StringType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
現在支援非string 型別了嗎?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
沒錯,目前我測試下來已支援。因爲在column時是能夠處理null的,但如果放在udf中轉換回scala中的某些type就不支持null處理了。
cols.flatMap(c => | ||
List( | ||
lit(c.name), | ||
when(col(c.name).isNotNull, col(c.name)).otherwise(lit(null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
或許我們可以直接把 null 的欄位取消掉,因為當null的時候就代表沒有該值,直接過濾掉可能還可以提升一點效能
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這是我能想到的將null欄位取消掉的寫法,看上去沒有很優雅,但我也找不到其他的了。有優雅的我再修改。
resolved #1286
統一處理爲string type不太好,因此換了一種做法。如果檢測到column 爲null提前設置好null就行。
順便修了一些bug。
grep 不會匹配 . 因此在腳本中會把註解也匹配到。