Process non-nullable scala type before udf #1471

wycccccc · 2023-01-31T19:40:43Z

resolved #1286
統一處理爲string type不太好，因此換了一種做法。如果檢測到column 爲null提前設置好null就行。
順便修了一些bug。
grep 不會匹配 . 因此在腳本中會把註解也匹配到。

chia7712 · 2023-02-02T12:31:12Z

config/spark2kafka.properties

-#Spark checkpoint path
-checkpoint =
+#Spark checkpoint
+checkpoint.path =


請問為何加上.path? 如果是要統一命名的話，Metadata裡面用的變數名稱也要跟著改

主要是shell 如果按照checkpoint去搜索會把上方的註解也一併識別，因此乾脆改一個統一的名字。

chia7712 · 2023-02-02T12:31:32Z

docker/start_etl.sh

@@ -89,7 +89,7 @@ function runContainer() {

  if [[ "$master" == "spark:"* ]] || [[ "$master" == "local"* ]]; then
    docker run -d --init \
-      --name "csv-kafka-${source_name}" \
+      --name "csv-kafka${source_name}" \


這邊拿掉-是有目的的嗎？

沒有，我在查上面那個bug時不小心刪掉的，已恢復。

chia7712 · 2023-02-02T12:31:57Z

etl/src/main/scala/org/astraea/etl/DataFrameProcessor.scala

@@ -171,10 +178,6 @@ object DataFrameProcessor {

    private def schema(columns: Seq[DataColumn]): StructType =
      StructType(columns.map { col =>
-        if (col.dataType != DataType.StringType)


現在支援非string 型別了嗎？

沒錯，目前我測試下來已支援。因爲在column時是能夠處理null的，但如果放在udf中轉換回scala中的某些type就不支持null處理了。

chia7712 · 2023-02-02T12:32:54Z

etl/src/main/scala/org/astraea/etl/DataFrameProcessor.scala

+              cols.flatMap(c =>
+                List(
+                  lit(c.name),
+                  when(col(c.name).isNotNull, col(c.name)).otherwise(lit(null))


或許我們可以直接把 null 的欄位取消掉，因為當null的時候就代表沒有該值，直接過濾掉可能還可以提升一點效能

這是我能想到的將null欄位取消掉的寫法，看上去沒有很優雅，但我也找不到其他的了。有優雅的我再修改。

…ableSparkType

wycccccc added 2 commits February 1, 2023 03:31

Graceful handling of null field

1bbab10

spotless

3aec0ab

wycccccc requested a review from chia7712 January 31, 2023 19:42

wycccccc marked this pull request as ready for review January 31, 2023 19:42

chia7712 reviewed Feb 2, 2023

View reviewed changes

wycccccc added 3 commits February 20, 2023 01:31

Merge branch 'main' of https://github.com/skiptests/astraea into null…

8b9d1e5

…ableSparkType

filter null value

74516b1

remove otherwies

7dca5ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process non-nullable scala type before udf #1471

Process non-nullable scala type before udf #1471

wycccccc commented Jan 31, 2023

chia7712 Feb 2, 2023

wycccccc Feb 22, 2023 •

edited

Loading

chia7712 Feb 2, 2023

wycccccc Feb 22, 2023

chia7712 Feb 2, 2023

wycccccc Feb 22, 2023 •

edited

Loading

chia7712 Feb 2, 2023

wycccccc Feb 22, 2023

Process non-nullable scala type before udf #1471

Are you sure you want to change the base?

Process non-nullable scala type before udf #1471

Conversation

wycccccc commented Jan 31, 2023

chia7712 Feb 2, 2023

Choose a reason for hiding this comment

wycccccc Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

chia7712 Feb 2, 2023

Choose a reason for hiding this comment

wycccccc Feb 22, 2023

Choose a reason for hiding this comment

chia7712 Feb 2, 2023

Choose a reason for hiding this comment

wycccccc Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

chia7712 Feb 2, 2023

Choose a reason for hiding this comment

wycccccc Feb 22, 2023

Choose a reason for hiding this comment

wycccccc Feb 22, 2023 •

edited

Loading

wycccccc Feb 22, 2023 •

edited

Loading