Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unmatched in NDS query18 #146

Open
johnnyzhon opened this issue Apr 18, 2023 · 3 comments
Open

unmatched in NDS query18 #146

johnnyzhon opened this issue Apr 18, 2023 · 3 comments

Comments

@johnnyzhon
Copy link
Collaborator

johnnyzhon commented Apr 18, 2023

Parameters: sf3k
s3a://spark-data/nds2/parquet_sf3k_decimal

plugin:
revision=faf94e423c55f020d68da77745e2748c93eb89f3
branch=HEAD
date=2023-04-16T11:50:34Z
url=https://github.com/NVIDIA/spark-rapids.git

Detail message:
Collected 100 rows in 0.18825602531433105 seconds
Row 99:
['AAAAAAAAAABBCAAA', None, None, None, Decimal('38.500000'), Decimal('195.280000'), Decimal('2456.060000'), Decimal('120.755000'), Decimal('-987.980000'), Decimal('1939.500000'), Decimal('1.000000')]
['AAAAAAAAAABBCAAA', None, None, None, Decimal('45.190476'), Decimal('75.922381'), Decimal('520.519048'), Decimal('45.438571'), Decimal('169.410952'), Decimal('1964.857143'), Decimal('2.952381')]

=== Unmatch Queries: ['query18'] ===

Query 18 in stream:
-- start query 18 in stream 2 using template query75.tpl
WITH all_sales AS (
SELECT d_year
,i_brand_id
,i_class_id
,i_category_id
,i_manufact_id
,SUM(sales_cnt) AS sales_cnt
,SUM(sales_amt) AS sales_amt
FROM (SELECT d_year
,i_brand_id
,i_class_id
,i_category_id
,i_manufact_id
,cs_quantity - COALESCE(cr_return_quantity,0) AS sales_cnt
,cs_ext_sales_price - COALESCE(cr_return_amount,0.0) AS sales_amt
FROM catalog_sales JOIN item ON i_item_sk=cs_item_sk
JOIN date_dim ON d_date_sk=cs_sold_date_sk
LEFT JOIN catalog_returns ON (cs_order_number=cr_order_number
AND cs_item_sk=cr_item_sk)
WHERE i_category='Electronics'
UNION
SELECT d_year
,i_brand_id
,i_class_id
,i_category_id
,i_manufact_id
,ss_quantity - COALESCE(sr_return_quantity,0) AS sales_cnt
,ss_ext_sales_price - COALESCE(sr_return_amt,0.0) AS sales_amt
FROM store_sales JOIN item ON i_item_sk=ss_item_sk
JOIN date_dim ON d_date_sk=ss_sold_date_sk
LEFT JOIN store_returns ON (ss_ticket_number=sr_ticket_number
AND ss_item_sk=sr_item_sk)
WHERE i_category='Electronics'
UNION
SELECT d_year
,i_brand_id
,i_class_id
,i_category_id
,i_manufact_id
,ws_quantity - COALESCE(wr_return_quantity,0) AS sales_cnt
,ws_ext_sales_price - COALESCE(wr_return_amt,0.0) AS sales_amt
FROM web_sales JOIN item ON i_item_sk=ws_item_sk
JOIN date_dim ON d_date_sk=ws_sold_date_sk
LEFT JOIN web_returns ON (ws_order_number=wr_order_number
AND ws_item_sk=wr_item_sk)
WHERE i_category='Electronics') sales_detail
GROUP BY d_year, i_brand_id, i_class_id, i_category_id, i_manufact_id)
SELECT prev_yr.d_year AS prev_year
,curr_yr.d_year AS year
,curr_yr.i_brand_id
,curr_yr.i_class_id
,curr_yr.i_category_id
,curr_yr.i_manufact_id
,prev_yr.sales_cnt AS prev_yr_cnt
,curr_yr.sales_cnt AS curr_yr_cnt
,curr_yr.sales_cnt-prev_yr.sales_cnt AS sales_cnt_diff
,curr_yr.sales_amt-prev_yr.sales_amt AS sales_amt_diff
FROM all_sales curr_yr, all_sales prev_yr
WHERE curr_yr.i_brand_id=prev_yr.i_brand_id
AND curr_yr.i_class_id=prev_yr.i_class_id
AND curr_yr.i_category_id=prev_yr.i_category_id
AND curr_yr.i_manufact_id=prev_yr.i_manufact_id
AND curr_yr.d_year=2002
AND prev_yr.d_year=2002-1
AND CAST(curr_yr.sales_cnt AS DECIMAL(17,2))/CAST(prev_yr.sales_cnt AS DECIMAL(17,2))<0.9
ORDER BY sales_cnt_diff,sales_amt_diff
LIMIT 100;

-- end query 18 in stream 2 using template query75.tpl

@gerashegalov
Copy link
Collaborator

@johnnyzhon Can you add a repro command and some high level description. Was it a single query run?

@johnnyzhon
Copy link
Collaborator Author

Got this failed when run NDS2 power run testing on NGC. Paste command and configuration in following.

  • /spark-3.2.1-bin-hadoop3.2/bin/spark-submit --master spark://127.0.0.1:7077 --driver-memory 20G --conf spark.memory.storageFraction=0.2 --conf spark.sql.broadcastTimeout=7200 --conf spark.shuffle.file.buffer=96k --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrator=com.nvidia.spark.rapids.GpuKryoRegistrator --conf spark.kryoserializer.buffer.max=512M --conf spark.network.timeout=200s --conf spark.sql.shuffle.partitions=200 --conf spark.driver.maxResultSize=2GB --conf spark.rapids.memory.host.spillStorageSize=32G --conf spark.rapids.sql.concurrentGpuTasks=4 --conf spark.sql.files.maxPartitionBytes=2g --conf spark.rapids.memory.pinnedPool.size=8g --conf spark.rapids.sql.enabled=true --conf spark.plugins=com.nvidia.spark.SQLPlugin --conf spark.rapids.sql.castFloatToString.enabled=true --conf spark.rapids.sql.castStringToFloat.enabled=true --conf spark.rapids.sql.castStringToInteger.enabled=true --conf spark.rapids.sql.incompatibleOps.enabled=true --conf spark.rapids.sql.variableFloatAgg.enabled=true --conf spark.rapids.sql.csv.read.date.enabled=true --conf spark.rapids.sql.csv.read.integer.enabled=true --conf spark.rapids.sql.csv.read.decimal.enabled=true --conf spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED --conf spark.rapids.sql.castFloatToDecimal.enabled=true --conf spark.rapids.shuffle.multiThreaded.writer.threads=32 --conf spark.rapids.shuffle.multiThreaded.reader.threads=32 --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 --conf spark.hadoop.fs.s3a.threads.max=25 --conf spark.hadoop.fs.s3a.connection.maximum=2000 --conf spark.hadoop.fs.s3a.experimental.input.fadvise=random --conf spark.hadoop.fs.s3a.max.total.tasks=2048 --conf spark.hadoop.fs.s3a.socket.recv.buffer=65536 --conf spark.hadoop.fs.s3a.socket.send.buffer=65536 --conf spark.sql.warehouse.dir=/raid/tmp/spark-warehouse --jars /tmp/ds/deploy/plugin/rapids-4-spark_2.12-23.04.0-SNAPSHOT-cuda11.jar,/tmp/ds/deploy/temp/spark-rapids-benchmarks/nds/jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar /tmp/ds/deploy/temp/spark-rapids-benchmarks/nds/nds_power.py s3a://spark-data/nds2/parquet_sf3k_decimal /tmp/ds/deploy/temp/query_2.sql /tmp/ds/logs/3.2.1-cuda11.5.1-ubuntu20.04/nds_metric-gpu-power-run-1681785188 --property_file '' --input_format parquet --output_prefix /raid/tmp/power_output_gpu --output_format parquet --json_summary_folder /tmp/ds/deploy/temp/spark-rapids-benchmarks/nds/tpcds-gen/json_summary
    23/04/18 02:33:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    23/04/18 02:33:12 INFO SparkContext: Running Spark version 3.2.1
    23/04/18 02:33:12 WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
    23/04/18 02:33:12 INFO DriverLogger: Added a local log appender at: /raid/tmp/spark-dfd136cf-113b-4eab-b1f0-ae9b90aa9daa/driver_logs/driver.log
    23/04/18 02:33:12 INFO ResourceUtils: ==============================================================
    23/04/18 02:33:12 INFO ResourceUtils: No custom resources configured for spark.driver.
    23/04/18 02:33:12 INFO ResourceUtils: ==============================================================
    23/04/18 02:33:12 INFO SparkContext: Submitted application: NDS - Power Run
    23/04/18 02:33:12 INFO SparkContext: Spark configuration:
    spark.app.name=NDS - Power Run
    spark.app.startTime=1681785192027
    spark.cores.max=64
    spark.driver.extraJavaOptions=-javaagent:/root/.m2/repository/org/jacoco/org.jacoco.agent/0.8.7/org.jacoco.agent-0.8.7-runtime.jar=destfile=/tmp/ds/target/jacoco_driver.exec,append=true,includes=ai.rapids.cudf.:com.nvidia.spark.:org.apache.spark.sql.rapids.,classdumpdir=/tmp/ds/target/classes_loaded -Duser.timezone=UTC
    spark.driver.log.dfsDir=/raid/tmp/spark-events
    spark.driver.log.persistToDfs.enabled=true
    spark.driver.maxResultSize=2GB
    spark.driver.memory=20G
    spark.driver.supervise=true
    spark.eventLog.dir=/raid/tmp/spark-events
    spark.eventLog.enabled=true
    spark.executor.cores=8
    spark.executor.extraJavaOptions=-javaagent:/root/.m2/repository/org/jacoco/org.jacoco.agent/0.8.7/org.jacoco.agent-0.8.7-runtime.jar=destfile=/tmp/ds/target/jacoco_executor.exec,append=true,includes=ai.rapids.cudf.
    :com.nvidia.spark.:org.apache.spark.sql.rapids.,classdumpdir=/tmp/ds/target/classes_loaded -Duser.timezone=UTC -Dai.rapids.cudf.prefer-pinned=true
    spark.executor.memory=16g
    spark.executor.memoryOverhead=1G
    spark.executor.resource.gpu.amount=1
    spark.executor.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh
    spark.hadoop.fs.s3a.access.key=*********(redacted)
    spark.hadoop.fs.s3a.aws.credentials.provider=
    spark.hadoop.fs.s3a.connection.maximum=2000
    spark.hadoop.fs.s3a.endpoint=pbss.s8k.io
    spark.hadoop.fs.s3a.experimental.input.fadvise=random
    spark.hadoop.fs.s3a.max.total.tasks=2048
    spark.hadoop.fs.s3a.path.style.access=true
    spark.hadoop.fs.s3a.socket.recv.buffer=65536
    spark.hadoop.fs.s3a.socket.send.buffer=65536
    spark.hadoop.fs.s3a.threads.max=25
    spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2
    spark.history.fs.logDirectory=/raid/tmp/spark-events
    spark.jars=file:///tmp/ds/deploy/plugin/rapids-4-spark_2.12-23.04.0-SNAPSHOT-cuda11.jar,file:///tmp/ds/deploy/temp/spark-rapids-benchmarks/nds/jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar
    spark.kryo.registrator=com.nvidia.spark.rapids.GpuKryoRegistrator
    spark.kryoserializer.buffer.max=512M
    spark.local.dir=/raid/tmp
    spark.logConf=true
    spark.master=spark://127.0.0.1:7077
    spark.memory.storageFraction=0.2
    spark.network.timeout=200s
    spark.plugins=com.nvidia.spark.SQLPlugin
    spark.pyspark.python=/usr/bin/python3
    spark.rapids.memory.host.spillStorageSize=32G
    spark.rapids.memory.pinnedPool.size=8g
    spark.rapids.shuffle.multiThreaded.reader.threads=32
    spark.rapids.shuffle.multiThreaded.writer.threads=32
    spark.rapids.sql.castFloatToDecimal.enabled=true
    spark.rapids.sql.castFloatToString.enabled=true
    spark.rapids.sql.castStringToFloat.enabled=true
    spark.rapids.sql.castStringToInteger.enabled=true
    spark.rapids.sql.concurrentGpuTasks=4
    spark.rapids.sql.csv.read.date.enabled=true
    spark.rapids.sql.csv.read.decimal.enabled=true
    spark.rapids.sql.csv.read.integer.enabled=true
    spark.rapids.sql.enabled=true
    spark.rapids.sql.explain=ALL
    spark.rapids.sql.incompatibleOps.enabled=true
    spark.rapids.sql.variableFloatAgg.enabled=true
    spark.rdd.compress=True
    spark.repl.local.jars=file:///tmp/ds/deploy/plugin/rapids-4-spark_2.12-23.04.0-SNAPSHOT-cuda11.jar,file:///tmp/ds/deploy/temp/spark-rapids-benchmarks/nds/jvm_listener/target/nds-benchmark-listener-1.0-SNAPSHOT.jar
    spark.serializer=org.apache.spark.serializer.KryoSerializer
    spark.serializer.objectStreamReset=100
    spark.shuffle.file.buffer=96k
    spark.sql.broadcastTimeout=7200
    spark.sql.cache.serializer=com.nvidia.spark.ParquetCachedBatchSerializer
    spark.sql.files.maxPartitionBytes=2g
    spark.sql.legacy.parquet.datetimeRebaseModeInWrite=CORRECTED
    spark.sql.shuffle.partitions=200
    spark.sql.warehouse.dir=/raid/tmp/spark-warehouse
    spark.submit.deployMode=client
    spark.submit.pyFiles=
    spark.task.cpus=1
    spark.task.resource.gpu.amount=0.125

@johnnyzhon
Copy link
Collaborator Author

NGC instance : dgx1v.16g.8.norm jdk8| cuda11.5.1| ubuntu20.04

I am not sure this issue could be reproduced each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants