Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow fetching string rows #458

Open
ganshinm opened this issue Oct 19, 2024 · 7 comments
Open

Very slow fetching string rows #458

ganshinm opened this issue Oct 19, 2024 · 7 comments
Labels

Comments

@ganshinm
Copy link

ClickHouse version 24.9.2.42

create table test_int (
  Id UInt64 DEFAULT generateSnowflakeID(), Field1 UInt64, Field2 UInt64, Field3 UInt64, Field4 UInt64, Field5 UInt64, Field6 UInt64, Field7 UInt64, Field8 UInt64, Field9 UInt64, Field10 UInt64, Field11 UInt64, Field12 UInt64, Field13 UInt64, Field14 UInt64, Field15 UInt64, Field16 UInt64, Field17 UInt64, Field18 UInt64, Field19 UInt64, Field20 UInt64, Field31 UInt64, Field32 UInt64, Field33 UInt64, Field34 UInt64, Field35 UInt64, Field36 UInt64, Field37 UInt64, Field38 UInt64, Field39 UInt64, Field40 UInt64)
  ENGINE = MergeTree() order by Id;

create table test_str (
  Id UInt64 DEFAULT generateSnowflakeID(), Field1 UInt64, Field2 String, Field3 String, Field4 String, Field5 String, Field6 String, Field7 String, Field8 String, Field9 String, Field10 String, Field11 String, Field12 String, Field13 String, Field14 String, Field15 String, Field16 String, Field17 String, Field18 String, Field19 String, Field20 String, Field31 String, Field32 String, Field33 String, Field34 String, Field35 String, Field36 String, Field37 String, Field38 String, Field39 String, Field40 String)
  ENGINE = MergeTree() order by Id;


insert into test_int (Field1, Field2, Field3, Field4, Field5, Field6, Field7, Field8, Field9, Field10, Field11, Field12, Field13, Field14, Field15, Field16, Field17, Field18, Field19, Field20, Field31, Field32, Field33, Field34, Field35, Field36, Field37, Field38, Field39, Field40) 
select generate_series Field1, rand() Field2, rand() Field3 , rand() Field4 , rand() Field5 , rand() Field6 , rand() Field7 , rand() Field8 , rand() Field9 , rand() Field10, rand() Field11, rand() Field12, rand() Field13, rand() Field14, rand() Field15, rand() Field16, rand() Field17, rand() Field18, rand() Field19, rand() Field20, rand() Field31, rand() Field32, rand() Field33, rand() Field34, rand() Field35, rand() Field36, rand() Field37, rand() Field38, rand() Field39, rand() Field40 
from generate_series(1,50000);

insert into test_str (Field1, Field2, Field3, Field4, Field5, Field6, Field7, Field8, Field9, Field10, Field11, Field12, Field13, Field14, Field15, Field16, Field17, Field18, Field19, Field20, Field31, Field32, Field33, Field34, Field35, Field36, Field37, Field38, Field39, Field40) 
select generate_series Field1, rand() Field2, rand() Field3 , rand() Field4 , rand() Field5 , rand() Field6 , rand() Field7 , rand() Field8 , rand() Field9 , rand() Field10, rand() Field11, rand() Field12, rand() Field13, rand() Field14, rand() Field15, rand() Field16, rand() Field17, rand() Field18, rand() Field19, rand() Field20, rand() Field31, rand() Field32, rand() Field33, rand() Field34, rand() Field35, rand() Field36, rand() Field37, rand() Field38, rand() Field39, rand() Field40 
from generate_series(1,50000);

ODBC version 1.03.00.55930 for Windows

I'm run queries by ODBC
select * from test_int - fetching all rows very quickly (1 sec)
select * from test_str - fetching all rows very slowly (15 sec)

For example, vbs script

Option Explicit

Sub ProcessDatabase_ClickHouse (ServerName, DatabaseName, UserName, Password)
	Dim ADOConnection, RS, start, i
	Set ADOConnection=CreateObject("ADODB.Connection")
	ADOConnection.ConnectionString = "driver={ClickHouse ODBC Driver (Unicode)};server=" & ServerName & ";Database=" & DatabaseName & ";uid=" & UserName & ";Pwd=" & Password & ";Timeout=300000"
	ADOConnection.CursorLocation = 3
	ADOConnection.ConnectionTimeout = 120
	ADOConnection.CommandTimeout = 600
	On Error Resume Next
	ADOConnection.Open()
        If Err.Number <> 0 Then
		WScript.Echo CStr(Now)&vbTab&"Code: "&CStr(Err.Number)&vbTab&"Description: "+Err.Description,0,"Error"
                Wscript.quit 1
        End If
	On Error Goto 0
	start = now(): i = 0
	Set RS = ADOConnection.Execute("select * from test_int")
	While not RS.EOF
		i = i + 1
		RS.MoveNext()
	Wend
	RS.Close()
	Wscript.Echo("ClickHouse (test_int): " & DateDiff("s",start,now()) & " sec, " & i & " rows")

	start = now(): i = 0
	Set RS = ADOConnection.Execute("select * from test_str")
	While not RS.EOF
		i = i + 1
		RS.MoveNext()
	Wend
	RS.Close()
	Wscript.Echo("ClickHouse (test_str): " & DateDiff("s",start,now()) & " sec, " & i & " rows")

	ADOConnection.Close()
	Set ADOConnection = Nothing
End Sub

Call ProcessDatabase_ClickHouse ("clickhouse","tj","","")

ClickHouse (test_int): 1 sec, 50000 rows
ClickHouse (test_str): 15 sec, 50000 rows

@ganshinm ganshinm added the bug label Oct 19, 2024
@ganshinm
Copy link
Author

DBeaver (use JDBC) fetching ALL rows from test_int and test_str - very quickly (0.2-0.3 sec)

@ganshinm
Copy link
Author

Выяснилось, что задержка происходит при вызове SQLGetData,, если параметр BufferLength передается 0.
Взял APIMonitor (http://www.rohitab.com/apimonitor), настроил трассировку функций ODBC32.dll, запустил свой скрипт.

image

@ganshinm
Copy link
Author

На скрине видно, что если при вызове SQLGetData предпоследний параметр BufferLength передан 0, то осуществляется значительно дольше, чем случае, когда передано ненулевое значение.

Прототип SQLGetData() выглядит так

SQLRETURN SQLGetData(  
      SQLHSTMT       StatementHandle,  
      SQLUSMALLINT   Col_or_Param_Num,  
      SQLSMALLINT    TargetType,  
      SQLPOINTER     TargetValuePtr,  
      SQLLEN         BufferLength,  
      SQLLEN *       StrLen_or_IndPtr);  

@ganshinm
Copy link
Author

ganshinm commented Oct 20, 2024

Далее взял результаты трассировки APIMonitor сохранил в текстовый файл, залил в базу, в табличку APIMonitor и сгруппировал вызовы по методу и отсортировал по длительности.

Получил такую картину

localhost :) select API,sum(cast(Duration as numeric(32,10))), count(*) from APIMonitor group by API order by 2  desc;

SELECT
    API,
    sum(CAST(Duration, 'numeric(32, 10)')),
    count(*)
FROM APIMonitor
GROUP BY API
ORDER BY 2 DESC

Query id: 9ae7c92d-0f4c-4b73-b48c-870f33a174e3

     ┌─API────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─sum(CAST(Duration, 'numeric(32, 10)'))─┬─count()─┐
  1. │ SQLGetData ( 0x0000021a96751fd0, 3, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6981301 │   50000 │
  2. │ SQLGetData ( 0x0000021a96751fd0, 20, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6565419 │   50000 │
  3. │ SQLGetData ( 0x0000021a96751fd0, 25, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6535786 │   50000 │
  4. │ SQLGetData ( 0x0000021a96751fd0, 22, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6520546 │   50000 │
  5. │ SQLGetData ( 0x0000021a96751fd0, 26, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6465253 │   50000 │
  6. │ SQLGetData ( 0x0000021a96751fd0, 29, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6460461 │   50000 │
  7. │ SQLGetData ( 0x0000021a96751fd0, 21, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6453299 │   50000 │
  8. │ SQLGetData ( 0x0000021a96751fd0, 17, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6440748 │   50000 │
  9. │ SQLGetData ( 0x0000021a96751fd0, 27, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6426719 │   50000 │
 10. │ SQLGetData ( 0x0000021a96751fd0, 23, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6424047 │   50000 │
 11. │ SQLGetData ( 0x0000021a96751fd0, 18, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6409147 │   50000 │
 12. │ SQLGetData ( 0x0000021a96751fd0, 7, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6401205 │   50000 │
 13. │ SQLGetData ( 0x0000021a96751fd0, 16, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6400322 │   50000 │
 14. │ SQLGetData ( 0x0000021a96751fd0, 24, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6371668 │   50000 │
 15. │ SQLGetData ( 0x0000021a96751fd0, 11, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6370988 │   50000 │
 16. │ SQLGetData ( 0x0000021a96751fd0, 13, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6366859 │   50000 │
 17. │ SQLGetData ( 0x0000021a96751fd0, 30, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6365716 │   50000 │
 18. │ SQLGetData ( 0x0000021a96751fd0, 14, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6362751 │   50000 │
 19. │ SQLGetData ( 0x0000021a96751fd0, 4, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6352587 │   50000 │
 20. │ SQLGetData ( 0x0000021a96751fd0, 19, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6352128 │   50000 │
 21. │ SQLGetData ( 0x0000021a96751fd0, 28, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6343851 │   50000 │
 22. │ SQLGetData ( 0x0000021a96751fd0, 15, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6338472 │   50000 │
 23. │ SQLGetData ( 0x0000021a96751fd0, 5, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6336093 │   50000 │
 24. │ SQLGetData ( 0x0000021a96751fd0, 9, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6334756 │   50000 │
 25. │ SQLGetData ( 0x0000021a96751fd0, 12, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6333269 │   50000 │
 26. │ SQLGetData ( 0x0000021a96751fd0, 10, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6309791 │   50000 │
 27. │ SQLGetData ( 0x0000021a96751fd0, 6, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6300248 │   50000 │
 28. │ SQLGetData ( 0x0000021a96751fd0, 8, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                         │                              0.6287487 │   50000 │
 29. │ SQLGetData ( 0x0000021a96751fd0, 31, 1, 0x00000040ccd6c368, 0, 0x00000040ccd6c3c8 )                                                        │                              0.6267672 │   50000 │
 30. │ SQLFetch ( 0x0000021a96751fd0 )                                                                                                            │                              0.4320346 │   50001 │

Тут видно, что задержки достаточно большие (50000 вызовов SQLGetData отрабатывали 0.6 сек)

@ganshinm
Copy link
Author

К примеру такой скрипт на Python не вызывает задержек

#!/usr/local/bin/python
# -*- coding: utf-8 -*-

import sys
import pyodbc
import datetime

def process_database(servername,dataasebase,username,password):
    init_string="DRIVER={{ClickHouse ODBC Driver (Unicode)}}; server={0}; database={1}; uid={2}; pwd={3}".format(servername,dataasebase,username,password)
    conn = pyodbc.connect(init_string)
    conn.autocommit = True
    cursor = conn.cursor()

    start = datetime.datetime.now()
    i = 0
    for row in cursor.execute("select * from test_int").fetchall():
        i+=1
    print("ClickHouse (test_int): {0} sec, {1} rows".format((datetime.datetime.now()-start).total_seconds(),i))

    start = datetime.datetime.now()
    i = 0
    for row in cursor.execute("select * from test_str").fetchall():
        i+=1
    print("ClickHouse (test_str): {0} sec, {1} rows".format((datetime.datetime.now()-start).total_seconds(),i))

if __name__ == "__main__":
	process_database("clickhouse","tj","","")

ClickHouse (test_int): 1.560003 sec, 50000 rows
ClickHouse (test_str): 2.168404 sec, 50000 rows

Это происходит потому, что библиотека pyodbc вызывает SQLGetData всегда с ненулевым значением параметра BufferLength (это удалось подтвердить через вышеупомянутый APIMonitor)

@ganshinm
Copy link
Author

В первоначальном скрипте используется ADODB поверх ODBC,

Set ADOConnection=CreateObject("ADODB.Connection")
ADOConnection.ConnectionString = "driver={ClickHouse ODBC Driver (Unicode)};server=" & ServerName & ";Database=" & DatabaseName & ";uid=" & UserName & ";Pwd=" & Password & ";Timeout=300000"

ADODB - это высокоуровневый интерфейс в т.ч. вокруг ODBC (используется в случаях, когда клиентское приложение может использовать только высокоуровнеый COM IDispatch.
Именно ADODB формирует большое количество вызовов SQLGetData с параметром BufferLength=0 (что и приводит к задержкам)
Повлиять мы на это не можем. Использовать чистый ODBC тоже нет возможности. Просьба поправить задержки внутри драйвера ClickHouse ODBC

@ganshinm
Copy link
Author

ganshinm commented Oct 23, 2024

Задержка происходит из-за проброса исключений

    if (in_value_length > out_value_max_length)
        throw SqlException("String data, right truncated", "01004", SQL_SUCCESS_WITH_INFO);

в методе fillOutputBuffer (driver\utils\type_info.h)
и

    if ((converted_length_in_symbols + 1) > out_value_max_length_in_symbols) // +1 for null terminating character
        throw SqlException("String data, right truncated", "01004", SQL_SUCCESS_WITH_INFO);

в методе fillOutputString (driver\utils\type_info.h)

Чуть выше исключение перехватывается

try {
    return doCall(callable);
}
catch (const SqlException & ex) {
    LOG(ex.getSQLState() << " (" << ex.what() << ")" << "[rc: " << ex.getReturnCode() << "]");
    return ex.getReturnCode();
}

и в вызывающий код передается SQL_SUCCESS_WITH_INFO, а не SQL_SUCCESS, как обычно (это кстати, видно в APIMonitor)

Вызов исключения происходит часто, если у меня 30 полей и 50000 строк - то вызов 1.5 млн исключений съедает 15 секунд

Закомментил эти строки (throw SqlException... и условия перед ними), пересобрал, задержка ушла.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant