Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow query to AD #1105

Open
santosshen opened this issue Nov 1, 2023 · 3 comments
Open

Slow query to AD #1105

santosshen opened this issue Nov 1, 2023 · 3 comments

Comments

@santosshen
Copy link

from datetime import datetime

from ldap3 import Server, Connection, set_config_parameter, SUBTREE, NONE

from ad.test.info import AD_SERVER_INFO

set_config_parameter('RESTARTABLE_SLEEPTIME', 0.5)
set_config_parameter('RESPONSE_WAITING_TIMEOUT', 2)

LDAP_BASE = AD_SERVER_INFO['LDAP_BASE']
AD_SERVER_IP = AD_SERVER_INFO['AD_SERVER_IP']
AD_USER = AD_SERVER_INFO['AD_USER']
AD_PASSWORD = AD_SERVER_INFO['AD_PASSWORD']

LDAP_SERVER = Server(host=AD_SERVER_IP, port=389, get_info=NONE, use_ssl=False)
LDAP_CONN = Connection(server=LDAP_SERVER, user=AD_USER, password=AD_PASSWORD, auto_bind=True,
                       client_strategy='RESTARTABLE', receive_timeout=2, check_names=False)


def paged_search_object():
    start_time = datetime.now()
    object_list = []
    response = LDAP_CONN.extend.standard.paged_search(
        search_base=LDAP_BASE,
        search_filter='(objectCategory=*)',
        search_scope=SUBTREE,
        attributes=['distinguishedName', 'ou', 'cn']
    )
    for res in response:
        if res.get('attributes'):
            object_list.append(dict(res['attributes']))
    end_time = datetime.now()
    time_elapsed = end_time - start_time
    seconds_elapsed = time_elapsed.total_seconds()
    print(len(object_list))
    print(f'{seconds_elapsed} s')
    return object_list


paged_search_object()
paged_search_object()
5829
3.489604 s
5829
3.40749 s

use java javax.naming.ldap search 5829 object , 900ms
use python ldap3 search 5829 object , 3.4s

Is there a problem with my query?

@santosshen
Copy link
Author

Total time: 5.2746 s
Function: paged_search_object at line 21

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           @func_line_time
    22                                           def paged_search_object():
    23         1         39.0     39.0      0.0      start_time = datetime.now()
    24         1          7.0      7.0      0.0      object_list = []
    25         2         76.0     38.0      0.0      response = LDAP_CONN.extend.standard.paged_search(
    26         1          3.0      3.0      0.0          search_base=LDAP_BASE,
    27         1          4.0      4.0      0.0          search_filter='(objectCategory=*)',
    28         1          3.0      3.0      0.0          search_scope=SUBTREE,
    29         1          5.0      5.0      0.0          attributes=['distinguishedName', 'ou', 'cn']
    30                                               )
    31      5830   52466375.0   8999.4     99.5      for res in response:
    32      5829      46411.0      8.0      0.1          if res.get('attributes'):
    33      5829     232829.0     39.9      0.4              object_list.append(dict(res['attributes']))
    34         1         39.0     39.0      0.0      end_time = datetime.now()
    35         1         17.0     17.0      0.0      time_elapsed = end_time - start_time
    36         1         16.0     16.0      0.0      seconds_elapsed = time_elapsed.total_seconds()
    37         1        142.0    142.0      0.0      print(len(object_list))
    38         1         68.0     68.0      0.0      print(f'{seconds_elapsed} s')
    39         1          4.0      4.0      0.0      return object_list

@Gu-f
Copy link

Gu-f commented Apr 18, 2024

I think this is a problem with the ldap3 library code, which seems to be N pulls N times for the AD service, not batch pulls.
The loop is inside this method:

# /ldap3/extend/standard/PagedSearch.py
    while cookie:
        result = connection.search(search_base,
                                   search_filter,
                                   search_scope,
                                   dereference_aliases,
                                   attributes,
                                   size_limit,
                                   time_limit,
                                   types_only,
                                   get_operational_attributes,
                                   controls,
                                   paged_size,
                                   paged_criticality,
                                   None if cookie is True else cookie)

        if not connection.strategy.sync:
            response, result = connection.get_response(result)
        else:
            if connection.strategy.thread_safe:
                _, result, response, _ = result
            else:
                response = connection.response
                result = connection.result

It seems to be better suited for paging iterators than pulling all at once, but at the same time I had another problem, details #1141

@santosshen
Copy link
Author

#1147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants