Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while iterating over a chunk #1862

Open
Erivan3000 opened this issue Jul 24, 2024 · 3 comments
Open

Timeout while iterating over a chunk #1862

Erivan3000 opened this issue Jul 24, 2024 · 3 comments
Labels

Comments

@Erivan3000
Copy link

Erivan3000 commented Jul 24, 2024

System information (please complete the following information):

  • OS: Windows 11 Pro 22H2 (Build 22621.3880) x64
  • Python Version: [3.12.4]
  • SDK Version: 7.54.8

Describe the bug
after just over 1 hour the token expires

A fictitious example, but one that reproduces my problem, is the following code:

for event_list in client.events(chunk_size = 500_000, data_set_ids = ids_sites):
events = event_list.to_pandas()

Error message after just over an hour: CogniteAPIError: Unauthorized | code: 401 | X-Request-ID: 9c8702fc-59bf-9f7a-be47-4876d6b433f3

In fact, if I iterate in another way (for example, filtering by date), I can run code for more than 8 hours (for as long as I want, actually, because I re-authenticate between iterations, preventing the token from expiring). However, I don't want to filter by date because it doesn't provide consistent data volume like iterating by chunks.

And apparently, I can't authenticate between iterations in the example I provided here, as it seems the chunk persists the initial authentication, which initially makes sense. Does anyone know how to solve this?

To Reproduce
Runnable code reproducing the error.

from cognite.client import CogniteClient

client = CogniteClient()

import pandas as pd

# DataFrame para acumular os resultados
all_events = pl.DataFrame({col: pl.Series([], dtype=dt) for col, dt in zip(events_columns, events_column_types)})

# Itera sobre os eventos retornados pelo cliente
for event_list in client.events(chunk_size = 250_000, data_set_ids = ids_sites):
    events = event_list.to_pandas()  # Converte os eventos para DataFrame
    events = pl.from_pandas(events)

    events = events.select(events_columns)
    
    all_events = pl.concat([all_events, events])  # Concatena com o DataFrame acumulado
    print(len(all_events))


Expected behavior
I expected it to go through all the chunks without expiring the token

Screenshots
image

image

image

Additional context
Add any other context about the problem here.

@Erivan3000 Erivan3000 added the bug label Jul 24, 2024
@Erivan3000 Erivan3000 changed the title timeout with iterate over a chunk Timeout while iterating over a chunk Jul 24, 2024
@haakonvt
Copy link
Contributor

Hi @Erivan3000 and thanks for the bug report. Could you share a code snippet showcasing how authenticating is set up for your CogniteClient?

@Erivan3000
Copy link
Author

Hi @Erivan3000 and thanks for the bug report. Could you share a code snippet showcasing how authenticating is set up for your CogniteClient?

of course, I put it now

@haakonvt
Copy link
Contributor

@Erivan3000 From what I can tell, you pass in a single token as a string (that will eventually expire as you observe). You need to pass in a function that returns a valid token, or better, use one of the CredentialProviders that ship with the SDK for simplicity. These will refresh automatically in the background for you.

For instance, check out OAuthInteractive:

>>> from cognite.client.credentials import OAuthInteractive
>>> oauth_provider = OAuthInteractive(
...     authority_url="https://login.microsoftonline.com/xyz",
...     client_id="abcd",
...     scopes=["https://greenfield.cognitedata.com/.default"],
... )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants