You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As previously addressed in e.g. #2417, #2996, #3003, #3208, and #3211, SQLite can't handle multiple concurrent writes. Our SQLite IO Manager has worked around this for the PUDL DB, but we seem to be hitting a new limit of some kind with Dagster's event logging DB, which still uses SQLite locally by default.
@cmgosnell recently encountered the issue in attempting to debug some integration test failures through the Dagster UI, in which it failed basically 100% of the time. I have had the problem on and off with maybe a 50-75% failure rate on the full ETL.
For me, the failure always seems to happen right after the execution of the raw_eia860m__all_dfs asset.
2024-09-25 21:43:12 -0600 - dagster - DEBUG - etl_full - 3162fc18-e15b-4284-81dd-b46c063d8d35 - 8873 - raw_eia860m__all_dfs - STEP_SUCCESS - Finished execution of step "raw_eia860m__all_dfs" in 2m35s.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: unable to open database file
Full stack trace here:
2024-09-25 21:43:12 -0600 - dagster - DEBUG - etl_full - 3162fc18-e15b-4284-81dd-b46c063d8d35 - 8873 - raw_eia860m__all_dfs - STEP_SUCCESS - Finished execution of step "raw_eia860m__all_dfs" in 2m35s.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: unable to open database file
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_grpc/impl.py", line 266, in start_run_in_subprocess
_run_in_subprocess(
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_grpc/impl.py", line 250, in _run_in_subprocess
instance.report_engine_event(
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/instance/__init__.py", line 2497, in report_engine_event
self.report_dagster_event(dagster_event, run_id=run_id, log_level=log_level)
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/instance/__init__.py", line 2519, in report_dagster_event
self.handle_new_event(event_record)
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/instance/__init__.py", line 2417, in handle_new_event
self._event_storage.store_event(events[0])
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/dagster/_core/storage/event_log/sqlite/sqlite_event_log.py", line 242, in store_event
conn.execute(insert_event_statement)
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
return meth(
^^^^^
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection
return connection._execute_clauseelement(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
ret = self._execute_context(
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
return self._exec_single_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context
self._handle_dbapi_exception(
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
self.dialect.do_execute(
File "/Users/zane/miniforge3/envs/pudl-dev/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
[SQL: INSERT INTO event_logs (run_id, event, dagster_event_type, timestamp, step_key, asset_key, partition) VALUES (?, ?, ?, ?, ?, ?, ?)]
[parameters: ('3162fc18-e15b-4284-81dd-b46c063d8d35', '{"__class__": "EventLogEntry", "dagster_event": {"__class__": "DagsterEvent", "event_specific_data": {"__class__": "EngineEventData", "error": null, ... (350 characters truncated) ... "pipeline_name": "etl_full", "run_id": "3162fc18-e15b-4284-81dd-b46c063d8d35", "step_key": null, "timestamp": 1727322130.5879638, "user_message": ""}', 'ENGINE_EVENT', '2024-09-26 03:42:10.587964', None, None, None)]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
This "unable to open database file" error seems somewhat different than the locked DB error that we were getting before due to attempted concurrent writes.
So, is there a new workaround to squeeze some more life out of SQLite? Or what is the easiest way to use Postgres locally for development?
@jdangerx suggested Orbstack as a containerized Postgres solution for local use that doesn't run like 10x slower on MacOS/arm64.
Is there some way to make Dagster log fewer events? Can we turn off DEBUG level logging? I don't think we really look at it!
How does this intersect with our development environment setup, which is already challenging for new contributors? Is there an "easy" way to use dev containers and integrate this Postgres setup into that?
Is there a possibility of making our tests not use the in-process (single threaded) executor if we're messing around with containers anyway?
The text was updated successfully, but these errors were encountered:
I may just be a superstitious pigeon on this but it seems like sometimes I get this failure 100% reliably, and other times it's maybe just a 50% chance, so if it fails, and I try to run the ETL again, it might work. But I don't know what influences the probability. Last night I tried to run it 10 times in a row, and they all failed within 2.5 minutes.
Some things I've explored in the past without success:
Does the probability vary with the python environment? I.e. is there some combination of packages that aren't playing well together that causes the problem to be more likely? Sometimes updating the pudl-dev environment seems to fix things. But it's not clear if that's real or me imagining things.
Are there too many files & sockets open at once? Using lsof I can see that Dagster often has thousands of "files" open at once, and my system as a whole might have 20,000 open when the ETL is running.
As previously addressed in e.g. #2417, #2996, #3003, #3208, and #3211, SQLite can't handle multiple concurrent writes. Our SQLite IO Manager has worked around this for the PUDL DB, but we seem to be hitting a new limit of some kind with Dagster's event logging DB, which still uses SQLite locally by default.
@cmgosnell recently encountered the issue in attempting to debug some integration test failures through the Dagster UI, in which it failed basically 100% of the time. I have had the problem on and off with maybe a 50-75% failure rate on the full ETL.
For me, the failure always seems to happen right after the execution of the
raw_eia860m__all_dfs
asset.Full stack trace here:
This "unable to open database file" error seems somewhat different than the locked DB error that we were getting before due to attempted concurrent writes.
So, is there a new workaround to squeeze some more life out of SQLite? Or what is the easiest way to use Postgres locally for development?
DEBUG
level logging? I don't think we really look at it!The text was updated successfully, but these errors were encountered: