Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost connection to PostgreSQL database #7061

Open
jekeam opened this issue Jul 16, 2024 · 10 comments
Open

Lost connection to PostgreSQL database #7061

jekeam opened this issue Jul 16, 2024 · 10 comments

Comments

@jekeam
Copy link

jekeam commented Jul 16, 2024

Issue Summary

After launching Redash, after some time the connection to the database is lost.

If you go into connection testing, the request just hangs and does not return an answer.

There can be no problem with the database, because... I checked this and after restarting the service, the database connects again and works for some time

Steps to Reproduce

Right after launch:
Screenshot_73

After 15 minutes:
Screenshot_74

Also, the table schema, pg 12 version, is never loaded:
image

I don't see any errors in the logs.

Technical details:

  • Redash Version: 10.1.0
  • Browser/OS: Ubuntu 20.04.6 LTS
  • How did you install Redash: setup.sh
@justinclift
Copy link
Member

Interesting. Is this a brand new Redash installation that you could recreate using the latest preview image instead?

@jekeam
Copy link
Author

jekeam commented Jul 16, 2024

Interesting. Is this a brand new Redash installation that you could recreate using the latest preview image instead?

Yes, everything is the same on the preview version.

image

@jekeam
Copy link
Author

jekeam commented Jul 16, 2024

Could this be something related to the characteristics of the system (I am using a pure VPS server)? Where can I see logs for this problem?

@justinclift
Copy link
Member

Yeah, it sounds like something on the VPS setup itself is dropping the connection.

Which hosting company is the VPS with?

@justinclift
Copy link
Member

Where can I see logs for this problem?

Good question. My first thought is it would probably be one of the worker docker containers showing the problem. It's also weird (to me) that it's not automatically re-establishing the dropped PostgreSQL connection(s) either. 😦

@jekeam
Copy link
Author

jekeam commented Jul 16, 2024

VPS

VPS - reg.ru
Number of vCPU cores
6 RAM memory size, GB
Up to 48 GB available when selecting more vCPU cores 6
Disk capacity, GB 60

I have now discovered an error in the worker, probably this is the problem:

worker-1 | [2024-07-16 16:34:16,109][PID:419][INFO][rq.worker] periodic: Job OK (305c0cae0a196ae96915fd2b6f81001c435aad65)
worker-1 | [2024-07-16 16:34:16,110][PID:419][INFO][rq.worker] Result is kept for 600 seconds
worker-1 | [2024-07-16 16:34:16,121][PID:8][INFO][rq.worker] periodic: 59a84668e68687338646f735965213c58e814b32
worker-1 | [2024-07-16 16: 34: 16,154] [PID: 420] [RQ.Worker] [Job 59A846868733864646464646F735965213C58E8142]: Excection Raidion While Execution (Redash.TASS. queries.maintenance.remove_ghost_locks)
worker-1 | Traceback (most recent call last):
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/worker.py", line 1431, in perform_job
worker-1 | rv = job.perform()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1280, in perform
worker-1 | self._result = self._execute()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 1317, in _execute
worker-1 | result = self.func(*self.args, **self.kwargs)
worker-1 | File "/app/redash/tasks/queries/maintenance.py", line 148, in remove_ghost_locks
worker-1 | jobs = list(rq_job_ids())
worker-1 | File "/app/redash/monitor.py", line 64, in rq_job_ids
worker-1 | started_jobs = [StartedJobRegistry(queue=q).get_job_ids() for q in queues]
worker-1 | File "/app/redash/monitor.py", line 64, in <listcomp>
worker-1 | started_jobs = [StartedJobRegistry(queue=q).get_job_ids() for q in queues]
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/registry.py", line 152, in get_job_ids
worker-1 | self.cleanup()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/registry.py", line 234, in cleanup
worker-1 | job = self.job_class.fetch(job_id, connection=self.connection, serializer=self.serializer)
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 591, in fetch
worker-1 | job.refresh()
worker-1 | File "/usr/local/lib/python3.10/site-packages/rq/job.py", line 991, in refresh
worker-1 | data = self.connection.hgetall(self.key)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/commands/core.py", line 4911, in hgetall
worker-1 | return self.execute_command("HGETALL", name)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1269, in execute_command
worker-1 | return conn.retry.call_with_retry(
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/retry.py", line 46, in call_with_retry
worker-1 | return do()
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1270, in <lambda>
worker-1 | lambda: self._send_command_parse_response(
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1246, in _send_command_parse_response
worker-1 | return self.parse_response(conn, command_name, **options)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/client.py", line 1286, in parse_response
worker-1 | response = connection.read_response()
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 882, in read_response
worker-1 | response = self._parser.read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 349, in read_response
worker-1 | result = self._read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 393, in _read_response
worker-1 | response = [
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 394, in <listcomp>
worker-1 | self._read_response(disable_decoding=disable_decoding)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 401, in _read_response
worker-1 | response = self.encoder.decode(response)
worker-1 | File "/usr/local/lib/python3.10/site-packages/redis/connection.py", line 130, in decode
worker-1 | value = value.decode(self.encoding, self.encoding_errors)
worker-1 | UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

@justinclift
Copy link
Member

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Yeah, that seems weird. 😦

@justinclift
Copy link
Member

Excection Raidion

That's a weird error message too. Haven't seen that before either.

@jekeam
Copy link
Author

jekeam commented Jul 16, 2024

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Yeah, that seems weird. 😦

As far as I understand, this error is not related to my attempt to connect to the database

@jekeam
Copy link
Author

jekeam commented Jul 16, 2024

Here is the log after I press connection test

worker-1  | [2024-07-16 17:44:57,772][PID:7][INFO][rq.worker] default: ba60d510-d236-4ed0-9ad1-eb0695431875
worker-1  | [2024-07-16 17:44:57,832][PID:316][INFO][rq.worker] default: Job OK (ba60d510-d236-4ed0-9ad1-eb0695431875)
worker-1  | [2024-07-16 17:44:57,832][PID:316][INFO][rq.worker] Result is kept for 500 seconds
worker-1  | [2024-07-16 17:44:59,163][PID:7][INFO][rq.worker] default: e25614f4-7cb8-4ee3-843f-1e06856d5db4
worker-1  | [2024-07-16 17:44:59,233][PID:317][INFO][rq.worker] default: Job OK (e25614f4-7cb8-4ee3-843f-1e06856d5db4)
worker-1  | [2024-07-16 17:44:59,234][PID:317][INFO][rq.worker] Result is kept for 500 seconds
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] Received TICK_60 event from supervisor
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] Performing `RQ Worker Healthcheck` check for process name worker-0
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] WorkerHealthcheck: Worker rq:worker:f2ca6d09926b45d4b88ca2708f114801 healthcheck: Is busy? False. Seen lately? True (1 seconds ago). Has nothing to do? True (0 jobs in watched queues). ==> Is healthy? True
worker-1  | 2024/07/16 17:45:01 [worker_healthcheck] `RQ Worker Healthcheck` check succeeded for process worker-0
worker-1  | RESULT 2
worker-1  | OKREADY
worker-1  | [2024-07-16 17:45:08,380][PID:7][INFO][rq.worker] periodic: 305c0cae0a196ae96915fd2b6f81001c435aad65
worker-1  | [2024-07-16 17:45:08,394][PID:318][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.refresh_queries job.id=305c0cae0a196ae96915fd2b6f81001c435aad65 Refreshing queries...
worker-1  | [2024-07-16 17:45:08,462][PID:318][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.refresh_queries job.id=305c0cae0a196ae96915fd2b6f81001c435aad65 Done refreshing queries: {'started_at': 1721151908.3942385, 'outdated_queries_count': 0, 'last_refresh_at': 1721151908.4607894, 'query_ids': '[]'}
worker-1  | [2024-07-16 17:45:08,465][PID:318][INFO][rq.worker] periodic: Job OK (305c0cae0a196ae96915fd2b6f81001c435aad65)
worker-1  | [2024-07-16 17:45:08,466][PID:318][INFO][rq.worker] Result is kept for 600 seconds
worker-1  | [2024-07-16 17:45:30,711][PID:7][INFO][rq.worker] default: d51faf22-0da7-427b-8628-ef1410d1f329
worker-1  | [2024-07-16 17:45:30,774][PID:319][INFO][rq.worker] default: Job OK (d51faf22-0da7-427b-8628-ef1410d1f329)
worker-1  | [2024-07-16 17:45:30,774][PID:319][INFO][rq.worker] Result is kept for 500 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants