Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dragonfly keeps restarting #4250

Open
kissingtiger opened this issue Dec 4, 2024 · 3 comments
Open

Dragonfly keeps restarting #4250

kissingtiger opened this issue Dec 4, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@kissingtiger
Copy link

Dragonfly version:dragonfly df-v1.20.1-501b7f7b4fb049de2a8a5fff15d945cd7da1046a
os: 5.14.0-467.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jun 19 12:08:12 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
dragonfly.conf:
--pidfile=/data/dragonfly/run/dragonfly.pid
--log_dir=/data/dragonfly/logs
--dir=/data/dragonfly/data
--version_check=true
--cache_mode=true
--dbnum=16
--bind=0.0.0.0
--port=6379
--maxmemory=500g
--keys_output_limit=12288
--requirepass=xxxx
--masterauth=xxxx
--logtostderr=false

logs:
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: F20241204 03:10:37.370811 2542862 epoll_socket.cc:260] Check failed: write_context_ == NULL
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: *** Check failure stack trace: ***
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309e30343 google::LogMessage::SendToLog()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309e28b07 google::LogMessage::Flush()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309e2a48f google::LogMessageFatal::~LogMessageFatal()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309c35e5f util::fb2::EpollSocket::WriteSome()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309c34279 util::fb2::EpollSocket::AsyncWriteSome()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309c393a4 io::AsyncSink::AsyncWrite()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c3097147f2 dfly::JournalStreamer::Write()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309714912 _ZNSt17_Function_handlerIFvRKN4dfly7journal11JournalItemEbEZNS0_15JournalStreamer5StartEPN4util15FiberSocketBaseEbEUlS4_bE_E9_M_invokeERKSt9_Any_dataS4_Ob
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c3096fdcb0 dfly::journal::JournalSlice::AddLogRecord()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c3096fc88d dfly::journal::Journal::RecordEntry()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c30970e7de dfly::RecordExpiry()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c3096ccda0 dfly::DbSlice::ExpireIfNeeded()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c3096ceb9c ZN4dfly9DashTableINS_10CompactObjENS_12ExpirePeriodENS_6detail17ExpireTablePolicyEE8TraverseIRZNS_7DbSlice17DeleteExpiredStepERKNS_9DbContextEjEUlNS5_8IteratorILb0ELb0EEEE_EENS3_10DashCursorESF_OT
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c3096ceea3 dfly::DbSlice::DeleteExpiredStep()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c30962db0f dfly::EngineShard::Heartbeat()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c30962f180 dfly::EngineShard::RunPeriodic()
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c30962f7ed _ZN5boost7context6detail11fiber_entryINS1_12fiber_recordINS0_5fiberEN4util3fb219FixedStackAllocatorEZNS6_6detail15WorkerFiberImplIZN4dfly11EngineShard18StartPeriodicFiberEPNS6_12ProactorBaseEEUlvE_JEEC4IS7_EESt17basic_string_viewIcSt11char_traitsIcEERKNS0_12preallocatedEOT_OSE_EUlOS4_E_EEEEvNS1_10transfer_tE
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: @ 0x55c309c3e1af make_fcontext
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: *** SIGABRT received at time=1733253037 on cpu 78 ***
Dec 4 03:10:37 VM-16-191-centos dragonfly[2542781]: PC: @ 0x7fec1da8b94c (unknown) __pthread_kill_implementation
Dec 4 03:10:37 VM-16-191-centos systemd[1]: Started Process Core Dump (PID 2580405/UID 0).
Dec 4 03:10:37 VM-16-191-centos systemd-coredump[2580406]: Removed old coredump core.dragonfly.0.cd995e46fe58401c838fc28a21cc6e5d.2439892.1733230611000000.zst.
Dec 4 03:10:50 VM-16-191-centos systemd-coredump[2580406]: Process 2542781 (dragonfly) of user 0 dumped core.#12#012Stack trace of thread 2542862:#12#0 0x00007fec1da8b94c n/a (n/a + 0x0)
Dec 4 03:10:51 VM-16-191-centos systemd[1]: [email protected]: Deactivated successfully.
Dec 4 03:10:51 VM-16-191-centos systemd[1]: [email protected]: Consumed 8.654s CPU time.
Dec 4 03:11:00 VM-16-191-centos systemd[1]: dragonfly.service: Main process exited, code=dumped, status=6/ABRT
Dec 4 03:11:00 VM-16-191-centos systemd[1]: dragonfly.service: Failed with result 'core-dump'.

@kissingtiger kissingtiger added the bug Something isn't working label Dec 4, 2024
@adiholden
Copy link
Collaborator

Hi @kissingtiger . Thank you for reporting this.
Are there any additinal logs before the stack trace ? maybe it can give me a hint if this happened on replica reconnect stage or something else. How frequently do you get it can you tell?
Also will you try to use latest dragonfly version and see if this crash still appears?

@adiholden adiholden self-assigned this Dec 4, 2024
@kissingtiger
Copy link
Author

Triggered approximately every 4 hours, causing the dragonfly process to restart. No maintenance operations have been carried out before this, and the latest version verification has not been upgraded yet.

@romange
Copy link
Collaborator

romange commented Dec 11, 2024

@kissingtiger have you had a chance to update the version ? we won't be able to help with fixing v1.20.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants