-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Understanding FoundationDB Log Files
Type - The name of the trace event.
LogGroup - The value of the --loggroup
parameter passed to fdbserver
. This is useful when trace logs from multiple databases are ingested into the same system.
Severity:
- 10 - SevInfo - General information events
- 20 - SevWarn - Indicates a generally rare event, but not indicate a actual problem
- 30 - SevWarnAlways - Indicates an event that could be causing performance problems in a cluster
- 40 - SevError - Indicates one of the assumptions of the database has been violated. Generally caused by hardware failures.
ID - The ID of the role which produced the event. For instance, messages logged by the master will contain the master's ID.
Machine - The IP address and port of the process which produced the event. For client trace logs the pid of the process is logged instead of the port.
Time - The time at which the event happened.
Roles - A comma separated list of two letter codes which correspond to all of the roles running on the process at the time the message was logged.
- StorageServer - SS
- TLog - TL
- MasterProxyServer - MP
- MasterServer - MS
- Resolver - RV
- ClusterController - CC
- Tester - TS
- LogRouter - LR
- DataDistributor - DD
- Ratekeeper - RK
- Coordinator - CD
Error - Events which are logged because of an error include the associated error code.
Backtrace - Some high severity events include a backtrace which provides the call stack of the system at the time the event was generated.
TrackLatestType - The trace files generated by FoundationDB will automatically roll after a period of time. Some events are re-logged every time the trace file is rolled, so we will always have a history of the last time the message was called. If the TrackLatestType is Original
that means the event just happened. If the TrackLatestType is Rolled
it means the event happened sometime in the past. For Rolled
trace events, the OriginalTime
field is the time at which the original event happened.
SuppressedEventCount - Many events have a maximum cap on the number of times they can be logged in a given time interval. Events that have this time of suppression include the number of events that were not logged in the next message which does get written.
Counters - Fields which have values that are three space separated numeric values are called counters. The first number is the the amount the value has increases per second since the field was last logged. The second number is a roughness, where larger numbers represent that the increases to this field are very bursty, and smaller values represent that the value has increase at a very regular interval. The third number is the total amount this value has increased since the start of the role logging the event.
TLogMetrics
The difference between BytesInput
and BytesDurable
is the amount of memory the TLog is using. Multiple TLogs can be recruited on the same process, but only one of them can be active. SharedBytesInput
and SharedBytesDurable
represent the combined memory used by all TLogs on the same process. Once this difference reaches 1.1GB batch priority transactions will be throttled. Once it reaches 1.5GB the TLog will start take writing mutations from memory to disk. Future reads for these mutations will need to read them from disk. Once this difference reaches 2.0GB Ratekeeper will start slowing down client traffic, with the goal of ensuring TLog memory does not exceed 2.4GB. SharedOverheadBytesInput
and SharedOverheadBytesDurable
show the amount of memory being used on overhead, and should generally be much smaller than the total amount of memory in use.
- Version
- QueueCommittedVersion
- PersistentDataVersion
- PersistentDataDurableVersion
- KnownCommittedVersion
- MinPoppedTagVersion
- MinPoppedTagLocality
- MinPoppedTagId
- KvstoreBytesUsed
- KvstoreBytesFree
- KvstoreBytesAvailable
- KvstoreBytesTotal
- QueueDiskBytesUsed
- QueueDiskBytesFree
- QueueDiskBytesAvailable
- QueueDiskBytesTotal
- PeekMemoryReserved
- PeekMemoryRequestsStalled
ProxyMetrics
- TxnStartIn
- TxnStartOut
- TxnStartBatch
- TxnSystemPriorityStartIn
- TxnSystemPriorityStartOut
- TxnBatchPriorityStartIn
- TxnBatchPriorityStartOut
- TxnDefaultPriorityStartIn
- TxnDefaultPriorityStartOut
- TxnCommitIn
- TxnCommitVersionAssigned
- TxnCommitResolving
- TxnCommitResolved
- TxnCommitOut
- TxnCommitOutSuccess
- TxnConflicts
- CommitBatchIn
- CommitBatchOut
- MutationBytes
- Mutations
- ConflictRanges
- KeyServerLocationRequests
- LastAssignedCommitVersion
- Version
- CommittedVersion
- CommitBatchesMemBytesCount
StorageMetrics
- GetKeyQueries
- GetValueQueries
- GetRangeQueries
- QueryQueue
- FinishedQueries
- RowsQueried
- BytesQueried
- WatchQueries
- EmptyQueries
- BytesInput
- BytesDurable
- BytesFetched
- MutationBytes
- SampledBytesCleared
- Mutations
- SetMutations
- ClearRangeMutations
- AtomicMutations
- UpdateBatches
- UpdateVersions
- Loops
- FetchWaitingMS
- FetchWaitingCount
- FetchExecutingMS
- FetchExecutingCount
- ReadsRejected
- LastTLogVersion
- Version
- StorageVersion
- DurableVersion
- DesiredOldestVersion
- VersionLag
- LocalRate
- FetchKeysFetchActive
- FetchKeysWaiting
- QueryQueueMax
- BytesStored
- ActiveWatches
- WatchBytes
- KvstoreBytesUsed
- KvstoreBytesFree
- KvstoreBytesAvailable
- KvstoreBytesTotal
ProcessMetrics
MachineMetrics
NetworkMetrics
MemoryMetrics
LogRouterMetrics
ClusterControllerMetrics
MovingData
TotalDataInFlight
DDTrackerStats
RkUpdate
RkUpdateBatch
DatacenterVersionDifference
SpringCleaningMetrics
SlowTask
Net2SlowTaskTrace
FailureDetectionStatus
ConnectingTo
ConnectionClosed
ConnectionTimedOut
ConnectionTimeout
ProgramStart
Role
TLogDegraded
LargeTransaction
ProxyCommitBatchMemoryThresholdExceeded
TooManyNotifications
TooManyStatusRequests
LargePacketSent
LargePacketReceived
HugeArenaSample
GetMagazineSample
SlowKAIOTruncate
SlowKAIOLaunch
KVCommit10sSample
StorageServerUpdateLag
TLogUpdateLag
TraceEventThrottle_*
TLS*
ServerTag
ServerTagRemove
MasterRecoveryState
MasterRecovering
MasterRecoveredConfig
MasterRegistrationReceived
MasterRecoveryDuration
MasterTerminated
BetterMasterExists
CCWDB
TLogRestorePersistentState
TLogRestorePersistentStateDone
TLogRecover
TLogReady
TLogPoppedTag
TLogLockStarted
TLogLocked
TLogStop
GetDurableResult
GetDurableResultWaiting
Rollback
ReadingDurableState
RestoringDurableState
SSTimeRestoreDurableState
StorageServerStartingCore
StatusMapChange
TeamHealthChanged
BuildTeamsBegin
ServerTeamRemover
MachineTeamRemover
DDInitTakingMoveKeysLock
DDInitTookMoveKeysLock
NominatingLeader
ReplacedAsLeader
LeaderTrueHeartbeat
LeaderFalseHeartbeat
LeaderNoHeartbeat
ReleasingLeadership
ClientStart
TransactionMetrics