Simple scripts for working with Apache Cassandra logs.
These are scripts I created to make it simpler to read/summarise/parse the system.log
.
They were intended to be overly simple for readability and for portability, i.e. they can just run on any machine that can run Bourne shell or Perl without having to download additional modules or plugins.
Extracts the compaction throughput from a Cassandra system.log
.
Use this script to get a feel for the compaction in MB/s.
Usage: compaction_rate.sh <system_log> [min_data_size_bytes]
Sample output:
$ compaction_rate.sh system.log 20000000
Throughput: 16.228230MB/s. | Data size: 38798031 (37 MB) | SSTables count: 2
Throughput: 15.939397MB/s. | Data size: 41834507 (39 MB) | SSTables count: 2
Throughput: 15.885001MB/s. | Data size: 45639407 (43 MB) | SSTables count: 2
Counts the occurrences of a string for each hour in the Cassandra system.log
.
Use this script to work out whether load has increased during a particular period.
Useful query strings:
- "ParNew" - shows distribution of GC pauses
- "Compacted" - shows distribution of compaction activity
- "flush of Memtable" - shows flushing activity, indicates traffic
- "ConcurrentMarkSweep" - indicates GC pressure
- "Started hinted handoff" - indicates existence of unresponsive nodes
Sample output:
$ count_entries_per_hour.sh ParNew system.log
2015-04-28 12:00 - 8
2015-04-28 16:00 - 108
2015-04-28 20:00 - 202
Displays entries in Cassandra logs relating to compaction of large rows.
Useful for showing rows larger than a given size, e.g. 100MB.
Usage: display_large_rows.sh <system_log> [min_row_size_bytes]
Sample output:
$ display_large_rows.sh system.log 1000000000
INFO [CompactionExecutor:73] 2015-01-14 19:11:45,959 CompactionController.java (line 192) Compacting large row myKS/myCF:8e6fb0b72937 (1407625692 bytes) incrementally | 1342 MB
INFO [CompactionExecutor:73] 2015-01-14 19:13:09,901 CompactionController.java (line 192) Compacting large row myKS/myCF:2eec906de37b (1410187132 bytes) incrementally | 1344 MB
INFO [CompactionExecutor:73] 2015-01-14 19:14:34,765 CompactionController.java (line 192) Compacting large row myKS/myCF:ce49043461ce (2871138316 bytes) incrementally | 2738 MB
A script I use to do a quick assessment of a Cassandra node. It highlights restarts, GC activity, dropped mutations, large rows, errors, etc.
Usage: initial_log_assess.sh system.log
The directory names for nodes in the OpsCenter Diagnostic tarball are quite long so I wrote this script to simplify them to just the IP address of the nodes.
Usage: cd <diagnostic_dir>/nodes && rename_node_dirs.sh
Sample output:
$ rename_node_dirs.sh
Renaming [opsc-2015-04-29-11-54-21-UTC-172.31.3.100] to [172.31.3.100]... OK
Renaming [opsc-2015-04-29-11-54-21-UTC-172.31.36.54] to [172.31.36.54]... OK
Renaming [opsc-2015-04-29-11-54-21-UTC-172.31.43.125] to [172.31.43.125]... OK
Summarises a Cassandra log file into line counts by message type, thread names, classes and de-personalised messages.
Usage: summarise_log_cass.pl -f system.log
Sample output:
===== Summarising log file [system_678571075.log] =====
===== Count of message entries by log level =====
15708 --- INFO
311 --- WARN
===== Count of message entries by thread =====
9871 --- [ScheduledTasks]
1512 --- [FlushWriter]
1263 --- [MemoryMeter]
1068 --- [RequestResponseStage]
814 --- [CompactionExecutor]
305 --- [HANDSHAKE#.#.#.#/#.#.#.#]
279 --- [WRITE-/#.#.#.#]
228 --- [main]
219 --- [HintedHandoff]
...
===== Count of message entries by class =====
9239 --- StatusLogger.java
2775 --- Memtable.java
1233 --- Gossiper.java
808 --- ColumnFamilyStore.java
593 --- CompactionTask.java
305 --- OutboundTcpConnection.java
280 --- SSLFactory.java
202 --- HintedHandOffManager.java
118 --- GCInspector.java
...
===== Count of message entries =====
9239 --- INFO [ScheduledTasks] StatusLogger.java ...
1234 --- INFO [MemoryMeter] Memtable.java CFS(Keyspace='...') liveRatio is #.# (just-counted was #.#). calculation took #ms for # cells
1067 --- INFO [RequestResponseStage] Gossiper.java InetAddress /#.#.#.# is now UP
756 --- INFO [FlushWriter] Memtable.java Writing Memtable-<table>@#(#/# serialized/live bytes, # ops)
743 --- INFO [FlushWriter] Memtable.java Completed flushing <db_file> (# bytes) for commitlog position ReplayPosition(segmentId=#, position=#)
500 --- INFO [ScheduledTasks] ColumnFamilyStore.java Enqueuing flush of Memtable-<table>
298 --- INFO [CompactionExecutor] CompactionTask.java Compacting [SSTableReader(path=<sstable_files>), ...]
295 --- INFO [CompactionExecutor] CompactionTask.java Compacted # to [<sstable_files>]...
279 --- INFO [HANDSHAKE#.#.#.#/#.#.#.#] OutboundTcpConnection.java Handshaking version with #.#.#.#/#.#.#.#
279 --- WARN [WRITE-/#.#.#.#] SSLFactory.java Filtering out TLS_RSA_WITH_AES_#_CBC_SHA,TLS_DHE_RSA_WITH_AES_#_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_#_CBC_SHA as it isnt supported by the socket
155 --- INFO [GossipTasks] Gossiper.java InetAddress /#.#.#.# is now DOWN
106 --- INFO [ScheduledTasks] GCInspector.java GC for ConcurrentMarkSweep: # ms for # collections, # used; max is #
100 --- INFO [HintedHandoff] HintedHandOffManager.java Started hinted handoff for host: UUID with IP: /#.#.#.#
...
This script is also run from the nodes directory, it will look for "system.log" files and then grab the first and last lines with the date string in them. The idea is to give the user the range the logs cover
This script takes input from nodetool info
and prints the nodes uptime in a more 'friendly' format .
Useful to see how long has been up when looking at other nodetool commands.