Skip to content

Latest commit

 

History

History
54 lines (32 loc) · 3.27 KB

README.md

File metadata and controls

54 lines (32 loc) · 3.27 KB

Tegrastats -> Telegraf -> Influxdb -> Grafana

I'm developing a series of AI devices Nvidia Jetson based.

In order to remote monitor them I use influxdb. I send the metrics using Telegraf, and visualize them using Grafana.

For the most part I use This beautiful dashboard, but for the more specific parts like temperature, GPU, hardware encoders, etc. I'm going with the nvidia tool tegrastat

So the basic workflow is:

1. Generate the logs:

tegrastats tegrastats --interval 10000 --logfile /var/log/tegrastat

2. Parse the logs with telegraf inputs.tail plugin:

Excerpt from /etc/telegraf/telegraf.conf

[[inputs.tail]]
  ## file(s) to tail:
  files = ["/var/log/tegrastat"]
  from_beginning = false
  data_format = "grok"

  #name of the "Metric" (which I want to see in Grafana eventually)
  name_override = "tegrastat"

 grok_patterns = ["%{CUSTOM_LOGS}"]

 grok_custom_patterns = '''
CUSTOM_LOGS %{NUMBER:ramused:int}/%{NUMBER:ramtotal:int}MB \(lfb %{NUMBER:pages:int}x4MB\) CPU \[%{NUMBER:cpu1percentaje:int}\%@%{NUMBER:cpu1freq:int},%{NUMBER:cpu2percentaje:int}\%@%{NUMBER:cpu2freq:int},%{NUMBER:cpu3percentaje:int}\%@%{NUMBER:cpu3freq:int},%{NUMBER:cpu4percentaje:int}\%@%{NUMBER:cpu4freq:int},%{NUMBER:cpu5percentaje:int}\%@%{NUMBER:cpu5freq:int},%{NUMBER:cpu6percentaje:int}\%@%{NUMBER:cpu6freq:int},%{NUMBER:cpu7percentaje:int}\%@%{NUMBER:cpu7freq:int},%{NUMBER:cpu8percentaje:int}\%@%{NUMBER:cpu8freq:int}\] EMC_FREQ %{NUMBER:emcfreqpercentaje:int}\%@%{NUMBER:emcfreq:int} GR3D_FREQ %{NUMBER:gr3dfreqpercentaje:int}\%@%{NUMBER:gr3dfreq:int} NVENC %{NUMBER:nvencfreq:int} NVENC1 %{NUMBER:nvenc1freq:int} APE %{NUMBER:ape:int} MTS fg %{NUMBER:mts_fg:int}\% bg %{NUMBER:mts_bg:int}\% AO@%{NUMBER:ao_temp:float}C GPU@%{NUMBER:gpu_temp:float}C Tboard@%{NUMBER:tboard_temp:float}C Tdiode@%{NUMBER:tdiode_temp:float}C AUX@%{NUMBER:aux_temp:float}C CPU@%{NUMBER:cpu_temp:float}C thermal@%{NUMBER:thermal_temp:float}C PMIC@100C GPU %{NUMBER:gpupowercur:int}/%{NUMBER:gpupoweravg:int} CPU %{NUMBER:cpupowercur:int}/%{NUMBER:cpupoweravg:int} SOC %{NUMBER:socpowercur:int}/%{NUMBER:socpoweravg:int} CV %{NUMBER:cvpowercur:int}/%{NUMBER:cvpoweravg:int} VDDRQ %{NUMBER:vddrqpowercur:int}/%{NUMBER:vddrqpoweravg:int} SYS5V %{NUMBER:sys5vpowercur:int}/%{NUMBER:sys5vpoweravg:int}
'''

In my case I have an output plugin [[outputs.influxdb]] which sends the data to my influx instance.

3. Then you can of course graph them, set alarms, etc. as usual.

Jetson Xavier Temperatures

4. See attached my grafana panel configuration

Nvidia Jetson Grafana Dashboard

Notes:

I didn't know grok, which is a nice line-parser a la regex. Here a couple of tools that I used to assemble the custom pattern:

https://grokdebug.herokuapp.com/

http://grokconstructor.appspot.com/do/match#result