Skip to content

I Kytos Workshop BluePrint

Diego Rabatone Oliveira edited this page Feb 14, 2017 · 2 revisions

TITLE

Statistics Gathering, Storing and Exporting

STATUS

Under construction http://i.imgur.com/IRbYnFH.gif

ABSTRACT

(the problem) In order to support the Operations and Management (O&M) of a production SDN Network within a defined Service Level Agreement (SLA), it is a requirement the permanent / long term collection, storage and statistical analysis of counters and flow data. Which are the key information required to support O&M (Operations and Management) of a production SDN Network with a SLA?

DESCRIPTION

(problem detailed description) The O&M of a production network requires sets of baseline of operational information and KPIs (Key Performance Indicators) to allow for analisys and troubleshooting and recovery in the lowest possible time in order to fulfill SLA (Service Level Agreements) with its clients and users.

REQUIREMENTS

  • Flow Statistics
    • List of Flow Entries per device every X time units
    • Dump of the Flow Table of each device on a given date in the past
    • Number of simultaneous flows per device in a determined point in time (device == data_path_id)
    • Number of simultaneous flows per network in a determined point in time (sum of all devices' flows)
    • Flow duration (to identify flapping flows)
    • Flow countersal
    • Path trace
  • Packet Statistics
    • Rate of FlowMods per interval (per switch?)
    • Rate of PacketOuts per interval (per switch?)
    • Rate of StatsRequest per interval (per switch?
  • Keep an histogram of sensible header info. Ex: which OF header types were used, etc.
    • Also keep an histogram of (control) messages order. (maybe a certain sequence of control messages may freeze the device, but it may not be the case if it happened before)
  • Switch's CPU and Memory Utilization (what about the controller? does make sense to monitor controller resource usage? - yes!!)
  • Port Status (and possible status changes) for each device
  • Device's flow version
  • Classification of statistics according to their domain (controller) (what are the possible domains to be evaluated?)
  • Persistent back-end customization (time series)
    • Possibility to use relational (MariaDB, Oracle, etc) or non-relational (MongoDB, plain text) storage
    • Scalable back-end to support thousands of entries

SOLUTION

(solution proposal for the problem)

WHITEBOARD

  • Will it process large number of messages (big data)? - A compressed text storage can be useful for Hadoop, Spark, Elastic Search, etc...
  • Need to define the time interval to avoid CPU spikes and keep a reasonable back-end size
  • Informations required for Operation are the same of those required for Management?
  • Define an API (at least basic) to gather statistics
  • Um snapshot das informações sobre a rede pode ser importante para análises e replay de dados históricos (qual versão do protocolo OpenFlow estava em uso, qual versão de firmware etc.)

DEPENDENCY TREE

(Related blueprints, if any)

Clone this wiki locally