Skip to content

Latest commit

 

History

History
626 lines (495 loc) · 17.5 KB

File metadata and controls

626 lines (495 loc) · 17.5 KB

Alarm Types

alarm-type
    cdb-offload-threshold-too-low
    certificate-expiration
    ha-alarm
        ha-node-down-alarm
            ha-primary-down
            ha-secondary-down
    ncs-cluster-alarm
        cluster-subscriber-failure
    ncs-dev-manager-alarm
        abort-error
        bad-user-input
        commit-through-queue-blocked
        commit-through-queue-failed
        commit-through-queue-failed-transiently
        commit-through-queue-rollback-failed
        configuration-error
        connection-failure
        final-commit-error
        missing-transaction-id
        ned-live-tree-connection-failure
        out-of-sync
        revision-error
    ncs-package-alarm
        package-load-failure
        package-operation-failure
    ncs-service-manager-alarm
        service-activation-failure
    ncs-snmp-notification-receiver-alarm
        receiver-configuration-error
    time-violation-alarm
        transaction-lock-time-violation

Alarm Type Descriptions

abort-error
  • Initial Perceived Severity
    major
  • Description
    An error happened while aborting or reverting a transaction. Device's configuration is likely to be inconsistent with the NCS CDB.
  • Recommended Action
    Inspect the configuration difference with compare-config, resolve conflicts with sync-from or sync-to if any.
  • Clear Condition(s)
    If NCS achieves sync with the device, or receives a transaction id for a netconf session towards the device, the alarm is cleared.
  • Alarm Message(s)
    • Device {dev} is locked
    • Device {dev} is southbound locked
    • abort error
alarm-type
  • Description
    Base identity for alarm types. A unique identification of the fault, not including the managed object. Alarm types are used to identify if alarms indicate the same problem or not, for lookup into external alarm documentation, etc. Different managed object types and instances can share alarm types. If the same managed object reports the same alarm type, it is to be considered to be the same alarm. The alarm type is a simplification of the different X.733 and 3GPP alarm IRP alarm correlation mechanisms and it allows for hierarchical extensions.
    A 'specific-problem' can be used in addition to the alarm type in order to have different alarm types based on information not known at design-time, such as values in textual SNMP Notification varbinds.
bad-user-input
  • Initial Perceived Severity
    critical
  • Description
    Invalid input from user. NCS cannot recognize parameters needed to connect to device.
  • Recommended Action
    Verify that the user supplied input are correct.
  • Clear Condition(s)
    This alarm is not cleared.
  • Alarm Message(s)
    • Resource {resource} doesn't exist
cdb-offload-threshold-too-low
  • Description
    CDB Offload threshold configuration set too low so that CDB memory footprint reaches the threshold even though there is no offloadable data present in the memory. The severity is warning.
  • Recommended Action
    If system memory is sufficient, increase the threshold value, otherwise increase the system memory capacity.
  • Clear Condition(s)
    This alarm is cleared when CDB offload can take the CDB memory footprint lower than the configured threshold value.
  • Alarm Message(s)
    • Too low /config/cdb/persistence/offload/threshold value.
certificate-expiration
  • Description
    The certificate is nearing its expiry or has already expired. The severity depends on the time left to expiry, it ranges from warning to critical.
  • Recommended Action
    Replace certificate.
  • Clear Condition(s)
    This alarm is cleared when the certificate is no longer loaded.
  • Alarm Message(s)
    • Certificate expires in less than {days} day(s)/Certificate has expired.
cluster-subscriber-failure
  • Initial Perceived Severity
    critical
  • Description
    Failure to establish a notification subscription towards a remote node.
  • Recommended Action
    Verify IP connectivity between cluster nodes.
  • Clear Condition(s)
    This alarm is cleared if NCS succeeds to establish a subscription towards the remote node, or when the subscription is explicitly stopped.
  • Alarm Message(s)
    • Failed to establish netconf notification subscription to node ~s, stream ~s
    • Commit queue items with remote nodes will not receive required event notifications.
commit-through-queue-blocked
  • Initial Perceived Severity
    warning
  • Description
    A commit was queued behind a queue item waiting to be able to connect to one of its devices. This is potentially dangerous since one unreachable device can potentially fill up the commit queue indefinitely.
  • Clear Condition(s)
    An alarm raised due to a transient error will be cleared when NCS is able to reconnect to the device.
  • Alarm Message(s)
    • Commit queue item ~p is blocked because item ~p cannot connect to ~s
commit-through-queue-failed
  • Initial Perceived Severity
    critical
  • Description
    A queued commit failed.
  • Recommended Action
    Resolve with rollback if possible.
  • Clear Condition(s)
    This alarm is not cleared.
  • Alarm Message(s)
    • Failed to authenticate towards device {device}: {reason}
    • Device {dev} is locked
    • {Reason}
    • Device {dev} is southbound locked
    • Commit queue item {CqId} rollback invoked
    • Commit queue item {CqId} has failed: Operation failed because: inconsistent database
    • Remote commit queue item ~p cannot be unlocked: cluster node not configured correctly
commit-through-queue-failed-transiently
  • Initial Perceived Severity
    critical
  • Description
    A queued commit failed as it exhausted its retry attempts on transient errors.
  • Recommended Action
    Resolve with rollback if possible.
  • Clear Condition(s)
    This alarm is not cleared.
  • Alarm Message(s)
    • Failed to connect to device {dev}: {reason}
    • Connection to {dev} timed out
    • Failed to authenticate towards device {device}: {reason}
    • The configuration database is locked for device {dev}: {reason}
    • the configuration database is locked by session {id} {identification}
    • the configuration database is locked by session {id} {identification}
    • {Dev}: Device is locked in a {Op} operation by session {session-id}
    • resource denied
    • Commit queue item {CqId} rollback invoked
    • Commit queue item {CqId} has failed: Operation failed because: inconsistent database
    • Remote commit queue item ~p cannot be unlocked: cluster node not configured correctly
commit-through-queue-rollback-failed
  • Initial Perceived Severity
    critical
  • Description
    Rollback of a commit-queue item failed.
  • Recommended Action
    Investigate the status of the device and resolve the situation by issuing the appropriate action, i.e., service redeploy or a sync operation.
  • Clear Condition(s)
    This alarm is not cleared.
  • Alarm Message(s)
    • {Reason}
configuration-error
  • Initial Perceived Severity
    critical
  • Description
    Invalid configuration of NCS managed device, NCS cannot recognize parameters needed to connect to device.
  • Recommended Action
    Verify that the configuration parameters defined in tailf-ncs-devices.yang submodule are consistent for this device.
  • Clear Condition(s)
    The alarm is cleared when NCS reads the configuration parameters for the device, and is raised again if the parameters are invalid.
  • Alarm Message(s)
    • Failed to resolve IP address for {dev}
    • the configuration database is locked by session {id} {identification}
    • {Reason}
    • Resource {resource} doesn't exist
connection-failure
  • Initial Perceived Severity
    major
  • Description
    NCS failed to connect to a managed device before the timeout expired.
  • Recommended Action
    Verify address, port, authentication, check that the device is up and running. If the error occurs intermittently, increase connect-timeout.
  • Clear Condition(s)
    If NCS successfully reconnects to the device, the alarm is cleared.
  • Alarm Message(s)
    • The connection to {dev} was closed
    • Failed to connect to device {dev}: {reason}
final-commit-error
  • Initial Perceived Severity
    critical
  • Description
    A managed device validated a configuration change, but failed to commit. When this happens, NCS and the device are out of sync.
  • Recommended Action
    Reconcile by comparing and sync-from or sync-to.
  • Clear Condition(s)
    If NCS achieves sync with a device, the alarm is cleared.
  • Alarm Message(s)
    • The connection to {dev} was closed
    • External error in the NED implementation for device {dev}: {reason}
    • Internal error in the NED NCS framework affecting device {dev}: {reason}
ha-alarm
  • Description
    Base type for all alarms related to high availablity. This is never reported, sub-identities for the specific high availability alarms are used in the alarms.
ha-node-down-alarm
  • Description
    Base type for all alarms related to nodes going down in high availablity. This is never reported, sub-identities for the specific node down alarms are used in the alarms.
ha-primary-down
  • Initial Perceived Severity
    critical
  • Description
    The node lost the connection to the primary node.
  • Recommended Action
    Make sure the HA cluster is operational, investigate why the primary went down and bring it up again.
  • Clear Condition(s)
    This alarm is never automatically cleared and has to be cleared manually when the HA cluster has been restored.
  • Alarm Message(s)
    • Lost connection to primary due to: Primary closed connection
    • Lost connection to primary due to: Tick timeout
    • Lost connection to primary due to: code {Code}
ha-secondary-down
  • Initial Perceived Severity
    critical
  • Description
    The node lost the connection to a secondary node.
  • Recommended Action
    Investigate why the secondary node went down, fix the connectivity issue and reconnect the secondary to the HA cluster.
  • Clear Condition(s)
    This alarm is cleared when the secondary node is reconnected to the HA cluster.
  • Alarm Message(s)
    • Lost connection to secondary
missing-transaction-id
  • Initial Perceived Severity
    warning
  • Description
    A device announced in its NETCONF hello message that it supports the transaction-id as defined in http://tail-f.com/yang/netconf-monitoring. However when NCS tries to read the transaction-id no data is returned. The NCS check-sync feature will not work. This is usually a case of misconfigured NACM rules on the managed device.
  • Recommended Action
    Verify NACM rules on the concerned device.
  • Clear Condition(s)
    If NCS successfully reads a transaction id for which it had previously failed to do so, the alarm is cleared.
  • Alarm Message(s)
    • {Reason}
ncs-cluster-alarm
  • Description
    Base type for all alarms related to cluster. This is never reported, sub-identities for the specific cluster alarms are used in the alarms.
ncs-dev-manager-alarm
  • Description
    Base type for all alarms related to the device manager This is never reported, sub-identities for the specific device alarms are used in the alarms.
ncs-package-alarm
  • Description
    Base type for all alarms related to packages. This is never reported, sub-identities for the specific package alarms are used in the alarms.
ncs-service-manager-alarm
  • Description
    Base type for all alarms related to the service manager This is never reported, sub-identities for the specific service alarms are used in the alarms.
ncs-snmp-notification-receiver-alarm
  • Description
    Base type for SNMP notification receiver Alarms. This is never reported, sub-identities for specific SNMP notification receiver alarms are used in the alarms.
ned-live-tree-connection-failure
  • Initial Perceived Severity
    major
  • Description
    NCS failed to connect to a managed device using one of the optional live-status-protocol NEDs.
  • Recommended Action
    Verify the configuration of the optional NEDs. If the error occurs intermittently, increase connect-timeout.
  • Clear Condition(s)
    If NCS successfully reconnects to the managed device, the alarm is cleared.
  • Alarm Message(s)
    • The connection to {dev} was closed
    • Failed to connect to device {dev}: {reason}
out-of-sync
  • Initial Perceived Severity
    major
  • Description
    A managed device is out of sync with NCS. Usually it means that the device has been configured out of band from NCS point of view.
  • Recommended Action
    Inspect the difference with compare-config, reconcile by invoking sync-from or sync-to.
  • Clear Condition(s)
    If NCS achieves sync with a device, the alarm is cleared.
  • Alarm Message(s)
    • Device {dev} is out of sync
    • Out of sync due to no-networking or failed commit-queue commits.
    • got: ~s expected: ~s.
package-load-failure
  • Initial Perceived Severity
    critical
  • Description
    NCS failed to load a package.
  • Recommended Action
    Check the package for the reason.
  • Clear Condition(s)
    If NCS successfully loads a package for which an alarm was previously raised, it will be cleared.
  • Alarm Message(s)
    • failed to open file {file}: {str}
    • Specific to the concerned package.
package-operation-failure
  • Initial Perceived Severity
    critical
  • Description
    A package has some problem with its operation.
  • Recommended Action
    Check the package for the reason.
  • Clear Condition(s)
    This alarm is not cleared.
receiver-configuration-error
  • Initial Perceived Severity
    major
  • Description
    The snmp-notification-receiver could not setup its configuration, either at startup or when reconfigured. SNMP notifications will now be missed.
  • Recommended Action
    Check the error-message and change the configuration.
  • Clear Condition(s)
    This alarm will be cleared when the NCS is configured to successfully receive SNMP notifications
  • Alarm Message(s)
    • Configuration has errors.
revision-error
  • Initial Perceived Severity
    major
  • Description
    A managed device arrived with a known module, but too new revision.
  • Recommended Action
    Upgrade the Device NED using the new YANG revision in order to use the new features in the device.
  • Clear Condition(s)
    If all device yang modules are supported by NCS, the alarm is cleared.
  • Alarm Message(s)
    • The device has YANG module revisions not supported by NCS. Use the /devices/device/check-yang-modules action to check which modules that are not compatible.
service-activation-failure
  • Initial Perceived Severity
    critical
  • Description
    A service failed during re-deploy.
  • Recommended Action
    Corrective action and another re-deploy is needed.
  • Clear Condition(s)
    If the service is successfully redeployed, the alarm is cleared.
  • Alarm Message(s)
    • Multiple device errors: {str}
time-violation-alarm
  • Description
    Base type for all alarms related to time violations. This is never reported, sub-identities for the specific time violation alarms are used in the alarms.
transaction-lock-time-violation
  • Initial Perceived Severity
    warning
  • Description
    The transaction lock time exceeded its threshold and might be stuck in the critical section. This threshold is configured in /ncs-config/transaction-lock-time-violation-alarm/timeout.
  • Recommended Action
    Investigate if the transaction is stuck and possibly interrupt it by closing the user session which it is attached to.
  • Clear Condition(s)
    This alarm is cleared when the transaction has finished.
  • Alarm Message(s)
    • Transaction lock time exceeded threshold.