Skip to content

Commit

Permalink
Post London workshop updates
Browse files Browse the repository at this point in the history
- add initial introduction about statuses vs alarms
- add more examples
- update connectionStatus list to cover late/lost packets
- change statusReportingDelay default to 3s
- add section about deactivating a receiver
- add paragraph about autoResetPacketCounters
- Add definition for Receiver activation
- Add paragraph about syncSource changes, how they are reflected in the status and how a new sync source can be approved
- Add paragraph about how statusReportingDelay is applied for statuses
  • Loading branch information
cristian-recoseanu committed Jul 16, 2024
1 parent ed74ed1 commit 543dde5
Showing 1 changed file with 71 additions and 10 deletions.
81 changes: 71 additions & 10 deletions docs/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,11 @@ _(c) AMWA 2021, CC Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0)_

## Introduction

The aim of this BCP document is to describe the expectations, behaviour and conformance requirements for Devices with stream Receivers in terms of status monitoring.
Alarms are context and workflow specific, and in general determined by a higher level monitoring system, with different calculations for different users. For example, a hardware error status (such as link down) from a device not actively being used would not cause an alarm to a live workflow operator, but the same status condition would escalate an alarm to a maintenance engineer who needs to prepare that device for future operational use.

This BCP document does not attempt to define alarms but instead it describes the expectations, behavior and conformance requirements for Devices with stream Receivers in terms of status monitoring.

The [overall status](#receiver-overall-status) concepts defined in this document are intended to make it easy to calculate a typical operator alarm condition. In simple systems with no higher level monitoring system, the `overallStatus` can be used directly as a simple pre-defined non-configurable operator alarm condition, without in any way limiting a monitoring system's ability to take the same status values and calculate one or more different alarm conditions appropriate to other desired workflows or users.

This document relies on previous familiarity with the following existing documents:

Expand All @@ -36,6 +40,8 @@ and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]

The NMOS terms 'Controller', 'Node', 'Source', 'Flow', 'Sender', 'Receiver' are used as defined in the [NMOS Glossary](https://specs.amwa.tv/nmos/main/docs/Glossary.html).

Receiver activation - An [IS-05 activation](https://specs.amwa.tv/is-05/latest/docs/Interoperability_-_IS-04.html#identifying-active-connections) which results in the Receiver having the required transport parameters and a `master_enable` status of `true`. This can happen for an idle receiver but also when the receiver is already activated and a client is applying new transport parameters.

## Prerequisites

Devices in conformance to this BCP MUST use [NMOS Control Framework](https://specs.amwa.tv/ms-05-02/) for generating device models.
Expand All @@ -58,7 +64,9 @@ Devices MUST follow the rules listed below when mapping specific domain statuses
* When the Receiver is Inactive the overall status uses the Inactive option
* When the Receiver is Active the overall status takes the worst state across the different domains (if one status is PartiallyHealthy (or equivalent) and another is Unhealthy (or equivalent) then the overall status would be Unhealthy)
* The overall status is Healthy only when all domain statuses are either Healthy or a neutral state (e.g. Not used)
* When activating a Receiver, it is expected for devices to go through a period of instability when connecting to the new stream. The overall status transitions immediately to a Healthy state and delays the reporting of errors for a configurable amount of time (devices use 5s as the default overall status reporting delay) after which it can transition to PartiallyHealthy or Unhealthy by taking the worst state across the different domains.
* When activating a Receiver, it is expected for devices to go through a period of instability when connecting to the new stream. The overall status transitions immediately to a Healthy state and delays the reporting of errors for a configurable amount of time (see `statusReportingDelay`) after which it can transition to PartiallyHealthy or Unhealthy by taking the worst state across the different domains.

The `statusReportingDelay` property allows clients to customize the reporting delay used by devices to report statuses. Devices MUST use 3s as the default value. All status reporting properties MUST delay the transition to a more healthy state by the configured `statusReportingDelay` value and MUST only make the transition if the healthier state is maintained for the duration (this does not apply when starting from neutral values like Inactive or NotUsed where devices MUST make an immediate transition). All status reporting properties MUST make a transition to a less healthy state as soon as possible.

The proposed models are minimal and they can be implemented as is or derived in [vendor specific variants](https://specs.amwa.tv/ms-05-02/latest/docs/Introduction.html) which can add more statuses, properties and methods.

Expand All @@ -76,6 +84,7 @@ This includes the following specific items which cover the connectivity domain:
* linkStatusMessage
* connectionStatus
* connectionStatusMessage
* autoResetPacketCounters
* Methods
* GetLostPackets
* GetLatePackets
Expand All @@ -95,7 +104,15 @@ Devices specify if:
* Some of the interfaces are Down (equivalent to a PartiallyHealthy state)
* All of the interfaces are Up (equivalent to a Healthy state)

The link status message is an optional nullable property where devices can offer the reason and further details as to why the current status value was chosen.
The link status message is a nullable property where devices can offer the reason and further details as to why the current status value was chosen.

Devices are recommended to publish information about which interfaces are down in the link status message.

Example:

```log
NIC1, NIC2 are down
```

### Connection status monitoring

Expand All @@ -106,11 +123,11 @@ Connection status monitoring allows devices to expose the health of the receiver
Devices specify:

* When the receiver is Inactive (is a neutral state)
* Healthy when the receiver is Active and receiving packets without using any form of loss recovery
* PartiallyHealthy when the receiver is Active and is receiving packets but some form of loss recovery is being used (e.g. redundant leg recovery or some form of FEC)
* Unhealthy when the receiver is Active and is either not receiving any packets or receiving packets but has unrecoverable errors
* Healthy when the receiver is Active and receiving all required packets without using any form of loss recovery
* PartiallyHealthy when the receiver is Active and is receiving all required packets but some form of loss recovery is being used (e.g. redundant leg recovery or some form of FEC)
* Unhealthy when the receiver is Active and is either not receiving any packets or receiving packets but has unrecoverable errors (such as late or lost packets)

The connection status message is an optional nullable property where devices can offer the reason and further details as to why the current status value was chosen.
The connection status message is a nullable property where devices can offer the reason and further details as to why the current status value was chosen.

### Late and lost packets

Expand All @@ -122,6 +139,8 @@ The feature is expressed with the following methods:
* GetLatePacketCounters - returns a collection of counters which hold the name and numeric value of the counter (this allows more capable devices to report late packets across different interfaces).
* ResetPacketCounters - allows a client application to reset both the Lost and Late packet counters to 0.

The `autoResetPacketCounters` property allows clients to configure if the packet counters automatically reset with each Receiver activation (by default devices MUST have this enabled). If this is enabled, receivers MUST reset all packet counters to 0 after each activation.

## Receiver synchronization

The technical model describing the monitoring requirements for a receiver is [NcReceiverMonitor](https://specs.amwa.tv/nmos-control-feature-sets/branches/publish-status-reporting/monitoring/#ncreceivermonitor).
Expand All @@ -131,6 +150,8 @@ This includes the following specific items which cover the synchronization domai
* synchronizationStatus
* synchronizationStatusMessage
* synchronizationSourceId
* Methods
* ApproveCurrentSynchronizationSource

| ![Receiver synchronization](images/receiver-model-synchronization.png) |
|:--:|
Expand All @@ -144,15 +165,35 @@ Devices specify:

* When the receiver is not using external synchronization (is a neutral state)
* When the receiver synchronization status is healthy (locked to a synchronization source)
* When the receiver synchronization status is partially healthy (partially locked to a synchronization source)
* When the receiver synchronization status is partially healthy (partially locked to a synchronization source or suffered a sync source transition)
* When the receiver synchronization status is unhealthy (not locked to a synchronization source)

The synchronization status message is an optional nullable property where devices can offer the reason and further details as to why the current status value was chosen.
The synchronization status message is a nullable property where devices can offer the reason and further details as to why the current status value was chosen.

Devices are recommended to publish information about the previous synchronization source and interface retrieved from as well as the current synchronization source and interface retrieved from in the synchronization status message.

Example:

```log
previousSync:baseband from SDI1, currentSync: 0x00:0c:ec:ff:fe:0a:2b:a1 from NIC1
```

or

```log
previousSync:0x70:35:09:ff:fe:c7:da:00 from NIC1, currentSync: 0x00:0c:ec:ff:fe:0a:2b:a1 from NIC2
```

### Synchronization source change

When devices are configured to use synchronization they MUST publish the synchronization source id currently being used and update the property whenever it changes, using `null` if a synchronization source cannot be discovered. For devices which are not using synchronization this property MUST be set to `null`.

When devices suffer a synchronization source change the synchronizationStatus property MUST transition to a `PartiallyUnhealthy` state and stay there unless the following conditions apply:

* The device returns to using the previous sync source and therefore can transition synchronizationStatus to `Healthy` if there are no other synchronization issues
* A client invokes the ApproveCurrentSynchronizationSource and therefore the receiver can transition synchronizationStatus to `Healthy` if there are no other synchronization issues
* The receiver is Activated and therefore MUST approve the current sync source and transition synchronizationStatus to `Healthy` if there are no other synchronization issues

## Receiver stream validation

The technical model describing the monitoring requirements for a receiver is [NcReceiverMonitor](https://specs.amwa.tv/nmos-control-feature-sets/branches/publish-status-reporting/monitoring/#ncreceivermonitor).
Expand All @@ -177,4 +218,24 @@ Devices specify:
* PartiallyHealthy when the receiver is Active and can decode the incoming stream but there are inconsistencies in the stream with what the device is expecting
* Unhealthy when the receiver is active and cannot decode the incoming stream

The stream status message is an optional nullable property where devices can offer the reason and further details as to why the current status value was chosen.
The stream status message is a nullable property where devices can offer the reason and further details as to why the current status value was chosen.

Examples:

```log
Unexpected stream format
```

```log
Parameter X does not match expectations
```

### Deactivating a receiver

A Receiver is deactivated after an [IS-05 activation](https://specs.amwa.tv/is-05/latest/docs/Interoperability_-_IS-04.html#identifying-active-connections) results in the Receiver `master_enable` becoming `false`.

When a receiver is being deactivated it MUST cleanly disconnect from the current stream by not generating intermediate unhealthy states (PartiallyHealthy or Unhealthy) and instead transition directly to `Inactive` for the following statuses:

* overallStatus
* connectionStatus
* streamStatus

0 comments on commit 543dde5

Please sign in to comment.