Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STaaS Troubleshooting #4105

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/STaaS_vs_Cassandra.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/SystemCenter_DB_per_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/SystemCenter_STaaS.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ In addition, the following **other limitations** currently apply:

## Troubleshooting

For troubleshooting information related to STaaS, see [Troubleshooting – STaaS](xref:Troubleshooting_STaaS_Issues).
For troubleshooting information related to STaaS, see [Troubleshooting – STaaS](xref:Troubleshooting_STaaS).

> [!NOTE]
> If you experience any issues during setup or while using Storage as a Service, and you cannot resolve these using the available troubleshooting information, contact <[email protected]>.
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
---
uid: Troubleshooting_STaaS_Issues
uid: STaaS_Error_messages
---

# Troubleshooting – STaaS

This page provides solutions to common issues that you may encounter while using STaaS. It covers problems related to dataminer.services connectivity, registration, token expiration, and service reachability.
# STaaS error messages

## The DMS is not connected to dataminer.services

Expand Down
265 changes: 265 additions & 0 deletions user-guide/Troubleshooting/Procedures/STaaS_Troubleshooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
---
uid: Troubleshooting_STaaS
---

# Troubleshooting – STaaS

[Storage as a Service (STaaS)](xref:STaaS) is a cloud-native data storage architecture that allows you to securely store your data without the need to maintain databases yourself.

One of the key advantages of STaaS is its ability to replace storage solutions like Cassandra and Elastic, while providing its own backup mechanism through Microsoft Azure.

> [!IMPORTANT]
> Communication between DataMiner Agents and STaaS occurs over the internet. This means all DataMiner Agents must:
>
> - Have internet access.
> - Be able to reach the STaaS endpoints.

> [!NOTE]
> Every interaction with the cloud has a cost. As with any storage system, the number of interactions should be reduced to a minimum. Using STaaS will highlight any inefficiencies because of their direct impact on cost. How and when to optimize this is specific to the integration.

## Architecture

The diagram below provides an overview of two clusters using STaaS versus Cassandra.

![STaaS vs Cassandra](~/images/STaaS_vs_Cassandra.png)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image appears quite complex and could benefit from additional context. What is the primary message or focus you want to convey here, @robin-devos-skyline? Perhaps we can clarify this by adding a brief explanation or key takeaway below the image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is brief enough 😇
DMS A and B, provide a compact visualization on the interaction towards STaaS.
SLDataGateway sends/receives data from/to STaaS. However each STaaS-connected agent requires a secure HTTPS channel towards the STaaS DB (i.e. Microsoft Azure endpoint, being West Europe or UK South). This is achieved via port 443, but also by using an access token given by the Cloud Gateway. (you can see this token as the key to unlock the door to this secure channel).
DMS A has a Cloud Gateway installed on both agents, as opposed to DMS B which only has 1 Cloud Gateway in the cluster (which is the minimum per cluster).

DMS C and D represent the visualization for Cassandra related databases.
DMS D uses local Cassandra nodes and an external Open/ElasticSearch node, whereas DMS C points towards a Cassandra (and Open/ElasticSearch) Cluster.
The key takeaway here is that all DMS <> DB communication goes via SLDataGateway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for the extra info! Added in commit 65e35b0.


## Investigation

### Verify your setup is using STaaS

There are two ways to verify if you are using a STaaS setup:

- **In DataMiner Cube**:

1. Navigate to *System Center* > *Database* > *General*.

1. Check if "STaaS" is entered in the *Database* field.

![System Center - Database set to STaaS](~/images/SystemCenter_STaaS.png)

> [!NOTE]
> The type of database (i.e. *Database per cluster* or *Database per Agent*) is not relevant, as all data from the cluster will be stored the same way.
>
> For example:
>
> ![System Center - Database per cluster](~/images/SystemCenter_DB_per_cluster.png)

- **In the *DB.xml* file**:

1. Open *C:\Skyline DataMiner\DB.xml*.

1. Verify that the `type` attribute of the `<Database>` element is set to `CloudStorage`.

If the `type` attribute is set to something other than `CloudStorage`, the system is not configured for STaaS.

Example of *DB.xml* file with a non-STaaS setup:

```xml
<DataBase search="false" active="True" local="true" type="CassandraCluster">
...
</DataBase>
```

- **Only applicable to Skyline employees**: Navigate to the *CDMR Agent* element and check whether the *DB Engine type* on the *Database* page is set to "CloudStorage".

### Check if the prerequisites are met

The following prerequisites must be met for a successful STaaS setup:

- DataMiner version 10.4.0 or higher.

- [CloudGateway DxM](#install-or-upgrade-the-cloudgateway-dxm) version 2.8.0 or higher, deployed on at least one DataMiner Agent.

- A DataMiner System [connected to dataminer.services](#verify-your-dms-is-connected-to-dataminerservices).

- A [working internet connection](#verify-your-dma-has-a-working-internet-connection).

#### Install or upgrade the CloudGateway DxM

To install the *CloudGateway* module:

1. In the Admin app, check whether the correct organization is mentioned in the header bar.

> [!TIP]
> See also: [Accessing the Admin app](xref:Accessing_the_Admin_app)

1. If a different organization should be selected, click the organization selector ![Organization selector](~/user-guide/images/Cloud_Admin_Selector_icon.png) in the top-right corner and select the organization in the list.

1. In the pane on the left, under *DataMiner Systems*, select your DataMiner System and select the *DxMs* page.

1. Locate the relevant node (i.e. the DMA).

1. Next to the *CloudGateway* module, click *Deploy* to start the automatic installation process.

If the *CloudGateway* module is installed on your DMA already, **verify that it is version 2.8.0 or higher**:

- **Using the Admin app**:
EdithLansens marked this conversation as resolved.
Show resolved Hide resolved

1. On the *DxMs* page, locate the relevant node (i.e. the DMA).

1. Verify that the current version of the *CloudGateway* DxM is 2.8.0 or higher.

1. If necessary, click the *Upgrade* button to upgrade the module to a more recent version.

- **Using SLLogCollector**:

1. Run the [SLLogCollector tool](xref:SLLogCollector).

1. Open the resulting package and navigate to *Logs* > *DxM* > *DataMiner CloudGateway* > *DataMiner CloudGateway.exe_version.txt*.

1. Confirm the CloudGateway DxM version under *Product Version*.

- **Using the Skyline Admin app** (for Skyline Technical Support only):

1. In the [Skyline Admin app](https://skyline-admin.dataminer.services/organization), go to the *Organizations* tab and enter your organization in the search box to filter the results.

1. Click the eye icon next to your organization to access an overview of all DataMiner Systems created under it.

1. Locate your DMS and click the eye icon next to it.

1. Select the node (i.e. the DMA) on which the *CloudGateway* module is installed to expand the overview of deployed DxMs. This includes details such as DxM version, DxM data timestamp, and extra data.

1. Find the *DataMiner CloudGateway* DxM in the overview and verify that its version is 2.8.0 or higher.

#### Verify your DMS is connected to dataminer.services

> [!TIP]
> See also: [Connecting your DataMiner System to dataminer.services](xref:Connecting_your_DataMiner_System_to_the_cloud)

There are two ways to verify whether your DMS is connected to dataminer.services:

- **Using DataMiner Cube**:

1. Go to the System Center > *Cloud* page, and select *Open dataminer.services*.
EdithLansens marked this conversation as resolved.
Show resolved Hide resolved

1. On this page, check the connection status between your system and the dataminer.services platform.

If a green icon and a green bar are displayed next to the DMS information, your DMS is connected to dataminer.services.

> [!TIP]
> For details about other connection states, see [The dataminer.services home page](xref:dataminer_services_home_page#connection-states).

- **Using the SLCloudStorage log file**:

1. Navigate to the *C:/Skyline DataMiner/logging* folder of your DataMiner Agent, and open the *SLCloudStorage.txt* log file.

1. Check whether you can find the following error message in the log file:

```txt
CloudSettings could not be retrieved from the cloud. Retrying in 00:00:05. Exception: SLCloudStorageConnection.Repositories.Exceptions.CloudSettingsRepositoryException: Failed to do GetCloudAccessTokenRequest. Received the following error messages: { "message": "This DMS is not Cloud Registered." }
```

This error message will be present if your system is trying to use STaaS but is not connected to dataminer.services.

#### Verify your DMA has a working internet connection

Ensure your STaaS-connected DMA has a working internet connection.

Verify the following:

- Confirm that the firewall allows traffic through port 443.

- Verify that the following endpoints are reachable:

- STaaS West Europe: 20.76.71.123

- STaaS UK South: 20.162.131.128

## Common pitfalls

### Cloud session token expiration

Description:

In the *SLCloudStorage.txt* log file, you may encounter entries similar to the examples under [The session token has expired](xref:STaaS_Error_messages#the-session-token-has-expired).

Impact:

- DataMiner will not be able to start up.

- Data cannot be retrieved or stored.

- Modules requiring an indexing database, such as SRM and Process Automation, cannot load.

Reason:

- The *CloudGateway* module is unable to refresh the cloud session automatically.

Actions:

- Use the SLNetClientTest tool to attempt to resolve the issue, as outlined under [The session token has expired](xref:STaaS_Error_messages#the-session-token-has-expired).

### DataMiner does not start up after registration

Description:

In the *SLError.txt* log file, you may find the error message detailed under [DataMiner is unable to start up after registration](xref:STaaS_Error_messages#dataminer-is-unable-to-start-up-after-registration).

Impact:

- DataMiner will not be able start up.

- DataMiner startup gets stuck at 99%.

Reason:

- The DMA was previously registered under a different cloud organization.

- The *CloudGateway* module is not running.

Actions:

1. Make sure the CloudGateway DxM is running:

1. Open Windows Task Manager and check whether the process called *DataMiner CloudGateway* is running.

1. Alternatively, check the *Services* tab to see if the service with the same name has the *Running* status.

1. Manually remove the file *C:\ProgramData\Skyline Communications\DxMs Shared\Data\NodeId.txt*.

1. Restart the DMA.

### Slow interaction/performance

Impact:

- Overall system slowness.

- Inability to retrieve trend graphs, alarm history, etc.

Reason:

- High volume of interaction with dataminer.services.

- The upload/download bandwidth of the local internet connection is insufficient to handle the high load, resulting in queueing.

- dataminer.services cannot handle the high load, resulting in queueing.

- (Poor) integration design.

Actions:

- Navigate to the *C:/Skyline DataMiner/Logging* folder of your DataMiner Agent, open the *SLCloudStorage.txt* log file, and look for throttling warnings.

- Make sure you have the latest improvements to InterApp read calls, with the most recent updates introduced in DataMiner 10.4.6.

> [!TIP]
> For more information, refer to [Skyline DataMiner Core InterAppCalls Range 1.0.1.1](xref:Skyline_DataMiner_Core_InterAppCalls_Range_1.0#1011).

DataMiner Systems may experience performance issues because of the high number of interactions when using SLNet Subscriptions for specific actions. In DataMiner 10.3.12, improvements were made to increase the efficiency of these interactions, with further enhancements added in DataMiner 10.4.6.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if I interpreted the section above correctly. It was structured and worded differently in the original document, but I made some adjustments for the sake of clarity. Could you double-check if the content is still accurate, @robin-devos-skyline?


- Only applicable to Skyline employees: Check whether dataminer.services is queueing with [Microsoft Azure](https://portal.azure.com/#view/AppInsightsExtension/WorkbookViewerBlade/ComponentId/azure%20monitor/ConfigurationId/%2Fsubscriptions%2Fc1a16bf4-039a-4778-8053-72e813c52ca4%2Fresourcegroups%2Frg-workbooks%2Fproviders%2Fmicrosoft.insights%2Fworkbooks%2Fd36c92a8-ef00-4c26-bf09-13962d3b705d/WorkbookTemplateName/Shared%20Cloud%20Storage).

- EventHub: Throttled requests by EventHub.

- Events: Event queue time. Check the instance that is used by the DMA:

- weu = West Europe.

- uks = UK South.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could benefit from further explanation, @robin-devos-skyline. Could we clarify these points?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, when there appears to be a delay in cloud activity (in our STaaS context, e.g. my trend graph takes ages to load). Then we can check the provided azure link to see if there are delays on Azure side.

When opening the link, 2 sections give a good visual on these delays, i.e.:

  • the graph "Throttled requests by EventHub" under the "EventHub" tab page
  • the graph "Event queue time" under the "Event" tab page

In both cases, you can filter on the TimeRange and the Instance (i.e. endpoint used). => so "Check the instance that is used by ...", might be best put on a different line.

A screenshot might clarify things further
image

- Ask the customer's IT department to verify the upload and download bandwidth usage.
EdithLansens marked this conversation as resolved.
Show resolved Hide resolved

## Adding a new DMA to a DMS running STaaS

When adding a new DataMiner Agent to a DMS that is using STaaS, some additional steps are required compared to the instructions for [adding a regular DataMiner Agent](xref:Adding_a_regular_DataMiner_Agent). For a detailed guide, see [Adding a DataMiner Agent to a DMS running STaaS](xref:Adding_a_DMA_to_a_DMS_running_STaaS).
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ For troubleshooting related to specific topics:

- [Troubleshooting - SLScripting](xref:TroubleshootingSLScriptingFinalizerException)

- [Troubleshooting – STaaS](xref:Troubleshooting_STaaS_Issues)
- [Troubleshooting – STaaS](xref:Troubleshooting_STaaS)

- [Troubleshooting – web](xref:Investigating_Web_Issues)

Expand Down
5 changes: 4 additions & 1 deletion user-guide/Troubleshooting/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,10 @@ items:
- name: Investigating StackOverflowException occurrences
topicUid: TroubleshootingSLScriptingStackOverflowException
- name: Troubleshooting – STaaS
topicUid: Troubleshooting_STaaS_Issues
topicUid: Troubleshooting_STaaS
items:
- name: STaaS error messages
topicUid: STaaS_Error_messages
- name: Troubleshooting – web
topicUid: Investigating_Web_Issues
- name: Troubleshooting – Miscellaneous
Expand Down