Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STaaS Troubleshooting #4105

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

STaaS Troubleshooting #4105

wants to merge 6 commits into from

Conversation

EdithLansens
Copy link
Member

No description provided.

@EdithLansens EdithLansens requested a review from a team as a code owner January 6, 2025 09:43
Copy link
Member Author

@EdithLansens EdithLansens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robin-devos-skyline, as per your request via email, we’ve created a new "Troubleshooting - STaaS" page based on the Word document you provided. Could you please carefully review this pull request and go over my comments?


The diagram below provides an overview of two clusters using STaaS versus Cassandra.

![STaaS vs Cassandra](~/images/STaaS_vs_Cassandra.png)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image appears quite complex and could benefit from additional context. What is the primary message or focus you want to convey here, @robin-devos-skyline? Perhaps we can clarify this by adding a brief explanation or key takeaway below the image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this is brief enough 😇
DMS A and B, provide a compact visualization on the interaction towards STaaS.
SLDataGateway sends/receives data from/to STaaS. However each STaaS-connected agent requires a secure HTTPS channel towards the STaaS DB (i.e. Microsoft Azure endpoint, being West Europe or UK South). This is achieved via port 443, but also by using an access token given by the Cloud Gateway. (you can see this token as the key to unlock the door to this secure channel).
DMS A has a Cloud Gateway installed on both agents, as opposed to DMS B which only has 1 Cloud Gateway in the cluster (which is the minimum per cluster).

DMS C and D represent the visualization for Cassandra related databases.
DMS D uses local Cassandra nodes and an external Open/ElasticSearch node, whereas DMS C points towards a Cassandra (and Open/ElasticSearch) Cluster.
The key takeaway here is that all DMS <> DB communication goes via SLDataGateway.


If the *CloudGateway* module is installed on your DMA already, **verify that it is version 2.8.0 or higher**:

- **Using the Admin app**:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robin-devos-skyline, I’ve added this "Using the Admin app" section as an additional way to check the DxM version since it’s the most straightforward method. The SLLogCollector method seems more complex, so I’m unsure why we’d include it unless there’s a specific use case or advantage to it. In what scenarios would users prefer to use this method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users with access can use the Admin App. But not everyone (in skyline) has access and will need to rely on the data retrieved via the SLLogCollector tool.


- **Using DataMiner Cube**:

1. Go to the System Center > *Cloud* page, and select *Open dataminer.services*.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs slightly from the steps in your original document. When I tested it, I couldn’t find any mention of "Session is active." Instead, I included an alternative way to verify the cloud connection on the same page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked again and idd, I wasn't able to find the "Session is Active". Thx for providing an alternative 🙂

> [!TIP]
> For more information, refer to [Skyline DataMiner Core InterAppCalls Range 1.0.1.1](xref:Skyline_DataMiner_Core_InterAppCalls_Range_1.0#1011).

DataMiner Systems may experience performance issues because of the high number of interactions when using SLNet Subscriptions for specific actions. In DataMiner 10.3.12, improvements were made to increase the efficiency of these interactions, with further enhancements added in DataMiner 10.4.6.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure if I interpreted the section above correctly. It was structured and worded differently in the original document, but I made some adjustments for the sake of clarity. Could you double-check if the content is still accurate, @robin-devos-skyline?

- weu = West Europe.

- uks = UK South.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section could benefit from further explanation, @robin-devos-skyline. Could we clarify these points?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, when there appears to be a delay in cloud activity (in our STaaS context, e.g. my trend graph takes ages to load). Then we can check the provided azure link to see if there are delays on Azure side.

When opening the link, 2 sections give a good visual on these delays, i.e.:

  • the graph "Throttled requests by EventHub" under the "EventHub" tab page
  • the graph "Event queue time" under the "Event" tab page

In both cases, you can filter on the TimeRange and the Instance (i.e. endpoint used). => so "Check the instance that is used by ...", might be best put on a different line.

A screenshot might clarify things further
image


- uks = UK South.

- Ask the customer's IT department to verify the upload and download bandwidth usage.
Copy link
Member Author

@EdithLansens EdithLansens Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Ask the customer's IT department to verify the upload and download bandwidth usage.
- Ask your IT department to verify the upload and download bandwidth usage.

I find it a bit strange we're suddenly referring to "the customer". Is the reader supposed to be "the customer", @robin-devos-skyline? If so, I’d suggest changing it to "you," as we’ve done elsewhere on this page for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target audience are those who need to provide support when something goes wrong. This is mainly skyline Techsupport for our own customers and could also be partners of ours who need to support their customers.
We wrote this to aid techsupport members and want anyone interested to benefit from knowing what we do / check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants