Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using Rand() in DCR #221

Open
jwikman opened this issue Nov 10, 2023 · 10 comments
Open

Error when using Rand() in DCR #221

jwikman opened this issue Nov 10, 2023 · 10 comments

Comments

@jwikman
Copy link

jwikman commented Nov 10, 2023

I'm trying to create a Data Collection Rule with the samples from samples/AppInsights/KQL/Queries/DataCollectionRules/FilterOnEvents.kql

When using this sample:

source
| where (
    Properties.eventId == "RT0008" // incoming web service calls 
    and 
    rand(1) < 0.1 // adjust as needed   
  ) or 
  Properties.eventId <> "RT0008"

I get this error when hitting the Run button:

Error occurred while compiling query in query: SemanticError:0x00000009 at 5:4 : Runtime scalar function provider not found for function: rand
If the issue persists, please open a support ticket. Request id: 12570843-1101-4eb1-aea3-4435ec21589d

Before opening a support ticket for this, has anyone actually been able to use this sample?
Does it still work if you try it out today?

We are flooded with RT0004 events, and I wanted to use this approach of sampling for that particular event. Right now, I'm forced to filter out all RT0004 events until this has been resolved.

@KennieNP
Copy link
Contributor

KennieNP commented Nov 10, 2023 via email

@jwikman
Copy link
Author

jwikman commented Nov 10, 2023

Ok, thanks for input.

I've created a support ticket for LogAnalytics. (req no. 2311100050001695, if anyone needs that)

@jwikman
Copy link
Author

jwikman commented Nov 15, 2023

Hi @KennieNP,

I just received an answer from MS support regarding using rand in DCR, where they say that the rand function is not supported for DCR:

I wanted to provide you with an update on this case, after checking internally on your query that, the 'rand' scalar function is not on the list of supported functions:
Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. For example, summarize isn't supported since it summarizes multiple records. See Supported KQL features for a complete list of supported features.

I kindly request you to refer the documentation guide for reference, which explains an overview of the KQL limitations: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-collection-transformations-structure#kql-limitations

Did you successfully use the above sample in a DCR at some point in time?

@KennieNP
Copy link
Contributor

KennieNP commented Nov 15, 2023 via email

@jwikman
Copy link
Author

jwikman commented Nov 22, 2023

Got the final word from the support request, they checked with the Product Manager and got back that "Rand function is not supported with the DCR"

Any other clues on how to do sampling on selected EventIds?

As is now, we need to skip some EventIds completely (RT0004 represented 50% of all our telemetry, and simply cost too much compared to what it gives)

@KennieNP
Copy link
Contributor

KennieNP commented Nov 22, 2023 via email

@jwikman
Copy link
Author

jwikman commented Dec 5, 2023

I just created a suggestion in the Azure Monitor Community forum, please upvote if you think it makes sense! (and spread the word to others to upvote as well 😉)
You'll find it at https://feedback.azure.com/d365community/idea/eaaf14b5-b493-ee11-a81c-6045bd7fe045

I cannot understand the arguments, that I got from the support case, on why rand() is not supported: Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. I can't see that the rand() function works on multiple records...

Maybe use a custom endpoint to do the filtering on RT0004?

Wouldn't that end up with double ingestion costs for the events that we forward? First for the Azure Function (all data) and then for the rest when received at Log Analytics?

@vpshibin
Copy link

vpshibin commented Mar 27, 2024

I am facing the same issue. Upvoted it.

Meanwhile used this trick for my DCR sampling and it worked well. (May be too late for you, but would be helpful for someone)
//obtained the Millisec value from a timestamp I had (previously extracted from RawData and looks like "21:09:35.549+11:00" and will get 549 out this string). Just used the Millisec part as that's the most random one from the timestamp.
| extend TimestampMs=extract("\d+:\d+:\d+.(\d+)\+\d+:\d+", 1, TimeStamp)
// below will get the modulus of the Ms, which will be a number between 0 and 9
| extend TsRandom = toint(TimestampMs) % 10
| where TsRandom < 1 // only ones with modulo 0 are taken, 1-9 are dropped. . This will sample only 10% rows
//then you can add you other conditions like
or Properties.eventId <> "RT0008"

@jwikman
Copy link
Author

jwikman commented Mar 27, 2024

Thanks @vpshibin, that was a very creative workaround. Thanks for sharing! 👍

@jwikman
Copy link
Author

jwikman commented Apr 26, 2024

This is what we use today as AppTraces transformation:

source
| extend TimestampHundredths=tolong(extract(@"\d+-\d+-\d+T\d+:\d+:\d+\.(\d{2})", 1, tostring(TimeGenerated)))
| where ((TimestampHundredths< 5) and (tostring(Properties.eventId) in ("AL0000CTE", "AL0000E24", "AL0000E25", "AL0000E26", "AL0000GDP", "AL0000H7M", "AL0000H7N", "AL0000KZV", "AL0000LB0", "AL0000LB1", "AL0000LB2", "LC0040", "LC0041", "LC0042", "LC0043", "RT0003", "RT0004", "RT0008", "RT0019", "RT0035", "RT0038")))
  or (tostring(Properties.eventId) !in ("AL0000CTE", "AL0000E24", "AL0000E25", "AL0000E26", "AL0000GDP", "AL0000H7M", "AL0000H7N", "AL0000KZV", "AL0000LB0", "AL0000LB1", "AL0000LB2", "LC0040", "LC0041", "LC0042", "LC0043", "RT0003", "RT0004", "RT0008", "RT0019", "RT0035", "RT0038"))
| project-away TimestampHundredths

This cut our costs to 1/5 of earlier.

We identified above events by first running this query to get the events that produces the most data (= highest cost):

traces 
| where timestamp > ago(10d)
| extend Size = estimate_data_size(*)
| summarize  AvgSize = avg(Size), Count=count() by eventId = tostring(customDimensions.eventId)
| extend SizeMb = round(AvgSize*Count / (1024*1024),1)
| project eventId, Count, SizeMb
| sort by SizeMb

We looked through top 30 of those events, kept all events for events that we felt crucial for troubleshooting (performance related, mainly, but also some other).
For simplicity we decided to only keep 5% of all the events we identified.

This was accomplished by using the fraction of the second of the timestamp, as suggested by @vpshibin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants