Error when using Rand() in DCR #221

jwikman · 2023-11-10T10:07:00Z

I'm trying to create a Data Collection Rule with the samples from samples/AppInsights/KQL/Queries/DataCollectionRules/FilterOnEvents.kql

When using this sample:

source
| where (
    Properties.eventId == "RT0008" // incoming web service calls 
    and 
    rand(1) < 0.1 // adjust as needed   
  ) or 
  Properties.eventId <> "RT0008"

I get this error when hitting the Run button:

Error occurred while compiling query in query: SemanticError:0x00000009 at 5:4 : Runtime scalar function provider not found for function: rand
If the issue persists, please open a support ticket. Request id: 12570843-1101-4eb1-aea3-4435ec21589d

Before opening a support ticket for this, has anyone actually been able to use this sample?
Does it still work if you try it out today?

We are flooded with RT0004 events, and I wanted to use this approach of sampling for that particular event. Right now, I'm forced to filter out all RT0004 events until this has been resolved.

The text was updated successfully, but these errors were encountered:

KennieNP · 2023-11-10T10:12:52Z

I believe this is a bug on the LogAnalytics team.

…

On Fri, 10 Nov 2023 at 11:07, Johannes Wikman ***@***.***> wrote: I'm trying to create a Data Collection Rule with the samples from samples/AppInsights/KQL/Queries/DataCollectionRules/FilterOnEvents.kql <https://github.com/microsoft/BCTech/blob/742a877bf30340aadcdf50915badd462dcaf5ee2/samples/AppInsights/KQL/Queries/DataCollectionRules/FilterOnEvents.kql#L33> When using this sample: source | where ( Properties.eventId == "RT0008" // incoming web service calls and rand(1) < 0.1 // adjust as needed ) or Properties.eventId <> "RT0008" I get this error when hitting the Run button: Error occurred while compiling query in query: SemanticError:0x00000009 at 5:4 : Runtime scalar function provider not found for function: rand If the issue persists, please open a support ticket. Request id: 12570843-1101-4eb1-aea3-4435ec21589d Before opening a support ticket for this, has anyone actually been able to use this sample? Does it still work if you try it out today? We are flooded with RT0004 events, and I wanted to use this approach of sampling for that particular event. Right now, I'm forced to filter out all RT0004 events until this has been resolved. — Reply to this email directly, view it on GitHub <#221> or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABXE6S7SKJRSLQS3XLYXZ6TYDX4FBBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJLJONZXKZNENZQW2ZNLORUHEZLBMRPXI6LQMWBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTLDTOVRGUZLDORPXI6LQMWSUS43TOVS2M5DPOBUWG44SQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJRGU3TMOBZGMZTHAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDCOJYG4ZTCNJXGY3KO5DSNFTWOZLSUZRXEZLBORSQ> . You are receiving this email because you are subscribed to this thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

--

--------------------------------------------------- Kennie Nybo Pontoppidan Email: ***@***.*** Twitter: @KennieNP www and cv: http://pontop.dk ---------------------------------------------------

jwikman · 2023-11-10T10:27:40Z

Ok, thanks for input.

I've created a support ticket for LogAnalytics. (req no. 2311100050001695, if anyone needs that)

jwikman · 2023-11-15T16:14:51Z

Hi @KennieNP,

I just received an answer from MS support regarding using rand in DCR, where they say that the rand function is not supported for DCR:

I wanted to provide you with an update on this case, after checking internally on your query that, the 'rand' scalar function is not on the list of supported functions:
Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. For example, summarize isn't supported since it summarizes multiple records. See Supported KQL features for a complete list of supported features.

I kindly request you to refer the documentation guide for reference, which explains an overview of the KQL limitations: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-collection-transformations-structure#kql-limitations

Did you successfully use the above sample in a DCR at some point in time?

KennieNP · 2023-11-15T19:51:51Z

Thanks, Johannes. Well, too bad that did not work. Sampling on the ingestion endpoint seemed like a good idea.

…

On Wed, 15 Nov 2023 at 17:15, Johannes Wikman ***@***.***> wrote: Hi @KennieNP <https://github.com/KennieNP>, I just received an answer from MS support regarding using rand in DCR, where they say that the rand function is not supported for DCR: I wanted to provide you with an update on this case, after checking internally on your query that, the 'rand' scalar function is not on the list of supported functions: Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. For example, summarize <https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/summarizeoperator> isn't supported since it summarizes multiple records. See Supported KQL features <https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-collection-transformations-structure#supported-kql-features> for a complete list of supported features. I kindly request you to refer the documentation guide for reference, which explains an overview of the KQL limitations: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-collection-transformations-structure#kql-limitations Did you successfully use the above sample in a DCR at some point in time? — Reply to this email directly, view it on GitHub <#221 (comment)> or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABXE6S5HEWMULYCKD45Q7YLYETTAPBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTKNZWHA4TGMZTQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRRHE4DOMZRGU3TMNVHORZGSZ3HMVZKMY3SMVQXIZI> . You are receiving this email because you were mentioned. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

--

--------------------------------------------------- Kennie Nybo Pontoppidan Email: ***@***.*** Twitter: @KennieNP www and cv: http://pontop.dk ---------------------------------------------------

jwikman · 2023-11-22T16:52:37Z

Got the final word from the support request, they checked with the Product Manager and got back that "Rand function is not supported with the DCR"

Any other clues on how to do sampling on selected EventIds?

As is now, we need to skip some EventIds completely (RT0004 represented 50% of all our telemetry, and simply cost too much compared to what it gives)

KennieNP · 2023-11-22T18:27:43Z

Maybe use a custom endpoint to do the filtering on RT0004?

…

On Wed, 22 Nov 2023 at 17:52, Johannes Wikman ***@***.***> wrote: Got the final word from the support request, they checked with the Product Manager and got back that "*Rand function is not supported with the DCR*" Any other clues on how to do sampling on selected EventIds? As is now, we need to skip some EventIds completely (RT0004 represented 50% of all our telemetry, and simply cost too much compared to what it gives) — Reply to this email directly, view it on GitHub <#221 (comment)> or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABXE6SYZLKPUBXCHIAUVXQDYFYUWDBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVEYTKNZWHA4TGMZTQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRRHE4DOMZRGU3TMNVHORZGSZ3HMVZKMY3SMVQXIZI> . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

--

--------------------------------------------------- Kennie Nybo Pontoppidan Email: ***@***.*** Twitter: @KennieNP www and cv: http://pontop.dk ---------------------------------------------------

jwikman · 2023-12-05T21:34:32Z

I just created a suggestion in the Azure Monitor Community forum, please upvote if you think it makes sense! (and spread the word to others to upvote as well 😉)
You'll find it at https://feedback.azure.com/d365community/idea/eaaf14b5-b493-ee11-a81c-6045bd7fe045

I cannot understand the arguments, that I got from the support case, on why rand() is not supported: Since the transformation is applied to each record individually, it can't use any KQL operators that act on multiple records. Only operators that take a single row as input and return no more than one row are supported. I can't see that the rand() function works on multiple records...

Maybe use a custom endpoint to do the filtering on RT0004?

Wouldn't that end up with double ingestion costs for the events that we forward? First for the Azure Function (all data) and then for the rest when received at Log Analytics?

vpshibin · 2024-03-27T10:04:21Z

I am facing the same issue. Upvoted it.

Meanwhile used this trick for my DCR sampling and it worked well. (May be too late for you, but would be helpful for someone)
//obtained the Millisec value from a timestamp I had (previously extracted from RawData and looks like "21:09:35.549+11:00" and will get 549 out this string). Just used the Millisec part as that's the most random one from the timestamp.
| extend TimestampMs=extract("\d+:\d+:\d+.(\d+)\+\d+:\d+", 1, TimeStamp)
// below will get the modulus of the Ms, which will be a number between 0 and 9
| extend TsRandom = toint(TimestampMs) % 10
| where TsRandom < 1 // only ones with modulo 0 are taken, 1-9 are dropped. . This will sample only 10% rows
//then you can add you other conditions like
or Properties.eventId <> "RT0008"

jwikman · 2024-03-27T12:35:22Z

Thanks @vpshibin, that was a very creative workaround. Thanks for sharing! 👍

jwikman · 2024-04-26T15:01:43Z

This is what we use today as AppTraces transformation:

source
| extend TimestampHundredths=tolong(extract(@"\d+-\d+-\d+T\d+:\d+:\d+\.(\d{2})", 1, tostring(TimeGenerated)))
| where ((TimestampHundredths< 5) and (tostring(Properties.eventId) in ("AL0000CTE", "AL0000E24", "AL0000E25", "AL0000E26", "AL0000GDP", "AL0000H7M", "AL0000H7N", "AL0000KZV", "AL0000LB0", "AL0000LB1", "AL0000LB2", "LC0040", "LC0041", "LC0042", "LC0043", "RT0003", "RT0004", "RT0008", "RT0019", "RT0035", "RT0038")))
  or (tostring(Properties.eventId) !in ("AL0000CTE", "AL0000E24", "AL0000E25", "AL0000E26", "AL0000GDP", "AL0000H7M", "AL0000H7N", "AL0000KZV", "AL0000LB0", "AL0000LB1", "AL0000LB2", "LC0040", "LC0041", "LC0042", "LC0043", "RT0003", "RT0004", "RT0008", "RT0019", "RT0035", "RT0038"))
| project-away TimestampHundredths

This cut our costs to 1/5 of earlier.

We identified above events by first running this query to get the events that produces the most data (= highest cost):

traces 
| where timestamp > ago(10d)
| extend Size = estimate_data_size(*)
| summarize  AvgSize = avg(Size), Count=count() by eventId = tostring(customDimensions.eventId)
| extend SizeMb = round(AvgSize*Count / (1024*1024),1)
| project eventId, Count, SizeMb
| sort by SizeMb

We looked through top 30 of those events, kept all events for events that we felt crucial for troubleshooting (performance related, mainly, but also some other).
For simplicity we decided to only keep 5% of all the events we identified.

This was accomplished by using the fraction of the second of the timestamp, as suggested by @vpshibin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using Rand() in DCR #221

Error when using Rand() in DCR #221

jwikman commented Nov 10, 2023

KennieNP commented Nov 10, 2023 via email

jwikman commented Nov 10, 2023

jwikman commented Nov 15, 2023

KennieNP commented Nov 15, 2023 via email

jwikman commented Nov 22, 2023

KennieNP commented Nov 22, 2023 via email

jwikman commented Dec 5, 2023

vpshibin commented Mar 27, 2024 •

edited

Loading

jwikman commented Mar 27, 2024

jwikman commented Apr 26, 2024

Error when using Rand() in DCR #221

Error when using Rand() in DCR #221

Comments

jwikman commented Nov 10, 2023

KennieNP commented Nov 10, 2023 via email

jwikman commented Nov 10, 2023

jwikman commented Nov 15, 2023

KennieNP commented Nov 15, 2023 via email

jwikman commented Nov 22, 2023

KennieNP commented Nov 22, 2023 via email

jwikman commented Dec 5, 2023

vpshibin commented Mar 27, 2024 • edited Loading

jwikman commented Mar 27, 2024

jwikman commented Apr 26, 2024

vpshibin commented Mar 27, 2024 •

edited

Loading