-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index -1 out of bounds for length #44
Comments
I think this is related to the path_filter. The error happens when trying to list all the blobs where also the path filtering happens with File::FNM_PATHNAME and File::FNM_EXTGLOB. I don't know how if /NSG-SIEM-POC//*.json will be able to find the files. in a couple of days I can test with a debug program to see what the files the filter would find, but for now I recommend removing the path_filter. If you have more files in the blob container you can use the simple configuration option "prefix" as a filter. |
Thank you @janmg I made some changes to the config to avoid some of the errors like the registry. I also removed path_filter.
Below are the errors. At some point it just throws an error without a message.
|
I dont have much time to setup a test environment. but in your case the plugin doesnt seem to be able to read the blobs and I think you either filter too much or the access key is not healthy. prefix is an option that is used direclty by the Ruby BlobClient to list the blobs. You don't need to set it if you use the storage account only for NSG flowlogs. You don't have to set a full path, it's enough to set prefix => "resourceId=/" or if you have multiple resource groups, The plugin reads nsgflowlogs as JSON and the learning is used to figure out what the first block and the last block contains so that it can read the other blocks as valid json, but also read partial blocks still as valid JSON. In below example the plugin can read time:1 as valid json. {"records":[ And if time:2 got added to the same file, the plugin will do a partial read and adds the head and tail of the JSON and it is still valid. The start Your head '{' and tail '}' will not result in valid JSON. set it to '{"records":[' and ']}' for NSG flowlogs. I do this by default if the logtype is nsgflowlog, but you can override it by setting file_head and file_tail if it is different. The plugin will still try to read a blob to check and this you can skip by setting skip_learning => true |
These are the logs I am seeing I am using
This is my dockerfile. |
I don't really understand where the Index -1 out of bounds comes from, it least the file has a length. because the plugin tries to list all the files in the blobstorage with their file lenght so it can read them one at a time and if it detects the file has grown, it will read the delta. I don't understand the second error eiter, " while trying to list blobs" because there was supposed to be an exception, but instead there are just spaces. Is there something special with these accounts? I created a fresh account for testing and didn't see this errors. In the repo there is a blob_debug.rb that can iterate through the account. Also a look with the storage explorer may indicate what is so special in these repositories? |
Here another trace It says [2023-11-10T22:05:19,386][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"} is it coming from here ? https://github.com/janmg/logstash-input-azure_blob_storage/blob/master/lib/logstash/inputs/azure_blob_storage.rb#L435 I am not fluent with ruby. but I could not find where the reference for path is coming from ? |
I tried
|
I'm dealing with the same error. [2023-11-11T12:15:16,857][ERROR][logstash.inputs.azureblobstorage][main][cd330872858e585d21c29db5cdbdd5d28094025289d883cf2088cbbb878b43f0] caught: while trying to list blobs I've investigated into all the points made such as changing the paths, changing the file head and end, and verifying the storage account permissions. I think in all cases, the key is not the problem as it has sufficient permissions, but as stated by @janmg, there could be a minor inconsistency in configuration. I would assume that it's do to the fact that when an NSG Flow Log is created, it forces you to create a new storage account. However, looking into storage explorer, nothing settings seem off about the account connected. @janmg do you mind specifying the specific steps you went through to create your storage account and configure the logs? Did you create the storage account separately and then link it to an NSG Flow, or create them both together? Here are the storage account settings: Here are the blob settings: This is what is linked with the plugin. Nothing seems special per se. Can you demonstrate the settings you have in order to compare? |
Connecting to the storage account itself is going fine, but I don't know why a list blob returns an index out of bounds. The registry keeps a list of files and their sizes, my test storage account is really small because I only setup one VM for 6 hours and let it attract some unwanted traffic to test my logstash pipeline and it works. If this is happenening more often, I will release a version which prints out more debugging information to get my finger on the problem. Below is my test pipeline.
|
I also tried upgrading my storage account to v2 just to rule out that possibility. No luck. |
I have been playing with |
I have modified the plugin to add more debugging, I now also receive the index out of bounds and also on my blob_debug.rb which is only doing a blob_list. This points to a problem in the dependancy azure-storage-ruby. Which hasnt changed in the last 2 years, which then must point to a problem in their dependancy in faraday or nokogiri. I haven't determined the exact reason, but I'm now no longer clueless. |
Thanks. If you are able to reform the plugin to work that would be great. |
I think I now understand that somehow the azure storage ruby that uses faraday to connect to the storage account is now not working anymore, upgrading to faraday 2 isn't really possible because logstash uses. I don't see a quick fix. I also don't see an easy alternative. I'm most tempted to rewrite the file handling in go, to make it available to any logsystem out there. But it's an aweful lot of coding and I don't have much free time to spare. |
@janmg Would it be possible to use this https://github.com/honeyankit/azure-storage-ruby instead of the one managed buy MS ? I tried doing it locally, but it just spins my head. |
I cloned the version from muxcmux for common and blob and pushed it to rubygems as version 3.0.0 my head is spinning too. I don't know which update killed the list_blob and if I figure it out, it should be possible to fix somehow, however ruby is not my strongest programming language and I choke up on the dependencies. I'm looking into migrating to java, but it wouldn't improve performance. I have previously considered making a fluentd plugin, but it's also in ruby. I also studied using filebeats, but I don't see how to easily use this for nsgflowlogs. Yesterday I started looking at storage-blobs-go-quickstart to see if I can split out the azure file handling from the logstash plugin event queue. That way a golang application would run connecting to the storage account and deal with listing and retrieving the files, while the logstash plugin would only have to pick it up and process it further. That seems more future proof. But it will cost me some weekends to get it into a working shape. Any other suggestions are very welcome |
Was just looking into using this plugin, i'm not of help here as i don't get these languages, but i can at least test for you. Just did a fresh install and have the exact same error. |
if the blob is not easily accessible from a logstash plugin anymore because of conflicting dependencies of Faraday. I though moving the file handling to a golang helper program would simplify the flow, but at least api keys don't seem to be supported. When I looked at my storage account, I saw a feature named "blob change feed" grayed out. It looks like it's intended for Apache Spark. I always felt that nsgflowlogs written to a blob for something else to read the blob, felt wrong. But if it's the only way to get the nsgflowlogs it's what had to be done. Now I feel stronger that we should just turn to Microsoft Azure and politely ask for the flowlogs to be sent to an EventHub instead, then we can just do a Kafka read with whatever analytics tool we please. I'll continue the golang route, to see if it's viable, but pretty please Microsoft, provide an alternative flow. |
Hello ThreatLentes & Janmg, For Ubuntu, I have faced the same issue when I tried to install the 8.x versions. Then I found out it is working fine with 7.10.1 version. This I have faced in only in Ubuntu but in Windows all 8.v versions are working. |
Thanks for the update. I think until Logstash 8.9 the plugin should work in Ubuntu, but I can't put my finger on why it started failing. I have started a golang version to push the events in a queue, a proof of concept is working, but it will take some time to finish it with proper file listing and partial reading. |
I'm giving the plugin a try but no idea why I'm getting the below error.
Running Logstash 8.10.4 fresh install
Logstash Config:
For reference, the full location of the json in the storage account is below:
resourceId=/SUBSCRIPTIONS/7123871293721379/RESOURCEGROUPS/SIEM-POC/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/NSG-SIEM-POC/y=2023/m=10/d=27/h=15/m=00/macAddress=7812738HD/PT1H.json
Getting the following error:
The text was updated successfully, but these errors were encountered: