Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour when resampling with granularity = end - start #997

Closed
alakae opened this issue Oct 16, 2018 · 7 comments
Closed

Unexpected behaviour when resampling with granularity = end - start #997

alakae opened this issue Oct 16, 2018 · 7 comments

Comments

@alakae
Copy link

alakae commented Oct 16, 2018

I am currently working with gnocchi version 4.1.4, since as of now we are unable to upgrade to the latest version due to some OpenStack cross-dependencies.

Our use case is to identify the top most resource consuming instances. Therefore, I started to work out a solution based on the example posted here:

There @sileht wrote:

Note that start - stop is exactly one week and we resample series to 1w too. This will ensure, each resulting series have only one point.

which is exactly what I want to achieve as well. However, if I run the following minimal working example:

start = datetime(2018, 10, 15, 1, 0)
end = datetime(2018, 10, 15, 23, 0)
delta = end - start
raw_result = gnocchi_client.aggregates.fetch(
        resource_type='instance',
        operations=[
            'resample',
            'max',
            "{}s".format(delta.seconds),
            [
                'metric',
                'cpu_util',
                'max'
            ]
        ],
        start=start,
        stop=end,
        groupby=['original_resource_id'],
        search=get_range_query(end, start)
    )

the resulting entries contain two elements in the max list, e.g.:

{
  'group':{
    'original_resource_id':'0ab6bc1b-d67c-4c84-a73a-7595f8afca7b'
  },
  'measures':{
    'measures':{
      '0ab6bc1b-d67c-4c84-a73a-7595f8afca7b':{
        'cpu_util':{
          'max':[
            (datetime.datetime(2018,10,14,4,0,tzinfo=datetime.timezone(datetime.timedelta(0),'+00:00')),79200.0,1.7204191457),
            (datetime.datetime(2018,10,15,2,0,tzinfo=datetime.timezone(datetime.timedelta(0),'+00:00')),79200.0,3.7629098805)
          ]
        }
      }
    }
  }
}

However, when I setstart = datetime(2018, 10, 15, 13, 0) and end = datetime(2018, 10, 15, 14, 0) I get only one entry as I anticipated. Therefore, I have the feeling that it could be bug. But it could also be a misunderstanding on my side.

Thanks

@chungg
Copy link
Member

chungg commented Oct 21, 2018

just curious but what is get_range_query?

i'd have to look into this it's possible it's because we time is aggregated relative to epoch so maybe 22hrs splits different? i'm completely guessing.

@alakae
Copy link
Author

alakae commented Oct 22, 2018

just curious but what is get_range_query?

It returns a search expression to find instances that exist within the given time window:

def get_range_query(filter_end, filter_start):
    return {
        "and": [
            {
                'le': {
                    "started_at": filter_end
                }
            },
            {
                "or": [
                    {
                        'ge': {
                            "ended_at": filter_start
                        }
                    },
                    {
                        '==': {
                            "ended_at": None
                        }
                    }
                ]
            },
        ]
    }

i'd have to look into this it's possible it's because we time is aggregated relative to epoch so maybe 22hrs splits different? i'm completely guessing.

I think something like that could be a likely explanation. I investigated a bit more and made the following observation:

duration 1h returns single value for intervals [0:00,1:00],[1:00,2:00],[2:00,3:00], etc.
duration 2h returns single value for intervals [0:00,2:00],[2:00,4:00],[4:00,6:00], etc.
duration 3h returns single value for intervals [0:00,3:00],[3:00,6:00],[6:00,9:00], etc.
duration 4h returns single value for intervals [0:00,4:00],[4:00,8:00],[8:00,12:00], etc.

@chungg
Copy link
Member

chungg commented Oct 23, 2018

hmmm maybe resample with count instead of max... you could see what the grouping is like.

not sure if this is reason but i don't think get_range_query should be ge and le. i would think it should not be inclusive on one end.

@chungg
Copy link
Member

chungg commented Oct 23, 2018

also, could you add what the archive policy is for the metrics are? are there multiple granularities?

@alakae
Copy link
Author

alakae commented Oct 24, 2018

Thank you for your help and suggestions.
I tried all combinations of opened and closed intervals to make sure that this is not the cause of this issue. The current archive policy is to store values every five minutes. We haven't configured multiple granularities.

I replaced 'max' by 'count' and here is the output corresponding to the example from my original post:

{
  'measures':{
    'measures':{
      '0ab6bc1b-d67c-4c84-a73a-7595f8afca7b':{
        'cpu_util':{
          'count':[
            (datetime.datetime(2018,10,14,4,0,tzinfo=datetime.timezone(datetime.timedelta(0),'+00:00')),79200.0,12.0),
            (datetime.datetime(2018,10,15,2,0,tzinfo=datetime.timezone(datetime.timedelta(0),'+00:00')),79200.0,252.0)
          ]
        }
      }
    }
  },
  'group':{
    'original_resource_id':'0ab6bc1b-d67c-4c84-a73a-7595f8afca7b'
  }
}

My interpretation is that we have count=12 for the first hour from (2018, 10, 15, 1, 0) to (2018,10,15,2,0) and count=252 for (2018,10,15,2,0) to (2018, 10, 15, 23, 0) for the remaining 21 hours which corresponds two 12 samples per hour.

@chungg
Copy link
Member

chungg commented Oct 25, 2018

yeah, i don' t know why i was asking about search... i just realised that isn't related to series data at all. apologies.

so i think i remember how this all works now :) and it does relate to epoch. when you resample, it's resampling based on that so for example, even though you give it a range of 22 hours, those 22 hours of data does not necessarily all fall in to the same 22 hour groups from epoch. i think pandas behaves differently and groups based on first index.

i'd need to think about whether there's a way to actually a way to get the max of each group over a specific 22 hour span. if exactly 22hrs is not requird, then you could change your use round_timestamp from the code to compute the start time that would return you one point from end time.

or you could add ability to group off first index rather than epoch

@tobias-urdin
Copy link
Contributor

Closing because of no new information, please update and we can open again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants