Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and reduce allocations when parsing RedisResults #33

Open
shaunsales opened this issue Sep 3, 2020 · 6 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@shaunsales
Copy link
Contributor

Currently when a RedisResult[] is returned from a TS.RANGE or similar query, we enumerate the entire result and allocate it to a new IReadOnlyCollection. This feels unnecessary, and iteration should be left up to the user.

private static TimeStamp ParseTimeStamp(RedisResult result)
{
	if (result.Type == ResultType.None) return default;
	return new TimeStamp((long)result);
}

private static TimeSeriesTuple ParseTimeSeriesTuple(RedisResult result)
{
	RedisResult[] redisResults = (RedisResult[])result;
	if (redisResults.Length == 0) return null;
	return new TimeSeriesTuple(ParseTimeStamp(redisResults[0]), (double)redisResults[1]);
}

private static IReadOnlyList<TimeSeriesTuple> ParseTimeSeriesTupleArray(RedisResult result)
{
	RedisResult[] redisResults = (RedisResult[])result;
	var list = new List<TimeSeriesTuple>(redisResults.Length);
	if (redisResults.Length == 0) return list;
	Array.ForEach(redisResults, tuple => list.Add(ParseTimeSeriesTuple(tuple)));
	return list;
}

I propose we wrap up the RedisResult into a TsTimeSeriesCollection object, something like;

public class TsTimeSeriesCollection
{
	private RedisResult[] _redisResults;

	public TimeSeriesCollection(RedisResult redisResult)
	{
		_redisResults = (RedisResult[])redisResult;
	}

	public (long TimeStamp, double Value) this[int index] => ((long)((RedisResult[]) _redisResults[index])[0], (double) ((RedisResult[]) _redisResults[index])[1]);

	public int Count => _redisResults.Length;
}

The above will only cast the samples RedisResult when it's accessed, which should help with performance when large sample sets are returned. I haven't extended this to implement IEnumerable<(long,double)> or started optimizing, but the general idea is to create a wrapper around the RedisResult[] and make time series arrays less allocatey and easier to work with.

Since this is a breaking change, it might make sense to do prior to the next release. Feedback welcome!

@DvirDukhan
Copy link
Collaborator

DvirDukhan commented Sep 3, 2020

@shaunsales great input, as always.

Let's iterate over the design as I would like to have the IEnumerable also implemented. This will cause it to be an implementation of IReadOnlyList
for an example

using System;
using System.Collections;
using System.Collections.Generic;

namespace NRedisTimeSeries.DataTypes
{
    public class TimeSeriesCollection : IReadOnlyList<(long, double)>
    {
        public TimeSeriesCollection()
        {
        }

        public (long, double) this[int index] => throw new NotImplementedException();

        public int Count => throw new NotImplementedException();

        public IEnumerator<(long, double)> GetEnumerator()
        {
            throw new NotImplementedException();
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            throw new NotImplementedException();
        }
    }
}

What do you think?
Thanks for your inputs

@DvirDukhan DvirDukhan added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Sep 3, 2020
@shaunsales
Copy link
Contributor Author

Here's a first cut of a timeseries collection class that I think would meet our requirements;


public class TsCollection : IEnumerator<(long TimeStamp, double Value)>
{
	private int _index = -1;

	private RedisResult[] _redisResults;

	public TsCollection(RedisResult redisResult)
	{
		_redisResults = (RedisResult[])redisResult;
	}

	public (long TimeStamp, double Value) this[int index]
	{
		get
		{
			var item = (RedisResult[])_redisResults[index];
			return ((long)item[0], (double)item[1]);
		}
	}
	
	public int Count => _redisResults.Length;

	public IEnumerator<(long TimeStamp, double Value)> GetEnumerator() => this;

	public (long TimeStamp, double Value) Current => (_index > -1 && _index <= _redisResults.Length) ? this[_index] : throw new IndexOutOfRangeException();

	object IEnumerator.Current => Current;

	public bool MoveNext() => ++_index < _redisResults.Length;

	public void Dispose() => Reset();

	public void Reset() => _index = -1;
}

I've not implemented it as a readonly or immutable collection as that adds quite a bit of overhead that I didn't feel was necessary. If there's a good reason to make the collection immutable, I suggest we use ImmutableArray<T> or ImmutableList<T> but given the underlying RedisResult[] is not immutable or readonly it seems better to follow the same pattern.

@DvirDukhan
Copy link
Collaborator

@shaunsales
Let's split the logic between the enumerator and the direct access functionality
something like

using System;
using System.Collections;
using System.Collections.Generic;
using StackExchange.Redis;

namespace NRedisTimeSeries.DataTypes
{

    public class TsCollection : IReadOnlyList<(long TimeStamp, double Value)>
    {

        private static (long, double) ResultAsTuple(RedisResult result)
        {
            var item = (RedisResult[])result;
            return ((long)item[0], (double)item[1]);
        }

        private class TsCollectionEnumerator : IEnumerator<(long TimeStamp, double Value)>
        {
            private int _index = -1;
            RedisResult[] _results;
            public TsCollectionEnumerator(TsCollection collection)
            {
                _results = collection._redisResults;
            }

            public (long TimeStamp, double Value) Current => (_index > -1 && _index <= _results.Length) ? ResultAsTuple(_results[_index]) : throw new IndexOutOfRangeException();

            object IEnumerator.Current => Current;

            public bool MoveNext() => ++_index < _results.Length;

            public void Reset() => _index = -1;

            public void Dispose() => Reset();

        }

        private RedisResult[] _redisResults;

        public TsCollection(RedisResult redisResult)
        {
            _redisResults = (RedisResult[])redisResult;
        }

        public (long, double) this[int index]
        {
            get => ResultAsTuple(_redisResults[index]);
 
        }

        public int Count => _redisResults.Length;

        public IEnumerator<(long, double)> GetEnumerator() => new TsCollectionEnumerator(this);

        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    }
}

WDYT?

@shaunsales
Copy link
Contributor Author

Looking good. I've made a few (quick and dirty) updates to include MRANGE as a collection of ranges with Name and Label support.

public class TsMRangeResults : IReadOnlyList<TsMRangeResult>
{
	private class TsMRangeEnumerator : IEnumerator<TsMRangeResult>
	{
		private int _index = -1;

		private IReadOnlyList<TsMRangeResult> _tsMRangeResults;

		public TsMRangeEnumerator(TsMRangeResults collection) => _tsMRangeResults = collection._tsMRangeResults;

		public TsMRangeResult Current => (_index > -1 && _index <= _tsMRangeResults.Count) ? _tsMRangeResults[_index] : throw new IndexOutOfRangeException();

		object IEnumerator.Current => Current;

		public bool MoveNext() => ++_index < _tsMRangeResults.Count;

		public void Reset() => _index = -1;

		public void Dispose() => Reset();
	}
	
	private IReadOnlyList<TsMRangeResult> _tsMRangeResults;

	public int Count { get; }

	public TsMRangeResults(RedisResult redisResult)
	{
		var redisResults = (RedisResult[])redisResult;

		Count = redisResults.Length;

		if (redisResults.Length > 0)
		{
			var list = new List<UserQuery.TsMRangeResult>(redisResults.Length);
			for (int i = 0; i < redisResults.Length; i++)
			{
				list.Add(new TsMRangeResult(redisResults[i]));
			}
			_tsMRangeResults = list;
		}
	}

	public TsMRangeResult this[int index] => _tsMRangeResults[index];

	public IEnumerator<TsMRangeResult> GetEnumerator() => new TsMRangeEnumerator(this);

	IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class TsMRangeResult : TsRangeResult
{
	public string Name { get; }

	public IList<(string, string)> Labels { get; }

	public TsMRangeResult(RedisResult redisResult) : base((RedisResult[])redisResult)
	{
		var redisResults = (RedisResult[])redisResult;

		Name = (string)redisResults[0];

		var labels = (RedisResult[])redisResults[1];

		if (labels.Length > 0)
		{
			Labels = new List<(string, string)>(labels.Length);

			for (int i = 0; i < labels.Length; i++)
			{
				var labelValue = (RedisResult[])labels[i];
				Labels.Add(((string)labelValue[0], (string)labelValue[1]));
			}
		}
	}
}

public class TsRangeResult : IReadOnlyList<(long TimeStamp, double Value)>
{
	private static (long TimeStamp, double Value) ResultAsTuple(RedisResult result)
	{
		var item = (RedisResult[])result;
		return ((long)item[0], (double)item[1]);
	}

	private class TsRangeResultEnumerator : IEnumerator<(long TimeStamp, double Value)>
	{
		private int _index = -1;

		RedisResult[] _redisResults;

		public TsRangeResultEnumerator(TsRangeResult collection) => _redisResults = collection._redisResults;

		public (long TimeStamp, double Value) Current => (_index > -1 && _index <= _redisResults.Length) ? ResultAsTuple(_redisResults[_index]) : throw new IndexOutOfRangeException();

		object IEnumerator.Current => Current;

		public bool MoveNext() => ++_index < _redisResults.Length;

		public void Reset() => _index = -1;

		public void Dispose() => Reset();
	}

	private RedisResult[] _redisResults;

	public int Count { get; }

	public TsRangeResult(RedisResult redisResult)
	{
		// TODO: Add some data shape checks
		_redisResults = (RedisResult[])redisResult;
		
		Count = _redisResults.Length;
	}

	protected TsRangeResult(RedisResult[] redisResults)
	{
		_redisResults = (RedisResult[])redisResults[2];
	}

	public (long TimeStamp, double Value) this[int index] => ResultAsTuple(_redisResults[index]);

	public IEnumerator<(long TimeStamp, double Value)> GetEnumerator() => new TsRangeResultEnumerator(this);

	IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

This will allow users to access the timeseries results with fairly simply syntax. Here's a couple of examples;

var tsMRangeResults = new TsMRangeResults(db.Execute("TS.MRANGE", new[] { "-", "+", "WITHLABELS", "FILTER", "lbl=abc" }));

foreach (var mRange in tsMRangeResults)
{
	foreach (var item in mRange)
	{
		$"{mRange.Name} {item.TimeStamp}:{item.Value} {mRange.Labels.Count}".Dump();
	}
}

var tsRange = new TsRangeResult(db.Execute("TS.RANGE", new[] { key, "-", "+" }));

foreach (var item in tsRange)
{
	$"{item.TimeStamp}:{item.Value}".Dump();
}

@DvirDukhan
Copy link
Collaborator

Nice approach
I think that you can apply the iterator approach everywhere here so you will not have to allocate lists at all
Let's first finish with PR #35 and continue with this

@shaunsales
Copy link
Contributor Author

Nice approach

I think that you can apply the iterator approach everywhere here so you will not have to allocate lists at all

Let's first finish with PR #35 and continue with this

Agreed - the labels implementation was a bit quick and dirty. I think we can optimise and improve this once we turn it into a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants