Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues from Exceptions #224

Open
ShadowMarker789 opened this issue Dec 13, 2021 · 46 comments
Open

Performance issues from Exceptions #224

ShadowMarker789 opened this issue Dec 13, 2021 · 46 comments

Comments

@ShadowMarker789
Copy link

When dealing with large amounts of tags on shaky network conditions, raveling and unravelling the LibPlcTagException exceptions is drowning me in CPU overhead.

Is there a way to deal with these tags in an "exceptionless" context where we can just check whether the operation was successful or not using the Status of the tag instead of these exceptions?

Currently when the network goes down thousands of exceptions are being raised as we have thousands of tags we're monitoring.

When this happens the CLR proceeds to have a very bad time from the exceptions being raised and CPU usage spikes up uncontrollably.

I'd very much rather set a static boolean value to suppress exceptions being raised from Read, ReadAsync, Write, WriteAsync, Initialize and check the Status of the tag instead.

@jkoplo
Copy link
Member

jkoplo commented Dec 13, 2021

I actually saw something similar in a co-workers code base for an exception he was suppressing - still huge performance overhead because of all the stack trace/context capture that happens. I can definitely see the desire to not throw thousands of exceptions, even if they're ignored. Is there any way to suspend your reads at the first exception and not trigger thousands of exceptions?

I'll think about an option to suppress exceptions - or maybe research what other open source libraries have done here. Seems like it should be a problem others have run into.

@ShadowMarker789
Copy link
Author

I have a setup in place where there's one core tag that's always monitored, and the "other" tags are monitored only if that one came clean on last read.

But if the network goes down afterwards then all those thousands of tags that were previously all reading fine are now throwing me thousands of exceptions, which causes issues for the CLR.

Over the long term we're looking to use non-CIP protocols over networks that are shaky, but long term is not now.

Right now we're just eating the exceptions on network failure and being sad about it.

@timyhac
Copy link
Collaborator

timyhac commented Dec 15, 2021

There is some commentary from Jon Skeet on this: https://jonskeet.uk/csharp/exceptions.html

One option he talks about is having TryXYZ versions of methods ie TryRead (...) which is common in other APIs

@jkoplo
Copy link
Member

jkoplo commented Dec 15, 2021

After reading Skeet's article (and the updated version, and some other referenced ones) I have to wonder if the exceptions are actually the root issue. It's worth asking - do you see the same performance issue in Release builds as Debug builds?

From your description, it sounds like you probably do something like this (pseudocode):

await SingleTag.ReadAsync();
await Task.WhenAll(BatchOfTags.Select(x => x.ReadAsync()));

In that example, if the first tag is successful, but we get a failure somewhere in the midst of the batch of tags, I think there would still be decent overhead in the rest of the tags that are failing. The async wrapper in C# is wrapped around the callback in the base libplctag library, but I think libplctag (base) would still be trying/timing out on some of the tags in the batch. After all, we can't actually read 1000+ items in parallel from the PLC - the reads are getting packed into multiple messages that are sent sequentially and failing sequentially.

Does something like this maybe seem possible?

@timyhac
Copy link
Collaborator

timyhac commented Dec 15, 2021

Jody that's a good point, i was thinking about whether they could inject a cancellation token for all read/writes that cancels upon a fail for any tag but didn't suggest because cancellation still throws an exception - but the point you raised about the sequential nature of libplctag means there could still be a benefit to doing that.

@kyle-github
Copy link
Member

It seems like there is a fundamental speed bump being exposed here. Please excuse the long info dump.

TL;DR: I stopped using exceptions completely (the lib is in C so it never had them internally) and use other means to determine whether data is valid/stale or not. Networks and PLCs are too flaky to use exceptions.

In the project that provided the push for me to develop the C DLL, I used Java as the wrapper. I initially was throwing exceptions when something went wrong. I ended up dropping that completely and went with timestamps on the data. I decoupled my tags from the data such that when tags were updated, I pushed updated values (and timestamps) into a hash map. Tag read errors simply result in no update and so the data got stale. I stored the last known read/write status in the metadata as well to report on it in the GUI.

This decoupled the data from the flaky network and every consumer of the data could make a local decision (based on timestamps) of how stale was too stale by only reading/writing from the hash map. This approach of adding metadata to the values ended up solving a bunch of other issues I had and made it a lot easier to handle cases when one or more machines was offline. I almost never used exceptions after that. It was not a performance issue, it was a control flow issue.

The scale of the system (>30 PLCs) was such that there was almost always something offline for maintenance. Due to the nature of the network (there were wave guide wireless segments and some other "interesting" technology involved) it was not uncommon to have one read on one PLC fail but the next one succeed. The entire reason for the initial project was due to the flakiness of the network (that turned out to be for different reasons but that's a different discussion).

IMHO, exceptions are for truly exceptional conditions and with a flaky network dropping the connection is not an exceptional condition. It is normal. Annoying but normal. It just means your data is sometimes stale. I am waving my hands about how I handled writes.

Within the core library, any given tag could be handled by one of three (and only three) sets of threads:

  1. the application thread via calls to the API functions. That's your thread, not mine :-) There might be more than one thread here. Some of the example test programs exercise this by having multiple threads that all call operations on a single tag handle. Every API call is protected by a mutex except plc_tag_destroy() which removes the tag handle from the lookup table and then decrements a ref count. If some other API call is ongoing at that time the tag will clean up when that call drops the reference.
  2. a library-wide tickler thread that handles cases where the application thread is blocking or otherwise not calling into the library. This handles the case where responses come back asynchronously in the background and cases such as triggering new reads to handle partial read cases. Automatic reads and writes are triggered here.
  3. a PLC-specific thread. Each PLC TCP connection gets its own thread and that processes all requests and responses on that TCP channel. This is the lowest impact on CPU time with the highest overall performance for network IO. All IO is completely async inside the library. Almost. DNS lookups are still blocking.

The last two threads will have the effect of serializing a lot of stuff, at least as is visible outside the library. However, the batching of requests for CIP will give the impression that many requests (sometimes upwards of 200) happen all at the same time. Inside the PLC, at least as far as I can tell, there is a lot of serialization going on as well. I have never seen requests be processed out of the order I send them in.

The reworked Modbus code only really uses the first and last of those threads and as I tune more of the CIP code, I will remove the second thread completely. It made some things easier early on but just gets in the way now. In the Modbus code the PLC thread runs the same function that the second thread runs on other PLC types, so the same work gets done.

I hope that background was useful.

Some things that might help:

  • I could expose the status of the PLC connection. plc_tag_get_int_attribute(tag_handle, "plc_connection_status", 0). I can probably cobble together something useful for this status or make new status up that is more relevant than the standard status codes. Probably new status.
  • I could dust off the ideas that I've had over the years to support tag grouping/batching within the library itself. None of these really seemed to help a lot but now that we have a lot more experience with async IO and different wrappers, maybe it needs to be revisited. This could be an API that lets you set a tag group ID on a tag and then call read/write and wait for operation termination on all the tags in the group (this is not thought through at all):
int32_t tag_handle = plc_tag_create("protocol=...&tag_group_id=42&...", 500);
/* or set it after the fact */
int rc = plc_tag_set_int_attribute(tag_handle, "tag_group_id", 42);

/* various calls to trigger IO and wait for completion.  */
int plc_tag_group_read(int32_t tag_group_id, int timeout_ms);
int plc_tag_group_write(int32_t tag_group_id, int timeout_ms);
int plc_tag_group_wait_for_completion(int32_t tag_group_id, int timeout_ms);

It might be nice to be able to trigger different operations on subsets of tags and then wait for them all at once. Not sure that this is really that useful of a use case though.

Maybe we'd need callbacks on tag groups? That would go in a different direction where you would create the group separately and then add tags to it rather than the other way around above.

Definitely more thought is needed to make that usable.

@jkoplo
Copy link
Member

jkoplo commented Dec 16, 2021

Nice info! My mental model of the core library was correct (a little simplified, but correct).

I really think this can be handled in a performant way without tweaking the base library. I'm not even sure exceptions are a problem - Skeet's article benchmarked throwing/catching 21 exceptions per millisecond in .NET 2.0

I might attempt a local set up on a PLC with a 1000+ tags to test some things out.

In the meantime, @ShadowMarker789 do you have a simplified example you could share?

@ShadowMarker789
Copy link
Author

ShadowMarker789 commented Dec 16, 2021

I had to redact out a bunch of stuff to comply with NDA compliance which is annoying.

I am considering that perhaps I should be doing a Parallel.ForEach on the set of things as opposed to a while-true loop that sleeps a lot.

Would there be any changes you'd make to this based on what you see here?

But with a Parallel.Foreach I'd also have to lock the list of tags, which delays modifying the list of tags until a read/write operation has completed.

// this gets executed for each known tag. All tags are unfortunately atomic. 
// We don't get any guarantees for UDT compliance across the set of equipment, so we must read from their atomic structures 
// I would much rather query a known chunk and strip it for the data we want, in the same way that I would rather win the lottery lol 
// also like, 1/10th of the tags don't even exist on the target device because the control systems engineers are too busy to fix them lol aaaa
// this is an asynchronous Task 

while (true)
{
	while (!isConnected) // whether the "key monitored tag" is good or not
	{
		// because the default C# implementation of random isn't thread safe
		await Task.Delay(ThreadSafeRandom.Next(1000, 5000));
	}

	try
	{
		// because reading and writing are mutually exclusive 
		// tagLock is a SemaphoreSlim 
		await tagLock.WaitAsync();
		LastReadChangedState = false;
		object newValue = null;
		bool isGood = false;
		try
		{
			await Tag.ReadAsync();
			isGood = true;
			// GetValue will call any of the GetXYZ32/64 etc functions depending on the client requested TypeCode 
			newValue = GetValue(Tag, TypeCode);

			if (!Equals(newValue, lastValue))
			{
				lastValue = newValue;
				LastReadChangedState = true;
			}
		}
		// because we enjoy leaking memory lol 
		catch (OutOfMemoryException oome)
		{
			client.log.Fatal(oome.Message);
			await Task.Delay(5000);
			Environment.FailFast(oome.Message);
		}
		// if you keep hitting exceptions, wait longer okay?
		catch (Exception)
		{
			if (targetTime < maxTargetTime)
			{
				targetTime++;
			}
		}
		finally
		{
			// if status changed notify client even if value hasn't changed
			var status = Tag.GetStatus();
			if (status != LastStatus)
			{
				LastStatus = status;
				LastReadChangedState = true;
			}

			if (LastReadChangedState)
			{
				if (targetTime > minTargetTime)
				{
					targetTime--;
				}
				try
				{
					// NOTIFY ALL SUBSCRIBED 'CLIENTS' HERE
					// CODE REDACTED TO COMPLY WITH NDA REQUIREMENTS
				}
				catch (OutOfMemoryException oome)
				{
					client.log.Fatal(oome.Message);
					await Task.Delay(5000);
					Environment.FailFast(oome.Message);
				}
				catch (Exception) { };
			}
			else
			{
				if (targetTime < maxTargetTime)
				{
					targetTime++;
				}
			}
		}
	}
	finally
	{
		tagLock.Release();
	}

	sleepTime = TimeSpan.FromMilliseconds(targetTime) - sw.Elapsed;
	if (sleepTime > minSleepTime)
	{
		//System.Diagnostics.Trace.WriteLine($"Tag {Name} sleep for {sleepTime.TotalMilliseconds} ms");
		await Task.Delay(sleepTime);
	}

	// wait extra long if we're timing okay from network issues
	if(Tag.GetStatus() == libplctag.Status.ErrorTimeout)
		await Task.Delay(ThreadSafeRandom.Next(30000, 60000));

	sw.Restart();
}

@jkoplo
Copy link
Member

jkoplo commented Dec 16, 2021

So from my quick read-through, you've got a loop that's monitoring a bunch of tags and sending notifications for each tag if its value or status changed? And it's implied there's another loop somewhere that's receiving requests to write tag values?

What I'm not seeing is the tag list - the example code only shows one tag. Are you doing a foreach on a List of tags somewhere?

I've actually written something similar a couple times. Let me see if I can write something up as an example in the next couple days. There are definitely some optimizations that can happen here on the c# side.

@ShadowMarker789
Copy link
Author

When each Tag-association is created, it spawns and runs the task, so we have a long-running task per tag.

I'll see if I can refactor my code to a Parallel-Foreach and see if that improves performance

@ShadowMarker789
Copy link
Author

I am having better performance with Parallel.For.

Parallel.For(0, lastTagCount, po, async (i) => { await tagList[i].PollAsync(); });

Where po is ParallelOptions setting a MaxDegreeOfParallelism of ProcessorCount / 2 (But cannot be less than 1)

Where PollAsync is an asynchronous method that also checks whether or not the 'primary tag' is healthy or not. If it's not healthy it returns immediately. I also have it set that if any of the tags encounter an ErrorTimeout status that they set that same flag to unhealthy so that as few tags as possible attempt a read after we've hit the ErrorTimeout.

I've tested yanking out the cable and plugging it in a few times and do get some CPU spikes, but it's manageable now, as opposed to locking up the program for several seconds.

My preference would still be to have exceptionless handling of the tags, as we have the great Status enum for determining what the result of the action requested was.

@timyhac
Copy link
Collaborator

timyhac commented Dec 21, 2021

There is some more guidance here that is pretty relevant: https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/exceptions-and-performance
The guidance is debated on StackOverflow though.

My thoughts on solutions:

  1. TryXYZ variants of each read/write/initialize API.
    • PRO - there seems to be advice from authority to use this API form.
    • CON - it would introduce 6 new methods to Tag and Tag<T,M>.
    • CON - Async timeout / cancellationToken cancellation still throws exception (performance gain nullified) but I guess we could use some other method for timeout to provide a uniform API.
  2. Global switch to turn on/off exception throwing.
    • PRO - usage would be straightforward.
    • CON - there would be a hidden difference in how each API method works.
    • CON - it isnt possible to throw exceptions for some method calls, and not in others (not sure if this would ever be used?)
  3. Ask the libplctag.NET consumer to check Status after read/write/initialize.
    • PRO - application development would be the same as libplctag core, and could therefore use examples and other guidance from the libplctag ecosystem.
    • PRO - many developers check Status manually anyway (in addition to handling exceptions).
    • CON - I haven't come across a precedent in .NET ecosystem
    • CON - there does seem to be advice from authority to avoid this API form.

TBH I haven't digested the above comments. My preference would be for 1 or 3.

@jkoplo
Copy link
Member

jkoplo commented Dec 22, 2021

@timyhac I'm not entirely convinced we need any of those three solutions yet. I want to do some testing with a bunch of tags before I'm sure, but I think that this might be better handled with some example/recommended structure in C#.

If I wrote a client app that was grabbing 1000 different rows from a database, I wouldn't do that on 1000 separate threads/tasks. Partially for performance, and partially because they have a shared constraint around the database connection and so any delays or errors would happen 1000-fold. Having a ton of parallel tasks that share a lock/synchronization object is probably a sign that the parallel tasks aren't necessary...

I think the question is "what is a good, simple structure for a c# program that needs to stay in sync with many tags from different parts of the program?". We're kind of seeing the same thing come up on #230.

Anyway, I'll try to do some testing and report back.

@timyhac
Copy link
Collaborator

timyhac commented Dec 22, 2021

Yeah thats a good point mate

@jkoplo
Copy link
Member

jkoplo commented Dec 22, 2021

I wrote a very simple test last night to try to determine the highest performance way of reading a bunch of tags from an AB PLC. For a setup, I pushed code to an AB L16ER that included 2000 DINTs that are defined at the controller scope (not an array, separate tags). I connected to the controller directly from a Windows PC and tried a couple different Read scenarios in .NET 6.

Here's the relevant code:

        internal static void ReadForEach(List<TagDint> allTags)
        {
            var stopwatch = new Stopwatch();

            stopwatch.Restart();
            foreach (var item in allTags)
            {
                item.Read();
            }
            stopwatch.Stop();
            Console.WriteLine($"Tag ReadForEach = {stopwatch.ElapsedMilliseconds} msec");
        }

        internal static async Task ReadWhenAllAsync(List<TagDint> allTags)
        {
            var stopwatch = new Stopwatch();

            stopwatch.Restart();
            await Task.WhenAll(allTags.Select(x => x.ReadAsync()));
            stopwatch.Stop();
            Console.WriteLine($"Tag ReadWhenAllAsync = {stopwatch.ElapsedMilliseconds} msec");
        }

        internal static async Task ReadForEachAsync(List<TagDint> allTags)
        {
            var stopwatch = new Stopwatch();

            stopwatch.Restart();
            foreach (var item in allTags)
            {
                await item.ReadAsync();
            }
            stopwatch.Stop();
            Console.WriteLine($"Tag ReadForEachAsync = {stopwatch.ElapsedMilliseconds} msec");
        }


        internal static void ReadParallelForEach(List<TagDint> allTags)
        {
            var stopwatch = new Stopwatch();

            ParallelOptions parallelOptions = new()
            {
                MaxDegreeOfParallelism = 3
            };


            stopwatch.Restart();
            Parallel.ForEach(allTags, parallelOptions, x => x.Read());
            stopwatch.Stop();
            Console.WriteLine($"Tag ReadParallelForEach = {stopwatch.ElapsedMilliseconds} msec");
        }

        internal static async Task ReadAsyncParallelForEachAsync(List<TagDint> allTags)
        {
            var stopwatch = new Stopwatch();
            ParallelOptions parallelOptions = new()
            {
                //Using a more reasonable number (3) is much slower
                MaxDegreeOfParallelism = allTags.Count(),
            };

            stopwatch.Restart();

            await Parallel.ForEachAsync(allTags, parallelOptions, async (tag, token)  => await tag.ReadAsync(token));
            stopwatch.Stop();
            Console.WriteLine($"Tag ReadAsyncParallelForEachAsync = {stopwatch.ElapsedMilliseconds} msec");
        }

And here are the results:

Tag Instantiation = 20 msec
Tag InitWhenAllAsync = 1261 msec
Tag ReadForEach = 30784 msec
Tag ReadForEachAsync = 31113 msec
Tag ReadWhenAllAsync = 1020 msec
Tag ReadParallelForEach = 12221 msec
Tag ReadAsyncParallelForEachAsync = 1120 msec

Running it in Release vs Debug didn't make a noticeable difference. I also tried mixing Parallel.ForEach with ReadAsync but that results in fire-and-forget... don't do that.

Takeaways:

  • When you have a large group of tags, WhenAll will give you the lowest latency
  • Using WhenAll however, does create a bunch of Tasks (not threads) and there's CPU overhead in both this creation and in the code we use to synchronize the C callbacks to C# Async. In a practical sense, this means a CPU spike when the Read is called this way.
  • ReadAsyncParallelForEachAsync doesn't give you great performance unless you let it create a ton of tasks and at that point it's equivalent to WhenAll. You could use this to tune down the CPU spike at the expense of latency (probably).

I'll modify my test to include some exceptions and see how that impacts things.

@jkoplo
Copy link
Member

jkoplo commented Dec 22, 2021

Oh this is cool - VS 2022 let's me run in WSL straight from the IDE. Running the same code on Linux (Ubuntu 18.04) improves performance in some of the scenarios - notably the synchronous calls:

Tag Instantiation = 11 msec
Tag InitWhenAllAsync = 1015 msec
Tag ReadForEach = 10825 msec
Tag ReadForEachAsync = 10829 msec
Tag ReadWhenAllAsync = 965 msec
Tag ReadParallelForEach = 7242 msec
Tag ReadAsyncParallelForEachAsync = 993 msec

@kyle-github
Copy link
Member

Awesome work, Jody! A 30x perf difference between the options is a big deal!

Interesting that WSL is that much faster. Is that WSL2 or WLS1? I would expect WSL1 to be slower as it is not really a full VM but is a translation layer running on top of the NT kernel. WSL2 is a full HyperV VM with some special tweaks for speed and so should be fairly speedy. At least that is what I understand.

The numbers also show me that I should probably focus more on Windows performance. I am still using, more or less, POSIX-style network and event primitives. That works well enough for Linux/macOS/*BSD but clearly may not be the best for Windows. That said, I really have no basis for comparison since it could just be a Windows issue. Still, your numbers show about a factor of 3 difference on some of these. That's huge. The fully async cases are a lot closer.

I bought a Mac that should be arriving soon, so my native dev platform is going to change to macOS. I am also looking for a decent ~$500 Windows laptop that I can use for "native" Windows development. Working in a VM on Linux is fine for functionality but terrible for performance. I think 2022 is going to be about performance. If you have any recommendations for Windows laptops I would love to hear them.

These perf numbers would make a great wiki or Performance.md page for libplctag.NET. People are always wondering how to get the best performance. Since no one (no one sane at any rate) uses the raw C API, it make sense to have this stuff be wrapper specific.

What about other tests like creating 2000 threads, each with one tag? Or 2000 fibers? Is that close to Tasks? The threading one is close to the thread stress test I run. It just pounds away without delays between reads. The async stress test is maybe close to your ... not sure which test that is closest to.

Amazing work here. Really nice data!

@jkoplo
Copy link
Member

jkoplo commented Dec 22, 2021

So, surprisingly, that's on WSL1. My personal laptop is a Dell XPS 15 9550 - essentially the consumer version of the Dell Precisions I use at work. My Dell won't be getting Windows 11, which might factor into your buying decision, but otherwise an older XPS would be a good test mule.

One thing I'm a bit unclear about with libplctag base, is how/when it decides to pack requests. Is there a slight delay like "let me see if I get another request within 5 msec that I can pack into this one" or something else?

The WSL performance increase is what's not expected - reading 2000 tags sequentially one after another and waiting for each read to complete (the ForEach cases) makes sense @ 30 seconds - that's 15 msec per read of a single tag which seems reasonable. I don't know why that would be faster on Linux without packing (and not affecting the other cases).

Yeah I could do some other tests. I really think having tons of threads waiting on network IO is bad - that's basically why Async/Await exists in C# - to handle non-CPU bound work in 'parallel'.

@kyle-github
Copy link
Member

The latest release, 2.4.8, has some updates that let me support at least 100k tags using only the automatic read/write system. The changes are not big but they did make it work better on my systems. It was taking up about 300% of the CPU and now takes up about 125% (multicore). That's on macOS which is... uh... not the greatest at networking speed. Of course, it is the M1 CPU and that thing is crazy fast. Blows the doors off of my old Dell XPS 13.

The test program even does a binary search lookup to find the tag entry, indexed by the tag handle/ID. It does that on every callback for every tag.

@kyle-github
Copy link
Member

@jkoplo Currently, there is a queue of requests (one of the things I want to change as it could be a source of open-ended memory allocation in some circumstances). The thread handling IO over a connection to a CIP-based PLC takes all the requests that fit and packs them into a single packet. It sends that packet and waits for a response. It unpacks the response to each of the requesting tags. Then it does it again. The stage where it waits for a response is where additional requests can queue up. It does not wait. Any requests that fit when it looks get packed together.

This is partly why using a thread per tag or a large number of threads is less performant than using a single thread and lots of tags in async mode. The single thread rips through creating the requests and thus tends to keep the queue full. Threads will have different timing. For instance, suppose you have ten threads each managing one tag. If four of them queue requests before the IO thread looks for work, then those four will get packed into a request to the PLC. The other six will come in later and may end up in a separate packed request.

In practice, various other factors tend to perform some level of synchronization. For instance the OS tends to cause threads to start on specific time boundaries (c.f. the recent changes to make Windows perform well where it used increments of 15ms or so as the base "tick" for the context switch). In turn that tends to group work together at specific times. In real-world situations this is usually not a problem but I can see 20-30% changes in performance between a "good" pattern and a "bad" pattern when doing stress testing.

@jkoplo
Copy link
Member

jkoplo commented Dec 28, 2021

Interesting, but I have no way of managing the queue, right?

Say I want to read 2000 tags at a time interval (every 5 sec for example sake). The request queue is empty. I start my read requests. I'm guessing that the base library is waiting on an empty queue and will build a CIP packet as soon as it sees some requests. Probably that first request packet goes out much smaller than the max/ideal size. Then the other requests queue up while the base library waits for a response. The second request will be fully packed since the queue has a bunch of items. And maybe the third just includes the "leftovers".

Is that about right? Is it worth pursuing an optimization where a send doesn't occur until either 1) a full packet of data is in the queue or 2) no queue items have been added for X milliseconds, where X could be zero or greater?

Or does that just come out in the wash since it's only the first request?

@kyle-github
Copy link
Member

If you had that many requests, it would just be the first request. After that, they'd all get packed, and probably maximally. I've done a fair amount of testing and found that you have to try fairly hard (my stress test programs are designed to do this) to make it not work well. And even those programs tend to perform quite well. I can easily trash the PLC to the point I need to power cycle it. With a Raspberry PI.

Is it 100% perfect? Nope. That said, everything else I tried to make things more efficient ended up being large amounts of complicated code and/or have really bad edge cases that were easy to hit with real world situations. I am sure that someone smarter than me could figure out something that was simple and slightly more efficient. But only slightly.

With the automated sync system, I do have a delay on writes. Once you change a value, I mark the tag as "dirty" and then wait some period of time to collect more changes to the tag. Then the library writes out the value and resets the flag. That is to allow you to change all the elements of an array (or a string) before the write happens. Even that is best effort.

I did some testing by setting a similar delay for all requests to batch things up. In practice I saw no real difference in efficiency at high work loads. If you have so little work to do that you need to wait 100ms or so for a full packet, then you are not stressing the PLC at all. If you are doing so much that you are stressing the PLC, then you are likely maximally packing requests already.

I still might do it for situations where the packet count is a problem. But I need to move everything over to using the method of queuing I used for Modbus first. There each tag with an outstanding request is queued, not an individual (and individually allocated) request separate from the tag. I have found that PLCs tend to do better with a steady flow of small requests than a bursty flow of large ones. The load stays lower overall (at least for my ENBTs) and I see fewer signs of delayed clean up on connection drops. So I am actually moving a little bit in the opposite direction by intentionally spreading out automatic read requests to avoid bunching. I think I may have seen a slight pattern of impact of large requests on scan times too, but never was able to measure it.

Still at some point it should probably be a choice of the user, not me.

@jkoplo
Copy link
Member

jkoplo commented Dec 28, 2021

So that sounds like what I started to realize - any performance loss here is probably completely academic and in the single to low double millisecond range - which is probably what I would set the timeout to anyway.

Nothing to worry about - at least until I dig out a 56k modem and try to dial into a PLC... (joking).

I've hijacked this thread into a performance discussion. We should get back to @ShadowMarker789's actual issue at hand.

@kyle-github
Copy link
Member

Where are we on this?

@timyhac
Copy link
Collaborator

timyhac commented Jan 19, 2022

My view is that TryXYZ versions of methods should be developed.

@ShadowMarker789
Copy link
Author

For any TryXYZ methods, my personal preference would be to have these methods return the LibPlcTag.Status enum if they are synchronous methods, else returning Task<LibPlcTag.Status>

This way you can easily write if-checks to execute code on success or failure.

That said, the Trys would only be relevant for the Initialize/Write/Read and async variants methods as well, right?

As far as I can tell, I've not had an exceptions thrown when setting bits or getting information from the tag so long as the bounds and types are okay.

@timyhac
Copy link
Collaborator

timyhac commented Jan 20, 2022

Is there a reason you would want to use the status to differentiate between the failure modes? The TryXYZ methods I've seen have all returned a boolean for success/failure.

@jkoplo
Copy link
Member

jkoplo commented Jan 24, 2022

So I picked up my benchmark code (from above) and played with it a little more. Focusing on the most performant ReadWhenAllAsync I set it up to read 2000 DINT tags in a loop. This takes about 1000 msec per loop iteration.

A little explanation of what's happening here: We're creating 2000 dotnet Tasks. A Task can be a thread, but generally is an async/await paradigm that yields it's thread back to the thread pool when it waits on non-cpu bound work - disk IO, network IO, etc. In this case, we create a waithandle that gets tied to the libplctag callback. This means that when I throw a breakpoint in and check my diagnostics, I have 2000+ tasks, but "only" 22 threads. There's definitely room for improvement between dotnet and the base library here (but that's a topic for another day).

While it's reading in a loop if I pull the network cable, I see the timeout exception get caught. I don't see any CPU or memory spike (outside of the spikes we get from creating waithandles [cough cough]) and the exception is processed quickly.

All this is to say I'm not against the TryXYZ pattern, but I don't think it's the exceptions that are causing your performance issues...

@ShadowMarker789
Copy link
Author

Is there a reason you would want to use the status to differentiate between the failure modes? The TryXYZ methods I've seen have all returned a boolean for success/failure.

Yes. Some of the devices are actually misconfigured, so we'll often get ErrorNotFound, ErrorBadParam, ErrorOutOfBounds etc.

Which makes me sad, but we can use the different failure modes helps us diagnose what exactly is wrong.

@jkoplo
Copy link
Member

jkoplo commented May 28, 2022

@timyhac what are we thinking on the TryXYZ pattern? Worth implementing?

@ShadowMarker789 are you working past this issue?

@ShadowMarker789
Copy link
Author

@timyhac what are we thinking on the TryXYZ pattern? Worth implementing?

@ShadowMarker789 are you working past this issue?

We're working well enough for now.

The TryXYZ would be better for us in terms of code structuring for our use-case where if any operation fails we just set a few additional Properties and don't take any additional action(s). The catch handling is making the code messier than I like, but it does work.

There's the argument I oft hear that "Exceptions should be thrown in exceptional circumstances" which raises questions about whether an operation failing because of a timeout is exceptional or not.

I've done a fair amount of Win32 C# usage by now and am used to continuously checking HRESULT of DX calls and similar so I would have no complaints about the (3) option of above.

That said, given that the Read, Write, Initialize calls don't return anything as they're void methods, it should be not too much effort to potentially return a libplctag Status enum on the TryXYZ variants of those?

@timyhac
Copy link
Collaborator

timyhac commented Jun 21, 2022

@jkoplo - I think so. I did some work a few months ago on this which is in branch #224.

The way it is written now is non-standard for .NET libraries - i.e. returning a StatusCode is not the norm for TryXYZ - it should return a bool instead. The change would be straightforward but I just haven't gotten to it.

Tim

@popew
Copy link

popew commented Jul 18, 2022

@timyhac: Just don't call it TryXYZ if you want to return codes, but don't want to break the convention.

Alternatives:

  1. Provide out argument: bool TryXYZ(out Status status);
  2. How about using a result object?
public class Result
{
    bool IsSuccess;
    bool IsFailure => !IsSuccess;
    Status Status;

    public Result(Status status)
    {
        // Logic mapping status to IsSuccess bool
    }
}

@ShadowMarker789
Copy link
Author

@timyhac: Just don't call it TryXYZ if you want to return codes, but don't want to break the convention.

Alternatives:

  1. Provide out argument: bool TryXYZ(out Status status);
  2. How about using a result object?
public class Result
{
    bool IsSuccess;
    bool IsFailure => !IsSuccess;
    Status Status;

    public Result(Status status)
    {
        // Logic mapping status to IsSuccess bool
    }
}

Resurrecting this I'd rather not use a dedicated Result object where it's of type class because that's another allocation per operation and we're already getting smashed by the GC in our current application.

After a performance review, we are figuratively melting a few CPUs here and there, but our application is stable after incorporating various improvements ideas that everyone here suggested and was very helpful.

I like your idea of bool TryXYZ(out Status status) as that's very convenient for making a choice about what one should do when the operation does or does not succeed.

@timyhac
Copy link
Collaborator

timyhac commented Oct 27, 2022

@ShadowMarker789 - if you want to give it a go/review it - #224 contains an implementation for the TryXYZ pattern.
Another option we could look at multi-targeting and using ValueTask for the async methods.

Its quite interesting that exceptions and garbage collection is contributing such a large level of overhead. Do you know why it is so significant?

@ShadowMarker789
Copy link
Author

Its quite interesting that exceptions and garbage collection is contributing such a large level of overhead. Do you know why it is so significant?

There's a lot of pre-existing code that was written by people unfamiliar with managed languages. Completely unrelated to libplctag.Net. We have always had a lot of GC pressure, but I am not allotted the time or resources to alleviate any of it, so instead I'm trying very hard to make this scenario not any worse than it has to be.

garbage

I may see if I can run some tests on the branch. Your above comment links to this issue instead of branch 224

@airwedge1
Copy link

I think creating a method ReadMultipleTags(ITag[] tags) that does not throw an exception and returns a collection of "Statuses" would resolve the problem. Only one task is created and abstracts away that multiple calls are being made to the C library.

Existing code is not touched and no breaking changes are required.

It would be "nicest" if the C library also accepted a batch and then that "first call" would also batch many instead of only doing one in the first round trip, but probably only talking about only a millisecond or 2 in benefit and hence probably isn't worth it.

@timyhac
Copy link
Collaborator

timyhac commented Jan 27, 2023

Adding API methods would be a minor version increment. The proposal also just adds new APIs.

In the proposal you could do something like this which would be equivalent.

await Task.WaitAll(tags.Select(tag => tag.TryReadAsync());
var statuses = tags.Select(tag => tag.GetStatus());

Although I guess you'd probably want to keep track of which status came from which Tag.

There is an idea of a "Tag Group" which has come up from time to time which you may be interested in - see here.

Note that the core library will batch requests if it can (i.e. when performing concurrent requests).

@jkoplo
Copy link
Member

jkoplo commented Feb 6, 2023

I have a branch that would do something like this - a 'batch read' call. I like it because:

  1. There are some optimizations that can be done behind the scenes if you know you're waiting on a bunch of tags to raise an event (which is what the base library does). For example, you only need one TaskCompletionSource which saves allocations since they don't get re-used easily.
  2. It makes things easier for people. Reading multiple tags and waiting for a value back from all of them is a common operation and one that we've seen several issues around.

That said, I need to prove out the implementation and benchmark it against current to make sure it's an actual savings. It's also kind of hard to implement with our current abstractions/layers. I had to do some re-org to not change the public API.

If there's interest that would probably motivate me to do this sooner rather than later.

@airwedge1
Copy link

I do like the idea of a separate batch read call / tag group, but the points that timyhac made with the TryReadAsync pattern will solve all my pain points.

It does seem "weird / non performant" to need to create 100 tasks if wanting to batch read a bunch of tags, but in my testing and what I've read their will be very little gain in performance by reducing it to 1 task. The value is that it would just make me / us feel better. The leaders in here would need to agree to what the pattern should look like to avoid spending time just to have it to be rejected from getting merged.

@kyle-github
Copy link
Member

Is this a case where I should provide a base function in the C library?

@airwedge1
Copy link

@kyle-github I do think providing a batch read specific call in the base library would "feel the best" and would also I think avoid having the more complicated code that @jkoplo was referring to.

I want to be clear though in my opinion I have doubts that this will make any statistical difference and not sure if the value is there in putting forth the effort. Depends on how much work you think is necessary to make the change.

@kyle-github
Copy link
Member

I've had to reimplement that functionality in every async C example, poorly. Doing it in the library would be somewhat reasonable. The only challenge is what to do if people do things like start a batch of async operations in one thread and in another thread dispose of some of the tags. And then where do I store the callback info? Right now that is on each tag.

@airwedge1
Copy link

I figured you would keep the tag objects and the callbacks still per tag. I visioned it as just a wrapper function that would just wait to receive all the callbacks for all tags and then return all the results at the same time. Less important would be adding something to tell the queue to wait so that first packet can be "full" in batch instances instead of only having one tag in it would then also save a couple more milliseconds.

Not sure about your threading question. If what you are saying could cause a problem it seems like the same thing would be possible now. People just shouldn't do that and they would get an exception.

@kyle-github
Copy link
Member

There are at least two operations that make things tricky. The first is having the application destroy a tag handle while a group async operation is pending. That is solvable. The second concerns ownership of the group itself. Assuming that the API looks something like this:

int plc_tag_group_read(int32_t *tags, int num_tags, int timeout);

Then who owns the tags array? If the application owns it, then the application could free it and cause the entire app to crash. I try very hard to make that impossible currently. If I copy the array, then I am adding more allocation to the internal operations of the library, which I am trying to eliminate.

All this is solvable, but involves tradeoffs that are not always understood at first. For instance, the above looks simple but doesn't have a callback into the app. It also doesn't have a way to give the individual tag operation statuses, though you could just read the statuses of the individual tags.

@jkoplo
Copy link
Member

jkoplo commented Feb 9, 2023

So I actually think the current API is reasonable and writing a wrapper around it makes sense.

In dotnet:

  • Take in IEnumerable<Tag>
  • Create a hashset/dictionary for all tags
  • Subscribe to tag read callback
  • Start async read on each tag
  • When callback received, remove tag from hashset, if hashset is empty return/set task completion source

That's the basics. Obviously there's some timeouts and corner cases but I think that strategy should work.

Performance wise trying to move this into the c library probably won't get us a ton. It would save a bunch of events getting marshaled between c and dotnet, but I doubt those are much of an impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants