Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MessageQueue is always "Available" problem #77

Open
aflteam opened this issue Feb 13, 2020 · 5 comments
Open

MessageQueue is always "Available" problem #77

aflteam opened this issue Feb 13, 2020 · 5 comments

Comments

@aflteam
Copy link

aflteam commented Feb 13, 2020

I have a problem about MessageQueue in 104 Simple Server mode.

CS104_Slave_enqueueASDU() function is adding new ASDUs to msg que. And self->entryCounter is incremented.

After ASDU is sent and confirmed (MessageQueue_markAsduAsConfirmed). MessageQue should release the message and decrement entryCounter. But that step is missing.
Becuse of that, connectionHandlingThread is always acts as there is isAsduWaiting is true.

Even if MessageQueue_getNextWaitingASDU() function returns NULL, which means there is no data to send anymore, MessageQueue_isAsduAvailable() function returns true. Same beauvoir shown in Debug Print as below:
ASDUs in FIFO: 9 (new(size=28/12): 0x39440, first: 0x39360, last: 0x39440 lastInBuf: 0x39440)

ASDUs in FIFO is constantly is increased even if message is sent. And first que pointer is fixed and newer deleted.

mzillgith added a commit that referenced this issue Feb 18, 2020
…see #77)

- CS104 master: confirm all received I messages before sending STOPDT ACT or closing the connection
- CS104 master: add additional semaphore to protect write to connection socket
@aflteam
Copy link
Author

aflteam commented Feb 24, 2020

Hey @mzillgith,

I am testing the latest commit. And, ASDU Fifo is now decremented after confirmed.
Thanks for improvement.

I created a test case which is working as redundancy server. (examples/cs104_redundancy_server)
Two different connection is established with different redundancy group. IP addresses of clients are added as Allowed IP.

While two communication ongoing, they got same amount of ASDU as data.
I unplug one of the ethernet cable of clients, wait a little while then i plug it again. So in that time unplugged ASDU Fifo increases and the other one still consumed to zero. After I re-plug the cable and communication recovers, i expect that accumulated ASDU will relase and both received ASDU counters will be same. But that is not the case. Most of ASDU is recovered but some of them are lost in the process.

@mzillgith
Copy link
Contributor

Hi @aflteam

Thanks for your feedback.

In redundancy group mode there are separate queues for the different redundancy groups. Depending on the storage capacity of the queue the queue can overflow when the connection is interrupted for some time. In this case messages can be lost.

Can this be the reason for your observed ASDU loss?

@aflteam
Copy link
Author

aflteam commented Mar 13, 2020

Hey @mzillgith,

I controlled the test case again. Que is not filled much messages. You can think of two each connection (red. group) is working properly at the beginning and ques are empty. Then disruption of the cable cause connection timeout. After the connection timeout Que is started increment, when it reaches only 50 Asdu, I plugged the cable again. When comparing one working and disrupted line total Asdus are not same. I think timeout mechanism does that.

If same test is done this way, no problem i seen. One connection is working normally and the other connection is closed with intentionally from master, ASDU que is incremented. After connection is re-establish ,received ASDU numbers will be same.

The diffrence is connection timeout (comm. error) and communication is closed (comm. disconnect).

@nicolatimeus
Copy link

Hi all,

I've tested 651691a as well and it seems to be a good improvement, it reduces CPU usage a bit and it prevents message losses on reconnection.

@aflteam, I've performed similar tests and I've seen that with 651691a the library will likely transmit some messages that will appear as duplicates to the master.
This is caused by the fact that if the connection is dropped for some reason, there queue will likely contain some messages that have been sent by the slave, and possibly also received by the master but not yet confirmed by it with a S frame (or the S frame has been sent by the master but lost due to the connection drop). At reconnection the slave will retransmit the messages that have not been confirmed.
The number of retransmitted messages should be less or equal to the value of the k parameter.
I did not observe loss of messages so far.

For this reason I think that with current implementation if you try to fill the queue with a given sequence of messages and try to drain it with or without connection drops, the sequence of messages received by the master in the two cases will likely be different. Can this explain the behaviour you are observing?

@mzillgith can you please confirm this and the fact that the retransmission mechanism is compliant with the spec? do you know when the next release with the that change will be available?

Thanks

@mzillgith
Copy link
Contributor

@nicolatimeus Your explanation seems correct to me. The retransmission mechanism is designed to avoid lost messages. But it cannot avoid duplicate messages. When the connection is lost after the message is sent but before the confirmation is received, there is no way for the slave to know if the message was processed by the client and therefore the message remains in the queue and will be resent later when the connection is established again. This is my understanding of the standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants