MessageQueue is always "Available" problem #77

aflteam · 2020-02-13T12:58:30Z

I have a problem about MessageQueue in 104 Simple Server mode.

CS104_Slave_enqueueASDU() function is adding new ASDUs to msg que. And self->entryCounter is incremented.

After ASDU is sent and confirmed (MessageQueue_markAsduAsConfirmed). MessageQue should release the message and decrement entryCounter. But that step is missing.
Becuse of that, connectionHandlingThread is always acts as there is isAsduWaiting is true.

Even if MessageQueue_getNextWaitingASDU() function returns NULL, which means there is no data to send anymore, MessageQueue_isAsduAvailable() function returns true. Same beauvoir shown in Debug Print as below:
ASDUs in FIFO: 9 (new(size=28/12): 0x39440, first: 0x39360, last: 0x39440 lastInBuf: 0x39440)

ASDUs in FIFO is constantly is increased even if message is sent. And first que pointer is fixed and newer deleted.

The text was updated successfully, but these errors were encountered:

…see #77) - CS104 master: confirm all received I messages before sending STOPDT ACT or closing the connection - CS104 master: add additional semaphore to protect write to connection socket

aflteam · 2020-02-24T08:45:03Z

Hey @mzillgith,

I am testing the latest commit. And, ASDU Fifo is now decremented after confirmed.
Thanks for improvement.

I created a test case which is working as redundancy server. (examples/cs104_redundancy_server)
Two different connection is established with different redundancy group. IP addresses of clients are added as Allowed IP.

While two communication ongoing, they got same amount of ASDU as data.
I unplug one of the ethernet cable of clients, wait a little while then i plug it again. So in that time unplugged ASDU Fifo increases and the other one still consumed to zero. After I re-plug the cable and communication recovers, i expect that accumulated ASDU will relase and both received ASDU counters will be same. But that is not the case. Most of ASDU is recovered but some of them are lost in the process.

mzillgith · 2020-03-04T06:17:52Z

Hi @aflteam

Thanks for your feedback.

In redundancy group mode there are separate queues for the different redundancy groups. Depending on the storage capacity of the queue the queue can overflow when the connection is interrupted for some time. In this case messages can be lost.

Can this be the reason for your observed ASDU loss?

aflteam · 2020-03-13T05:13:30Z

Hey @mzillgith,

I controlled the test case again. Que is not filled much messages. You can think of two each connection (red. group) is working properly at the beginning and ques are empty. Then disruption of the cable cause connection timeout. After the connection timeout Que is started increment, when it reaches only 50 Asdu, I plugged the cable again. When comparing one working and disrupted line total Asdus are not same. I think timeout mechanism does that.

If same test is done this way, no problem i seen. One connection is working normally and the other connection is closed with intentionally from master, ASDU que is incremented. After connection is re-establish ,received ASDU numbers will be same.

The diffrence is connection timeout (comm. error) and communication is closed (comm. disconnect).

nicolatimeus · 2020-05-22T08:47:33Z

Hi all,

I've tested 651691a as well and it seems to be a good improvement, it reduces CPU usage a bit and it prevents message losses on reconnection.

@aflteam, I've performed similar tests and I've seen that with 651691a the library will likely transmit some messages that will appear as duplicates to the master.
This is caused by the fact that if the connection is dropped for some reason, there queue will likely contain some messages that have been sent by the slave, and possibly also received by the master but not yet confirmed by it with a S frame (or the S frame has been sent by the master but lost due to the connection drop). At reconnection the slave will retransmit the messages that have not been confirmed.
The number of retransmitted messages should be less or equal to the value of the k parameter.
I did not observe loss of messages so far.

For this reason I think that with current implementation if you try to fill the queue with a given sequence of messages and try to drain it with or without connection drops, the sequence of messages received by the master in the two cases will likely be different. Can this explain the behaviour you are observing?

@mzillgith can you please confirm this and the fact that the retransmission mechanism is compliant with the spec? do you know when the next release with the that change will be available?

Thanks

mzillgith · 2020-05-27T17:11:54Z

@nicolatimeus Your explanation seems correct to me. The retransmission mechanism is designed to avoid lost messages. But it cannot avoid duplicate messages. When the connection is lost after the message is sent but before the confirmation is received, there is no way for the slave to know if the message was processed by the client and therefore the message remains in the queue and will be resent later when the connection is established again. This is my understanding of the standard.

aflteam mentioned this issue Feb 14, 2020

Issues with CS104 slave MessageQueue #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MessageQueue is always "Available" problem #77

MessageQueue is always "Available" problem #77

aflteam commented Feb 13, 2020

aflteam commented Feb 24, 2020 •

edited

Loading

mzillgith commented Mar 4, 2020

aflteam commented Mar 13, 2020

nicolatimeus commented May 22, 2020

mzillgith commented May 27, 2020

MessageQueue is always "Available" problem #77

MessageQueue is always "Available" problem #77

Comments

aflteam commented Feb 13, 2020

aflteam commented Feb 24, 2020 • edited Loading

mzillgith commented Mar 4, 2020

aflteam commented Mar 13, 2020

nicolatimeus commented May 22, 2020

mzillgith commented May 27, 2020

aflteam commented Feb 24, 2020 •

edited

Loading