Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use default build-in transport in reliable mode, the last one packet or last several packets can not be received by subscriber #5280

Closed
1 task done
yuzu-ogura opened this issue Sep 30, 2024 · 14 comments
Labels
need more info Issue that requires more info from contributor

Comments

@yuzu-ogura
Copy link

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

in reliable mode, all packet maybe received by subscriber

Current behavior

the last one packet or last several packets can not be received by subscriber

Steps to reproduce

// high frequency send data
while(i < 1000) {
datawriter->write(userdata);
}

Fast DDS version/commit

v2.6.9

Platform/Architecture

Ubuntu Focal 20.04 amd64

Transport layer

Default configuration, UDPv4 & SHM

Additional context

No response

XML configuration file

No response

Relevant log output

No response

Network traffic capture

No response

@yuzu-ogura yuzu-ogura added the triage Issue pending classification label Sep 30, 2024
@JesusPoderoso
Copy link
Contributor

JesusPoderoso commented Sep 30, 2024

Hi @yuzu-ogura,
can you provide a reproducer to determine the root cause of the issue?
Which QoS configuration are you applying? Have you considered using TRANSIENT_LOCAL durability QoS to ensure reliability, among the RELIABLE reliability QoS?
Is HistoryQoS configured as KEEP_LAST 1? In such case, you should consider increasing the depth.

@JesusPoderoso JesusPoderoso added need more info Issue that requires more info from contributor and removed triage Issue pending classification labels Sep 30, 2024
@yuzu-ogura
Copy link
Author

of course @JesusPoderoso .
datawriterqos is like below:

    DataWriterQos writerQos(DATAWRITER_QOS_DEFAULT);
    writerQos.resource_limits().max_samples = 1000;
    writerQos.history().kind = KEEP_ALL_HISTORY_QOS;
    writerQos.publish_mode().kind = ASYNCHRONOUS_PUBLISH_MODE;
    writerQos.reliability().kind = RELIABLE_RELIABILITY_QOS;
    writerQos.data_sharing().off();
    writerQos.endpoint().history_memory_policy = eprosima::fastrtps::rtps::PREALLOCATED_WITH_REALLOC_MEMORY_MODE;

datareaderqos is like below:

    eprosima::fastdds::dds::DataReaderQos readerQos(DATAREADER_QOS_DEFAULT);
    readerQos.resource_limits().max_samples = 1000;
    readerQos.history().kind = KEEP_ALL_HISTORY_QOS;
    readerQos.endpoint().history_memory_policy = eprosima::fastrtps::rtps::PREALLOCATED_WITH_REALLOC_MEMORY_MODE;
    readerQos.data_sharing().off();
    readerQos.reliability().kind = RELIABLE_RELIABILITY_QOS;
    readerQos.durability().kind = TRANSIENT_LOCAL_DURABILITY_QOS;

other qos all apply the default qos
main.cpp like below:

while(index < 999) {
        HelloWorld hello;
        hello.index(++index);
        datawriter->write(&hello)
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
    }

idl like below:

struct HelloWorld
{
    unsigned long index;
    sequence<octet> points;
};

@JesusPoderoso
Copy link
Contributor

@yuzu-ogura are the endpoints running in the same process?
In that case, do you wait for the subscriber to receive all samples before killing the publisher and ending the app execution?

@JesusPoderoso
Copy link
Contributor

Using the wait-for-acknowledgments method forces the publisher to wait until the sample has been acknowledged. You may consider using the method on the last sample to force that @yuzu-ogura.

You may also consider checking the write call output just in case the write is not being performed (due to a timeout).
To solve that, you should increase the max_blocking_time reliability QoS field.

@yuzu-ogura
Copy link
Author

@yuzu-ogura are the endpoints running in the same process? In that case, do you wait for the subscriber to receive all samples before killing the publisher and ending the app execution?

yes. pub and sub are run separately on different host. And pub will only exit when all the data had been sent . The sub will always sleep in main thread to avoid exiting.

@yuzu-ogura
Copy link
Author

You may also consider checking the write call output just in case the write is not being performed (due to a timeout).
To solve that, you should increase the max_blocking_time reliability QoS field.

so, if bool write() return true, it doesn't mean data send successfully?

@JesusPoderoso
Copy link
Contributor

so, if bool write() return true, it doesn't mean data send successfully?

The write return value was not considered in the code snippet provided. It returns true if successfully sent, but it can also return false. In those cases, you were not considering that the write call failed.

@JesusPoderoso
Copy link
Contributor

pub will only exit when all the data had been sent

In such a reliable case scenario, you should wait until all data has been received too.

@yuzu-ogura
Copy link
Author

The write return value was not considered in the code snippet provided. It returns true if successfully sent, but it can also return false. In those cases, you were not considering that the write call failed.

I remove other non-necessary code include check reture value fromcode snippet . And here is pub && sub log:
pub:
image

sub:
image

@yuzu-ogura
Copy link
Author

pub will only exit when all the data had been sent

In such a reliable case scenario, you should wait until all data has been received too.

This approach will block until sub really receive data or timeout. However, it may reduce the sending rate ?

@yuzu-ogura
Copy link
Author

@JesusPoderoso Sorry, I realized it was my fault .Thanks for your enthusiastically reply . Just now I found another scene that call write method before pub && sub matched. the write method still return true. how could I solve it ?

@JesusPoderoso
Copy link
Contributor

This approach will block until sub really receive data or timeout.

Yes

However, it may reduce the sending rate?

No, because you only wait the ack of the sent samples after all samples have been sent.

    while(index < 999) {
        HelloWorld hello;
        hello.index(++index);
        if (!datawriter->write(&hello)){
           std::cout << "Could not send message with  index " << index << std::endl;
        }
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
    }
    datawriter->wait_for_acknowledgments(eprosima::fastrtps::Duration_t(1, 0));  // wait 1 sec 

@JesusPoderoso
Copy link
Contributor

Just now I found another scene that call write method before pub && sub matched. the write method still return true. how could I solve it ?

I would recommend you @yuzu-ogura to follow the best-practices structure of the refactored examples in Fast DDS v3. In particular, the mechanism to determine that endpoints have already matched before start sending messages.

@yuzu-ogura
Copy link
Author

OK, Thanks for your enthusiastically reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info Issue that requires more info from contributor
Projects
None yet
Development

No branches or pull requests

2 participants