Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace current c-style sessions to c++ #240

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

tszumski
Copy link
Collaborator

No description provided.


#include "utils.h"

class session
Copy link
Collaborator

@Sakoram Sakoram Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's capitalize every class name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean CamelCase or Camel_Snake ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change snake_case, but personally I'm used to class names starting with capital letters, so for classes it would be Snake_case_started_with_capital_letter (however it is called) ^^

Copy link
Collaborator Author

@tszumski tszumski Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest JustForClassNames, to be consistent with gRPC/TCP class names

#include <thread>
#include <stdexcept>

using namespace std; // for atomic_bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first you write using namespace std; and after that sometimes in code there is std:: used. Let's unify approach.

I would explicitly use std::, it makes obvious that given function is standard one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, then this change also apply for atomic_bool

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I hope it is possible to do without breaking C code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can do that


#include <mcm_dp.h>

#include "shm_memif.h" /* share memory */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "shm_memif.h" /* share memory */
#include "shm_memif.h" /* shared memory */

int shm_deinit();

protected:
atomic_bool shm_ready;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you split public section into fields and methods, but in protected section you have both fields and methods. It is a little inconsistent.

Honestly, in my very limited experience I rarely encountered splitting the sections, but I'm ok with it as far as it is consistent.


#include "session-base.h"

#define MTL_ZERO_COPY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is added by build script, please remove it from the code.

Copy link
Collaborator Author

@tszumski tszumski Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous approach was wrong. Macro is defined in media_proxy subdirectory.
When you add #include "session-mtl.h" inside Unit Tests macro is alwas OFF, while compilation of "media_proxy" submodule happen with macro ON. This lead to smaller object allocation size compared to what library expected, and overall read/write outside of allocation happen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO:
Remove ZERO_COPY macro from media_proxy/CMakeLists.txt

@@ -134,63 +136,9 @@ void* msg_loop(void* ptr)
if (it->id == session_id) {
/* return memif parameters. */
memif_conn_param param = { };
if (it->type == TX) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful 🎉 I was devastated every time I looked at that spaghetti. good job

@@ -134,63 +136,9 @@ void* msg_loop(void* ptr)
if (it->id == session_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change Failed to start MTL session. to Failed to start a session. a few lines above. ^^^^
Thanks. :)

int err;
memif_socket_handle_t memif_socket = memif_conn_args.socket;

do {
Copy link
Collaborator

@Sakoram Sakoram Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should implement graceful exit here. Currently we use pthread_cancel(), which could potentially create some problems as this thread handles events and calls functions like on_receive_cb(). Those functions lock mutexes, and interact with libfabric and MTL. A lot of funny stuff can happen when this is interrupted in random places.

We could set a little shorter timeout then infinity and check some new stop flag on every loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

int ret = 0;

/* Set application name */
strncpy(memif_socket_args.app_name, memif_ops->app_name,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
strncpy(memif_socket_args.app_name, memif_ops->app_name,
strlcpy(memif_socket_args.app_name, memif_ops->app_name,

to be sure that the string is NULL terminated. Should be done everywhere.

void rx_st20_mtl_session::cleanup()
{
if (frame_thread_handle) {
frame_thread_handle->join();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it won't cancel the thread. It would wait indefinitely. The stop variable has to be set for this thread to stop.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue exists in different places

Copy link
Collaborator Author

@tszumski tszumski Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop variable is set in session_stop(), but i think i should completely remove that function and move everything to destructor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but session_stop() is called from proxy_context.cc, and this join() will freeze the thread that called rx_st20_mtl_session(), which is the thread from proxy_context.cc which would call session_stop() otherwise.

Anyway, you are right. We need to leverage RAII as much as possible, we should move to destructors every cleanup that can be moved there.

Copy link
Collaborator

@Sakoram Sakoram Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, to cleanup when constructor fails and throws an exception we could:

  1. Use smart pointers whenever you allocate memory
  2. Do something like this when smart pointers aren't enough:
class ResourceWrapper {
    // Your resource management logic
public:
    ResourceWrapper () {
        // Acquire the resource
    }
    ~ResourceWrapper () {
        // Release the resource
    }
};

class Session{
    ResourceWrapper resource;  // RAII type
public:
    Session() {
        // If an exception is thrown here, `resource` will still be cleaned up
        if (/* some condition */) {
            throw std::runtime_error("Initialization failed");
        }
    }

    ~Session() {
        // `resource ` will automatically clean itself up
    }
};

@Sakoram
Copy link
Collaborator

Sakoram commented Oct 23, 2024

I'm thinking what to do with error handling.

  • In some places there is simply return -1 used. I don't like it.
  • Elsewhere, some semi-useful error codes ( like return -EINVAL ) are returned. That's better, the best what can be done in C, maybe only defining more descriptive err codes could improve the situation in C
  • But this is c++ and sometimes throw is used, which seems to be the most elegant way. maybe slightly less performant, but not much, I guess it is the best everywhere besides critical paths.

Any thoughts? Are there any drawback of using try-catch and throw everywhere?

return -1;
}

err = memif_buffer_alloc_timeout(memif_conn, qid, shm_bufs, 1, &rx_buf_num, frame_size, 10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized a problem that I myself had introduced. This code is invoked from MTL's "scheduler" thread. This function has to be very quick and non blocking. I'm not even sure if normal memif_buffer_alloc can be used in it, but I'm sure that using memif_buffer_alloc_timeout is a very bad idea as this function could wait for slot in memif's ring and this will starve every session in MTL's scheduler that made call to this function.

I guess we have to use memif_buffer_alloc() instead of memif_buffer_alloc_timeout() and just drop frames if no memory is available in memif ring

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@Sakoram
Copy link
Collaborator

Sakoram commented Oct 23, 2024

What do you think about splitting every class to a separate file? It could improve readability

session_ptr->type = RX;
mDpCtx.push_back(session_ptr);

return session_ptr->id;
}

int ProxyContext::RxStart_mtl(const mcm_conn_param *request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Break this function into a few smaller ones. for example move device init into first function, move the case switch to second, call the two of them and set variables in this one.
  • change imtl_init_preparing to some real lock
  • the same applies to TxStart_mtl
  • please add a lock similar to imtl_init_preparing (but actually working) also to RxStart_rdma and TxStart_rdma

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "case switch part" could be even a separate class as it is basically a factory pattern. Calling it explicitly a factory could help readability.
(if we want to be very crazy, we could even use abstract factory with concrete factories for RX and TX. I guess this could also work, but this is a lot of abstraction)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any bigger changes to proxy_context.cc and api_server_tcp.cc should be done in the next PR to not to touch too many topics at once

Copy link
Collaborator Author

@tszumski tszumski Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Break this function into a few smaller ones. for example move device init into first function, move the case switch to second, call the two of them and set variables in this one.
  • change imtl_init_preparing to some real lock
  • the same applies to TxStart_mtl
  • please add a lock similar to imtl_init_preparing (but actually working) also to RxStart_rdma and TxStart_rdma

I think the best way is to move MTL/RDMA device creation before we accept any connection(both of them at once), and close those devices after we stop accepting connection. This way we avoid unnecessary locks and imtl_init_preparing will disappear. Thanks to that only mDpCtx have to be locked

Copy link
Collaborator

@Sakoram Sakoram Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can define some arguments to media_proxy to specify what backend will be initialized at startup. maybe everything currently available should be initialized by default.


session *session_ptr = *ctx;
session_ptr->session_stop();
delete session_ptr;
mDpCtx.erase(ctx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you know that, but just a reminder to lock mDpCtx every time it is changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any bigger changes to proxy_context.cc and api_server_tcp.cc should be done in the next PR to not to touch too many topics at once

@tszumski
Copy link
Collaborator Author

I'm thinking what to do with error handling.

  • In some places there is simply return -1 used. I don't like it.
  • Elsewhere, some semi-useful error codes ( like return -EINVAL ) are returned. That's better, the best what can be done in C, maybe only defining more descriptive err codes could improve the situation in C
  • But this is c++ and sometimes throw is used, which seems to be the most elegant way. maybe slightly less performant, but not much, I guess it is the best everywhere besides critical paths.

Any thoughts? Are there any drawback of using try-catch and throw everywhere?

  • We cannot use "throw" functionality in function that are called through callback by MTL/memif. We can only return error codes or print error messages
  • Generally for proper error(and messages/logs) handling we have to implement one consistent approach across entire MCM repository. I think this is quite large topic and should added as another PR

@tszumski
Copy link
Collaborator Author

What do you think about splitting every class to a separate file? It could improve readability

Good idea, only if we move source files into multiple subdirectories. Otherwise having too many files in single directory will reduce readability
My proposition:

media_proxy
    app
        media_proxy.cc
    servers
        tcp
            <tcp-related sources>
        grpc
            <grpc-related sources>
    sessions
        rdma
            <rdma-related sources>
        mtl
            <mtl-related sources>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants