Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why dash_sai_create_dash_acl_rule create multiple SAI_OBJECT_TYPE_DASH_ACL_RULE objects? #436

Open
kcudnik opened this issue Sep 9, 2023 · 32 comments
Labels
question Further information is requested

Comments

@kcudnik
Copy link
Collaborator

kcudnik commented Sep 9, 2023

When sai implementation is generated, functions is added:

 176 static sai_status_t dash_sai_create_dash_acl_rule(
 177         _Out_ sai_object_id_t *dash_acl_rule_id,
 178         _In_ sai_object_id_t switch_id,
 179         _In_ uint32_t attr_count,
 180         _In_ const sai_attribute_t *attr_list)

in generated file: dash-pipeline/SAI/lib/saidashacl.cpp

but internally i see 5 times, so 5 times object is created;

objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);

and then inserted:

if (false == dashSai->insertInTable(matchActionEntry, objId)) {

but insertion is with the 5 different objects,

that generated code for that functions process in loop several attributes which are duplicated in each process line

can anyone explain my why this is happening ? this function should create only 1 single SAI objects, perhaps multiple bmv2 objects internally, but from SAI perspective only one, i see this as a bug, please explain

@kcudnik kcudnik added the question Further information is requested label Sep 9, 2023
@chrispsommers
Copy link
Collaborator

@marian-pritsak can best explain this, it's a byproduct of some implementation choices, not a "bug."

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 9, 2023

further more if we grep actionId:

  24     pi_p4_id_t actionId = 0;
  45     actionId = 25655048; // SAI_DASH_ACL_GROUP_ACTION_SET_ACL_GROUP_ATTRS
  48     action->set_action_id(actionId);
 193     pi_p4_id_t actionId = 0;
 283                     actionId = 32161567;
 289                     actionId = 20706700;
 295                     actionId = 28146588;
 301                     actionId = 31424218;
 314     action->set_action_id(actionId);
 423                     actionId = 32161567;
 429                     actionId = 20706700;
 435                     actionId = 28146588;
 441                     actionId = 31424218;
 454     action->set_action_id(actionId);
 563                     actionId = 32161567;
 569                     actionId = 20706700;
 575                     actionId = 28146588;
 581                     actionId = 31424218;
 594     action->set_action_id(actionId);
 703                     actionId = 18858683;
 709                     actionId = 24263137;
 715                     actionId = 29962337;
 721                     actionId = 26077229;
 734     action->set_action_id(actionId);
 843                     actionId = 18858683;
 849                     actionId = 24263137;
 855                     actionId = 29962337;
 861                     actionId = 26077229;
 874     action->set_action_id(actionId);
 983                     actionId = 18858683;
 989                     actionId = 24263137;
 995                     actionId = 29962337;
1001                     actionId = 26077229;
1014     action->set_action_id(actionId);

we can see that those actions repeat for each object, i would like to have some explanation on this, i would really like to stick in creating single SAI object here

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 9, 2023

@marian-pritsak can best explain this, it's a byproduct of some implementation choices, not a "bug."

@marian-pritsak
im not sure, i dont understand entire code yet, but in this create function multiple objects are created, and when you call remove, only 1 objects i removed, unless remove is somehow called multiple times outside SAI which make no sense here
also i noticed that insert table and remove are into multi map, but i don;t know how this make sense ether, if each object have unique ID, so there should be no duplicated objects in the multimap

@saad-mzhr saad-mzhr reopened this Sep 10, 2023
@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 20, 2023

Any update on this ?

@Pterosaur
Copy link
Collaborator

I remember this is by design. Because we have 5 stages for each ACL direction, we always manage the ACL by ACL group instead of STAGE.
So, in the current design, we insert the ACL rules to every stage and use the group name as the table key to differentiate the stage.
If we want to insert an ACL rule only into one stage, that will increase the complexity of hardware/software design.

@marian-pritsak Please correct me if anything is misunderstood.

@kcudnik
Copy link
Collaborator Author

kcudnik commented Sep 20, 2023

one SAI api calls should not create 5 different SAI objects

@KrisNey-MSFT
Copy link
Collaborator

Adding @prsunny and @r12f in case they have the bandwidth or knowledge

@chrispsommers
Copy link
Collaborator

@marian-pritsak to review again

@KrisNey-MSFT
Copy link
Collaborator

@marian-pritsak responded during the week, and plans to review, per 9/27 Community Call

@KrisNey-MSFT
Copy link
Collaborator

Looks like a bug, need to find time to look for a solution. If anyone else in the Community could help or provide a pointer, please do :)

@KrisNey-MSFT
Copy link
Collaborator

Using SAI to express the interface, the backend is the issue here.

@KrisNey-MSFT
Copy link
Collaborator

@AmithGspn offered to take a look, assigning to him

@KrisNey-MSFT
Copy link
Collaborator

hi @AmithGspn - just curious whether you and Meyappan had been able to check this out? TY!

@meyappank
Copy link
Collaborator

meyappank commented Nov 17, 2023

ubuntu@ip-172-31-30-182:~/DASH_/dash-pipeline/SAI/lib$ egrep -rn "acl_stage|SAI_OBJECT_TYPE_DASH_ACL_RULE" saidashacl.cpp
211: // For stage inbound_acl_stage1
218: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);
222: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE");
233: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
342: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
363: // For stage inbound_acl_stage2
370: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);
374: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE");
385: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
494: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
515: // For stage inbound_acl_stage3
522: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);
526: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE");
537: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
646: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
667: // For stage outbound_acl_stage1
674: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);
678: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE");
689: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
798: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
819: // For stage outbound_acl_stage2
826: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);
830: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE");
841: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
950: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
971: // For stage outbound_acl_stage3
978: objId = dashSai->getNextObjectId((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE);
982: DASH_LOG_ERROR("getNextObjectId failed for SAI_OBJECT_TYPE_DASH_ACL_RULE");
993: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);
1102: auto *md = sai_metadata_get_attr_metadata((sai_object_type_t)SAI_OBJECT_TYPE_DASH_ACL_RULE, attr_list[i].id);

Explanation - Objects are created at stage levels. I see 3 inbound stages and 3 outbound stages. Total 6 stages. So, 6 objects are created. Guess this is as per design.

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 17, 2023

Ok maybe this is required by bmv2 but this is still single SAI API call which should return single OID when create, and since 6 are created internally this is not they way it should go, since which object should be returned ? Similar question arises when doing later set/get/remove on such objects where internally are 6 created

tagging @marian-pritsak to take a look please

@meyappank
Copy link
Collaborator

meyappank commented Nov 29, 2023

Here is my comments... lets wait for @marian-pritsak comments to conclude..

Yes single API passing number of OIDs(groups/stages) to be created. In this case, it is asking 6 OIDs(group/stage) to be created.

static sai_status_t dash_sai_create_dash_acl_rules(
In sai_object_id_t switch_id,
In uint32_t object_count, <<<< Application requesting number of objects/stages to be created internally
In const uint32_t *attr_count,
In const sai_attribute_t **attr_list,
In sai_bulk_op_error_mode_t mode,
Out sai_object_id_t *object_id,
Out sai_status_t *object_statuses)

Base pointer to OID(OID[0] shall be returned to Application. Application can use the base OID + group id (OID[groupid]) to index the respective stage to set/get.

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 29, 2023

Here is my comments... lets wait for @marian-pritsak comments to conclude..

Yes single API passing number of OIDs(groups/stages) to be created. In this case, it is asking 6 OIDs(group/stage) to be created.

static sai_status_t dash_sai_create_dash_acl_rules( In sai_object_id_t switch_id, In uint32_t object_count, <<<< Application requesting number of objects/stages to be created internally In const uint32_t *attr_count, In const sai_attribute_t **attr_list, In sai_bulk_op_error_mode_t mode, Out sai_object_id_t *object_id, Out sai_status_t *object_statuses)

Base pointer to OID(OID[0] shall be returned to Application. Application can use the base OID + group id (OID[groupid]) to index the respective stage to set/get.

but api im reffering to is singe create not bulk:

176 static sai_status_t dash_sai_create_dash_acl_rule(
 177         _Out_ sai_object_id_t *dash_acl_rule_id,
 178         _In_ sai_object_id_t switch_id,
 179         _In_ uint32_t attr_count,
 180         _In_ const sai_attribute_t *attr_list)

that single api inside is creating 6 oids internally

@meyappank
Copy link
Collaborator

Agreed. Based on the logs/evidences, I believe trigger is bulk create and not single create. We need to check why bulk create is triggered ?

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 29, 2023

i dont think this is mistaken with bulk api, since number of object does not depend on any counter, since there is no counter in input params, this seems to me like limiatation of bmv2 or some other reason that im not aware, or maybe it was easier to implement ?

@KrisNey-MSFT
Copy link
Collaborator

ping to @r12f and @marian-pritsak to take a look

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 29, 2023

also take a look at this part: (there are 5 or 6 of them in that function)

1136     if (false == dashSai->insertInTable(matchActionEntry, objId)) {
1137         goto ErrRet;
1138     }

so actually each of the generated objId is inserted into Table with matchActionEntry, and that table in dash SAI is actually used as

std::unordered_multimap<sai_object_id_t, std::shared_ptr<p4::v1::TableEntry> > m_tableEntryMap;

so multiple objects can be present at the same key, so i believe this was intended action to do this way, i still dont have enough knowledge to understand, whether this could be done in other way, but my intention suggest me that there should be only one objectId per SAI call generated, and if different internal indexes should be used for required bmv2 stages, then this should be handled in different manner

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 29, 2023

and there is only one line (and should be only one):

1141     *dash_acl_rule_id = objId;

which uses last object id as output to the create function

@r12f
Copy link
Collaborator

r12f commented Nov 29, 2023

I have also looked at this issue, and I believe what Kamil said above is true. This looks to be intentional. And here is the detailed explanation in my understanding.

In short, this is caused by the combination result of 5 stages of ACLs and ACL group atomic swap requirement:

  1. First of all, all ACL rules are hold by ACL groups, while each ACL group has a unique ID to represent it.
  2. Each stage of ACL works as a dedicated match action table and each ENI can associate each table to any ACL group by updating its ID via it attributes.

Because of these 2 things, if we want to implement ACL rules directly using P4, then all tables will have to hold all ACL groups with all ACL rules in order to make the ENI ACL group attribute update work as expected. However, because 5 ACL rules will be created, the id of the last one is used as return result.

You might find this loop doesn't apply for ACL group, this is because an extra mostly no-op table is created for generating the ACL group with only the the first stage involved. From what I am seeing, this extra table is almost used for generating SAI API only, but not really populating the ACL group id into metatdata.

    @name("dash_acl_group|dash_acl")
    table acl_group {
        key = {
            meta.stage1_dash_acl_group_id : exact @name("meta.stage1_dash_acl_group_id:dash_acl_group_id");
        }
        actions = {
            set_acl_group_attrs();
        }
    }

I agree with Kamil, supposedly, only 1 entry should be inserted while only 1 id should be added. So the question is - can we associate a ACL group with a stage? E.g., adding an attribute in the ACL group to specify which stage it should go. Then we will know exactly which table it should go in the first place, hence only 1 entry will be created.

This might not only apply to BMv2, but also applies to real implementation. Otherwise when a new ACL group followed by ACL rules are added, we cannot really program the ASIC and have to save all the information in memory. The ASIC programming will has to be delayed until the ACL group ID on the ENI is changed, then all ACL rules can be sent to the right table. Considering the scale, this leads to unnecessary persistent memory usage (SAI implementation has to hold the ACL group in memory, even through it is programmed into ASIC). And I am not seeing use cases where 2 stages sharing the same ACL group.... (Why running the same ACL twice back to back?)

I will let @marian-pritsak and @kcudnik to take a look on my comment and see how reasonable it is. And then, we can move on from there.

@meyappank
Copy link
Collaborator

Reference - Previous discussion on the same issue(https://www.youtube.com/watch?v=bxitnRrQlhY).

My understanding from the recordings:
Fixed number of ACL table( 6 - 3 inbound and 3 outbound) are predefined for BMv2. All 6 tables have the same ACL rules only ACL group id differs. Meaning if you invoke "dash_sai_create_dash_acl_rule()", it will create the rule on all 6 ACL tables - this is applicable only for BMv2 and real implementation will have unique ACL tables and unique sai api. I guess this approach was taken in BMv2 for ease implementation.

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 30, 2023

@r12f This sounds good, but as i mentioned, i have little to nothing knowledge about bmv2, and im not able to determine whether this is correct bahaviour or not, just my assumptions hot it should go

@KrisNey-MSFT
Copy link
Collaborator

KrisNey-MSFT commented Dec 6, 2023

When ACL group is created, can we say it is bound to a stage? @r12f @sharmasushant
Call out in the HLD that we bind 1 group to 1 stage
Group->Create Rules->ENI->Stage

Original SDN document
https://github.com/sonic-net/DASH/blob/main/documentation/general/sdn-pipeline-basic-elements.md#acl-levels

SONiC-DASH HLD document to possibly be updated
https://github.com/sonic-net/DASH/blob/main/documentation/general/dash-sonic-hld.md#325-acl

@sharmasushant
Copy link
Collaborator

Objects stored in SAI need to have 1:1 mapping with what is provided via DASH proto.
And SAI needs to give those objects as it is to sailib and vendor code.
There should be no logic in SAI. It should just be a proxy to forward dash objects to vendor datapath implementation.
So, forking one acl_group into multiple does not seem right to me.
This needs to be fixed.

@KrisNey-MSFT
Copy link
Collaborator

@r12f indicated this is not easy to fix, and is related to the leaf/spine design

@KrisNey-MSFT
Copy link
Collaborator

Just checking back - @marian-pritsak and @r12f - so we are ok with the multiple entries for now? @devenjagasia and @kcudnik for visibility

Could you guys read through @r12f 's comment above in the thread please? Riff Comment

@KrisNey-MSFT
Copy link
Collaborator

Dedicated meeting w/ @r12f and @kcudnik per Marian

@KrisNey-MSFT
Copy link
Collaborator

Per @r12f - we plan to meet on this separately in December 2024

@KrisNey-MSFT
Copy link
Collaborator

KrisNey-MSFT commented Dec 2, 2024

Need some refactoring in the first stage.
Each action is using a different index, and these are not handled now.
Per-action P4 indexing is a problem.
Currently all actions use the same index (for PL, ST, CA/PA mapping for example).
We have no SET/GET support - this needs to be enabled and placed into CI.
This is in current libsai, and are things that need to be fixed.
The libsai j2 template is too complicated to maintain.
There is a lot of duplication.
RE: the libsai generation in DASH.
Need to split this into multiple issues.
Simplify J2 template and libsai generation. @r12f did this for headers, not yet for libsai.
With a cleaner template we can solve Issues 1 by 1.
Of the 5 Stage ACL, maybe pick one ACL as 'representer' to track it.
Add Object ID to a list of Object ID mappings (multi-stage object ID)
How to place CI and build image pipeline?

@r12f @marian-pritsak @kcudnik @jimmyzhai for viz and/or help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests