Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vendor custom attributes after SAI_*_ATTR_END #2099

Open
kcudnik opened this issue Nov 11, 2024 · 4 comments
Open

Vendor custom attributes after SAI_*_ATTR_END #2099

kcudnik opened this issue Nov 11, 2024 · 4 comments
Assignees

Comments

@kcudnik
Copy link
Collaborator

kcudnik commented Nov 11, 2024

This issue is related to recently discovered problem: sonic-net/sonic-buildimage#20725

During syncd boot sequence we are performing discovery operation on the device iterating GET via all OID attributes to figure out what objects are existing on the device.

Recent problem is that in syncd we use latest headers that for example added SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST attribute which was introduced in SAI version v1.15.0, but SAI version used in sonic was v1.14.0. This should not be and issue, since from version v1.10.0 SAI is binary backward compatible.

Unfortunately there are 2 problems here.

  1. we moved DASH extensions to fixed range [meta][sai] Move extensions SAI and API to 0x20000000 range #2028, which previously on v1.14.0 dash attributes were glued after SAI_*_ATTR_END to have continuous range of attributes for given object, and from version v1.15.0 all dash attributes are moved to fixed range 0x20000000, this is breaking change and not backward compatible, but right now in sonic this is not and issue.
  2. in above case where SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST caused issues, vendor had some custom attributes implemented after SAI_PORT_ATTR_END. Problem is that SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST was not existing in v1.14.0 but the same enum ID was used for vendor custom attribute which GET operation returned success, and returned some internal value, not expected by syncd used SAI v1.15.0, this also potentially could lead to crash if custom attribute would want to populate array pointer. In this case discovery process expected valid OID but got some unknown value field which didn't correspond to OID and returned invalid object type, and caused to syncd crash. (this is actually a feature to find issues like this in vendor SAI)

Having custom attributes after ATTR_END is not backward compatible, it may be handy for vendors to keep enum id small value instead of 0x10000000 where custom attr range starts, but it causes issue describe

NOTE: Vendors should always use SAI_*_ATTR_CUSTOM_RANGE_START to implement internal attributes in their SAI.

To mitigate partially this issue i'm working on enhancing SAI metadata to include SAI version for each attribute, and query only attributes that are up to the version of SAI that vendor is returning. This solution is only partially solving this issue since there are other 2 cases here:

  1. in syncd we are often using SAI headers from master (to test new features) which master can have new attributes defined that are not in official release yet, so for example current official release is v1.15.0 but after that we have more commits that contain new attributes which will become part of new release like v1.15.1 or v1.16.0 so skipping this attribute during discovery is also not good solution, but potential conflict with custom vendor attributes after ATTR_END.
  2. create/set operation with new attributes, for example if syncd is using headers from v1.15.0 but vendor lib sai i v1.14.0 then during SET/CREATE operation a new attribute can be specified by OA that is only present in v1.15.0 and it potentially can overlap with custom vendor attribute after ATTR_END causing to crash because of invalid and unknown data type passed by this attribute

I think permanent solution is to make sure all vendors will have not attributes after ATTR_END so they will be backward compatible. And current partial solution can be checking attributes version a gains plugged SAI version

Please give your thoughts.

Visual example:

// v1.14.0
...
    SAI_PORT_ATTR_ARS_PORT_LOAD_PAST_WEIGHT,
    SAI_PORT_ATTR_ARS_PORT_LOAD_FUTURE_WEIGHT,
    SAI_PORT_ATTR_POE_PORT_ID,
    SAI_PORT_ATTR_END,

// v1.15.0
...
    SAI_PORT_ATTR_ARS_PORT_LOAD_PAST_WEIGHT,
    SAI_PORT_ATTR_ARS_PORT_LOAD_FUTURE_WEIGHT,
    SAI_PORT_ATTR_POE_PORT_ID,
    SAI_PORT_ATTR_JSON_FORMATTED_DEBUG_DATA_SIZE,
    SAI_PORT_ATTR_UNRELIABLE_LOS,
    SAI_PORT_ATTR_ERROR_STATUS,
    SAI_PORT_ATTR_STATS_COUNT_MODE,
    SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST,
    SAI_PORT_ATTR_END,

// v1.14.0 - vendor implementation
    SAI_PORT_ATTR_ARS_PORT_LOAD_PAST_WEIGHT,
    SAI_PORT_ATTR_ARS_PORT_LOAD_FUTURE_WEIGHT,
    SAI_PORT_ATTR_POE_PORT_ID,
    SAI_PORT_ATTR_END,
    SAI_PORT_ATTR_CUSTOM_VENDOR_1 = SAI_PORT_ATTR_END,
    SAI_PORT_ATTR_CUSTOM_VENDOR_2,
    SAI_PORT_ATTR_CUSTOM_VENDOR_3,
    SAI_PORT_ATTR_CUSTOM_VENDOR_4,
    SAI_PORT_ATTR_CUSTOM_VENDOR_5,
     ...

So you can see, when syncd is doing query SAI_PORT_ATTR_SELECTIVE_COUNTER_LIST it's actually getting SAI_PORT_ATTR_CUSTOM_VENDOR_5, which caused issue at the first place.

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 11, 2024

I created attribute API metadata PR here: #2100

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 20, 2024

plus @JaiOCP for visibility

kcudnik added a commit to kcudnik/sonic-sairedis that referenced this issue Nov 21, 2024
By default disabled. If enabled, it will use SAI metadata and libsai
version returned by sai_api_version_query to verify if given attribute
during SAI discovery process was introduced in the current libsai
version or below that version.

This feature is partial solution to this problem:
sonic-net/sonic-buildimage#20725,
more here: opencomputeproject/SAI#2099
@bandaru-viswanath
Copy link
Contributor

bandaru-viswanath commented Nov 22, 2024

Few questions, please clarify -

  • For moving the extensions to CUSTOM_RANGE, do the extensions need to be placed in the community header file or can they continue to exist in the experimental header file, and just use the CUSTOM_RANGE_START? if we were to place the extension attributes in the experimental headers, then CUSTOM_RANGE_START would be an issue, since the spec has the CUSTOM_RANGE_END immediately following that. If this CUSTOM_RANGE_END were to serve any purpose, the extension attributes must be within the original header file between the START and END markers.

  • Looking at the saiport.h, there is SAI_PORT_ATTR_CUSTOM_RANGE_START and also SAI_PORT_ATTR_EXTENSIONS_RANGE_BASE. What should be the start for vendor extension attributes ? Such an EXTENSIONS_BASE doesn't appear in all headers. Please advise.

  • CUSTOM_RANGE exists only for attributes. What about enums such as sai_object_type_t and sai_api_t ? These have EXTENSIONS_RANGE_BASE but not CUSTOM_RANGE. Again it is confusing since the recommendation is stating CUSTOM_RANGE.

Thanks

@kcudnik
Copy link
Collaborator Author

kcudnik commented Nov 22, 2024

First of all please don't confuse extensions with custom attributes.

  • custom attributes are internal vendor attributes which are not intended to be visible for public, or included in official SAI headers, those are intended for internal usage, ranges on enum starting from 0x10000000
  • extension attributes are extensions and experimental attributes which can be modified and removed at any time, not backward compatible, starting from range 0x20000000
  • there should be not attributes implemented after any ATTR_END marker for backward and future compatibility of SAI

Addressing your questions:

  1. there is no moving extensions to custom range, as above stated what is extensions and custom range, i assume that you are referring to internal/custom vendor attributes that were implemented after ATTR_END marker. if vendor have custom attributes to talk to SAI, vendor should have 2 sets of attributes one for public release, and 2nd for internal use, all internal attributes should be implemented in CUSTOM_RANGE enums. CUSTOM_RANGE_END attribute is just a marker in public headers, in internal headers it can be moved to any position after adding internal attributes, CUSTOM_RANGE_END in public SAI headers serves just information role. since each vendor may have different number of custom attributes CUSTOM_RANGE_END will have different value on each vendor SAI
  2. again, all internal attributes that are not meant for public, should be implemented in CUSTOM_RANGE_START at any enum. Extensions that are implemented in EXTENSIONS_BASE are in SAI/extensions directory and are for public visible and can be removed changed at any time. EXTENSIONS_BASE don't appear at all enums since we didn't added them in enums that don't have extensions, to just not pullute headers.
  3. CUSTOM_RANGE is not added to all enums since not all enums have custom ranges in vendors, we didn't added thatm to not pollute headers, if you wish to add them to any enum, sure, please create PR to mark them to enums you will want to use. by default we are not planning to add them to all enums besides attributes. internal vendor custom attributes/enums are not recommendations, it's requirement to not break SAI, i will add custom range for api and for objet_type for you here:Add custom range base enums for api and object_type #2112

kcudnik added a commit to sonic-net/sonic-sairedis that referenced this issue Nov 22, 2024
By default disabled. If enabled, it will use SAI metadata and libsai
version returned by sai_api_version_query to verify if given attribute
during SAI discovery process was introduced in the current libsai
version or below that version.

This feature is partial solution to this problem:
sonic-net/sonic-buildimage#20725,
more here: opencomputeproject/SAI#2099
shiraez pushed a commit to Marvell-switching/sonic-sairedis that referenced this issue Dec 12, 2024
By default disabled. If enabled, it will use SAI metadata and libsai
version returned by sai_api_version_query to verify if given attribute
during SAI discovery process was introduced in the current libsai
version or below that version.

This feature is partial solution to this problem:
sonic-net/sonic-buildimage#20725,
more here: opencomputeproject/SAI#2099
shiraez pushed a commit to Marvell-switching/sonic-sairedis that referenced this issue Dec 12, 2024
By default disabled. If enabled, it will use SAI metadata and libsai
version returned by sai_api_version_query to verify if given attribute
during SAI discovery process was introduced in the current libsai
version or below that version.

This feature is partial solution to this problem:
sonic-net/sonic-buildimage#20725,
more here: opencomputeproject/SAI#2099
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants