Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a hook to limit scanning of the entire PCIe space for pre-silicon effectiveness #418

Open
cdwilde opened this issue Jan 13, 2025 · 10 comments

Comments

@cdwilde
Copy link

cdwilde commented Jan 13, 2025

In pre-silicon, the S/BSA setup and tests loop over all Bus devices and functions in multiple parts of the code:
BSA:
test_pool/exerciser/operating_system/test_os_e015.c
test_pool/pcie/operating_system/test_os_p002.c
test_pool/pcie/operating_system/test_os_p004.c
test_pool/pcie/operating_system/test_os_p005.c
val/common/src/acs_pcie.c
val/common/src/acs_peripherals.c
SBSA:
test_pool/pcie/operating_system/test_p003.c
test_pool/pmu/operating_system/test_pmu007.c

This is extremely costly in a pre-silicon environment. Changing the upper MAX value for each bus, device, and function and not always helpful either, especially if the active devices are non-contiguous. I would like to ask that we add a PAL hook to allow the user to cut down this space.

As an example (for acs_pcie.c):
/* Derive ecam specific information */
seg_num = (uint32_t)val_pcie_get_info(PCIE_INFO_SEGMENT, ecam_index);
start_bus = (uint32_t)val_pcie_get_info(PCIE_INFO_START_BUS, ecam_index);
end_bus = (uint32_t)val_pcie_get_info(PCIE_INFO_END_BUS, ecam_index);

  /* Iterate over all buses, devices and functions in this ecam */
  for (bus_index = start_bus; bus_index <= end_bus; bus_index++)
  {
      for (dev_index = 0; dev_index < PCIE_MAX_DEV; dev_index++)
      {
          for (func_index = 0; func_index < PCIE_MAX_FUNC; func_index++)
          {
              /* Form bdf using seg, bus, device, function numbers */
              bdf = PCIE_CREATE_BDF(seg_num, bus_index, dev_index, func_index);

Add these lines here to allow the user to skip the rest of this loop iteration:
/* Skip non-sequential BDFs to save time */
if (pal_pcie_check_device_valid(bdf))
continue;

Hopefully, pal_pcie_check_device_valid is a good function to use. As far as I can tell, it is only called in acs_pcie.c and that later call would no longer be needed. I would put buses, devices, and functions that I want skipped in this function and it does the job of not wasting inordinate amounts of time in pre-silicon.

@Sujana-M
Copy link
Contributor

Hi @cdwilde,

For running ACS at Pre-silicon, we have optimisations to ensure the user can change the values of the maximum bus, device and function according to their requirement.

  1. To modify the values of the max bus, device and function for pre-silicon, request you to modify the defines at lines 254-256 in the platform config file at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/include/platform_override_fvp.h#L254. On modifying thee parameters, the respective max, device and function will be taken in the test
  2. In the above tests and files mentioned, the max bus number is taken from the pcie info table created. The details of which are taken from the ECAM space config details configured at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/include/platform_override_fvp.h#L237.

Please let us know if you need more information.

Thanks,
ACS Team.

@cdwilde
Copy link
Author

cdwilde commented Jan 17, 2025

This is insufficient. We may have a space where the Buses are not consecutive (Think 31, 63, 91, ... up to 255). Setting the MAX BUS is insufficient as we would need the entire range of 255, but only every 32 or so. The way you have this implemented requires that you go through the entire bus range for only a handful of valid buses. What I am proposing will allow the user to have control instead of relying on max values. In this case, we might say in a pal function:
if bus % 32:
return VALID
else:
return INVALID

@Sujana-M
Copy link
Contributor

Hi @cdwilde,

In scenarios where the buses are not consecutive, you can modify the number of ECAM to more than 1, with the same ECAM base address and make the corresponding changes of start and end bus. Please find the reference changes at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/include/platform_override_fvp.h#L269. The corresponding change to also be done at https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/src/platform_cfg_fvp.c#L256 and https://github.com/ARM-software/bsa-acs/blob/main/pal/baremetal/target/RDN2/common/src/platform_cfg_fvp.c#L273.

This will ensure reduced time for non-consecutive bus ranges.

In addition, the tests mentioned

  1. test_pool/pcie/operating_system/test_os_p002.c and test_pool/pcie/operating_system/test_p003.c
    : For complete coverage of the rules, the entire ECAM space of all devices within the start to end bus of the ECAM need to be verified.
  2. test_pool/pcie/operating_system/test_os_p004.c and test_pool/pcie/operating_system/test_os_p005.c: These tests traverse through the devices and function of the bus range only between the secondary and subordinate bus. As part of the enumeration, the secondary and subordinate buses programmed on the RP will only be until the busses connected to the RP.
  3. val/common/src/acs_pcie.c and val/common/src/acs_peripherals.c: By making the changes mentioned above in the platform config file, the time taken for this will be reduced.

Please let us know if this helps.

Thanks,
ACS Team

@cdwilde
Copy link
Author

cdwilde commented Jan 31, 2025

Thanks for this idea. I am out of the office until mid next week but will give it a try when I get back.

@cdwilde
Copy link
Author

cdwilde commented Feb 5, 2025

I'm back. The other issue I run into is that modifying MAX_FUNC and MAX_DEV causes address calculation problems since these values are used all over the place for address calculation. For example, in acs_pcie.c:
/* There are 8 functions / device, 32 devices / Bus and each has a 4KB config space */
cfg_addr = (bus * PCIE_MAX_DEV * PCIE_MAX_FUNC * 4096) +
(dev * PCIE_MAX_FUNC * 4096) + (func * 4096);

I would need a way to tell the system that MAX_BUS, MAX_DEV, and MAX_FUNC are the full values for address calculations. I would then need a way to divide up the space to only iterate over certain addresses. I don't see a way to do this with the current implementation. Am I missing something here?

@Sujana-M
Copy link
Contributor

Sujana-M commented Feb 7, 2025

Hi @cdwilde,

Hoping that the scenario of enumeration for non-consecutive bus is taken care by modifying the number of ECAM to more than 1 as mentioned in the previous comment. This modification would have reduced the enumeration/BDF scan time considerably.

For PCIE_MAX_DEV and PCIE_MAX_FUNC, These macros used in the tests, apart from configuration address calculation, are part of the test verification, where the entire Device and function space need to be scanned. Request you to please let us know how much time is it taking for scanning upto PCIE_MAX_DEV and PCIE_MAX_FUNC, given that the non-consecutive bus ranges are taken care.

If the BSA test "test_pool/pcie/operating_system/test_os_p002.c" and SBSA test "test_pool/pcie/operating_system/test_p003.c" are taking long time, [as the entire BDF range to be tested for complete coverage], these tests can be commented out and run, if the tests have passed in the first run.

Thanks,
ACS Team

@cdwilde
Copy link
Author

cdwilde commented Feb 7, 2025

When I went to modify PCIE_MAX_DEV and PCIE_MAX_FUNC, my address calculations started failing all over the place. I can try to reduce only the buses and see if that runs in a reasonable time.

@cdwilde
Copy link
Author

cdwilde commented Feb 10, 2025

Alright, I attempted the multiple ECAM region as requested. I ran into a couple of issues.

  1. In pal_pcie.c, I had to change from local ECAM start and stop to use the full ecam range:
    if ((sec_bus >= platform_pcie_cfg.block[ecam_index].start_bus_num) &&
  •        (sub_bus <= platform_pcie_cfg.block[ecam_index].end_bus_num) &&
    
  •    if ((sec_bus >= 0) &&
    
  •        (sub_bus <= PLATFORM_OVERRIDE_PCIE_MAX_BUS) &&
    

If I didn't do this, it started flagging some valid spaces as invalid. This is likely due to splitting up a valid range into smaller ones.
2) The big problem is that PCIe test 808 started failing. This is because it also relies on segments to calculate Max bus value:
/* Get the maximum bus value from PCIe info table */
end_bus = val_pcie_get_info(PCIE_INFO_END_BUS, ecam_index);
segment = val_pcie_get_info(PCIE_INFO_SEGMENT, ecam_index);
val_set_status(pe_index, RESULT_SKIP(TEST_NUM, 1));

  /* Get the highest BDF value for that segment */
  bdf = get_max_bdf(segment, end_bus);

  /* Get the maximum subordinate bus for that segment */
  sub_bus = get_max_bus(segment, end_bus);

  /* Get the least highest of max bus number */
  bus_index = (PCIE_EXTRACT_BDF_BUS(bdf) < end_bus) ? sub_bus:end_bus;
  bus_index += 1;
  val_print(ACS_PRINT_INFO, "\n       Maximum bus value is 0x%x", bus_index);

Since we divided this continuous ECAM space into smaller ones, the Maximum bus value started having issues:
Previous passing:
[11356891415 ps] Maximum sub bus 20
[11357742082 ps] Maximum bus value is 0x21

Now failing:
[13029858814 ps] Maximum sub bus 20
[13030699371 ps] Maximum bus value is 0xfa

Values from platform_override_bifrost.h:
#define PLATFORM_OVERRIDE_PCIE_ECAM_BASE_ADDR_0 PLATFORM_ECAM_BASE
#define PLATFORM_OVERRIDE_PCIE_SEGMENT_GRP_NUM_0 0x0
#define PLATFORM_OVERRIDE_PCIE_START_BUS_NUM_0 0x0
#define PLATFORM_OVERRIDE_PCIE_END_BUS_NUM_0 0x1
#define PLATFORM_OVERRIDE_PCIE_ECAM_BASE_ADDR_1 PLATFORM_ECAM_BASE
#define PLATFORM_OVERRIDE_PCIE_SEGMENT_GRP_NUM_1 0x0
#define PLATFORM_OVERRIDE_PCIE_START_BUS_NUM_1 0x1F
#define PLATFORM_OVERRIDE_PCIE_END_BUS_NUM_1 0x20
#define PLATFORM_OVERRIDE_PCIE_ECAM_BASE_ADDR_2 PLATFORM_ECAM_BASE
#define PLATFORM_OVERRIDE_PCIE_SEGMENT_GRP_NUM_2 0x0
#define PLATFORM_OVERRIDE_PCIE_START_BUS_NUM_2 0xF8
#define PLATFORM_OVERRIDE_PCIE_END_BUS_NUM_2 0xF9

Has this method of overriding busses into smaller segments been done before? I'm still liking giving the user a hook to control this through the pal interface. It allows the user to control the regions in a way that doesn't break other tests. :)

@Sujana-M
Copy link
Contributor

Hi @cdwilde,

Please make the following changes in the hierarchy file to ensure the inconsistent Bus number is taken into consideration.

PLATFORM_OVERRIDE_NUM_ECAM will be 1, PLATFORM_OVERRIDE_PCIE_ECAM0_HB_COUNT will be 3 as you have 3 inconsistent bus ranges. The macros code will be as below
#define PLATFORM_OVERRIDE_PCIE_ECAM_BASE_ADDR_0 PLATFORM_ECAM_BASE
#define PLATFORM_OVERRIDE_PCIE_SEGMENT_GRP_NUM_0 0x0
#define PLATFORM_OVERRIDE_PCIE_START_BUS_NUM_0 0x0
#define PLATFORM_OVERRIDE_PCIE_END_BUS_NUM_0 0xF9
#define PLATFORM_OVERRIDE_PCIE_ECAM_BASE_ADDR_1 PLATFORM_ECAM_BASE

#define PLATFORM_OVERRIDE_PCIE_ECAM0_SEG_NUM 0x0
#define PLATFORM_OVERRIDE_PCIE_ECAM0_START_BUS_NUM 0x0
#define PLATFORM_OVERRIDE_PCIE_ECAM0_END_BUS_NUM 0x1
#define PLATFORM_OVERRIDE_PCIE_ECAM0_EP_BAR64 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM0_RP_BAR64 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM0_EP_NPBAR32 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM0_EP_PBAR32 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM0_RP_BAR32 0xXXXXX

#define PLATFORM_OVERRIDE_PCIE_ECAM1_SEG_NUM 0x0
#define PLATFORM_OVERRIDE_PCIE_ECAM1_START_BUS_NUM 0x1F
#define PLATFORM_OVERRIDE_PCIE_ECAM1_END_BUS_NUM 0x20
#define PLATFORM_OVERRIDE_PCIE_ECAM1_EP_BAR64 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM1_RP_BAR64 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM1_EP_NPBAR32 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM1_EP_PBAR32 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM1_RP_BAR32 0xXXXXX

#define PLATFORM_OVERRIDE_PCIE_ECAM2_SEG_NUM 0x0
#define PLATFORM_OVERRIDE_PCIE_ECAM2_START_BUS_NUM 0xF8
#define PLATFORM_OVERRIDE_PCIE_ECAM2_END_BUS_NUM 0xF9
#define PLATFORM_OVERRIDE_PCIE_ECAM2_EP_BAR64 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM2_RP_BAR64 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM2_EP_NPBAR32 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM2_EP_PBAR32 0xXXXXX
#define PLATFORM_OVERRIDE_PCIE_ECAM2_RP_BAR32 0xXXXXX

The same changes to be done in the platform_cfg_fvp.c file as well.

Please let us know if this works.

Thanks,
ACS Team

@cdwilde
Copy link
Author

cdwilde commented Feb 18, 2025

Hi,

Looking into your examples in pal/baremetal, these regions are really only used in the pal_pcie_enumeration.c file, which the user owns. We are already taking care of enumeration on our end. The problem of the tests going through the entire PCIe space would remain since the for loops wouldn't take these into account. Would you like me to set up a meeting with you to go through this issue?

Thanks,
Cameron

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants