Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bmc fw update, passwd and reset roles #2

Merged
merged 1 commit into from
Feb 20, 2024

Conversation

glimchb
Copy link
Member

@glimchb glimchb commented Feb 6, 2024

Signed-off-by: Boris Glimcher [email protected]

@glimchb glimchb force-pushed the main branch 4 times, most recently from f31e900 to 28eb2fa Compare February 8, 2024 00:36
playbooks/firmware.yml Outdated Show resolved Hide resolved
roles/bmc_fw_update/tasks/main.yml Show resolved Hide resolved
roles/bmc_fw_update/tasks/main.yml Show resolved Hide resolved
roles/bmc_fw_update/tasks/main.yml Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vars defined here cannot be given as a role input, So only role vars that are used in the role as a constant can go here.

update_image_file: "/tmp/{{ firmware.bmc }}"
register: result_update_task

- name: Print TASK id for tracking

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing the values will not help the user to get this at later stage, good to add under the facts ex: role_name_facts.Taskid , so that in case the job is not finished even after the retries/delay. user has an hold of the task id to take up next action.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sachin-apa see the code above I just added ansible.builtin.set_fact . Is that what you had in mind ?

ansible.builtin.debug:
msg: "{{ fw_inventory_after.redfish_facts.firmware.entries[0].Version }}"

- ansible.builtin.assert: { that: "fw_inventory_before.redfish_facts.firmware.entries[0].Version != fw_inventory_after.redfish_facts.firmware.entries[0].Version" }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add changed_when and failed_when condition for users to understand if the firmware was updated. it helps to set the meta fields changed, failed etc that can be used to check later in the playbook.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Define defaults for the roles if applicable. the default vars are not mandatory fields that can be changed as part of the role input.
  2. Add meta/argument_specs.yml to that helps in validation, documentation, changelog of a role.
  3. Add meta/main.yml with the details of the author and other data.
  4. Lets add the lint, sanity checks as it becomes hectic to fix it.
  5. Role vars, register and fact vars need to follow naming convention that prefix role_name_

@glimchb glimchb force-pushed the main branch 7 times, most recently from 0e8c65e to 8ef7968 Compare February 14, 2024 02:20
@@ -1,2 +1,7 @@
---
# vars file for bmc_fw_update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be in vars...

@glimchb glimchb force-pushed the main branch 2 times, most recently from 9673f54 to 4f1004f Compare February 14, 2024 02:37
Dockerfile Outdated
@@ -1,2 +1,2 @@
FROM python:3.11
RUN pip install --no-cache-dir ansible
RUN pip install --no-cache-dir ansible==9.2

Check warning

Code scanning / Scorecard

Pinned-Dependencies Medium

score is 0: pipCommand not pinned by hash
Click Remediation section below to solve this issue
@glimchb glimchb force-pushed the main branch 2 times, most recently from 12da53f to e35b625 Compare February 14, 2024 02:51
@glimchb glimchb requested a review from Gal-Zaidman February 14, 2024 18:40
@glimchb glimchb changed the title feat: fill bmc_fw_update role feat: bmc_fw_update and reset roles Feb 14, 2024
@glimchb glimchb changed the title feat: bmc_fw_update and reset roles feat: bmc fw update, passwd and reset roles Feb 14, 2024
@jcastanos2
Copy link
Contributor

so, I took a cursory look at the pull request. Just a few comments to get started:

  • I do not think that downloading the file from a repository into /tmp should be part of this task, but something it should be done separately. The update_bmc_fw should receive a filename/location of the image to install and that's it. Think you're updating a lot of dpus; you want to download it once. The datacenter might not have external connectivity, and will probably have a repository where all the valid fw are located, signed, etc. We should point to that location
  • we're delegating to localhost to run those commands. This should probably a {{ delegate_to }} so we can have multiple nodes pushing images
  • we're hardcoding MultipartHTTPPushUpdate. We do not support that yet in Bluefield. We might also want SimpleUpdate
  • retries/delay should also be variables. Different vendors might have different requirements
  • I see you have the cec in vars. In BF, a fw update implies posting two files: the bmc fw and the cec (eROT) and then reboot. This second step is missing in the pull request (it is also a same post with a different file), but it is also something very particular to us. Maybe the update/wait can be done on a list of files?

@jcastanos2
Copy link
Contributor

One more comment: before I run any command on the DPU BMC, I need to check the server is up and running, by calling a redfish command on the server BMC. Otherwise, all my DPU BMCs will fail as the DPU has no power. I tend to leave all these external dependencies out the individual roles, rather than including them in my roles.


- name: Get Firmware Inventory
ansible.builtin.include_role:
name: get_bmc_facts
Copy link
Member

@Gal-Zaidman Gal-Zaidman Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_bmc_facts sounds general, not specific to FW (UpdateService/FirmwareInventory/BMC_Firmware), I assume that later it will just grow which will make it retrieve unnecessary information and would result in more and more redfish calls.
I would either:

  1. add options to it, specifying what we want to retrieve
  2. not separate it to a different role and call FW directly in this role

I prefer [2] since it this is not something that will change in the future we care about a specific FW version

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • we should separate if we want to re-use it
  • we can rename to fw facts to be more specific

ansible.builtin.debug:
msg: "{{ fw_inventory_before.redfish_facts.firmware.entries[0].Version }}"

- name: Download firmware image {{ firmware.bmc }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the role needs to recive a firmware location as a parameter, if it is a valid url then download it, if it is a file then we expect it to be present on the machine running (this makes more sense when installing a large cluster)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I can open an issue on this, so somebody can also contribute that part ...
or if i have more time I can also do it myself...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #16

Signed-off-by: Boris Glimcher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants