-
Notifications
You must be signed in to change notification settings - Fork 0
Sync meeting 2024 10 08
Caspar van Leeuwen edited this page Nov 11, 2024
·
1 revision
- Monthly, every 2nd Tuesday of the month at 10:00 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
- Tue 12 Nov 2024 10:00 CET (prep for special project review?)
- Tue 10 Dec 2024 10:00 CET (post-mortem of special project review?)
attending:
- Kenneth (HPC-UGent)
- Caspar, Xin, Maksim, Satish (SURF)
- Nadia, Susana, Pedro Frenandez, Eli (HPCNow!)
- Neja (NIC)
- Alan (CECAM)
- Pedro Santos Neves, Bob (RUG)
- Julian (BSC)
- Richard (UiB)
- 2024Q3 quarterly report
- Deadline 25th Oct
- Every institute: please fill in the hours and bullet-point work done tables
- w.r.t. project review
- important to show that we're "on target" w.r.t. PM effort in 2024
-
each partner should report PM efforts for Jan-Sept 2024 to Neja (incl. breakdown per WP) by 20 Oct
- still missing from BSC, HPCNow, UB, RUG
- Alan will know on Thu for UB
- Upcoming deliverables (M24):
- D1.3, M24: Report on stable, shared software stack (UB)
- who: Kenneth (UGent), Caspar (SURF), Pedro (RUG), UiB (Richard/Thomas), UB (Alan)
- TODO:
- GPU support -> see tiger team
- monitoring -> see ongoing effort by RUG
-
compare the performance against on-premise software stacks to identify potential performance limitations
- mostly on Snellius @ SURF, since that's easier
-
stability
-> zero reports so far of EESSI "network" being down - @Alan: ask UK guy for quote "this is a game changer for small sites" (feedback during EESSI intro on 4 Oct'24)
- D6.2, M24: Training Activity Technical Support Infrastructure (UiB)
-
dev.eessi.io
should be covered here (more than in D1.3) - => @ Thomas/Ricard to pull this
-
- D7.2, M24: Intermediate report on Dissemination, Communication and Exploitation (HPCNow)
- D8.5, M24: Project Data Management Plan - final (NIC)
- D1.3, M24: Report on stable, shared software stack (UB)
- Upcoming milestones (M24):
- Milestone 4, M21: First training event supported by the Ambassador Program. [WP1/WP5/WP6] (UB)
- Oct 4th training in Vienna: https://events.vsc.ac.at/event/141
- 2nd Ambassador event: MultiXscale hackaton in Slovenia (Dec'24)
- Milestone 5, M24: WP4 Pre-exascale workload executed on EuroHPC architecture. [WP2/WP3/WP4] (NIC)
- probably via Espresso (Jean-Noël?), cfr. ongoing development
- do we need to request a reservation for this? (Vega?)
- may be useful to iterate quickly
- doesn't necessarily need to be on a EuroHPC system
- Milestone 4, M21: First training event supported by the Ambassador Program. [WP1/WP5/WP6] (UB)
- WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
-
dev.eessi.io
: Tiger team is making very good progress. See meeting notes- Key results:
- Building specific commit of ESPResSo works, see test PR #1
- Key TODO's:
- get ingestion of builds into
dev.eessi.io
CernVM-FS repository to work [Pedro+Bob] - documentation
- let Jean-Noël/Rudolph play with it (test setup)
- also make it work for GROMACS, see test PR #2
- get ingestion of builds into
-
What do we REALLY need from this before the project review?
- ESPResSo development builds set up in
dev.eessi.io
?
- ESPResSo development builds set up in
- Key results:
- NVIDIA GPU support Tiger team making really good progress. See meeting notes
- Key results:
- Bot uses
accel:nvidia/cc80
arguments to install in correct prefix. - First builds in
accel
prefix: CUDA, UCX-CUDA, UCC-CUDA, NCCL, CUDA-Samples, OSU-Microbenchmarks, LAMMPS w/ CUDA, ESPRESSO w/ CUDA - Check for missing installations in
accel
builds in CI
- Bot uses
- TODO:
- Actual GPU nodes in build cluster (now cross-compiling, not running GPU tests)
- Adapt bot to accept arguments to allocate/build on GPU nodes
- cuDNN (strip non-redistributable files + support local installation in
host_injections
) - Decide on and expand combinations of CPU & GPU architectures
- enhance script(s) in
software-layer
repo- auto-detect GPU model/architecture (enhance
archdetect
) - pick up
accel
directive from the bot and change software installation prefix accordingly - install GPU software in proper location: ESPResSo (?), LAMMPS, MetalWalls (?), TensorFlow, PyTorch, ...
- auto-detect GPU model/architecture (enhance
-
proper NVIDIA GPU support is due by M24 (deliverable D1.3)
- => we shouldn't wait for
dev.eessi.io
being operational
- => we shouldn't wait for
- we need to plan who will actively contribute, and how [Kenneth,Lara]
- Key results:
- need to review description of Task 1.1, make sure all subtasks are covered
- => need to update project planning (Caspar, Kenneth)
- "we will benchmark software from the shared software stack and compare the performance against on-premise software stacks to identify potential performance limitations, ..."
- ESPResSo + LAMMPS + OpenFOAM + ALL(?) (MultiXscale), GROMACS (BioExcel)
- Who does what, and on which system?
- "increase stability of the shared software stack ... pro-actively by developing monitoring tools"
- proper monitoring for CVMFS network (S0 + S1s)
- active work-in-progress by RUG, see also meeting notes
-
- [RUG] T1.2 Extending support - D1.4 due M30 (June'25)
-
zen4
almost on par with the rest.- Need to symlink everything for 2022b from
zen3
. (@Lara?) - Then merge https://github.com/EESSI/software-layer/pull/766
- Need to symlink everything for 2022b from
- optimized installations for AMD Genoa Zen4 (~64% done) + A64FX (~23% done) are still a work-in-progress
- Intel Sapphire Rapids & NVIDIA Grace (for JUPITER) to start
- Who, When, Where?
- Intel Sapphire Rapids & NVIDIA Grace (for JUPITER) to start
- AMD ROCm (see planning issue #31 + support issue #71)
- effort led by Pedro/Bob (RUG)
- Any progress to mention?
- effort led by Pedro/Bob (RUG)
-
- [SURF] T1.3 Test suite - D1.5 due M30 (June'25)
- V0.4.0 released.
- New tests: CP2K, LAMMPS, PyTorch
- Tutorial:
mpi4py
, also in docs
- WIP: test for MetalWalls
- WIP: use an
eessi_mixin
class to make test development for the EESSI test suite easier, as implemented in this pr. Many of the steps in the docs are mandatory, and could be inherited from a mixin class.
- V0.4.0 released.
- [BSC] T1.4 RISC-V (due M48, D1.6)
- Julian is working on getting CernVM-FS deployed natively on the RISC-V hardware they have at BSC => Progress?
- ... Other updates?
- [SURF] T1.5 Consolidation (starts M25 - Jan'25)
- continuation of effort on EESSI (T1.1, etc.) (not started yet)
- [UGent] T1.1 Stable (EESSI) - D1.3 due M24 (Dec'24)
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
- dashboard to present test results is work-in-progress @ SURF
- Ingestion script for ingesting test suite results into ElasticSearch instance in private repo
- Test suite runs on Vega, Karolina, AWS, Azure, ... now ingested on daily basis using that script
- Deployment of dashboard:
- Long term: Could be in SURF HPCV 'internal' services, but this is still being set up. Would support authentication, but requires a backend change in the dashboard. Should not be too invasive.
- Short term: can we deploy somewhere in a VM and whitelist?
- Alternative: if we can share all the data, things are much easier and we can deploy anywhere, in any VM, tomorrow (well, figure of speach)
- [UGent] T5.4 support/maintenance - D5.4 due M48 (Dec'26)
- Changed meeting format a bit: based on a board
- Total: 86 issues (28 open, 58 closed)
- [SURF] T5.2 Monitoring/testing, D5.3 due M30 (June'25)
- [UB] WP6 Community outreach, education, and training
- deliverables due: D6.2 (M24 - Dec'24), D6.3 (M30 - June'25)
- HPCWire nomination in category "Best HPC Programming Tool or Technology"
- overview of systems in EESSI docs
- [Alan] invited speaker for Nordic Industry Days (early Sept'24, Copenhagen)
- ... How did it go?
- [Thomas] presentation @ CernVM workshop on EESSI (16-18 Sept 2024, Geneva)
- ... How did it go?
- [Richard] public webinar Introduction to EESSI (3 Oct 2024)
- ... How did it go?
- [Alan] First ambassador event: "Introduction to EESSI" on 4 Oct 2024 (see news post on MultiXscale website)
- ... How did it go?
- CECAM webinar (17 Oct 2024)
- https://www.cecam.org/webinar-details/multixscale-cecam-webinar-supporting-development-multiscale-methods-european-environment-scientific-software-installations-eessi
- Mix of EESSI + Scientific WPs (ESPResSo, LAMMPS, KOKKOS and ALL, waLBerla, pystencils)
- EuroHPC User Days (22-23 Oct 2024, Amsterdam)
- link to agenda
- attending: Kenneth/Lara (UGent), Thomas/Ricard (UiB), Bob?/Pedro? (RUG), Caspar (SURF)
- paper submitted to get a talk slot
- Tue 22 Oct 14:00-15:30 (IJ LOUNGE)
- in touch with organisation w.r.t. participation in CoE session
- "Walk-in networking sessions focusing on specific EuroHPC user needs: provide your feedback and get some advice"
- Wed 23 Oct 10:30-12:00
- EESSI demos, printed handouts
-
Raspberry Pi prize
- "How much faster is Vega than a cluster of Raspberry Pis?"
- bring your MultiXscale T-shirt!
- Caspar will bring a monitor (from home :)) for EESSI demo
- Kenneth will print flyers at UGent.
- One flyer on MultiXscale => Kenneth / Lara will make them
- One on EESSI itself => Kenneth / Lara will make them
- One on each of the scientific use cases => Kenneth will contact the scientific WPs
-
Netherlands eScience Center (Dutch national center of expertise for research software, ~60 RSEs) got in touch with Bob to give a talk (31 Oct'24, Amsterdam)
- unclear if that's a public event, but can do a write-up afterwards
- Caspar: one of my colleagues also got in touch with them (unrelated to the event) to see if EESSI could be interesting to them
- [Eli/HPCNow!] EESSI Birds-of-a-Feather session accepted at Supercomputing'24 (Atlanta, US)
- can reuse material from BoF session @ ISC'24 in Hamburg
- [Pedro] submitted talk for SURF Advanced Computing Days (12 Dec'24, Utrecht)
- talk not accepted yet
- [Eli?] EESSI tutorial at HiPEAC 2025 accepted (20-22 Jan'25)
- we need to start promoting this
- [Jean-Noël] Espresso summer school
- 2025?
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- podcast interview for EuroHPC podcast
- Kenneth can ask Jothi @ NCC Belgium to be interviewee
- Susana will ask Apostolos what the turnaround time is once recording is provided
- sound recording tutorial on Mon 14 Oct 2024 (Communications coffee break)
- T7.1 Scientific applications provisioned on demand (lead: HPCNow) (started M13, finished M48)
- Updates ... (Pedro, HPCNow)?
- Task 7.2 - Dissemination and communication activities (lead: NIC)
- Updates ... ?
- Task 7.3 - Sustainability (lead: NIC, started M18, due M42)
- Updates ... ?
- Task 7.4 - Industry-oriented training activities (lead: HPCNow)
- Updates ... ?
- podcast interview for EuroHPC podcast
- [NIC] WP8 (Management and Coordination)
- Ammendment (@Neja / @Alan, can you summarize the key points of what was submitted?)
- Submitted 10th of September, EuroHPC has 45 days to respond
- Travel budget: move part of PMs to travel budget for some partners
- CI/CD
- added to task in WP1
- effort was relocated for this from WP5
- Tweaks on descriptions on the Scientific WPs
- Removed OpenFOAM from the table of applications
- Added ALL to the table of applications (going into LAMMPS)
- Clearer picture that Kokkos will be the key library to achieve scalability
- next General Assembly meeting
- 23-24 Jan'25 in Barcelona/Sitges
- coupled to HiPEAC'25 (20-22 Jan 2025)
- We need to promote the workshop at HiPEAC more!
- registration is quite pricey, so we'll need to limit who actually attends?
- 23-24 Jan'25 in Barcelona/Sitges
- Project review
- Discuss the plan, Todo's, etc...
- agenda is available on shared drive
- (internal) deadline for presentations : 30 Oct
- internal review round/meeting shortly after
- presentations have to be submitted to project reviewers ~1 week before review
- Ammendment (@Neja / @Alan, can you summarize the key points of what was submitted?)
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- CI/CD call for EuroHPC
- is 100% funded (not 50/50 EU/countries)
- not published yet
- request for success story by CASTIEL2
- status: rounds of editing going on, should be published soon [Neja,Alan,Caspar]
- @Neja: do you know if this has been published by now?
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-05-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-04-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-03-12
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-02-13
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2024-01-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-12-12
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-11-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-10-10
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-09-12
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-08-08
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-07-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-06-13
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-05-09
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-04-11
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-03-14
- https://github.com/multixscale/meetings/wiki/Sync-meeting-2023-02-14
- https://github.com/multixscale/meetings/wiki/sync-meeting-2023-01-10