Skip to content

7 8 2022 Coffea Casa Meeting Minutes

Mat Adamec edited this page Jul 8, 2022 · 3 revisions

7/8/2022 Coffea-Casa Meeting

General Positives!

  • More people seem to be using the facility! And they seem to be doing actual analyses; Andrew helped somebody troubleshoot theirs.

Minutes

Summaries of discussions are laid out chronologically.

Holly

Working on benchmarking coffea by applying asv package to ADL benchmarks.

Updates
  • Still in beginning stages, expect updates in upcoming weeks.
Action Items
  • Mat can help answer any questions about coffea/benchmark implementation.

Durbar

Continuing to work on coffea-casa metrics.

Updates
  • Figured out how to configure Jupyter/Dask metrics. Some have been implemented already (locally?)
  • Durbar thinks custom metrics shouldn't be too difficult to implement. The above are just a starting point.
Action Items
  • We need to give more input on what sorts of metrics we want - the Dask/Jupyter ones are mostly systematic. Brian has many ideas, Mat has created a GitHub discussion to keep track. Everybody is encouraged to contribute if they have thoughts or suggestions. Revisit next meeting.
  • Durbar will continue to make progress to implement the default metrics on coffea-casa in the mean time.

Mat

Prepping for talk next Thursday at 10:40 AM CST, 5:40 PM CERN.

Updates
  • Mat is getting the tHq analysis modernized. Post-processing with the new hist package is troublesome.
Action Items
  • Mat will attempt to make ServiceX work nicely with the analysis.
  • Mat will test analysis on flat iron to help validate that the new hardware works.
  • Mat will (belatedly) look at the dependency management plugin Andrew created to be up-to-date on the issue for the talk.
  • Carl or Oksana should look into this issue ideally before the talk.

Andrew

Trying to get Workqueue plugged into coffea-casa.

Updates
  • Andrew ran into issues with Condor. It seems like jobs land in the queue, immediately start, then die. The logs seem to indicate it tries to open a port before dying. Carl has some ideas.
    • There was some discussion about "incoming"/"outgoing" traffic. Brian thought Workqueue should be outgoing only and thus this problem shouldn't arise.
Action Items
  • Carl or Andrew should ask Ben about the incoming/outgoing issue and figure out whether we can avoid the Condor problems.
  • Carl or Andrew will investigate the Condor issue (if we need to fix it). Brian mentioned that the Coffea-Casa Condor Cluster(?) might have a solution to the port opening issue. We've done a similar procedure setting up Dask.

Open Discussion about Coffea-Casa Philosophy

Brian left around here. There was an open discussion about Workqueue, coffea-casa's philosophy, and our priorities with limited humanpower. I tried to document it as best as I could:

Topic: Why are we pushing for Workqueue so hard when we have Dask?

Pros Cons
Carl suggested Workqueue has built-in dependency management; Andrew agreed. Primary concern: is it worth the effort with the limited amount of people we have working on this project?
Andrew suggested an alternative to Dask is nice for the purposes of coffea-casa being a template for other analysis facilities. Not all of them will want to use Dask. If going for template model, then it should be easy to use any scale-out technology with coffea-casa. Carl mentioned spinning up more instances is causing admin headache (we already have one for opendata, one for CMS data - now two more to have opendata and cms for Workqueue?)
Ken mentioned the coffea principle is "scale-out without specific implementation of scale-out" - the coffea-casa template model aligns with this. -
Action Items:
  • Oksana and Brian should weigh in on this. It might be worth revisiting next meeting.

Topic: Is Condor central to coffea-casa workers? Would it be a good idea to move more towards Kubernetes? Carl considers this to be a critical question about coffea-casa's design philosophy.

Condor Pros Condor Cons
Carl has high praise for how beautifully coffea and condor work together out of the box. As more features get added to coffea-casa, Condor seems to cause more complications.
Carl mentioned how using Condor reuses old infrastructure and that's an ideal he holds. Maybe the old infrastructure isn't all that great.
Kubernetes Pros Kubernetes Cons
Kubernetes is becoming more broadly used. It naturally handles dynamic sites like coffea, where user activity can vary depending on the circumstances. Again, humanpower is limited and swapping over to Kubernetes requires dev work to be done!
Action Items
  • Oksana and Brian should weigh in on this. It might be worth revisiting next meeting.
  • Carl will try to get Kubernetes scale-out working properly.

Lastly, Carl mentioned something needing to be done with ServiceX before the flat iron hardware can be deployed more broadly. This might be an Oksana action item.

To-Do

Compiling the list of action items here.

Holly
  • N/A. Learning stages.
Durbar
  • Continue to make progress to implement the default metrics on coffea-casa.
Mat
  • Answer any questions that come up as Holly learns about coffea.
  • Finish tHq analysis update, add ServiceX capabilities, look at dependency management, talk next week.
  • Start testing flat iron hardware for Carl.
Andrew
  • Ask Ben about the "incoming" issue with Workqueue (see section above for more details).
  • Keep troubleshooting Condor (see section above for suggestions).
Oksana
  • Look at the two topic discussions above.
  • Ask Carl about the ServiceX/flat iron issue that still needs to get addressed.
  • Take a look at this isssue ideally before next Thursday
Brian
  • Look at the two topic discussions above.
Carl
  • Ask Ben about the "incoming" issue with Workqueue (see section above for more details).
  • Keep troubleshooting Condor (see section above for suggestions).
  • Work on Kubernetes scale-out.
  • Take a look at this isssue ideally before next Thursday
Everyone
  • Contribute ideas for metrics on our discussion page!

For Next Meeting

Compiling the list of things to revisit here.

  • Look at the metrics discussion on GitHub and figure out which ones we should prioritize. This should clarify what Durbar should be focusing on.
  • Potentially revisit the two topics discussed this week when more people are present.