Skip to content
andychu edited this page Dec 7, 2020 · 33 revisions

Old: Project Goals

Notes on gg (Ad Hoc Multi-Cloud Distribution with Lambdas)

  • Great intro blog post, concentrating mostly on the C++ build use case, which indeed has some unique elements: https://buttondown.email/nelhage/archive/papers-i-love-gg/
    • reaction: distcc pump is another solution to the preprocessor problem, although neither model substitution or distcc pump are fully general
  • Great Usenix ATC '19 Video: https://www.youtube.com/watch?v=Cc_MVldSijA&ab_channel=USENIX
    • I really like the framing: low latency (which is why I use shell in the first place), warm vs. cold clusters
    • IR is extremely similar to Blaze/Forge (and described with a tiny set of protobufs!)
  • HN comments from July 2019: https://news.ycombinator.com/item?id=20433315
    • Lambda still has some limitations for huge packages. Good experience report here (althuogh it sounds like the commenter could benefit from "proper" declared dependencies)
    • What about state in lambdas?
  • My initial reaction: https://lobste.rs/s/virbxa/papers_i_love_gg#c_nbmnod
  • Concepts
    • Model Substitution
    • Tail Calls
    • Dynamic dependendencies, not static (how does it relate to Shake?)
    • Lambdas can talk to each other (via NAT traversal?) Solves a well known performance issue.
  • Citations
    • UCop
    • Ciel
  • My sense on limitations
    • It's not a fully general shell parallelizer, because it's mainly about small data and big compute. Some problems are big data and small compute, like analytics (joins, etc.)
  • Their Notes on Limitations / Future Work
    • Worker communication (didn't understand the NAT traversal bit)
    • They want to schedule thunks onto GPUs
    • A gg DSL! They have a C++ and Python SDK. They say they want "parallel map", "fold", etc. What does this look like?
  • Questions
    • Where does the scheduler run? (on a lambda? Or does the client need to be connected the whole time)
    • How does the worker-to-worker communication work?
    • What would the DSL look like?

Project Ideas

  • Well first, try gg to see how well it works...
  • Really basic:
    • Oil can create CLI descriptions for "model substitution"
  • Second: Oil front end rather (on top of model sub, "scripting", Python, C++). Does that make sense? (That's in their future work -- a DSL)
  • Run Toil on gg ! For better continuous builds
  • Does it make sense to augment gg with streams? For shell pipelines?
    • dgsh uses Unix domain sockets to implement pipelines
  • Big project: write an executor that addresses the object distribution problem with differential compression / affinity (e.g. OSTree/casync)
  • Is there some sort of command line wrapper style that specifies inputs / outputs unambiguously that can be used to wrap every command? Then you don't need model substitution?
  • Could Oil be a local executor the gg runtime? what does the file system look like?
    • you need a component to set up the file system, I guess a user space chroot / bind tool?
Clone this wiki locally