Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

pipeline automation using the doit library for python #419

Merged
merged 30 commits into from
May 3, 2014

Conversation

rbeagrie
Copy link
Contributor

@rbeagrie rbeagrie commented Apr 5, 2014

A lesson on using doit for automating data analysis pipelines. The aim would be to eventually cover the material which we currently have for make.

This was meant to be a 10 minute lesson for Instructor round 8.5, but it was far too long so I split it into three lessons of roughly 10 minutes each.

I'm not sure whether the comments on this pull request are the best place to discuss the relative merits of teaching doit vs. make (possibly that ought to be an issue rather than a PR). Just for some context though, here is a short summary of my pitch:

Advantages:

  • doit is much more verbose than make, which I think makes the files more readable if you know neither make nor doit
  • doit can be combined with argparse to make a pipeline with a nice usage page and sensible, verbose parameters. I think this is a big plus for maintainability
  • A build tool written in python might be less cognitive overload if learners are covering python in the same lesson

Disadvantages:

  • Introduces an additional dependency to install
  • Very probably many fewer SWC instructors are familiar with doit than make

@durden
Copy link

durden commented Apr 7, 2014

Is there a good way to see the rendered version of this new lesson proposal, or do I need to get this PR locally and build it?

Maybe it would be nice to link to a temporary location with the new lesson. It's easier to review by walking through the fully rendered lesson than the diff.

@rbeagrie
Copy link
Contributor Author

rbeagrie commented Apr 7, 2014

I don't know if there is a way to see the files fully rendered as HTML, but you could try looking at the markdown files here: https://github.com/rbeagrie/bc/tree/intermediate_doit_lesson/intermediate/doit (or you can do the same thing by clicking the view button in the top right hand corner of any diff)

@gvwilson
Copy link
Contributor

gvwilson commented Apr 7, 2014

If you pull Rob's PR into a branch in your repository and run 'make',
you should find the generated HTML in the output _site directory.
To pull his branch in:

  1. git remote add rbeagrie [email protected]:rbeagrie/bc
  2. git checkout master # if you're not already there
  3. git checkout -b rbeagrie-doit
  4. git pull rbeagrie intermediate_doit_lesson
    and then
  5. make site
    It works for me...

@durden
Copy link

durden commented Apr 7, 2014

@gvwilson Thanks for the instructions. Everything seems to work. Now on to the real work of reviewing...

@durden
Copy link

durden commented Apr 7, 2014

Good job. The lesson looks good at a quick glance and pretty thorough. I think showing the quick examples in the notebook are a great example of how lightweight doit really is.

Maybe the doit lesson should include a little bit about how this is a simple Python approach to make.

There shouldn't be a big section on pros/cons between the two systems, but might be useful to tell students that the two systems aim to solve some of the same problems. This might keep any students who are devout make enthusiasts at ease during the lesson.

I'd be more interested in discussing the merits of teaching doit instead of or in addition to make. I know it was mentioned that such a discussion should go into an issue, but creating an issue for that doesn't make sense until this PR is accepted.

I like the concept behind teaching doit. It's easier to use than Make, but Make is so prevalent. In my opinion, deciding whether or not to teach doit instead of Make comes down to what the goals of teaching this concept to students.

Is the goal to give students easy ways to automate their own work or how to get packages off the internet, installed, etc.?

@AnneTheAgile
Copy link

I get ;
jekyll -t build -d _site
make: jekyll: No such file or directory
make: *** [_site/index.html] Error 1

which I guess means I have to have already installed jekyll?

@AnneTheAgile
Copy link

1.I think this is the tool, but in the diffs I don't see any URL?
http://pydoit.org/
http://schettino72.wordpress.com/2008/04/14/doit-a-build-tool-tale/
doit – a build-tool tale | Rounder Wheels
2.For SWC people, would a comparison be in order? (Fabric is pretty popular and I saw this in the comments for doit.)
"DoIT looks and sounds nice, how does it compare to fabric( http://www.nongnu.org/fab/ ). It missed commands that allow SSH commands."
3.It's a great idea to add more automation info to the bc.

@durden
Copy link

durden commented Apr 8, 2014

Fabric is very popular, but it's mostly for running commands over ssh with web developers, etc. So, I wouldn't consider a competitor or alternative to make or doit.py. However, fabric would definitely be the tool to teach if there's ever a need to teach basic system administration that requires going over ssh for communication.

@gvwilson
Copy link
Contributor

gvwilson commented Apr 9, 2014

On 2014-04-08 5:11 PM, AnneTheAgile wrote:

I get ;
jekyll -t build -d _site
make: jekyll: No such file or directory
make: *** [_site/index.html] Error 1

which I guess means I have to have already installed jekyll?

Yup - there are instructions in the README.md file in the repository,
which are displayed on http://github.com/swcarpentry/bc (scroll down
below the file/directory listing).

@rbeagrie
Copy link
Contributor Author

rbeagrie commented Apr 9, 2014

@AnneTheAgile Thanks for noticing I'm missing a link! Now added a link to the library and to the documentation in the first lesson.

@rbeagrie
Copy link
Contributor Author

rbeagrie commented Apr 9, 2014

@durden The goal is very much to help students automate their own work. My feeling is that if you wanted to teach installation of source code, there is so much implicit knowledge required that being able to understand Makefiles is not in the first five things you would teach.

My aim when writing this was not so much to teach doit explicitly, but to use doit to introduce some of the concepts behind build-tools, and best practices for pipeline automation. I didn't cover the advantages of doit vs. Make, or the conceptual differences, because I was assuming that the audience was made up of learners who didn't know Make (so such a discussion could be confusing). If someone is already using Make to automate workflows, and is happy with it, I would say stick with what works for you!

On the other hand, I accept that this is probably an unrealistic starting point. In an intermediate bootcamp, it's probably reasonably likely that some learners will have at least a passing familiarity with Make. I think it's probably a good idea to have a resource available for these learners which explains in some detail the relative differences. I've added a new page, which is now linked to from the lesson index - you can find it at https://github.com/rbeagrie/bc/blob/intermediate_doit_lesson/intermediate/doit/make-vs-doit.md

I think this page could also provide a starting point for further discussion in an issue. @gvwilson please let me know if you think it would be useful for me to open a separate issue about this.

Final note, adding the make-vs-doit page required me to make some alterations to the Makefile - just wanted to flag that up for whoever merges this (if it gets merged, of course).

@durden
Copy link

durden commented Apr 9, 2014

@rbeagrie Thanks for the background. I think the idea is great. I got sidetracked because I'm an old unix guy who uses Make, but not happily.. :)

The idea of pipeline automation is interesting and definitely something students can benefit from. Also, you bring up a good point that doit might be a great introduction into some of the more complicated tooling. For example, I can imagine a student knowing doit and then later seeing Make and realizing they have a decent grasp of the concept already without the complexity of Make itself.

Good job on the lessons.

@rbeagrie
Copy link
Contributor Author

rbeagrie commented Apr 9, 2014

On closer inspection, perhaps this would be better as a comment in #375 than a separate issue.

@DamienIrving
Copy link
Contributor

@rbeagrie Great work! I've been using make but after reading over your lesson I'm now switching to doit - much more user friendly and the ability to combine it with argparse is a huge plus.

One comment is that I use the '-n' option quite a lot when playing around with make (which does a dry run and just prints what tasks would have been executed to the screen). Is there a way to do this with DoIt? If so, this would probably be worth mentioning somewhere in your lesson.

@rbeagrie
Copy link
Contributor Author

@DamienIrving Doit doesn't have a -n or "dry run" option unfortunately. I think this is partially because doit checks the actual md5 hash of a file to see whether it has been changed or not. This means that if C depends on B, which depends on A, if A is changed we don't know whether C will be run until after B has been regenerated - it depends on whether the contents change.

I don't cover using doit with argparse in the lessons because it is a little advanced. Essentially, behind the scenes doit does:

sys.exit(DoitMain(ModuleTaskLoader(globals())).run(sys.argv[1:]))

If you're using argparse you'll have extra command line options that doit won't recognise, so you have to invoke doit manually in your script, removing the sys.argv[1:] part

sys.exit(DoitMain(ModuleTaskLoader(globals())).run([]))

If you have any issues I'll be happy to help debug - just send me a message on twitter or something!

@gvwilson gvwilson merged commit 183c5b0 into swcarpentry:master May 3, 2014
@gvwilson
Copy link
Contributor

gvwilson commented May 3, 2014

Superceded by #477.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants