Skip to content

Commit

Permalink
Merge pull request #3 from csiro-data-school/spr-0124
Browse files Browse the repository at this point in the history
Updates to chapters 4-9 ahead of CSIRO Data School Jan '24 inc. new page on Agile methodology
  • Loading branch information
spriggsy83 authored Jan 29, 2024
2 parents a36ff42 + 5630112 commit 1e0ab3d
Show file tree
Hide file tree
Showing 6 changed files with 74 additions and 28 deletions.
1 change: 1 addition & 0 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ episodes:
- 06-track_changes.md
- 07-manuscripts.md
- 08-what_next.md
- 09-agile.md

# Information for Learners
learners:
Expand Down
2 changes: 1 addition & 1 deletion episodes/04-collaboration.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ tasks in various ways.
![](fig/ms-tasks-list-view.png){alt="An example of Teams Tasks list view"}

- [Jira](https://jira.csiro.au/) is another tool supported and deployed in CSIRO. Developed by
Australian software company [Atlassian](https://www.atlassian.com/software/jira, it allows
Australian software company [Atlassian](https://www.atlassian.com/software/jira), it allows
tracking of to-do tasks/issues and sub-tasks, lets you assign tasks to people, and lets
you track and view tasks in the context of worflows, timelines, and "board" visualisations,
such as the "Kanban board". Jira can directly integrate with both BitBucket
Expand Down
26 changes: 19 additions & 7 deletions episodes/05-project_organization.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,14 +113,14 @@ files that perform the core analysis of the research, such as data
cleaning or statistical analyses. These files can be thought of as
the "scientific guts" of the project.

The second type of file in `src` is controller or driver scripts
that contains all the analysis steps for the entire project
Another type of file that might go in `src` is controller/driver/workflow scripts
that contain all the analysis steps of a project
from start to finish, with particular parameters and data
input/output commands. A controller script for a simple project, for
example, may read a raw data table, import and apply several cleanup
and analysis functions from the other files in this directory, and
create and save a numeric result. For a small project with one main
output, a single controller script should be placed in the main
output, a single controller script could be placed in the main
`src` directory and distinguished clearly by a name such as
"runall". The short example below is typical of
scripts of this kind; note how it uses one variable, `TEMP_DIR`, to
Expand All @@ -140,9 +140,21 @@ avoid repeating the name of a particular directory four times.
rm -rf $(TEMP_DIR)
```

::::::::::::::::::::::::::::::::::::::::: callout

**Important note:** Don't place information specific to your own computer/system
or self in these types of files, especially if they are being Git-tracked. Use
relative paths instead of full paths where possible (e.g. input as `../data/` rather
than `/home/xyz123/project/data`). Don't include any passwords or keys.
If personal or system-specific information is required for your workflow, then make
use of locally set environment variables and/or git-ignored files and then document
how to set up these inputs again for anyone (or future self) re-using your work.

::::::::::::::::::::::::::::::::::::::::::::::::::

## Put compiled programs in the `bin` directory

`bin` contains
A directory named `bin` is usually used to contain
executable programs compiled from code in the `src` directory.
Projects that do not have any will not require `bin`.

Expand Down Expand Up @@ -193,9 +205,9 @@ simple project might be organized following these recommendations:

```
.
|-- CITATION
|-- README
|-- LICENSE
|-- CITATION.cff
|-- README.md
|-- LICENSE.md
|-- requirements.txt
|-- data
| -- birds_count_table.csv
Expand Down
13 changes: 4 additions & 9 deletions episodes/06-track_changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -217,12 +217,7 @@ approach—the one we use in our own projects–don't just accelerate the
manual process: they also automate some steps while enforcing others,
and thereby require less self-discipline for more reliable results.

1. ***Use a version control
system***, to manage changes to a
project.

Box 2 briefly explains how version control systems work. It's hard to
know what version control tool is most widely used in research today,
It's hard to know what version control tool is most widely used in research today,
but the one that's most talked about is undoubtedly Git. This is largely because of
GitHub, a popular hosting site that combines the technical infrastructure for collaboration via Git with a
modern web interface. GitHub is free for public and open source projects
Expand All @@ -231,11 +226,11 @@ GitLab is a well-regarded alternative
that some prefer, because the GitLab platform itself is free and open
source. Bitbucket provides free hosting
for both Git and Mercurial repositories, but does not have nearly as
many scientific users.
many scientific users. CSIRO hosts it's own instance of BitBucket for employee use.

::::::::::::::::::::::::::::::::::::::::: callout

## Box 2: How Version Control Systems Work
## How Version Control Systems Work

A version control system stores snapshots of a project's files in a
repository. Users modify their working copy of the project, and then
Expand All @@ -244,7 +239,7 @@ and/or share their work with colleagues. The version control system
automatically records when the change was made and by whom along with
the changes themselves.

Crucially, if several people have edited files simultaneously, the
Crucially for collaboration, if several people have edited files simultaneously, the
version control system will detect the collision and require them to
resolve any conflicts before recording the changes. Modern version
control systems also allow repositories to be synchronized with each
Expand Down
14 changes: 3 additions & 11 deletions episodes/07-manuscripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,15 +117,9 @@ Our first alternative will already be familiar to many researchers:
With the document online, everyone's changes are in one place, and
hence don't need to be merged manually.

We realize that in many cases, even this solution is asking too much
from collaborators who see no reason to move forward from desktop GUI
tools. To satisfy them, the manuscript can be converted to a desktop
editor file format (e.g., Microsoft Word `.docx` or LibreOffice
`.odt`) after major changes, then downloaded and saved in the `doc`
folder. Unfortunately, this means merging some changes and suggestions
manually, as existing tools cannot always do this automatically when
switching from a desktop file format to text and back (although
[Pandoc](https://pandoc.org/) can go a long way).
This is easy under our current Microsoft Office organisational setup,
where Word documents (and others) may be converted to shared online
documents automatically when sharing through Outlook or Teams.

## Text-based Documents Under Version Control

Expand Down Expand Up @@ -193,8 +187,6 @@ In groups, discuss:

## Getting started writing text-based version control

[Version Control with Git](https://swcarpentry.github.io/git-novice/) Carpentries lesson introduces text-based version control, that you could use for a collaborative manuscript.

[Manubot](https://manubot.org) is an open-source system for writing scholarly manuscripts via GitHub, with tutorials.


Expand Down
46 changes: 46 additions & 0 deletions episodes/09-agile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: 'Agile'
teaching: 60
exercises: 0
---

::::::::::::::::::::::::::::::::::::::: objectives

- Learn some basic concepts of the 'Agile' methodology

::::::::::::::::::::::::::::::::::::::::::::::::::

## What is 'Agile'

'Agile' is a project management methodology, particularly for software development,
built around a 4 point philosophical [manifesto](https://agilemanifesto.org/)
and a 12 point set of [principles](https://agilemanifesto.org/principles.html).

Agile is typified by small teams that self-organise ('scrum') on how they will
address a backlog of requested work, in short cycles ('sprints'), by breaking
problems into small tasks, with frequent feedback and result delivery. It is a
highly iterative approach to planning, that allows for high flexibility and less
forward planning. A sprint may last 1-4 weeks, in which time an entire cycle of
planning, designing, implmenting, testing and delivering takes place, with small
tasks hopefully addressed to completion, followed by a review and retrospective
that may or may not end up influencing the next sprint cycle.

[Framework at a glance diagram](https://www.planview.com/resources/guide/agile-methodologies-a-beginners-guide/basics-benefits-agile-method/)

[Contrast to waterfall model](https://www.guru99.com/agile-methodology-in-software-testing.html)

[Roles and user stories](https://www.tutorialspoint.com/agile/agile_primer.htm)

[Atlassian on scrums, Kanban and Jira visualisations](https://www.atlassian.com/agile/project-management)


:::::::::::::::::::::::::::::::::::::::: keypoints

- The Agile approach is to break problems into smaller tasks and fully address them
in short, iterative work cycles (sprints), with each cycle ending in review and discussion
before planning the next cycle.
- Aspects of this approach may be useful in data science work.

::::::::::::::::::::::::::::::::::::::::::::::::::


0 comments on commit 1e0ab3d

Please sign in to comment.