Skip to content

Commit 05c0ae4

Browse files
authored
Merge pull request #10 from broadinstitute/ccds
Create new workflow standard + resources
2 parents 13084ff + 1498016 commit 05c0ae4

File tree

9 files changed

+736
-131
lines changed

9 files changed

+736
-131
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
handbook.md

.markdownlint.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
MD007:
2+
indent: 4 # List indent
3+
MD013: false # Line length
4+
MD024: false # Multiple headers with same content
5+
MD029:
6+
style: ordered # Ordered list style
7+
MD033: false # Inline HTML
8+
MD046: false # Code block style

.pre-commit-config.yaml

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,23 @@ repos:
33
rev: v5.0.0
44
hooks:
55
- id: trailing-whitespace
6+
exclude: \.dvc$
67
- id: check-added-large-files
7-
args: ['--maxkb=10240']
8+
args: [--maxkb=10240] # 10MB limit
9+
- id: check-yaml
10+
- id: end-of-file-fixer
11+
exclude: \.dvc$
12+
813
- repo: https://github.com/astral-sh/ruff-pre-commit
914
rev: v0.12.7
1015
hooks:
11-
- id: ruff
12-
types_or: [ python, pyi ]
13-
args: [ --fix ]
14-
- id: ruff-format
15-
types_or: [ python, pyi ]
16+
- id: ruff
17+
args: [--fix]
18+
- id: ruff-format
19+
20+
- repo: https://github.com/igorshubovych/markdownlint-cli
21+
rev: v0.45.0
22+
hooks:
23+
- id: markdownlint
24+
args: [--fix]
25+
files: \.(md|markdown)$

.ruff.toml

Lines changed: 0 additions & 1 deletion
This file was deleted.

01-lab-notebook.md

Lines changed: 0 additions & 123 deletions
This file was deleted.

CLAUDE.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# CLAUDE.md
2+
3+
Guidance for Claude Code when working in this repository.
4+
5+
## Repository Context
6+
7+
Documentation repository for Carpenter-Singh Lab at the Broad Institute. Maintains lab standards for data science workflows using Cookiecutter Data Science (CCDS) principles.
8+
9+
## Key Tasks When Working Here
10+
11+
### 1. Documentation Improvements
12+
13+
- Check for consistency in command examples across documents
14+
- Ensure all code blocks are properly formatted and executable
15+
- Verify that file paths and project names use consistent placeholder syntax (e.g., `{PROJECT_NAME}`)
16+
- Look for outdated dependencies or version numbers that may need updating
17+
18+
### 2. Common Updates
19+
20+
When asked to update documentation:
21+
22+
- Follow the Documentation Guidelines below
23+
- Focus on executable commands over explanations
24+
25+
### 3. Adding New Sections
26+
27+
When adding new documentation:
28+
29+
- Follow the existing markdown structure and heading hierarchy
30+
- Place new content in the appropriate file (workflows.md for process-related content)
31+
- Use real-world examples from the referenced repositories when possible
32+
33+
### 4. Quality Checks
34+
35+
Before finalizing any changes:
36+
37+
- Ensure all bash commands use proper escaping and line continuations
38+
- Verify that Python/YAML/TOML code blocks have correct syntax
39+
- Check that any new commands follow the established patterns in the document
40+
41+
## Repository Structure
42+
43+
```text
44+
carpenter-singh-lab-standards/
45+
├── README.md # Main entry point, links to other docs
46+
├── workflows.md # Comprehensive CCDS workflow guide
47+
└── CLAUDE.md # This file
48+
```
49+
50+
## When Making Changes
51+
52+
- workflows.md is the authoritative source for lab procedures
53+
- New documentation must complement, not duplicate, existing content
54+
- Make processes clear and executable, not flexible
55+
56+
## Documentation Guidelines
57+
58+
When updating documentation in this repository, follow these principles:
59+
60+
### Target Audience
61+
62+
- **Expert data scientists** who are new to the lab
63+
- They will be instructed to read documentation thoroughly
64+
- They want to understand THE way this lab operates, not explore options
65+
- Assume technical competence - skip basic explanations
66+
67+
### Writing Style
68+
69+
- **Prescriptive over descriptive**: "Do X" not "You might consider X"
70+
- **Dense over verbose**: One clear statement beats three explanatory paragraphs
71+
- **Executable over abstract**: Every process needs exact, working commands
72+
- **Real over generic**: Use actual filenames from real projects, not placeholders
73+
74+
### Content Organization
75+
76+
- **Role-based sections**: Clearly separate what analysts vs maintainers need
77+
- **Most-used first**: Daily workflow before one-time setup
78+
- **Progressive disclosure**: Core workflow in main text, edge cases in appendix
79+
- **Decision trees**: "If X → do Y" for any branching logic
80+
81+
### Critical Elements to Include
82+
83+
- **Coordination requirements**: Bold warnings when team sync is needed
84+
- **Order dependencies**: Explicit when sequence matters (e.g., hook installation)
85+
- **Known gotchas**: Document issues with specific workarounds
86+
- **The "why" sparingly**: Only when it affects implementation choices
87+
88+
### What to Avoid
89+
90+
- Beginner explanations ("Git is a version control system...")
91+
- Multiple options without clear guidance
92+
- Philosophical discussions about methodology
93+
- Untested command sequences
94+
95+
Remember: Documentation should be the authoritative reference that turns an expert data scientist into an expert lab member.

README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,17 @@
1-
# Carpenter-Singh Lab Standards
1+
# Carpenter-Singh Lab Guide
22

3+
Resources and standards for the Carpenter-Singh Lab at the Broad Institute.
4+
5+
## Documentation
6+
7+
### Project Organization
8+
9+
- **[Project Workflows](workflows.md)** - How we organize and run data science projects using Cookiecutter Data Science principles
10+
11+
### Learning Resources
12+
13+
- **[Resources](resources.md)** - Scientific literature, tools, and educational materials for image-based profiling
14+
15+
## Contributing
16+
17+
This is a living document. Please contribute additional guides, patterns, and standards as we develop them.

resources.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Resources
2+
3+
## Educational Resources
4+
5+
### Computing Skills
6+
7+
- [The Missing Semester of Your CS Education](https://missing.csail.mit.edu/)
8+
- [The Unix Workbench](https://www.coursera.org/learn/unix)
9+
- [Data Science Cheatsheet](https://github.com/aaronwangy/Data-Science-Cheatsheet)
10+
11+
### Textbooks
12+
13+
- [Data Analysis for the Life Sciences](https://leanpub.com/dataanalysisforthelifesciences)
14+
15+
## Scientific Literature
16+
17+
### Image-based Profiling Essentials
18+
19+
- [Image-based profiling: due for a machine-learning upgrade?](https://www.nature.com/articles/s41573-020-00117-w) - 2020 review of applications in image-based profiling
20+
- [Data-analysis strategies for image-based cell profiling](https://www.nature.com/articles/nmeth.4397) - Introduces key analysis steps
21+
- [Applications in image-based profiling of perturbations](https://www.sciencedirect.com/science/article/pii/S0958166916301112) - Describes common applications
22+
- [Cell Painting: a decade of discovery and innovation in cellular imaging](https://www.nature.com/articles/s41592-024-02528-8) - Systematic review of Cell Painting
23+
- [Cell Painting protocol](https://www.nature.com/articles/s41596-023-00840-9) - Detailed protocol
24+
25+
## Analysis Tools & Libraries
26+
27+
### Profiling Libraries
28+
29+
- **pycytominer**: Core library for image-based profiling ([GitHub](https://github.com/cytomining/pycytominer))
30+
- **copairs**: Evaluation metrics for profiling ([GitHub](https://github.com/cytomining/copairs))
31+
- More tools at [https://github.com/cytomining/](https://github.com/cytomining/)
32+
33+
### Deep Learning
34+
35+
- **DeepProfiler**: Deep learning for image-based profiling ([GitHub](https://github.com/cytomining/DeepProfiler))
36+
37+
### Learning Resources
38+
39+
- [Profiling Handbook](https://cytomining.github.io/profiling-handbook/) - Step-by-step instructions for producing image-based profiles from raw images
40+
- [LINCS Workflow](https://github.com/broadinstitute/lincs-cell-painting/tree/e9737c3e4e4443eb03c2c278a145f12efe255756/profiles#workflow) - Workflow diagram of image-based profiling
41+
42+
## Development Environment Setup
43+
44+
### Package Management
45+
46+
- Install applications using [brew](https://brew.sh/)
47+
- Use [uv](https://github.com/astral-sh/uv) for Python environment management
48+
- Install R following [these instructions](https://github.com/broadinstitute/imaging-configs/blob/master/R-instructions.md)
49+
50+
### Git Configuration
51+
52+
```bash
53+
# Configure shared repositories
54+
git config --global core.sharedRepository true
55+
```
56+
57+
**Best practice**: Avoid [amending](https://stackoverflow.com/questions/253055/how-do-i-push-amended-commit-to-the-remote-git-repository) pushed commits
58+
59+
## General Principles
60+
61+
### Troubleshooting
62+
63+
Follow the 30-minute rule: Try to solve issues independently for ~30 minutes before asking for help.
64+
65+
1. Search existing resources (documentation, issues, archives)
66+
2. Consult official documentation for relevant tools/libraries
67+
3. Consider using AI tools (ChatGPT, Claude) with appropriate caution
68+
4. Google error messages or problem descriptions
69+
5. Post in relevant channels with details of what you've already tried
70+
6. Document your troubleshooting process for future reference

0 commit comments

Comments
 (0)