Skip to content

Commit 89f488c

Browse files
author
Idate96
committed
Reorganize homepage to avoid duplication with complete guide
- Transform homepage into a quick reference guide - Remove detailed sections that exist in complete-guide.md - Keep only essential quick-start information and commands - Add clear navigation to detailed guides - Focus on practical examples and common workflows - Maintain quick reference tables for easy lookup
1 parent 31ac49a commit 89f488c

File tree

1 file changed

+94
-218
lines changed

1 file changed

+94
-218
lines changed

docs/index.md

Lines changed: 94 additions & 218 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Welcome to the comprehensive guide for using the Euler HPC cluster at ETH Zurich
55
## 🚀 Quick Navigation
66

77
!!! tip "Getting Started"
8-
New to Euler? Start with our [Complete Guide](complete-guide/) for detailed instructions on accessing and using the cluster.
8+
New to Euler? Start with our [Complete Guide](complete-guide/) for detailed step-by-step instructions on accessing and using the cluster.
99

1010
!!! example "Container Workflows"
1111
Learn how to build, deploy, and run [containerized applications](container-workflow/) using Docker and Singularity on Euler.
@@ -18,280 +18,156 @@ Welcome to the comprehensive guide for using the Euler HPC cluster at ETH Zurich
1818

1919
---
2020

21-
## 📋 Table of Contents
21+
## 🎯 Quick Start
2222

23-
1. [Access Requirements](#access-requirements)
24-
2. [Quick Start SSH Setup](#quick-start-ssh-setup)
25-
3. [Storage Overview](#storage-overview)
26-
4. [Basic SLURM Commands](#basic-slurm-commands)
27-
5. [Container Workflow Summary](#container-workflow-summary)
28-
6. [Interactive Sessions](#interactive-sessions)
29-
7. [Support & Resources](#support-resources)
30-
31-
---
32-
33-
## ✅ Access Requirements
34-
35-
To get access to the Euler cluster:
36-
37-
1. **Fill out the access form**: [RSL Cluster Access Form](https://forms.gle/UsiGkXUmo9YyNHsH8)
38-
2. **RSL members**: Directly message Manthan Patel for faster processing
39-
3. **Access approval**: Twice weekly (Tuesdays and Fridays)
40-
41-
**Prerequisites:**
42-
- Valid nethz username and password (ETH Zurich credentials)
43-
- Terminal access (Linux/macOS or Git Bash on Windows)
44-
- Membership in RSL group (es_hutter)
45-
46-
---
47-
48-
## 🔐 Quick Start SSH Setup
49-
50-
### Basic Connection
23+
### First Time Setup
5124
```bash
25+
# 1. SSH into Euler
5226
ssh <your_nethz_username>@euler.ethz.ch
53-
```
54-
55-
### SSH Key Setup (Recommended)
56-
```bash
57-
# Generate SSH key
58-
ssh-keygen -t ed25519 -C "[email protected]"
59-
60-
# Copy to Euler
61-
ssh-copy-id <your_nethz_username>@euler.ethz.ch
62-
63-
# Create SSH config (~/.ssh/config)
64-
cat >> ~/.ssh/config << EOF
65-
Host euler
66-
HostName euler.ethz.ch
67-
User <your_nethz_username>
68-
Compression yes
69-
ForwardX11 yes
70-
EOF
71-
72-
# Now connect simply with:
73-
ssh euler
74-
```
7527

76-
### Verify Your Access
77-
```bash
78-
# Check group membership
28+
# 2. Verify RSL group membership
7929
my_share_info
8030
# Should show: "You are a member of the es_hutter shareholder group"
8131

82-
# Create your directories
32+
# 3. Create your directories
8333
mkdir -p /cluster/project/rsl/$USER
8434
mkdir -p /cluster/work/rsl/$USER
8535
```
8636

87-
---
88-
89-
## 💾 Storage Overview
90-
91-
| Location | Quota | Files | Purpose | Persistence |
92-
|----------|-------|-------|---------|-------------|
93-
| **Home** `/cluster/home/$USER` | 45 GB | 450K | Code, configs | Permanent |
94-
| **Scratch** `/cluster/scratch/$USER` | 2.5 TB | 1M | Datasets, temp files | Auto-deleted after 15 days |
95-
| **Project** `/cluster/project/rsl/$USER` | 75 GB | 300K | Conda envs, software | Permanent |
96-
| **Work** `/cluster/work/rsl/$USER` | 150 GB | 30K | Results, containers | Permanent |
97-
| **Local** `$TMPDIR` | 800 GB | High | Job runtime data | Deleted after job |
98-
99-
### Check Your Usage
37+
### Submit Your First Job
10038
```bash
101-
# Home and Scratch
102-
lquota
39+
# Create a simple test script
40+
cat > test_job.sh << 'EOF'
41+
#!/bin/bash
42+
#SBATCH --job-name=test
43+
#SBATCH --time=00:10:00
44+
#SBATCH --mem=4G
45+
46+
echo "Hello from $(hostname)"
47+
echo "Job ID: $SLURM_JOB_ID"
48+
EOF
10349

104-
# Project and Work
105-
(head -n 5 && grep -w $USER) < /cluster/work/rsl/.rsl_user_data_usage.txt
106-
(head -n 5 && grep -w $USER) < /cluster/project/rsl/.rsl_user_data_usage.txt
50+
# Submit it
51+
sbatch test_job.sh
10752
```
10853

10954
---
11055

111-
## 🖥️ Basic SLURM Commands
56+
## 📊 Quick Reference
11257

113-
### Submit a Job
58+
### Storage Locations
59+
| Location | Quota | Purpose |
60+
|----------|-------|---------|
61+
| `/cluster/home/$USER` | 45 GB | Code, configs |
62+
| `/cluster/scratch/$USER` | 2.5 TB | Datasets (auto-deleted after 15 days) |
63+
| `/cluster/project/rsl/$USER` | 75 GB | Conda environments |
64+
| `/cluster/work/rsl/$USER` | 150 GB | Results, containers |
65+
| `$TMPDIR` | 800 GB | Fast local scratch (per job) |
66+
67+
### Essential Commands
11468
```bash
115-
# Basic job submission
116-
sbatch my_job.sh
69+
# Job Management
70+
sbatch script.sh # Submit job
71+
squeue -u $USER # Check your jobs
72+
scancel <job_id> # Cancel job
11773

118-
# Interactive session (2 hours, 8 CPUs, 32GB RAM)
119-
srun --time=2:00:00 --cpus-per-task=8 --mem=32G --pty bash
74+
# Interactive Sessions
75+
srun --pty bash # Basic session
76+
srun --gpus=1 --pty bash # GPU session
12077

121-
# GPU interactive session (4 hours, 1 GPU)
122-
srun --time=4:00:00 --gpus=1 --mem=32G --pty bash
78+
# Storage Check
79+
lquota # Check home/scratch usage
12380
```
12481

125-
### Monitor Jobs
82+
### GPU Resources
12683
```bash
127-
# Check your jobs
128-
squeue -u $USER
129-
130-
# Job details
131-
scontrol show job <job_id>
84+
# Request specific GPU types
85+
#SBATCH --gpus=1 # Any available GPU
86+
#SBATCH --gpus=nvidia_geforce_rtx_4090:1 # RTX 4090 (24GB)
87+
#SBATCH --gpus=nvidia_a100_80gb_pcie:1 # A100 (80GB)
88+
```
13289

133-
# Cancel a job
134-
scancel <job_id>
90+
---
13591

136-
# Job efficiency (after completion)
137-
seff <job_id>
138-
```
92+
## 🚀 Common Workflows
13993

140-
### Sample GPU Job Script
94+
### GPU Training Job
14195
```bash
14296
#!/bin/bash
143-
#SBATCH --job-name=gpu-test
144-
#SBATCH --output=logs/%j.out
145-
#SBATCH --error=logs/%j.err
146-
#SBATCH --time=04:00:00
97+
#SBATCH --job-name=training
14798
#SBATCH --gpus=1
14899
#SBATCH --cpus-per-task=8
149100
#SBATCH --mem=32G
150-
#SBATCH --tmp=50G
101+
#SBATCH --time=24:00:00
102+
#SBATCH --tmp=100G
151103

152104
module load eth_proxy
153-
154-
# Your GPU code here
155105
python train.py
156106
```
157107

158-
---
159-
160-
## 📦 Container Workflow Summary
161-
162-
### 1. Build Docker Image
163-
```bash
164-
# Create Dockerfile
165-
cat > Dockerfile << EOF
166-
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
167-
RUN apt-get update && apt-get install -y python3-pip
168-
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
169-
COPY . /app
170-
WORKDIR /app
171-
CMD ["python3", "train.py"]
172-
EOF
173-
174-
# Build image
175-
docker build -t my-ml-app:latest .
176-
```
177-
178-
### 2. Convert to Singularity
179-
```bash
180-
# Convert Docker to Singularity
181-
apptainer build --sandbox my-ml-app.sif docker-daemon://my-ml-app:latest
182-
183-
# Create tar for transfer
184-
tar -cf my-ml-app.tar my-ml-app.sif
185-
```
186-
187-
### 3. Transfer to Euler
108+
### Container Workflow
188109
```bash
189-
scp my-ml-app.tar euler:/cluster/work/rsl/$USER/containers/
110+
# 1. Build locally
111+
docker build -t myapp:latest .
112+
113+
# 2. Convert to Singularity
114+
apptainer build --sandbox myapp.sif docker-daemon://myapp:latest
115+
tar -cf myapp.tar myapp.sif
116+
117+
# 3. Transfer & run on Euler
118+
scp myapp.tar euler:/cluster/work/rsl/$USER/
119+
# Then use in job script:
120+
tar -xf /cluster/work/rsl/$USER/myapp.tar -C $TMPDIR
121+
singularity exec --nv $TMPDIR/myapp.sif python app.py
190122
```
191123

192-
### 4. Run on Euler
124+
### Interactive Development
193125
```bash
194-
#!/bin/bash
195-
#SBATCH --job-name=container-job
196-
#SBATCH --gpus=1
197-
#SBATCH --tmp=100G
198-
199-
# Extract to local scratch (fast!)
200-
tar -xf /cluster/work/rsl/$USER/containers/my-ml-app.tar -C $TMPDIR
201-
202-
# Run with GPU support
203-
singularity exec --nv $TMPDIR/my-ml-app.sif python3 /app/train.py
126+
# JupyterHub: https://jupyter.euler.hpc.ethz.ch
127+
# Or command line:
128+
srun --gpus=1 --mem=32G --time=2:00:00 --pty bash
204129
```
205130

206-
[→ Full Container Workflow Guide](container-workflow/)
207-
208131
---
209132

210-
## 🔧 Interactive Sessions
211-
212-
### JupyterHub Access
213-
- **URL**: [https://jupyter.euler.hpc.ethz.ch](https://jupyter.euler.hpc.ethz.ch)
214-
- **Login**: Use your nethz credentials
215-
- **Features**: GPU support, VSCode option, pre-installed libraries
216-
217-
### Quick Interactive Commands
218-
```bash
219-
# Basic interactive session
220-
srun --pty bash
221-
222-
# Development session with GPU
223-
srun --gpus=1 --mem=32G --time=2:00:00 --pty bash
224-
225-
# High memory session
226-
srun --mem=128G --time=1:00:00 --pty bash
133+
## 📚 Documentation Structure
227134

228-
# With local scratch
229-
srun --tmp=100G --mem=32G --pty bash
230-
```
135+
- **[Complete Guide](complete-guide/)** - Comprehensive setup and detailed instructions
136+
- **[Container Workflow](container-workflow/)** - Full Docker/Singularity workflow with examples
137+
- **[Scripts Library](scripts/)** - Ready-to-use job scripts and templates
138+
- **[Troubleshooting](troubleshooting/)** - Solutions to common problems
231139

232140
---
233141

234-
## 🐍 Python Environment Setup
142+
## 🆘 Getting Help
235143

236-
### Miniconda Installation
237-
```bash
238-
# Install in project directory (more space)
239-
mkdir -p /cluster/project/rsl/$USER/miniconda3
240-
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
241-
bash Miniconda3-latest-Linux-x86_64.sh -b -p /cluster/project/rsl/$USER/miniconda3
242-
rm Miniconda3-latest-Linux-x86_64.sh
243-
244-
# Initialize
245-
/cluster/project/rsl/$USER/miniconda3/bin/conda init bash
246-
conda config --set auto_activate_base false
247-
```
144+
### Quick Links
145+
- **Access Form**: [RSL Cluster Access](https://forms.gle/UsiGkXUmo9YyNHsH8)
146+
- **RSL Contact**: Manthan Patel ([email protected])
147+
- **ETH IT Support**: [ServiceDesk](https://ethz.ch/services/en/it-services/help.html)
148+
- **Official Docs**: [Euler Wiki](https://scicomp.ethz.ch/wiki/Euler)
248149

249-
### Create Environment
250-
```bash
251-
conda create -n ml_env python=3.10
252-
conda activate ml_env
253-
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
254-
```
150+
### Prerequisites
151+
✅ Valid nethz account
152+
✅ RSL group membership (es_hutter)
153+
✅ Terminal access
154+
✅ Basic Linux/SLURM knowledge
255155

256156
---
257157

258-
## 🛠️ Quick Tips
158+
## 🎓 Tips for Success
259159

260-
!!! success "Best Practices"
261-
- **Use local scratch** (`$TMPDIR`) for I/O intensive operations
262-
- **Request only needed resources** to reduce queue time
263-
- **Save work frequently** - interactive sessions can timeout
264-
- **Use job arrays** for parameter sweeps
265-
- **Load `eth_proxy` module** for internet access
160+
!!! success "Do's"
161+
- Use `$TMPDIR` for I/O intensive operations
162+
- Request only the resources you need
163+
- Use containers for reproducible environments
164+
- Save important results to `/cluster/work/rsl/$USER`
266165

267-
!!! warning "Common Pitfalls"
268-
- Don't install conda in home directory (limited inodes)
269-
- Don't run jobs on login nodes
166+
!!! warning "Don'ts"
167+
- Don't run computations on login nodes
270168
- Don't exceed storage quotas
271-
- Remember scratch data is auto-deleted after 15 days
272-
273-
---
274-
275-
## 📞 Support & Resources
276-
277-
### Getting Help
278-
- **Cluster Issues**: ETH IT ServiceDesk
279-
- **RSL Access**: Contact Manthan Patel ([email protected])
280-
- **Guide Issues**: [GitHub Issues](https://github.com/leggedrobotics/euler-cluster-guide/issues)
281-
282-
### Useful Links
283-
- [Official Euler Documentation](https://scicomp.ethz.ch/wiki/Euler)
284-
- [Getting Started with GPUs](https://scicomp.ethz.ch/wiki/Getting_started_with_GPUs)
285-
- [JupyterHub Access](https://jupyter.euler.hpc.ethz.ch)
286-
- [RSL Lab Homepage](https://rsl.ethz.ch)
287-
288-
### Tested Configuration
289-
| Component | Version |
290-
|-----------|---------|
291-
| **Docker** | 24.0.7 |
292-
| **Apptainer** | 1.2.5 |
293-
| **Cluster** | Euler (ETH Zurich) |
294-
| **Group** | es_hutter (RSL) |
169+
- Don't leave interactive sessions idle
170+
- Don't store data only in scratch (auto-deleted!)
295171

296172
---
297173

0 commit comments

Comments
 (0)