Commit f1ccdd4f authored by Jurij Pecar's avatar Jurij Pecar

intermediate march 2019

parent 12bc3e78
<a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons Licence" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
# Intro
Teaching material used during the High Performance Computing session of the EMBL Software Carpentry course.
# Commands
Below are most of the commands used during the practical, so they can be copy/pasted, but I highly recommend typing along if you can.
## Login to the frontend node
```
ssh <username>@login.cluster.embl.de
```
## Clone this git repository
```
git clone https://git.embl.de/msmith/embl_hpc.git
```
## Identifying our computer
```
hostname
```
## Our first SLURM job
```
srun hostname
```
## Exploring our example program (don't run!)
```
cd $HOME/embl_hpc/exercises
./hpc_example -t 10 -m 100
```
## Running example program on on the cluster
```
srun ./hpc_example -t 10 -m 100
```
## Using our reserved training space
```
srun --­­reservation=training ./hpc_example -­t 10 -­m 100
```
## Running in the background
```
sbatch ­­--reservation=training ./batch_job.sh
```
## Redirecting output
```
sbatch --output=output.txt --reservation=training ./batch_job.sh
```
## Creating a larger list
You will need to edit batch_jobs.sh to take arguments
```
sbatch --output=output.txt --reservation=training ./batch_job.sh 20 ???
```
## Displaying details of our cluster queue
```
scontrol show partition
```
## Requesting more resources
```
sbatch --mem=8200 --reservation=training ./batch_job.sh 30 8000
```
## Requesting a lot more resources
```
sbatch --mem=100G --reservation=training ./batch_job.sh 30 5000
```
## Cancel jobs
```
scancel <jobID>
scancel -u <username>
```
## Defining time limits
```
sbatch ­­--time=00­00:00:30 \
­­--reservation=training \
batch_job.sh 60 500
```
## Job efficiency statistics
```
seff <jobID>
```
## Emailing output
```
sbatch ­­--mail­user=<first.last>@embl.de \
­­--reservation=training \
./batch_job.sh 20 500
```
## Finding and using software
```
module avail
module spider samtools
module load BWA
```
## BWA example
```
nano bwa/bwa_batch.sh
sbatch ­­­­--reservation=training bwa/bwa_batch.sh
```
## Running interactive jobs
```
srun --pty bash
```
## Interactive job with more memory
```
srun --mem=250 --pty bash
```
## Using `sbatch` instead
```
sbatch batch_jobs.sh
```
## Using job dependencies to build pipelines
```
jid=$(sbatch --parsable batch_job.sh)
sbatch --dependency=afterok:$jid batch_job.sh
```
# Intro
This folder contains instructions and files for setting up the example cluster used during the course. It's included here mostly for my benefit next time I want to redo the setup, but it might be useful to others.
# Cluster infrastructure
The cluster we're using is running on the Heidelberg installation of the [de.NBI cloud](https://www.denbi.de/cloud-overview/cloud-hd). The current design is to create a 4 node cluster (1 controller, 3 compute nodes), with varying hardware specifications for each node so we can demonstrate resource managment.
Job scheduling is doing using [SLURM](https://slurm.schedmd.com/) since it is (a) free and (b) mirrors the infrastructure we're currently using at EMBL.
## Generate _ubuntu_ user SSH keys
We only need to do the is on the Master, since the home drive will be shared with the compute nodes
```
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
```
## Install NFS
### Master
```
sudo apt-get update
sudo apt-get install nfs-kernel-server
sudo cat '/home 10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)' >> /etc/exports
sudo service nfs-kernel-server start
```
### Node
```
sudo apt-get update
sudo apt-get install nfs-common
## add a line to automatically mount the shared home directory
sudo cat '10.0.0.8:/home /home nfs auto,noatime,nolock,bg,nfsvers=4,intr,tcp,actimeo=1800 0 0' >> /etc/fstab
## restart the machine
sudo shutdown -r now
```
## Install SLURM
### Master
```
sudo apt-get install slurm-wlm
## enable use of cgroups for process tracking and resource management
sudo bash -c 'echo CgroupAutomount=yes >> /etc/slurm-llnl/cgroup.conf'
sudo chown slurm:slurm /etc/slurm-llnl/cgroup.conf
sudo sed -i 's/GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"/g' /etc/default/grub
sudo update-grub
## put munge key in home directory so we can share it with the nodes
sudo cp /etc/munge/munge.key $HOME/
## download slurm.conf file (may require some editing of IP addresses etc)
sudo wget https://raw.githubusercontent.com/grimbough/embl_swc_hpc/oct2017/cluster_setup/slurm.conf -O /etc/slurm-llnl/slurm.conf -o /dev/null
sudo chown slurm:slurm /etc/slurm-llnl/slurm.conf
```
### Node
```
## install slurm worker daemon
sudo apt-get install slurmd
## enable use of cgroups for process tracking and resource management
sudo bash -c 'echo CgroupAutomount=yes >> /etc/slurm-llnl/cgroup.conf'
sudo chown slurm:slurm /etc/slurm-llnl/cgroup.conf
sudo sed -i 's/GRUB_CMDLINE_LINUX=""/GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"/g' /etc/default/grub
sudo update-grub
## copy the shared munge key and restart the service to start using it
sudo cp /home/ubuntu/munge.key /etc/munge/munge.key
sudo service munge restart
```
ControlMachine=master
ControlAddr=10.0.0.8
#
MpiDefault=none
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
StateSaveLocation=/var/lib/slurm-llnl/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/filetxt
AccountingStorageLoc=/var/log/slurm-llnl/accounting
ClusterName=cluster
JobAcctGatherType=jobacct_gather/linux
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
AuthType=auth/munge
JobAcctGatherFrequency=task=5
# MEMORY LIMITS
MemLimitEnforce=yes
KillWait=0
# COMPUTE NODES
NodeName=node1 NodeAddr=10.0.0.15 CPUs=2 ThreadsPerCore=1 RealMemory=3951 TmpDisk=19788
NodeName=node2 NodeAddr=10.0.0.16 CPUs=4 ThreadsPerCore=1 RealMemory=7983 TmpDisk=19788
NodeName=node3 NodeAddr=10.0.0.11 CPUs=8 ThreadsPerCore=1 RealMemory=16046 TmpDisk=19788
# PARTIONS
PartitionName=swc Nodes=node1,node2,node3 Default=YES DefaultTime=1 MaxTime=5 DefMemPerCPU=100 MaxMemPerCPU=16000 State=UP
#PartitionName=swc-long Nodes=node1,node2,node3 Default=NO DefaultTime=5 MaxTime=10 DefMemPerCPU=100 MaxMemPerCPU=300 State=UP
#!/bin/bash
## remove users when I've messed up the configuration
n=40
for i in `seq -w 1 ${n}`
do
echo $i;
userdel -rf user${i}
done;
#!/bin/bash
## script to create 40 users called userXX with a default password
## and setup up ssh logins without asking for passwords & host checking
n=40
for i in `seq -w 1 ${n}`
do
echo $i;
## create n new user called userXX and create default password
adduser --gecos "" --disabled-password user${i}
echo user${i}:SoftwareC | chpasswd
## create somewhere to store ssh configuration
mkdir -p /home/user${i}/.ssh
printf "Host *\n StrictHostKeyChecking no\n ForwardX11 yes\n" > /home/user${i}/.ssh/config
## generate a ssh key & copy to the list of authorized keys
ssh-keygen -f /home/user${i}/.ssh/id_rsa -t rsa -N ''
cp /home/user${i}/.ssh/id_rsa.pub /home/user${i}/.ssh/authorized_keys
## set new user as owner
chown -R user${i}:user${i} /home/user${i}/.ssh
chmod 600 /home/user${i}/.ssh/config
done
#!/bin/bash
## Run this on each compute node
## script creates 40 users called userXX with a default password
## and setup up ssh logins without asking for passwords & host checking
n=40
for i in `seq -w 1 ${n}`
do
echo $i;
## create n new user called userXX and create default password
adduser --gecos "" --disabled-password user${i}
echo user${i}:SoftwareC | chpasswd
done
#!/bin/bash
srun ./hpc_example -t 30 -m 200
......@@ -12,15 +12,15 @@
module load SAMtools BWA
## copy data to /tmp and change directory to /tmp
cp /g/huber/users/msmith/embl_hpc/Ecoli_genome.fa.gz $TMPDIR
cp /g/huber/users/msmith/embl_hpc/reads_*.fq.gz $TMPDIR
cp /g/its/home/pecar/benchmarks/msmith_bwa/Ecoli_genome.fa.gz $TMPDIR
cp /g/its/home/pecar/benchmarks/msmith_bwa/reads_*.fq.gz $TMPDIR
cd $TMPDIR
## create an index
bwa index -p ecoli Ecoli_genome.fa.gz
## perform alignment
bwa mem ecoli reads_1.fq.gz reads_2.fq.gz > aligned.sam
bwa mem -t $SLURM_CPUS_PER_TASK ecoli reads_1.fq.gz reads_2.fq.gz > aligned.sam
## create a compressed BAM file
samtools view -b aligned.sam > aligned.bam
......
#!/bin/bash
#SBATCH --output=output.txt
#SBATCH --open-mode=append
srun ./hpc_example -t $1 -m $2
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment