# scripts usage

Here we list the scripts in their logical use order:

### Get sequence databases and obtain multiple sequence alignments for _Spongilla_.

* `databases.sh`: download and build indices for the sequence databases UniRef30 and ColabFold
  EnvDB. Modified from ColabFold.
* `databases_pdb.sh`: download (sequence) PDB and build index
* `align.sh`: build multiple sequence alignments for the _Spongilla_ proteome. A wrapper around
  `colabfold_search`.

### Predict structures for _Spongilla_

* `fasta-splitter.pl`: written by Kirill Kryukov. A utility to partition FASTA files into pieces.
  Used with `--n-parts 32` to split the _Spongilla_ proteome in batches.
* `predict_structures.sh`: a wrapper around `colabfold_batch` to submit jobs to the EMBL cluster.
* `submit_colab.sh`: a simple for loop to handle all 32 batches.

### Use FoldSeek to search against available structures

* `fs_query.sh`: build structural database for _Spongilla_ predicted structures.
* `fs_afdb.sh, fs_pdb.sh, fs_sp.sh`: download the precomputed FoldSeek databases for AFDB, PDB, and
  SwissProt, respectively. Split in three so we could run them in parallel.
* `fs_search_afdb.sh, fs_search_pdb.sh, fs_search_swissprot.sh`: search with the _Spongilla_
  structure database against the three target databases, AFDB, PDB, and SwissProt.