# scripts usage Here we list the scripts in their logical use order: ### Get sequence databases and obtain multiple sequence alignments for _Spongilla_. * `databases.sh`: download and build indices for the sequence databases UniRef30 and ColabFold EnvDB. Modified from ColabFold. * `databases_pdb.sh`: download (sequence) PDB and build index * `align.sh`: build multiple sequence alignments for the _Spongilla_ proteome. A wrapper around `colabfold_search`. ### Predict structures for _Spongilla_ * `fasta-splitter.pl`: written by Kirill Kryukov. A utility to partition FASTA files into pieces. Used with `--n-parts 32` to split the _Spongilla_ proteome in batches. * `predict_structures.sh`: a wrapper around `colabfold_batch` to submit jobs to the EMBL cluster. * `submit_colab.sh`: a simple for loop to handle all 32 batches. ### Use FoldSeek to search against available structures * `fs_query.sh`: build structural database for _Spongilla_ predicted structures. * `fs_afdb.sh, fs_pdb.sh, fs_sp.sh`: download the precomputed FoldSeek databases for AFDB, PDB, and SwissProt, respectively. Split in three so we could run them in parallel. * `fs_search_afdb.sh, fs_search_pdb.sh, fs_search_swissprot.sh`: search with the _Spongilla_ structure database against the three target databases, AFDB, PDB, and SwissProt.