Skip to content
Snippets Groups Projects

scripts usage

Here we list the scripts in their logical use order:

Get sequence databases and obtain multiple sequence alignments for Spongilla.

  • databases.sh: download and build indices for the sequence databases UniRef30 and ColabFold EnvDB. Modified from ColabFold.
  • databases_pdb.sh: download (sequence) PDB and build index
  • align.sh: build multiple sequence alignments for the Spongilla proteome. A wrapper around colabfold_search.

Predict structures for Spongilla

  • fasta-splitter.pl: written by Kirill Kryukov. A utility to partition FASTA files into pieces. Used with --n-parts 32 to split the Spongilla proteome in batches.
  • predict_structures.sh: a wrapper around colabfold_batch to submit jobs to the EMBL cluster.
  • submit_colab.sh: a simple for loop to handle all 32 batches.

Use FoldSeek to search against available structures

  • fs_query.sh: build structural database for Spongilla predicted structures.
  • fs_afdb.sh, fs_pdb.sh, fs_sp.sh: download the precomputed FoldSeek databases for AFDB, PDB, and SwissProt, respectively. Split in three so we could run them in parallel.
  • fs_search_afdb.sh, fs_search_pdb.sh, fs_search_swissprot.sh: search with the Spongilla structure database against the three target databases, AFDB, PDB, and SwissProt.