-
Niko Papadopoulos authored
- moved to scripts folder - wrote README - removed obsolete - renamed properly
Niko Papadopoulos authored- moved to scripts folder - wrote README - removed obsolete - renamed properly
README.md 1.30 KiB
scripts usage
Here we list the scripts in their logical use order:
Get sequence databases and obtain multiple sequence alignments for Spongilla.
-
databases.sh
: download and build indices for the sequence databases UniRef30 and ColabFold EnvDB. Modified from ColabFold. -
databases_pdb.sh
: download (sequence) PDB and build index -
align.sh
: build multiple sequence alignments for the Spongilla proteome. A wrapper aroundcolabfold_search
.
Predict structures for Spongilla
-
fasta-splitter.pl
: written by Kirill Kryukov. A utility to partition FASTA files into pieces. Used with--n-parts 32
to split the Spongilla proteome in batches. -
predict_structures.sh
: a wrapper aroundcolabfold_batch
to submit jobs to the EMBL cluster. -
submit_colab.sh
: a simple for loop to handle all 32 batches.
Use FoldSeek to search against available structures
-
fs_query.sh
: build structural database for Spongilla predicted structures. -
fs_afdb.sh, fs_pdb.sh, fs_sp.sh
: download the precomputed FoldSeek databases for AFDB, PDB, and SwissProt, respectively. Split in three so we could run them in parallel. -
fs_search_afdb.sh, fs_search_pdb.sh, fs_search_swissprot.sh
: search with the Spongilla structure database against the three target databases, AFDB, PDB, and SwissProt.