diff --git a/README.md b/README.md index 7c26698506c082ecd0520f40dd122ef1cbd033df..c5ed03baec846e3efd926faee7db94023dd68146 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,13 @@ The question of the threshold turns out to be an important one. Burkhardt Rost f However, structure is more conserved than sequence. In theory, predicted structures can be compared against known structures that are otherwise annotated, allowing for the transfer of functional annotations (albeit less specific than sequence-based ones, since we will be detecting very remote homology at best). This is of particular interest for non-model organisms, especially ones outside the well-studied taxonomic groups (e.g. vertebrates or ecdysozoans). +## Follow-up ideas +Having a phylome, scRNAseq/cell type annotation, functional proteomic data and the prediction of protein structures for a non-bilaterian Metazoan presents a unique combination that would allow to ask many fundamental questions. Potential follow-up analysis include: + +- Correltation between AF prediction accuracy (overall, domain specific, etc.) and sequence identity/similarity or bitscore of best FoldSeek hit. I.e.: "Does higher sequence identity mean better prediction accuracy?" +- Relationship between identified homologs through sequence search (orthofinder, eggnog-mapper, blast, phylome) and best hits in FoldSeek for single sponge proteins. I.e.: "Do the best AF hits also include proteins identified as homologs in the phylome? Is there a sequence identity threshold to that?" +- Is there biological meaning to best FoldSeek hits of un-annotated, highly expressed genes in the scRNAseq dataset or differentially regulated hits in the functional proteomic datasets?. I.e.: "Can we transfer function / functionally annotate previously un-annotated hits in scRNAseq and protomics data and most importantly, do these hits make sense when taking prior knowledge into account?" + ## Usage (eventually a tutorial on what order to use the scripts in, if we don't have a master script).