From 9ea3b8ae1d7a2a7c987716f3a1ffae2359e8fa9f Mon Sep 17 00:00:00 2001
From: Niko Papadopoulos <nikolaos.papadopoulos@embl.de>
Date: Thu, 12 Jan 2023 10:53:41 +0100
Subject: [PATCH] freshened up README

---
 README.md | 42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/README.md b/README.md
index 8cfc6b7..e57792b 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,22 @@
-# CoFFE - a pipeline for structure-based annotation transfer
-
-Sufficient sequence similarity is used to consider an unknown protein an ortholog of a well-annotated one, and transfer structural and functional information to it. In the genomics era sequencing far outpaces functional experiments as well as experimental protein structure determination, making sequence-based annotation transfer a critical component of working with biological data.
-
-Experimentally determined protein structures and functions usually come from one of few model species (mostly human, mouse, fly, worm, yeast). For organisms that are phyletically distant, the usefulness of sequence-based annotation transfer is severely limited.
-
-Something that is conserved better across long evolutionary distances are protein structures, which have a more direct link to protein function. This is something that is long recognised (hence efforts like SCOP and CATH), but since there was no easy way to obtain lots of structures it was never practical to use protein structure similarity to assess homology in the same manner as sequences. Protein structure determination is hard and expensive, and protein structure prediction only worked well enough in homology modelling, where it was possible to find a template structure based on sequence similarity.
+# MorF - a pipeline for structure-based annotation transfer
+
+Sufficient sequence similarity is used to consider an unknown protein an ortholog of a
+well-annotated one, and transfer structural and functional information to it. In the genomics era
+sequencing far outpaces functional experiments as well as experimental protein structure
+determination, making sequence-based annotation transfer a critical component of working with
+biological data.
+
+Experimentally determined protein structures and functions usually come from one of few model
+species (mostly human, mouse, fly, worm, yeast). For organisms that are phyletically distant, the
+usefulness of sequence-based annotation transfer is severely limited.
+
+Something that is conserved better across long evolutionary distances are protein structures, which
+have a more direct link to protein function. This is something that is long recognised (hence
+efforts like SCOP and CATH), but since there was no easy way to obtain lots of structures it was
+never practical to use protein structure similarity to assess homology in the same manner as
+sequences. Protein structure determination is hard and expensive, and protein structure prediction
+only worked well enough in homology modelling, where it was possible to find a template structure
+based on sequence similarity.
 
 [AlphaFold](https://www.nature.com/articles/s41586-021-03819-2) changed how we think about protein
 structures. By leveraging deep learning, multiple sequence alignments, and the ever-expanding
@@ -12,15 +24,23 @@ library of solved protein structures, AlphaFold is able to predict three-dimensi
 structures at resolutions that rival solved crystal structures, and has immediately found use in
 large parts of biological research.
 
-We used AlphaFold to predict structures for the proteome of _Spongilla lacustris_, a freshwater sponge, and annotated them via structural similarity to all available protein structures. Please consider the [manuscript](https://www.biorxiv.org/content/10.1101/2022.07.05.498892v2) for more details, or peruse the notebooks to see our analysis.
+We used [colabfold](https://github.com/sokrypton/ColabFold) to predict structures for the proteome
+of _Spongilla lacustris_, a freshwater sponge, and annotated them via structural similarity to all
+available protein structures. Please consider the
+[manuscript](https://www.biorxiv.org/content/10.1101/2022.07.05.498892) for more details, or peruse
+the notebooks to see our analysis.
 
 ## Authors and contributions
 
 - Niko Papadopoulos and Fabian Ruperti conceived the project.
 - Niko Papadopoulos, Fabian Ruperti, and Jacob Musser designed the project.
-- Niko Papadopoulos and Fabian Ruperti performed the main analysis
-- Milot Mirdita consulted on ColabFold usage, performed additional analysis on novel fold candidates.
-- Martin Steinegger consulted on ColabFold usage, performed additional analysis on HGT candidates.
+- Niko Papadopoulos and Fabian Ruperti performed the main analysis.
+- Niko Papadopoulos and Fabian Ruperti performed the additional analysis requested during manuscript
+  revision.
+- Milot Mirdita consulted on ColabFold usage, performed additional analysis on novel fold
+  candidates and consulted during the revision process.
+- Martin Steinegger consulted on project design and ColabFold usage, and performed additional
+  analysis on HGT candidates.
 - Jacob Musser and Alexandros Pittis consulted on gene naming and phylogenetic assignment.
 - Niko Papadopoulos, Fabian Ruperti, Jacob Musser, and Detlev Arendt wrote the manuscript.
 
-- 
GitLab