"How good is protein structure at finding conserved function? To find out, we tried to compare enzyme function between distantly related species, to see how often function was correctly annotated by structural similarity. In particular, we were interested in cases where orthology via sequence similarity was no longer detectable but the annotated protein function was still similar. Enzymes are an interesting test case, since they have an easily accessible function description (EC number).\n",
"\n",
"The detailed strategy I am going to pursue here:\n",
"\n",
"1. take the proteome of a species distantly related to human (yeast, arabidopsis)\n",
"2. foldseek against Alphafold/proteomes\n",
"3. keep only significant human hits\n",
"4. remove query/target pairs that clearly share evolutionary history (same root orthogroup/same most specific orthogroup)\n",
"5. for all remaining cases: only keep enzymes (valid EC is available)\n",
"6. calculate how many digits of the EC overlap\n",
"/var/folders/md/d6lwwbv97xb6g6ddypntnprh0000gp/T/ipykernel_488/2997960917.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.\n",
"we don't want to get into quibbly territory, so we'll proceed with the IDs that are in both lists and will not wonder why they don't match to 100%."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f57728f2-f505-4195-9df1-a606ec65f40d",
"metadata": {},
"outputs": [],
"source": [
"keep = yeast_annot[\"id\"].isin(c)\n",
"yeast_annot = yeast_annot[keep]\n",
"\n",
"keep = yeast_foldseek[\"query\"].isin(c)\n",
"yeast_foldseek = yeast_foldseek[keep]"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d4a421da-3d5d-47f3-90eb-f77441d929d1",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/var/folders/md/d6lwwbv97xb6g6ddypntnprh0000gp/T/ipykernel_488/2273340934.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.\n",
"/var/folders/md/d6lwwbv97xb6g6ddypntnprh0000gp/T/ipykernel_488/2842444695.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.\n",
How good is protein structure at finding conserved function? To find out, we tried to compare enzyme function between distantly related species, to see how often function was correctly annotated by structural similarity. In particular, we were interested in cases where orthology via sequence similarity was no longer detectable but the annotated protein function was still similar. Enzymes are an interesting test case, since they have an easily accessible function description (EC number).
The detailed strategy I am going to pursue here:
1. take the proteome of a species distantly related to human (yeast, arabidopsis)
2. foldseek against Alphafold/proteomes
3. keep only significant human hits
4. remove query/target pairs that clearly share evolutionary history (same root orthogroup/same most specific orthogroup)
5. for all remaining cases: only keep enzymes (valid EC is available)
/var/folders/md/d6lwwbv97xb6g6ddypntnprh0000gp/T/ipykernel_488/2997960917.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
/var/folders/md/d6lwwbv97xb6g6ddypntnprh0000gp/T/ipykernel_488/2273340934.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.
/var/folders/md/d6lwwbv97xb6g6ddypntnprh0000gp/T/ipykernel_488/2842444695.py:1: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.