Commit acd49035 authored by Toby Hodges's avatar Toby Hodges
Browse files

changed file names, added protein sequence file

parent 338f1aef
......@@ -24,12 +24,12 @@ think about might include
#### First Application
Now run the script on `exampleSequences1.fasta`
Now run the script on `example_sequences1.fasta`
- Has this made you notice any more improvements that could be made?
#### Exposure to The Real World
What about if you run the script on `exampleSequences2.fasta`? Does this help you
What about if you run the script on `example_sequences2.fasta`? Does this help you
to notice any additional improvements that could be made?
#### Improving the Script
......@@ -51,7 +51,8 @@ this exercise.)
If you have time, try to further adapt the script to expand its functionality
such that, given a file of protein sequences instead, it will produce counts of
the different amino acids.
the different amino acids. You can use the file `protein_sequences.fasta` to
test your script.
#### *FASTA Format
......
>sequence_1
ATCGATTGATCGCCCATAAATCGTTTCCCCCTAGCAGTAACCTCGCTCAGCCTTAAGCC
>sequence_2
GTCCATGATCCGCTAGAACAGATCGATAGACAGATCGCATAG
>sequence_3
CTCACTAGACATCCCTAGATACAACACGTAGACAGGTTTTACCTAGACAAAAC
>sequence_4
GTCCGATACAGATCGACAGATCGACAGATCGACAAGTCTCGCTAGACAAGCCCTCAAACC
>sequence_5
TTATAGAAACAATAGAACATAGACAATAGACAATATAGACAACCCAAATATAA
>sequence_1
ATCGATTGATCGCCCATAAATCGTTTCCCCCTAGCAGTAACCTCGCTCAGCCTTAAGCC
>sequence_2
ACGACGACGACGACGACGACGACGACGACGACGACGACG
>sequence_3
AGTCGATAAGATCGCTCCMATCGATAACAHCACTAGCDKAGCTAGCTA
>sequence_4
AGTCGATCCGATCCGAATAGCTCGATCGCCTAGCTAAGCTAGCTAGCTAGGATCGCTAGT
GTCGATAAGTCTCCGCTCTCATTATAAGTCTCCGATAATCTCGATATCGTTTACGCTTCG
GTACAGCGACAAAGTCCCTAGATAACTAGATACTTTAGCTAGATCTAGATCTGACAGTCG
>sequence_5
agctagatagctccgatcgaatcgatcgctcgatcgatccgatcgatatcgat
\ No newline at end of file
>sp|P05480|SRC_MOUSE Neuronal proto-oncogene tyrosine-protein kinase Src OS=Mus musculus GN=Src PE=1 SV=4
MGSNKSKPKDASQRRRSLEPSENVHGAGGAFPASQTPSKPASADGHRGPSAAFVPPAAEP
KLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTRKVD
VREGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTF
LVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSK
HADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRV
AIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMNKGSLLDFLKG
ETGKYLRLPQLVDMSAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIE
DNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVL
DQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGEN
L
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNAT
YCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGF
GGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDD
FQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQP
GDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA
RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAK
ATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDW
NLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQK
NDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
>sp|P12931|SRC_HUMAN Proto-oncogene tyrosine-protein kinase Src OS=Homo sapiens GN=SRC PE=1 SV=3
MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAE
PKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGD
WWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRES
ETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGL
CHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTL
KPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKY
LRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYT
ARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVER
GYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment