Commit acd49035 authored by Toby Hodges's avatar Toby Hodges
Browse files

changed file names, added protein sequence file

parent 338f1aef
...@@ -24,12 +24,12 @@ think about might include ...@@ -24,12 +24,12 @@ think about might include
#### First Application #### First Application
Now run the script on `exampleSequences1.fasta` Now run the script on `example_sequences1.fasta`
- Has this made you notice any more improvements that could be made? - Has this made you notice any more improvements that could be made?
#### Exposure to The Real World #### Exposure to The Real World
What about if you run the script on `exampleSequences2.fasta`? Does this help you What about if you run the script on `example_sequences2.fasta`? Does this help you
to notice any additional improvements that could be made? to notice any additional improvements that could be made?
#### Improving the Script #### Improving the Script
...@@ -51,7 +51,8 @@ this exercise.) ...@@ -51,7 +51,8 @@ this exercise.)
If you have time, try to further adapt the script to expand its functionality If you have time, try to further adapt the script to expand its functionality
such that, given a file of protein sequences instead, it will produce counts of such that, given a file of protein sequences instead, it will produce counts of
the different amino acids. the different amino acids. You can use the file `protein_sequences.fasta` to
test your script.
#### *FASTA Format #### *FASTA Format
......
>sequence_1
ATCGATTGATCGCCCATAAATCGTTTCCCCCTAGCAGTAACCTCGCTCAGCCTTAAGCC
>sequence_2
GTCCATGATCCGCTAGAACAGATCGATAGACAGATCGCATAG
>sequence_3
CTCACTAGACATCCCTAGATACAACACGTAGACAGGTTTTACCTAGACAAAAC
>sequence_4
GTCCGATACAGATCGACAGATCGACAGATCGACAAGTCTCGCTAGACAAGCCCTCAAACC
>sequence_5
TTATAGAAACAATAGAACATAGACAATAGACAATATAGACAACCCAAATATAA
>sequence_1
ATCGATTGATCGCCCATAAATCGTTTCCCCCTAGCAGTAACCTCGCTCAGCCTTAAGCC
>sequence_2
ACGACGACGACGACGACGACGACGACGACGACGACGACG
>sequence_3
AGTCGATAAGATCGCTCCMATCGATAACAHCACTAGCDKAGCTAGCTA
>sequence_4
AGTCGATCCGATCCGAATAGCTCGATCGCCTAGCTAAGCTAGCTAGCTAGGATCGCTAGT
GTCGATAAGTCTCCGCTCTCATTATAAGTCTCCGATAATCTCGATATCGTTTACGCTTCG
GTACAGCGACAAAGTCCCTAGATAACTAGATACTTTAGCTAGATCTAGATCTGACAGTCG
>sequence_5
agctagatagctccgatcgaatcgatcgctcgatcgatccgatcgatatcgat
\ No newline at end of file
>sp|P05480|SRC_MOUSE Neuronal proto-oncogene tyrosine-protein kinase Src OS=Mus musculus GN=Src PE=1 SV=4
MGSNKSKPKDASQRRRSLEPSENVHGAGGAFPASQTPSKPASADGHRGPSAAFVPPAAEP
KLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTRKVD
VREGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTF
LVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSK
HADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRV
AIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMNKGSLLDFLKG
ETGKYLRLPQLVDMSAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIE
DNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVL
DQVERGYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGEN
L
>sp|P04062|GLCM_HUMAN Glucosylceramidase OS=Homo sapiens GN=GBA PE=1 SV=3
MEFSSPSREECPKPLSRVSIMAGSLTGLLLLQAVSWASGARPCIPKSFGYSSVVCVCNAT
YCDSFDPPTFPALGTFSRYESTRSGRRMELSMGPIQANHTGTGLLLTLQPEQKFQKVKGF
GGAMTDAAALNILALSPPAQNLLLKSYFSEEGIGYNIIRVPMASCDFSIRTYTYADTPDD
FQLHNFSLPEEDTKLKIPLIHRALQLAQRPVSLLASPWTSPTWLKTNGAVNGKGSLKGQP
GDIYHQTWARYFVKFLDAYAEHKLQFWAVTAENEPSAGLLSGYPFQCLGFTPEHQRDFIA
RDLGPTLANSTHHNVRLLMLDDQRLLLPHWAKVVLTDPEAAKYVHGIAVHWYLDFLAPAK
ATLGETHRLFPNTMLFASEACVGSKFWEQSVRLGSWDRGMQYSHSIITNLLYHVVGWTDW
NLALNPEGGPNWVRNFVDSPIIVDITKDTFYKQPMFYHLGHFSKFIPEGSQRVGLVASQK
NDLDAVALMHPDGSAVVVVLNRSSKDVPLTIKDPAVGFLETISPGYSIHTYLWRRQ
>sp|P12931|SRC_HUMAN Proto-oncogene tyrosine-protein kinase Src OS=Homo sapiens GN=SRC PE=1 SV=3
MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAE
PKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGD
WWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRES
ETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGL
CHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTL
KPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMSKGSLLDFLKGETGKY
LRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYT
ARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVER
GYRMPCPPECPESLHDLMCQCWRKEPEERPTFEYLQAFLEDYFTSTEPQYQPGENL
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment