Commit 158f168c authored by Marc Gouw's avatar Marc Gouw

Merge branch 'tabular_data' into 'master'

Chapter 4 Updates

See merge request grp-bio-it/ITPP!17
parents 298e35d3 35202470
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to Python Programming"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Walkthrough: Exercise 4.7 - Plotting with Bokeh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First things first, we need to import the functions that we need from the `bokeh` library. `Bar` to create a bar plot, and `show` to render it. We use `output_notebook` here, to allow the plot to be displayed in this Jupyter Notebook - you might want to use `output_file` instead, which will result in the figure being rendered in an HTML file instead. (See below.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from bokeh.charts import Bar, output_notebook, show\n",
"from bokeh.io import gridplot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we need to read the data from the file into Python. Start by initialising an empty dictionary, which will be populated with all of the data, and an empty list that will be used to store the names of the taxa as we encounter them for the first time. \n",
"Next, read through the file line-by-line. Each time we encounter a line starting with 'Site', we extract the name of the site and initialise another empty dictionary - keyed by that site name - in the dictionary that we created before. This will be used to store the data for this site. \n",
"If the line doesn't start with 'Site', we know that it contains data collected from the site name that we saw most recently, so we need to store it in the dictionary that we created for that site. To do that, first we split the line into the two pieces of information it contains: the taxon id and the recorded count. Then, we store these as a key-value pair in the dictionary for the current site."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"inputFile = 'speciesDistribution.txt'\n",
"handle = open(inputFile, 'r')\n",
"\n",
"sites = {}\n",
"taxa = []\n",
"\n",
"for line in handle.readlines():\n",
" line = line.strip()\n",
" if line.startswith('Site'):\n",
" siteName = line[6:]\n",
" sites[siteName] = {}\n",
" else:\n",
" taxonID, count = line.split('\\t')\n",
" if taxonID not in taxa:\n",
" taxa.append(taxonID)\n",
" sites[siteName][taxonID] = int(count)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the end, we want to output the plots for the sites in alphabetical order, and have the taxon ids sorted alphabetically in each plot too. The names of the sites are the keys of our `sites` dictionary, so we can access them by using the `.keys()` method on that dictionary. Then, this list and the list of taxon ids can be sorted - below we demonstrate two ways in which you can achieve this - the built-in `sorted()` function, and the `.sort()` method of a list object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"siteNames = sites.keys()\n",
"siteNames = sorted(siteNames)\n",
"taxa.sort()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we need to go through each site and add zeroes into the dictionaries, wherever one of our taxa wasn't observed at a particular site. For each site, we need to loop through each taxon id and check whether it doesn't have a count in the site's dictionary. If it doesn't, we should add an entry for this taxon id, with a count of 0. \n",
"_Note: we could use the `.get()` or `.setdefault()` dictionary methods later, when accessing the values, to use zeroes where we find missing values. You can read more about those, and other dictionary methods, [here](https://docs.python.org/3.5/library/stdtypes.html#mapping-types-dict)._"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"for site in siteNames:\n",
" for taxonID in taxa:\n",
" if taxonID not in sites[site]:\n",
" sites[site][taxonID] = 0\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `Bar` function, and other plotting functions from `bokeh`, expect an iterable data type to plot from. Here, we will create another dictionary, keyed by site name as before but only containing the counts in a list (ordered by taxon id as before). This is slightly clumsy - given that we have already gone to so much trouble to create our dictionary of dictionaries above - but it is good to have a single structure that holds all of our data, and a separate one that contains only the data for plotting. If you'd like to learn more advanced ways of handling and processing data in Python, I recommend reading about [pandas](http://pandas.pydata.org) and [object class definitions](https://docs.python.org/3.5/tutorial/classes.html#a-first-look-at-classes)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we use `output_notebook()` to make sure that our plots can be displayed in this Notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"output_notebook()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Note: to save the plot to a file instead, use:_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```Python\n",
"output_file('my_amazing_plot.html') # give files descriptive names so you can easily identify them in future\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can start creating the indivual bar plots, one for each site. We want to display these plots all together, so we should start by storing them in a list. You can read more about how to use the `Bar` function [here](http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html#bar-charts)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"plots = []\n",
"for site in siteNames:\n",
" data = {}\n",
" data[site] = []\n",
" data['taxa'] = taxa\n",
" for taxon in taxa:\n",
" data[site].append(sites[site][taxon])\n",
" plot = Bar(data, 'taxa', values=site, color='taxa', title=site, xlabel='taxon', ylabel='abundance', tools='',width=400)\n",
" plots.append(plot)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The final step is to lay out and display the plots. Here, we will use the `gridplot` function from `bokeh.io`, but there are plenty of other options. Check out the documentation [here](http://bokeh.pydata.org/en/latest/docs/user_guide/layout.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"layout = gridplot([[plots[0],plots[1]],\\\n",
" [plots[2],plots[3]],\\\n",
" [plots[4],plots[5]],\\\n",
" [plots[6]]])\n",
"show(layout)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
This source diff could not be displayed because it is too large. You can view the blob instead.
taxonID Grimston Wood Hagg Wood Hetchell Wood N Hetchell Wood S Scoreby Wood Sutton Wood Wheldrake Wood
A 123 2039 12983 9380 920 883 0
B 1340 9394 8493 13928 3928 293 91
C 11984 19380 948 0 0 893 22649
D 0 9102 9384 949 9301 18990 2949
E 9389 932 4942 19023 19384 0 901
F 4320 0 0 9384 12949 3910 0
G 1283 893 9834 948 0 930 9204
H 0 5839 0 9284 3892 1738 2040
I 0 0 1293 0 9192 819 8173
J 8193 9302 9348 1093 0 0 0
K 193 0 0 3029 912 0 0
L 0 984 0 0 0 0 6781
M 0 0 0 0 0 9934 9184
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment