Merge branch 'initial_setup' into 'master'

Initial setup See merge request !1

Merge branch 'initial_setup' into 'master'
Initial setup See merge request !1
2186877a · Renato Alves · f3057983 · b740a91f · f3057983 · 2186877a
Commit 2186877a authored 4 years ago by Renato Alves
--- a/_episodes/01-intro.md
+++ b/_episodes/01-intro.md
---
-title: Python Fundamentals
-teaching: 20
-exercises: 10
-questions:
- "What basic data types can I work with in Python?"
- "How can I create a new variable in Python?"
- "Can I change the value associated with a variable after I create it?"
-objectives:
- "Assign values to variables."
-keypoints:
- "Basic data types in Python include integers, strings, and floating-point numbers."
- "Use `variable = value` to assign a value to a variable in order to record it in memory."
- "Variables are created on demand whenever a value is assigned to them."
- "Use `print(something)` to display the value of `something`."
---
-## Variables
-Any Python interpreter can be used as a calculator:
-~~~
-3 + 5 * 4
-~~~
-{: .language-python}
-~~~
-23
-~~~
-{: .output}
-This is great but not very interesting.
-To do anything useful with data, we need to assign its value to a _variable_.
-In Python, we can [assign]({{ page.root }}/reference/#assign) a value to a
-[variable]({{ page.root }}/reference/#variable), using the equals sign `=`.
-For example, to assign value `60` to a variable `weight_kg`, we would execute:
-~~~
-weight_kg = 60
-~~~
-{: .language-python}
-From now on, whenever we use `weight_kg`, Python will substitute the value we assigned to
-it. In layman's terms, **a variable is a name for a value**.
-In Python, variable names:
- - can include letters, digits, and underscores
- - cannot start with a digit
- - are [case sensitive]({{ page.root }}/reference/#case-sensitive).
-This means that, for example:
- - `weight0` is a valid variable name, whereas `0weight` is not
- - `weight` and `Weight` are different variables
-## Types of data
-Python knows various types of data. Three common ones are:
-* integer numbers
-* floating point numbers, and
-* strings.
-In the example above, variable `weight_kg` has an integer value of `60`.
-To create a variable with a floating point value, we can execute:
-~~~
-weight_kg = 60.0
-~~~
-{: .language-python}
-And to create a string, we add single or double quotes around some text, for example:
-~~~
-weight_kg_text = 'weight in kilograms:'
-~~~
-{: .language-python}
-## Using Variables in Python
-To display the value of a variable to the screen in Python, we can use the `print` function:
-~~~
-print(weight_kg)
-~~~
-{: .language-python}
-~~~
-60.0
-~~~
-{: .output}
-We can display multiple things at once using only one `print` command:
-~~~
-print(weight_kg_text, weight_kg)
-~~~
-{: .language-python}
-~~~
-weight in kilograms: 60.0
-~~~
-{: .output}
-Moreover, we can do arithmetic with variables right inside the `print` function:
-~~~
-print('weight in pounds:', 2.2 * weight_kg)
-~~~
-{: .language-python}
-~~~
-weight in pounds: 132.0
-~~~
-{: .output}
-The above command, however, did not change the value of `weight_kg`:
-~~~
-print(weight_kg)
-~~~
-{: .language-python}
-~~~
-60.0
-~~~
-{: .output}
-To change the value of the `weight_kg` variable, we have to
-**assign** `weight_kg` a new value using the equals `=` sign:
-~~~
-weight_kg = 65.0
-print('weight in kilograms is now:', weight_kg)
-~~~
-{: .language-python}
-~~~
-weight in kilograms is now: 65.0
-~~~
-{: .output}
-> ## Variables as Sticky Notes
->
-> A variable is analogous to a sticky note with a name written on it:
-> assigning a value to a variable is like putting that sticky note on a particular value.
->
-> ![Value of 65.0 with weight_kg label stuck on it](../fig/python-sticky-note-variables-01.svg)
->
-> This means that assigning a value to one variable does **not** change
-> values of other variables.
-> For example, let's store the subject's weight in pounds in its own variable:
->
-> ~~~
-> # There are 2.2 pounds per kilogram
-> weight_lb = 2.2 * weight_kg
-> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> weight in kilograms: 65.0 and in pounds: 143.0
-> ~~~
-> {: .output}
->
-> ![Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it](../fig/python-sticky-note-variables-02.svg)
->
-> Let's now change `weight_kg`:
->
-> ~~~
-> weight_kg = 100.0
-> print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
-> ~~~
-> {: .output}
->
-> ![Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lb stuck on it](../fig/python-sticky-note-variables-03.svg)
->
-> Since `weight_lb` doesn't "remember" where its value comes from,
-> it is not updated when we change `weight_kg`.
-{: .callout}
-> ## Check Your Understanding
->
-> What values do the variables `mass` and `age` have after each statement in the following program?
-> Test your answers by executing the commands.
->
-> ~~~
-> mass = 47.5
-> age = 122
-> mass = mass * 2.0
-> age = age - 20
-> print(mass, age)
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > ~~~
-> > 95.0 102
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-> ## Sorting Out References
->
-> What does the following program print out?
->
-> ~~~
-> first, second = 'Grace', 'Hopper'
-> third, fourth = second, first
-> print(third, fourth)
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > ~~~
-> > Hopper Grace
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
--- a/_episodes/01-syntax.md
+++ b/_episodes/01-syntax.md
+---
+title: Syntax Elements & Powerful Functions
+teaching: 20
+exercises: 10
+questions:
+- "What elements of Python syntax might I see in other people's code?"
+- "How can I use these additional features of Python to take my code to the next level?"
+- "What built-in functions and standard library modules are recommended to improve my code?"
+objectives:
+- "write comprehensions to improve code readability and efficiency."
+- "call functions designed to make common tasks easier and faster."
+- "recognise all elements of modern Python syntax and explain their purpose."
+keypoints:
+- "Use comprehensions to efficiently create new iterables with fewer lines of code."
+- "Sets can be extremely useful when comparing collections of objects, and create significantly speed up your code."
+- "The `itertools` module includes many helpful functions for working with iterables."
+- "A decorator is a function that does something to the output of another function."
+---
+## plan
+- Renato currently scheduled to lead this session
+- comprehensions (list, dictionary, generators)
+  - `yield`
+- sets - `{1,2,3}`
+- function argument passing: `*`, `**`, `/`
+  - packing/unpacking and the catchall pattern `first, *, last = mylist`
+- multi-dimension slicing (numpy/pandas)
+- `dot` - i.e. `this.that`
+- `_`, `__` - single, double underscore (meaning)
+- context managers (`with`) - [contextlib](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)
+- things you might see, including new features
+  - `@not_twitter` - decorators
+  - `import typing` - type annotations / hints
+  - `:=` - "walrus" operator
+  - `async`/`await` - `yield from`
+- commonly used functions
+  - `zip()`
+  - `set()`
+  - `enumerate()`
+  - `itertools.*`
+  - (?) `functools.partial` - needs a good realistic example
+- (?) honorable mentions - useful modules
+  - `plotnine`
+## Notes on how to use this lesson template
+See the [Lesson Example][lesson-example]
+and [The Carpentries Curriculum Development Handbook][cdh]
+for full details.
+Below should be all the things you need to know right now...
+### Creating pages
+- Write material with [Markdown][markdown-cheatsheet]
+  - markdown files will be rendered as HTML pages and included in the built site
+- `.md` files in the `_episodes` folder will be added to the Episodes dropdown, page navigation, etc
+- Markdown files must include _front matter_: metadata specified in a YAML header bounded by `---`
+- At minimum, this must include a `title` field
+~~~
+---
+title: The Title of the Section
+---
+~~~
+{: .source}
+- but really your episodes (lesson sections) should include:
+  - an estimate of time required for teaching & exercises
+  - main questions answered in the section
+  - learning objectives
+  - key points to summarise what's covered in the section (these end are added at the end of the lession section)
+- as an example, below is the front matter for this page
+~~~
+---
+title: Syntax Elements & Powerful Functions
+teaching: 20
+exercises: 10
+questions:
+- "What elements of Python syntax might I see in other people's code?"
+- "How can I use these additional features of Python to take my code to the next level?"
+- "What built-in functions and standard library modules are recommended to improve my code?"
+objectives:
+- "write comprehensions to improve code readability and efficiency."
+- "call functions designed to make common tasks easier and faster."
+- "recognise all elements of modern Python syntax and explain their purpose."
+keypoints:
+- "Use comprehensions to efficiently create new iterables with fewer lines of code."
+- "Sets can be extremely useful when comparing collections of objects, and create significantly speed up your code."
+- "The `itertools` module includes many helpful functions for working with iterables."
+- "A decorator is a function that does something to the output of another function."
+---
+~~~
+{: .source}
+## Code blocks
+code snippets written like this
+{% raw %}
+    ~~~
+    print(weight_kg)
+    ~~~
+    {: .language-python}
+    ~~~
+    60.0
+    ~~~
+    {: .output}
+{% endraw %}
+will produce formatted blocks like this:
+~~~
+print(weight_kg)
+~~~
+{: .language-python}
+~~~
+60.0
+~~~
+{: .output}
+## Special blockquotes
+- The lesson template also includes a range of styled boxes
+  - examples for exercises and callouts below
+  - see [this section][lesson-example-blockquotes] of The Carpentries Lesson Example for the full list
+A callout block written like this
+~~~
+> ## Callout block example
+>
+> Write callout blocks as blockquotes,
+> with a styling tag (techincal term is a _class identifier_) at the end.
+>
+> ~~~
+> # you can still include code blocks in the callout
+> weight_lb = 2.2 * weight_kg
+> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
+> ~~~
+> {: .language-python}
+>
+> Use callouts for asides and comments -
+> anything that provides additional detail to the core of your material
+{: .callout}
+~~~
+{: .source}
+will be rendered like this:
+> ## Callout block example
+>
+> Write callout blocks as blockquotes,
+> with a styling tag (techincal term is a _class identifier_) at the end.
+>
+> ~~~
+> # you can still include code blocks in the callout
+> weight_lb = 2.2 * weight_kg
+> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
+> ~~~
+> {: .language-python}
+>
+> Use callouts for asides and comments -
+> anything that provides additional detail to the core of your material
+{: .callout}
+Similarly, exercises written like this
+~~~
+> ## Sorting Out References
+>
+> What does the following program print out?
+>
+> ~~~
+> first, second = 'Grace', 'Hopper'
+> third, fourth = second, first
+> print(third, fourth)
+> ~~~
+> {: .language-python}
+>
+> > ## Solution
+> >
+> > This text will only be visible if the solution is expanded
+> > ~~~
+> > Hopper Grace
+> > ~~~
+> > {: .output}
+> {: .solution}
+{: .challenge}
+~~~
+{: .source}
+will be rendered like this (note the expandable box containing the solution):
+> ## Sorting Out References
+>
+> What does the following program print out?
+>
+> ~~~
+> first, second = 'Grace', 'Hopper'
+> third, fourth = second, first
+> print(third, fourth)
+> ~~~
+> {: .language-python}
+>
+> > ## Solution
+> >
+> > This text will only be visible if the solution is expanded
+> > ~~~
+> > Hopper Grace
+> > ~~~
+> > {: .output}
+> {: .solution}
+{: .challenge}
+## Shared link references
+- Lastly, the last line in every `.md` file for each page should be
+{% raw %}
+`{% include links.md %}`
+{% endraw %}
+- This allows us to share link references across the entire site, which makes the links much more maintainable.
+  - link URLs should be put in the `_includes/links.md` file (ideally, arranged alphabetically by reference)
+  - you can then write Markdown links "reference-style" i.e. `[link text to be displayed][reference-id]`, with `[reference-id]: https://link.to.page` in `_includes/links.md`
+{% include links.md %}
--- a/_episodes/02-data.md
+++ b/_episodes/02-data.md
+---
+title: Working with Data
+teaching: 20
+exercises: 10
+questions:
+- "How should I work with numeric data in Python?"
+- "What's the recommended way to handle and analyse tabular data?"
+- "How can I import tabular data for analysis in Python and export the results?"
+objectives:
+- "handle and summarise numeric data with Numpy."
+- "filter values in their data based on a range of conditions."
+- "load tabular data into a Pandas dataframe object."
+- "describe what is meant by the data type of an array/series, and the impact this has on how the data is handled."
+- "add and remove columns from a dataframe."
+- "select, aggregate, and visualise data in a dataframe."
+keypoints:
+- "Specialised third-party libraries such as Numpy and Pandas provide powerful objects and functions that can help us analyse our data."
+- "Pandas dataframe objects allow us to efficiently load and handle large tabular data."
+- "Use the `pandas.read_csv` and `pandas.write_csv` functions to read and write tabular data."
+---
+## plan
+- Toby currently scheduled to lead this session
+- Numpy
+  - arrays
+  - masking
+  - aside about data types and potential hazards
+  - reading data from a file (with note that more will come later on this topic)
+  - link to existing image analysis material
+- Pandas
+  - when an array just isn't enough
+  - DataFrames - re-use material from [Software Carpentry][swc-python-gapminder]?
+    - ideally with a more relevant example dataset... [maybe a COVID one](https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data/resource/260bbbde-2316-40eb-aec3-7cd7bfc2f590)
+    - include an aside about I/O - reading/writing files (pandas (the `.to_*()` methods and highlight some: `csv`, `json`, `feather`, `hdf`), numpy, `open()`, (?) bytes vs strings, (?) encoding)
+  - Finish with example of `df.plot()` to set the scene for plotting section
+{% include links.md %}
--- a/_episodes/02-numpy.md
+++ b/_episodes/02-numpy.md
--- a/_episodes/03-matplotlib.md
+++ b/_episodes/03-matplotlib.md
---
-title: Visualizing Tabular Data
-teaching: 30
-exercises: 20
-questions:
- "How can I visualize tabular data in Python?"
- "How can I group several plots together?"
-objectives:
- "Plot simple graphs from data."
- "Group several graphs in a single figure."
-keypoints:
- "Use the `pyplot` module from the `matplotlib` library for creating simple visualizations."
---
-## Visualizing data
-The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers," and
-the best way to develop insight is often to visualize data.  Visualization deserves an entire
-lecture of its own, but we can explore a few features of Python's `matplotlib` library here.  While
-there is no official plotting library, `matplotlib` is the _de facto_ standard.  First, we will
-import the `pyplot` module from `matplotlib` and use two of its functions to create and display a
-heat map of our data:
-~~~
-import matplotlib.pyplot
-image = matplotlib.pyplot.imshow(data)
-matplotlib.pyplot.show()
-~~~
-{: .language-python}
-![Heatmap of the Data](../fig/inflammation-01-imshow.svg)
-Blue pixels in this heat map represent low values, while yellow pixels represent high values.  As we
-can see, inflammation rises and falls over a 40-day period.  Let's take a look at the average inflammation over time:
-~~~
-ave_inflammation = numpy.mean(data, axis=0)
-ave_plot = matplotlib.pyplot.plot(ave_inflammation)
-matplotlib.pyplot.show()
-~~~
-{: .language-python}
-![Average Inflammation Over Time](../fig/inflammation-01-average.svg)
-Here, we have put the average inflammation per day across all patients in the variable `ave_inflammation`, then
-asked `matplotlib.pyplot` to create and display a line graph of those values.  The result is a
-roughly linear rise and fall, which is suspicious: we might instead expect a sharper rise and slower
-fall.  Let's have a look at two other statistics:
-~~~
-max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
-matplotlib.pyplot.show()
-~~~
-{: .language-python}
-![Maximum Value Along The First Axis](../fig/inflammation-01-maximum.svg)
-~~~
-min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
-matplotlib.pyplot.show()
-~~~
-{: .language-python}
-![Minimum Value Along The First Axis](../fig/inflammation-01-minimum.svg)
-The maximum value rises and falls smoothly, while the minimum seems to be a step function.  Neither
-trend seems particularly likely, so either there's a mistake in our calculations or something is
-wrong with our data.  This insight would have been difficult to reach by examining the numbers
-themselves without visualization tools.
-### Grouping plots
-You can group similar plots in a single figure using subplots.
-This script below uses a number of new commands. The function `matplotlib.pyplot.figure()`
-creates a space into which we will place all of our plots. The parameter `figsize`
-tells Python how big to make this space. Each subplot is placed into the figure using
-its `add_subplot` [method]({{ page.root }}/reference/#method). The `add_subplot` method takes 3
-parameters. The first denotes how many total rows of subplots there are, the second parameter
-refers to the total number of subplot columns, and the final parameter denotes which subplot
-your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a
-different variable (`axes1`, `axes2`, `axes3`). Once a subplot is created, the axes can
-be titled using the `set_xlabel()` command (or `set_ylabel()`).
-Here are our three plots side by side:
-~~~
-import numpy
-import matplotlib.pyplot
-data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-axes1 = fig.add_subplot(1, 3, 1)
-axes2 = fig.add_subplot(1, 3, 2)
-axes3 = fig.add_subplot(1, 3, 3)
-axes1.set_ylabel('average')
-axes1.plot(numpy.mean(data, axis=0))
-axes2.set_ylabel('max')
-axes2.plot(numpy.max(data, axis=0))
-axes3.set_ylabel('min')
-axes3.plot(numpy.min(data, axis=0))
-fig.tight_layout()
-matplotlib.pyplot.show()
-~~~
-{: .language-python}
-![The Previous Plots as Subplots](../fig/inflammation-01-group-plot.svg)
-The [call]({{ page.root }}/reference/#function-call) to `loadtxt` reads our data,
-and the rest of the program tells the plotting library
-how large we want the figure to be,
-that we're creating three subplots,
-what to draw for each one,
-and that we want a tight layout.
-(If we leave out that call to `fig.tight_layout()`,
-the graphs will actually be squeezed together more closely.)
-> ## Plot Scaling
->
-> Why do all of our plots stop just short of the upper end of our graph?
->
-> > ## Solution
-> > Because matplotlib normally sets x and y axes limits to the min and max of our data
-> > (depending on data range)
-> {: .solution}
->
-> If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes',
-> for example:
->
-> ~~~
-> axes3.set_ylim(0,6)
-> ~~~
-> {: .language-python}
->
-> Update your plotting code to automatically set a more appropriate scale.
-> (Hint: you can make use of the `max` and `min` methods to help.)
->
-> > ## Solution
-> > ~~~
-> > # One method
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(data, axis=0))
-> > axes3.set_ylim(0,6)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
->
-> > ## Solution
-> > ~~~
-> > # A more automated approach
-> > min_data = numpy.min(data, axis=0)
-> > axes3.set_ylabel('min')
-> > axes3.plot(min_data)
-> > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Drawing Straight Lines
->
-> In the center and right subplots above, we expect all lines to look like step functions because
-> non-integer value are not realistic for the minimum and maximum values. However, you can see
-> that the lines are not always vertical or horizontal, and in particular the step function
-> in the subplot on the right looks slanted. Why is this?
->
-> > ## Solution
-> > Because matplotlib interpolates (draws a straight line) between the points.
-> > One way to do avoid this is to use the Matplotlib `drawstyle` option:
-> >
-> > ~~~
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-> >
-> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-> >
-> > axes1 = fig.add_subplot(1, 3, 1)
-> > axes2 = fig.add_subplot(1, 3, 2)
-> > axes3 = fig.add_subplot(1, 3, 3)
-> >
-> > axes1.set_ylabel('average')
-> > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid')
-> >
-> > axes2.set_ylabel('max')
-> > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid')
-> >
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid')
-> >
-> > fig.tight_layout()
-> >
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> ![Plot with step lines](../fig/inflammation-01-line-styles.svg)
-> {: .solution}
-{: .challenge}
-> ## Make Your Own Plot
->
-> Create a plot showing the standard deviation (`numpy.std`)
-> of the inflammation data for each day across all patients.
->
-> > ## Solution
-> > ~~~
-> > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0))
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Moving Plots Around
->
-> Modify the program to display the three plots on top of one another
-> instead of side by side.
->
-> > ## Solution
-> > ~~~
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-> >
-> > # change figsize (swap width and height)
-> > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0))
-> >
-> > # change add_subplot (swap first two parameters)
-> > axes1 = fig.add_subplot(3, 1, 1)
-> > axes2 = fig.add_subplot(3, 1, 2)
-> > axes3 = fig.add_subplot(3, 1, 3)
-> >
-> > axes1.set_ylabel('average')
-> > axes1.plot(numpy.mean(data, axis=0))
-> >
-> > axes2.set_ylabel('max')
-> > axes2.plot(numpy.max(data, axis=0))
-> >
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(data, axis=0))
-> >
-> > fig.tight_layout()
-> >
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
--- a/_episodes/03-plotting.md
+++ b/_episodes/03-plotting.md
+---
+title: Plotting Data
+teaching: 20
+exercises: 10
+questions:
+- "How can I create publication-ready figures with Python?"
+objectives:
+- "plot data in a Matplotlib figure."
+- "create multi-panelled figures."
+- "export figures in a variety of image formats."
+- "use interactive features of Jupyter to make it easier to fine-tune a plot."
+keypoints:
+- "Matplotlib is a powerful plotting library for Python."
+- "It can also be annoyingly fiddly. Jupyter can help with this."
+---
+## plan
+- Renato currently scheduled to lead this session
+- Matplotlib
+  - The multiple matplotlib interfaces: `pyplot` vs `OO API` vs obsolete `pylab`. See [this](https://matplotlib.org/api/index.html#usage-patterns)
+  - Concepts:
+    - [Artists & containers](https://matplotlib.org/tutorials/intermediate/artists.html#artist-tutorial)
+        - Figure
+        - Axes
+        - Axis & Ticks
+        - ...
+    - Sub-plots
+    - Labels
+    - Annotations
+    - Saving formats
+    - Jupyter integration(?)
+    - 3D plotting (?)
+    - Interactive plots (?)
+- plotting from pandas
+- altair? & plotnine?
+{% include links.md %}
--- a/_episodes/04-argparse.md
+++ b/_episodes/04-argparse.md
+---
+title: Parsing Command Line Arguments
+teaching: 20
+exercises: 10
+questions:
+- "How can I access arguments passed to my Python script at runtime?"
+- "How can I create a sophisticated command line interface for my script?"
+- "How can I provide the user with more information about how to run my code?"
+objectives:
+- "access command line arguments with `sys.argv`."
+- "parse and use arguments and options with `argparse`."
+- "create a comprehensive usage statement for their script."
+keypoints:
+- "Positional command line arguments can be accessed from inside a script through the `sys.argv` object."
+- "The `argparse` module allows us to create extensive and powerful command line interfaces for our scripts."
+- "`argparse` also constructs a standardised usage statement according to the parser's configuration."
+---
+## plan
+- Toby currently scheduled to lead this session
+- `sys.argv`
+- `argparse`
+  - positonal arguments
+  - options
+  - capturing multiple items in a single argument
+  - usage statements
+  - help text
+- [`docopt`](http://docopt.org/) (?)
+- [`click`](https://click.palletsprojects.com/en/7.x/) (?)
+- [comparison of argparse, docopt and click](https://realpython.com/comparing-python-command-line-parsing-libraries-argparse-docopt-click/)
+{% include links.md %}
--- a/_episodes/04-loop.md
+++ b/_episodes/04-loop.md
---
-title: Repeating Actions with Loops
-teaching: 30
-exercises: 0
-questions:
- "How can I do the same operations on many different values?"
-objectives:
- "Explain what a `for` loop does."
- "Correctly write `for` loops to repeat simple calculations."
- "Trace changes to a loop variable as the loop runs."
- "Trace changes to other variables as they are updated by a `for` loop."
-keypoints:
- "Use `for variable in sequence` to process the elements of a sequence one at a time."
- "The body of a `for` loop must be indented."
- "Use `len(thing)` to determine the length of something that contains other values."
---
-In the last episode, we wrote Python code that plots values of interest from our first
-inflammation dataset (`inflammation-01.csv`), which revealed some suspicious features in it.
-![Analysis of inflammation-01.csv](../fig/03-loop_2_0.png)
-We have a dozen data sets right now, though, and more on the way.
-We want to create plots for all of our data sets with a single statement.
-To do that, we'll have to teach the computer how to repeat things.
-An example task that we might want to repeat is printing each character in a
-word on a line of its own.
-~~~
-word = 'lead'
-~~~
-{: .language-python}
-In Python, a string is basically an ordered collection of characters, and every
-character has a unique number associated with it -- its index. This means that
-we can access characters in a string using their indices.
-For example, we can get the first character of the word `'lead'`, by using
-`word[0]`. One way to print each character is to use four `print` statements:
-~~~
-print(word[0])
-print(word[1])
-print(word[2])
-print(word[3])
-~~~
-{: .language-python}
-~~~
-l
-e
-a
-d
-~~~
-{: .output}
-This is a bad approach for three reasons:
-1.  **Not scalable**. Imagine you need to print characters of a string that is hundreds
-    of letters long.  It might be easier to type them in manually.
-2.  **Difficult to maintain**. If we want to decorate each printed character with an
-    asterisk or any other character, we would have to change four lines of code. While
-    this might not be a problem for short strings, it would definitely be a problem for
-    longer ones.
-3.  **Fragile**. If we use it with a word that has more characters than what we initially
-    envisioned, it will only display part of the word's characters. A shorter string, on
-    the other hand, will cause an error because it will be trying to display part of the
-    string that doesn't exist.
-~~~
-word = 'tin'
-print(word[0])
-print(word[1])
-print(word[2])
-print(word[3])
-~~~
-{: .language-python}
-~~~
-t
-i
-n
-~~~
-{: .output}
-~~~
---------------------------------------------------------------------------
-IndexError                                Traceback (most recent call last)
-<ipython-input-3-7974b6cdaf14> in <module>()
-      3 print(word[1])
-      4 print(word[2])
----> 5 print(word[3])
-IndexError: string index out of range
-~~~
-{: .error}
-Here's a better approach:
-~~~
-word = 'lead'
-for char in word:
-    print(char)
-~~~
-{: .language-python}
-~~~
-l
-e
-a
-d
-~~~
-{: .output}
-This is shorter --- certainly shorter than something that prints every character in a
-hundred-letter string --- and more robust as well:
-~~~
-word = 'oxygen'
-for char in word:
-    print(char)
-~~~
-{: .language-python}
-~~~
-o
-x
-y
-g
-e
-n
-~~~
-{: .output}
-The improved version uses a [for loop]({{ page.root }}/reference/#for-loop)
-to repeat an operation --- in this case, printing --- once for each thing in a sequence.
-The general form of a loop is:
-~~~
-for variable in collection:
-    # do things using variable, such as print
-~~~
-{: .language-python}
-Using the oxygen example above, the loop might look like this:
-![loop_image](../fig/loops_image.png)
-where each character (`char`) in the variable `word` is looped through and printed one character
-after another. The numbers in the diagram denote which loop cycle the character was printed in (1
-being the first loop, and 6 being the final loop).
-We can call the [loop variable]({{ page.root }}/reference/#loop-variable) anything we like, but
-there must be a colon at the end of the line starting the loop, and we must indent anything we
-want to run inside the loop. Unlike many other languages, there is no command to signify the end
-of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.
-> ## What's in a name?
->
->
-> In the example above, the loop variable was given the name `char` as a mnemonic;
-> it is short for 'character'.  We can choose any name we want for variables.
-> We can even call our loop variable `banana`, as long as we use this name consistently:
->
-> ~~~
-> word = 'oxygen'
-> for banana in word:
->     print(banana)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> o
-> x
-> y
-> g
-> e
-> n
-> ~~~
-> {: .output}
->
-> It is a good idea to choose variable names that are meaningful, otherwise it would be more
-> difficult to understand what the loop is doing.
-{: .callout}
-Here's another loop that repeatedly updates a variable:
-~~~
-length = 0
-for vowel in 'aeiou':
-    length = length + 1
-print('There are', length, 'vowels')
-~~~
-{: .language-python}
-~~~
-There are 5 vowels
-~~~
-{: .output}
-It's worth tracing the execution of this little program step by step.
-Since there are five characters in `'aeiou'`,
-the statement on line 3 will be executed five times.
-The first time around,
-`length` is zero (the value assigned to it on line 1)
-and `vowel` is `'a'`.
-The statement adds 1 to the old value of `length`,
-producing 1,
-and updates `length` to refer to that new value.
-The next time around,
-`vowel` is `'e'` and `length` is 1,
-so `length` is updated to be 2.
-After three more updates,
-`length` is 5;
-since there is nothing left in `'aeiou'` for Python to process,
-the loop finishes
-and the `print` statement on line 4 tells us our final answer.
-Note that a loop variable is a variable that's being used to record progress in a loop.
-It still exists after the loop is over,
-and we can re-use variables previously defined as loop variables as well:
-~~~
-letter = 'z'
-for letter in 'abc':
-    print(letter)
-print('after the loop, letter is', letter)
-~~~
-{: .language-python}
-~~~
-a
-b
-c
-after the loop, letter is c
-~~~
-{: .output}
-Note also that finding the length of a string is such a common operation
-that Python actually has a built-in function to do it called `len`:
-~~~
-print(len('aeiou'))
-~~~
-{: .language-python}
-~~~
-5
-~~~
-{: .output}
-`len` is much faster than any function we could write ourselves,
-and much easier to read than a two-line loop;
-it will also give us the length of many other things that we haven't met yet,
-so we should always use it when we can.
-> ## From 1 to N
->
-> Python has a built-in function called `range` that generates a sequence of numbers. `range` can
-> accept 1, 2, or 3 parameters.
->
-> * If one parameter is given, `range` generates a sequence of that length,
->   starting at zero and incrementing by 1.
->   For example, `range(3)` produces the numbers `0, 1, 2`.
-> * If two parameters are given, `range` starts at
->   the first and ends just before the second, incrementing by one.
->   For example, `range(2, 5)` produces `2, 3, 4`.
-> * If `range` is given 3 parameters,
->   it starts at the first one, ends just before the second one, and increments by the third one.
->   For example, `range(3, 10, 2)` produces `3, 5, 7, 9`.
->
-> Using `range`,
-> write a loop that uses `range` to print the first 3 natural numbers:
->
-> ~~~
-> 1
-> 2
-> 3
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > ~~~
-> > for number in range(1, 4):
-> >     print(number)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Understanding the loops
->
-> Given the following loop:
-> ~~~
-> word = 'oxygen'
-> for char in word:
->     print(char)
-> ~~~
-> {: .language-python}
->
-> How many times is the body of the loop executed?
->
-> * 3 times
-> * 4 times
-> * 5 times
-> * 6 times
->
-> > ## Solution
-> >
-> > The body of the loop is executed 6 times.
-> >
-> {: .solution}
-{: .challenge}
-> ## Computing Powers With Loops
->
-> Exponentiation is built into Python:
->
-> ~~~
-> print(5 ** 3)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> 125
-> ~~~
-> {: .output}
->
-> Write a loop that calculates the same result as `5 ** 3` using
-> multiplication (and without exponentiation).
->
-> > ## Solution
-> > ~~~
-> > result = 1
-> > for number in range(0, 3):
-> >     result = result * 5
-> > print(result)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Reverse a String
->
-> Knowing that two strings can be concatenated using the `+` operator,
-> write a loop that takes a string
-> and produces a new string with the characters in reverse order,
-> so `'Newton'` becomes `'notweN'`.
->
-> > ## Solution
-> > ~~~
-> > newstring = ''
-> > oldstring = 'Newton'
-> > for char in oldstring:
-> >     newstring = char + newstring
-> > print(newstring)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Computing the Value of a Polynomial
->
-> The built-in function `enumerate` takes a sequence (e.g. a list) and generates a
-> new sequence of the same length. Each element of the new sequence is a pair composed of the index
-> (0, 1, 2,...) and the value from the original sequence:
->
-> ~~~
-> for idx, val in enumerate(a_list):
->     # Do something using idx and val
-> ~~~
-> {: .language-python}
->
-> The code above loops through `a_list`, assigning the index to `idx` and the value to `val`.
->
-> Suppose you have encoded a polynomial as a list of coefficients in
-> the following way: the first element is the constant term, the
-> second element is the coefficient of the linear term, the third is the
-> coefficient of the quadratic term, etc.
->
-> ~~~
-> x = 5
-> coefs = [2, 4, 3]
-> y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
-> print(y)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> 97
-> ~~~
-> {: .output}
->
-> Write a loop using `enumerate(coefs)` which computes the value `y` of any
-> polynomial, given `x` and `coefs`.
->
-> > ## Solution
-> > ~~~
-> > y = 0
-> > for idx, coef in enumerate(coefs):
-> >     y = y + coef * x**idx
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
--- a/_episodes/05-lists.md
+++ b/_episodes/05-lists.md
---
-title: Storing Multiple Values in Lists
-teaching: 30
-exercises: 15
-questions:
- "How can I store many values together?"
-objectives:
- "Explain what a list is."
- "Create and index lists of simple values."
- "Change the values of individual elements"
- "Append values to an existing list"
- "Reorder and slice list elements"
- "Create and manipulate nested lists"
-keypoints:
- "`[value1, value2, value3, ...]` creates a list."
- "Lists can contain any Python object, including lists (i.e., list of lists)."
- "Lists are indexed and sliced with square brackets (e.g., list[0] and
-list[2:9]), in the same way as strings and arrays."
- "Lists are mutable (i.e., their values can be changed in place)."
- "Strings are immutable (i.e., the characters in them cannot be changed)."
---
-Similar to a string that can contain many characters, a list is a container that can store many values.
-Unlike NumPy arrays,
-lists are built into the language (so we don't have to load a library
-to use them).
-We create a list by putting values inside square brackets and separating the values with commas:
-~~~
-odds = [1, 3, 5, 7]
-print('odds are:', odds)
-~~~
-{: .language-python}
-~~~
-odds are: [1, 3, 5, 7]
-~~~
-{: .output}
-We can access elements of a list using indices -- numbered positions of elements in the list.
-These positions are numbered starting at 0, so the first element has an index of 0.
-~~~
-print('first element:', odds[0])
-print('last element:', odds[3])
-print('"-1" element:', odds[-1])
-~~~
-{: .language-python}
-~~~
-first element: 1
-last element: 7
-"-1" element: 7
-~~~
-{: .output}
-Yes, we can use negative numbers as indices in Python. When we do so, the index `-1` gives us the
-last element in the list, `-2` the second to last, and so on.
-Because of this, `odds[3]` and `odds[-1]` point to the same element here.
-If we loop over a list, the loop variable is assigned to its elements one at a time:
-~~~
-for number in odds:
-    print(number)
-~~~
-{: .language-python}
-~~~
-1
-3
-5
-7
-~~~
-{: .output}
-There is one important difference between lists and strings:
-we can change the values in a list,
-but we cannot change individual characters in a string.
-For example:
-~~~
-names = ['Curie', 'Darwing', 'Turing']  # typo in Darwin's name
-print('names is originally:', names)
-names[1] = 'Darwin'  # correct the name
-print('final value of names:', names)
-~~~
-{: .language-python}
-~~~
-names is originally: ['Curie', 'Darwing', 'Turing']
-final value of names: ['Curie', 'Darwin', 'Turing']
-~~~
-{: .output}
-works, but:
-~~~
-name = 'Darwin'
-name[0] = 'd'
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-TypeError                                 Traceback (most recent call last)
-<ipython-input-8-220df48aeb2e> in <module>()
-      1 name = 'Darwin'
----> 2 name[0] = 'd'
-TypeError: 'str' object does not support item assignment
-~~~
-{: .error}
-does not.
-> ## Ch-Ch-Ch-Ch-Changes
->
-> Data which can be modified in place is called [mutable]({{ page.root }}/reference/#mutable),
-> while data which cannot be modified is called [immutable]({{ page.root }}/reference/#immutable).
-> Strings and numbers are immutable. This does not mean that variables with string or number values
-> are constants, but when we want to change the value of a string or number variable, we can only
-> replace the old value with a completely new value.
->
-> Lists and arrays, on the other hand, are mutable: we can modify them after they have been
-> created. We can change individual elements, append new elements, or reorder the whole list. For
-> some operations, like sorting, we can choose whether to use a function that modifies the data
-> in-place or a function that returns a modified copy and leaves the original unchanged.
->
-> Be careful when modifying data in-place. If two variables refer to the same list, and you modify
-> the list value, it will change for both variables!
->
-> ~~~
-> salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
-> my_salsa = salsa        # <-- my_salsa and salsa point to the *same* list data in memory
-> salsa[0] = 'hot peppers'
-> print('Ingredients in my salsa:', my_salsa)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> Ingredients in my salsa: ['hot peppers', 'onions', 'cilantro', 'tomatoes']
-> ~~~
-> {: .output}
->
-> If you want variables with mutable values to be independent, you
-> must make a copy of the value when you assign it.
->
-> ~~~
-> salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
-> my_salsa = list(salsa)        # <-- makes a *copy* of the list
-> salsa[0] = 'hot peppers'
-> print('Ingredients in my salsa:', my_salsa)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> Ingredients in my salsa: ['peppers', 'onions', 'cilantro', 'tomatoes']
-> ~~~
-> {: .output}
->
-> Because of pitfalls like this, code which modifies data in place can be more difficult to
-> understand. However, it is often far more efficient to modify a large data structure in place
-> than to create a modified copy for every small change. You should consider both of these aspects
-> when writing your code.
-{: .callout}
-> ## Nested Lists
-> Since a list can contain any Python variables, it can even contain other lists.
->
-> For example, we could represent the products in the shelves of a small grocery shop:
->
-> ~~~
-> x = [['pepper', 'zucchini', 'onion'],
->      ['cabbage', 'lettuce', 'garlic'],
->      ['apple', 'pear', 'banana']]
-> ~~~
-> {: .language-python}
->
-> Here is a visual example of how indexing a list of lists `x` works:
->
-> [![x is represented as a pepper shaker containing several packets of pepper. [x[0]] is represented
-> as a pepper shaker containing a single packet of pepper. x[0] is represented as a single packet of
-> pepper. x[0][0] is represented as single grain of pepper.  Adapted 
-> from @hadleywickham.](../fig/indexing_lists_python.png)][hadleywickham-tweet]
->
-> Using the previously declared list `x`, these would be the results of the
-> index operations shown in the image:
->
-> ~~~
-> print([x[0]])
-> ~~~
-> {: .language-python}
->
-> ~~~
-> [['pepper', 'zucchini', 'onion']]
-> ~~~
-> {: .output}
->
-> ~~~
-> print(x[0])
-> ~~~
-> {: .language-python}
->
-> ~~~
-> ['pepper', 'zucchini', 'onion']
-> ~~~
-> {: .output}
->
-> ~~~
-> print(x[0][0])
-> ~~~
-> {: .language-python}
->
-> ~~~
-> 'pepper'
-> ~~~
-> {: .output}
->
-> Thanks to [Hadley Wickham][hadleywickham-tweet]
-> for the image above.
-{: .callout}
-> ## Heterogeneous Lists
-> Lists in Python can contain elements of different types. Example:
-> ~~~
-> sample_ages = [10, 12.5, 'Unknown']
-> ~~~
-> {: .language-python}
-{: .callout}
-There are many ways to change the contents of lists besides assigning new values to
-individual elements:
-~~~
-odds.append(11)
-print('odds after adding a value:', odds)
-~~~
-{: .language-python}
-~~~
-odds after adding a value: [1, 3, 5, 7, 11]
-~~~
-{: .output}
-~~~
-removed_element = odds.pop(0)
-print('odds after removing the first element:', odds)
-print('removed_element:', removed_element)
-~~~
-{: .language-python}
-~~~
-odds after removing the first element: [3, 5, 7, 11]
-removed_element: 1
-~~~
-{: .output}
-~~~
-odds.reverse()
-print('odds after reversing:', odds)
-~~~
-{: .language-python}
-~~~
-odds after reversing: [11, 7, 5, 3]
-~~~
-{: .output}
-While modifying in place, it is useful to remember that Python treats lists in a slightly
-counter-intuitive way.
-As we saw earlier, when we modified the `salsa` list item in-place, if we make a list, (attempt to) copy it and then modify this list, we can cause all sorts of trouble. This also applies to modifying the list using the above functions:
-~~~
-odds = [1, 3, 5, 7]
-primes = odds
-primes.append(2)
-print('primes:', primes)
-print('odds:', odds)
-~~~
-{: .language-python}
-~~~
-primes: [1, 3, 5, 7, 2]
-odds: [1, 3, 5, 7, 2]
-~~~
-{: .output}
-This is because Python stores a list in memory, and then can use multiple names to refer to the
-same list. If all we want to do is copy a (simple) list, we can again use the `list` function, so we do
-not modify a list we did not mean to:
-~~~
-odds = [1, 3, 5, 7]
-primes = list(odds)
-primes.append(2)
-print('primes:', primes)
-print('odds:', odds)
-~~~
-{: .language-python}
-~~~
-primes: [1, 3, 5, 7, 2]
-odds: [1, 3, 5, 7]
-~~~
-{: .output}
-> ## Turn a String Into a List
->
-> Use a for-loop to convert the string "hello" into a list of letters:
->
-> ~~~
-> ["h", "e", "l", "l", "o"]
-> ~~~
-> {: .language-python}
->
-> Hint: You can create an empty list like this:
->
-> ~~~
-> my_list = []
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > ~~~
-> > my_list = []
-> > for char in "hello":
-> >     my_list.append(char)
-> > print(my_list)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-Subsets of lists and strings can be accessed by specifying ranges of values in brackets,
-similar to how we accessed ranges of positions in a NumPy array.
-This is commonly referred to as "slicing" the list/string.
-~~~
-binomial_name = "Drosophila melanogaster"
-group = binomial_name[0:10]
-print("group:", group)
-species = binomial_name[11:23]
-print("species:", species)
-chromosomes = ["X", "Y", "2", "3", "4"]
-autosomes = chromosomes[2:5]
-print("autosomes:", autosomes)
-last = chromosomes[-1]
-print("last:", last)
-~~~
-{: .language-python}
-~~~
-group: Drosophila
-species: melanogaster
-autosomes: ["2", "3", "4"]
-last: 4
-~~~
-{: .output}
-> ## Slicing From the End
->
-> Use slicing to access only the last four characters of a string or entries of a list.
->
-> ~~~
-> string_for_slicing = "Observation date: 02-Feb-2013"
-> list_for_slicing = [["fluorine", "F"],
->                     ["chlorine", "Cl"],
->                     ["bromine", "Br"],
->                     ["iodine", "I"],
->                     ["astatine", "At"]]
-> ~~~
-> {: .language-python}
->
-> ~~~
-> "2013"
-> [["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
-> ~~~
-> {: .output}
->
-> Would your solution work regardless of whether you knew beforehand
-> the length of the string or list
-> (e.g. if you wanted to apply the solution to a set of lists of different lengths)?
-> If not, try to change your approach to make it more robust.
->
-> Hint: Remember that indices can be negative as well as positive
->
-> > ## Solution
-> > Use negative indices to count elements from the end of a container (such as list or string):
-> >
-> > ~~~
-> > string_for_slicing[-4:]
-> > list_for_slicing[-4:]
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Non-Continuous Slices
->
-> So far we've seen how to use slicing to take single blocks
-> of successive entries from a sequence.
-> But what if we want to take a subset of entries
-> that aren't next to each other in the sequence?
->
-> You can achieve this by providing a third argument
-> to the range within the brackets, called the _step size_.
-> The example below shows how you can take every third entry in a list:
->
-> ~~~
-> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
-> subset = primes[0:12:3]
-> print("subset", subset)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> subset [2, 7, 17, 29]
-> ~~~
-> {: .output}
->
-> Notice that the slice taken begins with the first entry in the range,
-> followed by entries taken at equally-spaced intervals (the steps) thereafter.
-> If you wanted to begin the subset with the third entry,
-> you would need to specify that as the starting point of the sliced range:
->
-> ~~~
-> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
-> subset = primes[2:12:3]
-> print("subset", subset)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> subset [5, 13, 23, 37]
-> ~~~
-> {: .output}
->
-> Use the step size argument to create a new string
-> that contains only every other character in the string
-> "In an octopus's garden in the shade"
->
-> ~~~
-> beatles = "In an octopus's garden in the shade"
-> ~~~
-> {: .language-python}
->
-> ~~~
-> I notpssgre ntesae
-> ~~~
-> {: .output}
->
-> > ## Solution
-> > To obtain every other character you need to provide a slice with the step
-> > size of 2:
-> >
-> > ~~~
-> > beatles[0:35:2]
-> > ~~~
-> > {: .language-python}
-> >
-> > You can also leave out the beginning and end of the slice to take the whole string
-> > and provide only the step argument to go every second
-> > element:
-> >
-> > ~~~
-> > beatles[::2]
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-If you want to take a slice from the beginning of a sequence, you can omit the first index in the
-range:
-~~~
-date = "Monday 4 January 2016"
-day = date[0:6]
-print("Using 0 to begin range:", day)
-day = date[:6]
-print("Omitting beginning index:", day)
-~~~
-{: .language-python}
-~~~
-Using 0 to begin range: Monday
-Omitting beginning index: Monday
-~~~
-{: .output}
-And similarly, you can omit the ending index in the range to take a slice to the very end of the
-sequence:
-~~~
-months = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]
-sond = months[8:12]
-print("With known last position:", sond)
-sond = months[8:len(months)]
-print("Using len() to get last entry:", sond)
-sond = months[8:]
-print("Omitting ending index:", sond)
-~~~
-{: .language-python}
-~~~
-With known last position: ["sep", "oct", "nov", "dec"]
-Using len() to get last entry: ["sep", "oct", "nov", "dec"]
-Omitting ending index: ["sep", "oct", "nov", "dec"]
-~~~
-{: .output}
-> ## Overloading
->
-> `+` usually means addition, but when used on strings or lists, it means "concatenate".
-> Given that, what do you think the multiplication operator `*` does on lists?
-> In particular, what will be the output of the following code?
->
-> ~~~
-> counts = [2, 4, 6, 8, 10]
-> repeats = counts * 2
-> print(repeats)
-> ~~~
-> {: .language-python}
->
-> 1.  `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]`
-> 2.  `[4, 8, 12, 16, 20]`
-> 3.  `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]`
-> 4.  `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]`
->
-> The technical term for this is *operator overloading*:
-> a single operator, like `+` or `*`,
-> can do different things depending on what it's applied to.
->
-> > ## Solution
-> >
-> > The multiplication operator `*` used on a list replicates elements of the list and concatenates
-> > them together:
-> >
-> > ~~~
-> > [2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
-> > ~~~
-> > {: .output}
-> >
-> > It's equivalent to:
-> >
-> > ~~~
-> > counts + counts
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-[hadleywickham-tweet]: https://twitter.com/hadleywickham/status/643381054758363136
-{% include links.md %}
--- a/_episodes/05-style.md
+++ b/_episodes/05-style.md
+---
+title: Coding Style
+teaching: 20
+exercises: 10
+questions:
+- "How should I organise my code?"
+- "What are some practical steps I can take to improve the quality and readability of my scripts?"
+- "What tools exist to help me follow good coding style?"
+objectives:
+- "write and adjust code to follow standards of style and organisation."
+- "use a linter to check and modify their code to follow PEP8."
+- "provide sufficient documentation for their functions and scripts."
+keypoints:
+- "It is easier to read and maintain scripts and Jupyter notebooks that are well organised."
+- "The most commonly-used style guide for Python is detailed in PEP8."
+- "Linters such as `flake8` and `black` can help us follow style standards."
+- "The rules and standards should be followed within reason, but exceptions can be made according to your best judgement."
+---
+## plan
+- Toby currently scheduled to lead this session
+- can base a lot of this on https://merely-useful.github.io/py-rse/py-rse-style.html
+- something on project structure and file organization (?)
+  - specially relevant if planning to make a python package
+  - code organisation & jargon (packages, modules, files, classes, functions)
+    - a word about avoiding circular imports (?)
+- PEP8
+  - `pycodestyle`/`pylint` - only warn, doesn't modify code - [see also this comparison](https://books.agiliq.com/projects/essential-python-tools/en/latest/linters.html)
+  - `black` - modifies code - note still [**beta**](https://github.com/psf/black#note-this-is-a-beta-product)
+- documentation
+  - docstrings
+  - `sphinx`?
+- include tips for good Jupyter hygiene
+  - name the notebook before you do anything else!
+  - be careful with cell order
+  - clear output before saving
+{% include links.md %}
--- a/_episodes/06-challenges.md
+++ b/_episodes/06-challenges.md
+---
+title: Coding Style
+teaching: 20
+exercises: 10
+questions:
+- "How can I practice the new skills I've learned?"
+objectives:
+- "apply their Python skills to solve more extensive challenges."
+keypoints:
+- "There are many coding challenges to be found online, which can be used to exercise your Python skills."
+---
+## plan
+- advent of code
+- rosalind
+- must recommend/suggest challenges that use what we've covered in the previous material
+{% include links.md %}
--- a/_episodes/06-files.md
+++ b/_episodes/06-files.md
---
-title: Analyzing Data from Multiple Files
-teaching: 20
-exercises: 0
-questions:
- "How can I do the same operations on many different files?"
-objectives:
- "Use a library function to get a list of filenames that match a wildcard pattern."
- "Write a `for` loop to process multiple files."
-keypoints:
- "Use `glob.glob(pattern)` to create a list of files whose names match a pattern."
- "Use `*` in a pattern to match zero or more characters, and `?` to match any single character."
---
-We now have almost everything we need to process all our data files.
-The only thing that's missing is a library with a rather unpleasant name:
-~~~
-import glob
-~~~
-{: .language-python}
-The `glob` library contains a function, also called `glob`,
-that finds files and directories whose names match a pattern.
-We provide those patterns as strings:
-the character `*` matches zero or more characters,
-while `?` matches any one character.
-We can use this to get the names of all the CSV files in the current directory:
-~~~
-print(glob.glob('inflammation*.csv'))
-~~~
-{: .language-python}
-~~~
-['inflammation-05.csv', 'inflammation-11.csv', 'inflammation-12.csv', 'inflammation-08.csv',
-'inflammation-03.csv', 'inflammation-06.csv', 'inflammation-09.csv', 'inflammation-07.csv',
-'inflammation-10.csv', 'inflammation-02.csv', 'inflammation-04.csv', 'inflammation-01.csv']
-~~~
-{: .output}
-As these examples show,
-`glob.glob`'s result is a list of file and directory paths in arbitrary order.
-This means we can loop over it
-to do something with each filename in turn.
-In our case,
-the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
-If we want to start by analyzing just the first three files in alphabetical order, we can use the
-`sorted` built-in function to generate a new sorted list from the `glob.glob` output:
-~~~
-import glob
-import numpy
-import matplotlib.pyplot
-filenames = sorted(glob.glob('inflammation*.csv'))
-filenames = filenames[0:3]
-for filename in filenames:
-    print(filename)
-    data = numpy.loadtxt(fname=filename, delimiter=',')
-    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-    axes1 = fig.add_subplot(1, 3, 1)
-    axes2 = fig.add_subplot(1, 3, 2)
-    axes3 = fig.add_subplot(1, 3, 3)
-    axes1.set_ylabel('average')
-    axes1.plot(numpy.mean(data, axis=0))
-    axes2.set_ylabel('max')
-    axes2.plot(numpy.max(data, axis=0))
-    axes3.set_ylabel('min')
-    axes3.plot(numpy.min(data, axis=0))
-    fig.tight_layout()
-    matplotlib.pyplot.show()
-~~~
-{: .language-python}
-~~~
-inflammation-01.csv
-~~~
-{: .output}
-![Analysis of inflammation-01.csv](../fig/03-loop_49_1.png)
-~~~
-inflammation-02.csv
-~~~
-{: .output}
-![Analysis of inflammation-02.csv](../fig/03-loop_49_3.png)
-~~~
-inflammation-03.csv
-~~~
-{: .output}
-![Analysis of inflammation-03.csv](../fig/03-loop_49_5.png)
-Sure enough,
-the maxima of the first two data sets show exactly the same ramp as the first,
-and their minima show the same staircase structure;
-a different situation has been revealed in the third dataset,
-where the maxima are a bit less regular, but the minima are consistently zero.
-> ## Plotting Differences
->
-> Plot the difference between the average inflammations reported in the first and second datasets
-> (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly),
-> i.e., the difference between the leftmost plots of the first two figures.
->
-> > ## Solution
-> > ~~~
-> > import glob
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > filenames = sorted(glob.glob('inflammation*.csv'))
-> >
-> > data0 = numpy.loadtxt(fname=filenames[0], delimiter=',')
-> > data1 = numpy.loadtxt(fname=filenames[1], delimiter=',')
-> >
-> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-> >
-> > matplotlib.pyplot.ylabel('Difference in average')
-> > matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0))
-> >
-> > fig.tight_layout()
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Generate Composite Statistics
->
-> Use each of the files once to generate a dataset containing values averaged over all patients:
->
-> ~~~
-> filenames = glob.glob('inflammation*.csv')
-> composite_data = numpy.zeros((60,40))
-> for filename in filenames:
->     # sum each new file's data into composite_data as it's read
->     #
-> # and then divide the composite_data by number of samples
-> composite_data = composite_data / len(filenames)
-> ~~~
-> {: .language-python}
->
-> Then use pyplot to generate average, max, and min for all patients.
->
-> > ## Solution
-> > ~~~
-> > import glob
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > filenames = glob.glob('inflammation*.csv')
-> > composite_data = numpy.zeros((60,40))
-> >
-> > for filename in filenames:
-> >     data = numpy.loadtxt(fname = filename, delimiter=',')
-> >     composite_data = composite_data + data
-> >
-> > composite_data = composite_data / len(filenames)
-> >
-> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-> >
-> > axes1 = fig.add_subplot(1, 3, 1)
-> > axes2 = fig.add_subplot(1, 3, 2)
-> > axes3 = fig.add_subplot(1, 3, 3)
-> >
-> > axes1.set_ylabel('average')
-> > axes1.plot(numpy.mean(composite_data, axis=0))
-> >
-> > axes2.set_ylabel('max')
-> > axes2.plot(numpy.max(composite_data, axis=0))
-> >
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(composite_data, axis=0))
-> >
-> > fig.tight_layout()
-> >
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
->{: .solution}
-{: .challenge}
-{% include links.md %}
--- a/_episodes/07-cond.md
+++ b/_episodes/07-cond.md
---
-title: Making Choices
-teaching: 30
-exercises: 0
-questions:
- "How can my programs do different things based on data values?"
-objectives:
- "Write conditional statements including `if`, `elif`, and `else` branches."
- "Correctly evaluate expressions containing `and` and `or`."
-keypoints:
- "Use `if condition` to start a conditional statement, `elif condition` to
-   provide additional tests, and `else` to provide a default."
- "The bodies of the branches of conditional statements must be indented."
- "Use `==` to test for equality."
- "`X and Y` is only true if both `X` and `Y` are true."
- "`X or Y` is true if either `X` or `Y`, or both, are true."
- "Zero, the empty string, and the empty list are considered false;
-   all other numbers, strings, and lists are considered true."
- "`True` and `False` represent truth values."
---
-In our last lesson, we discovered something suspicious was going on
-in our inflammation data by drawing some plots.
-How can we use Python to automatically recognize the different features we saw,
-and take a different action for each? In this lesson, we'll learn how to write code that
-runs only when certain conditions are true.
-## Conditionals
-We can ask Python to take different actions, depending on a condition, with an `if` statement:
-~~~
-num = 37
-if num > 100:
-    print('greater')
-else:
-    print('not greater')
-print('done')
-~~~
-{: .language-python}
-~~~
-not greater
-done
-~~~
-{: .output}
-The second line of this code uses the keyword `if` to tell Python that we want to make a choice.
-If the test that follows the `if` statement is true,
-the body of the `if`
-(i.e., the set of lines indented underneath it) is executed, and "greater" is printed.
-If the test is false,
-the body of the `else` is executed instead, and "not greater" is printed.
-Only one or the other is ever executed before continuing on with program execution to print "done":
-![A flowchart diagram of the if-else construct that tests if variable num is greater than 100](../fig/python-flowchart-conditional.png)
-Conditional statements don't have to include an `else`.
-If there isn't one,
-Python simply does nothing if the test is false:
-~~~
-num = 53
-print('before conditional...')
-if num > 100:
-    print(num,' is greater than 100')
-print('...after conditional')
-~~~
-{: .language-python}
-~~~
-before conditional...
-...after conditional
-~~~
-{: .output}
-We can also chain several tests together using `elif`,
-which is short for "else if".
-The following Python code uses `elif` to print the sign of a number.
-~~~
-num = -3
-if num > 0:
-    print(num, 'is positive')
-elif num == 0:
-    print(num, 'is zero')
-else:
-    print(num, 'is negative')
-~~~
-{: .language-python}
-~~~
-3 is negative
-~~~
-{: .output}
-Note that to test for equality we use a double equals sign `==`
-rather than a single equals sign `=` which is used to assign values.
-We can also combine tests using `and` and `or`.
-`and` is only true if both parts are true:
-~~~
-if (1 > 0) and (-1 > 0):
-    print('both parts are true')
-else:
-    print('at least one part is false')
-~~~
-{: .language-python}
-~~~
-at least one part is false
-~~~
-{: .output}
-while `or` is true if at least one part is true:
-~~~
-if (1 < 0) or (-1 < 0):
-    print('at least one test is true')
-~~~
-{: .language-python}
-~~~
-at least one test is true
-~~~
-{: .output}
-> ## `True` and `False`
-> `True` and `False` are special words in Python called `booleans`,
-> which represent truth values. A statement such as `1 < 0` returns
-> the value `False`, while `-1 < 0` returns the value `True`.
-{: .callout}
-## Checking our Data
-Now that we've seen how conditionals work,
-we can use them to check for the suspicious features we saw in our inflammation data.
-We are about to use functions provided by the `numpy` module again.
-Therefore, if you're working in a new Python session, make sure to load the
-module with:
-~~~
-import numpy
-~~~
-{: .language-python}
-From the first couple of plots, we saw that maximum daily inflammation exhibits
-a strange behavior and raises one unit a day.
-Wouldn't it be a good idea to detect such behavior and report it as suspicious?
-Let's do that!
-However, instead of checking every single day of the study, let's merely check
-if maximum inflammation in the beginning (day 0) and in the middle (day 20) of
-the study are equal to the corresponding day numbers.
-~~~
-max_inflammation_0 = numpy.max(data, axis=0)[0]
-max_inflammation_20 = numpy.max(data, axis=0)[20]
-if max_inflammation_0 == 0 and max_inflammation_20 == 20:
-    print('Suspicious looking maxima!')
-~~~
-{: .language-python}
-We also saw a different problem in the third dataset;
-the minima per day were all zero (looks like a healthy person snuck into our study).
-We can also check for this with an `elif` condition:
-~~~
-elif numpy.sum(numpy.min(data, axis=0)) == 0:
-    print('Minima add up to zero!')
-~~~
-{: .language-python}
-And if neither of these conditions are true, we can use `else` to give the all-clear:
-~~~
-else:
-    print('Seems OK!')
-~~~
-{: .language-python}
-Let's test that out:
-~~~
-data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-max_inflammation_0 = numpy.max(data, axis=0)[0]
-max_inflammation_20 = numpy.max(data, axis=0)[20]
-if max_inflammation_0 == 0 and max_inflammation_20 == 20:
-    print('Suspicious looking maxima!')
-elif numpy.sum(numpy.min(data, axis=0)) == 0:
-    print('Minima add up to zero!')
-else:
-    print('Seems OK!')
-~~~
-{: .language-python}
-~~~
-Suspicious looking maxima!
-~~~
-{: .output}
-~~~
-data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
-max_inflammation_0 = numpy.max(data, axis=0)[0]
-max_inflammation_20 = numpy.max(data, axis=0)[20]
-if max_inflammation_0 == 0 and max_inflammation_20 == 20:
-    print('Suspicious looking maxima!')
-elif numpy.sum(numpy.min(data, axis=0)) == 0:
-    print('Minima add up to zero!')
-else:
-    print('Seems OK!')
-~~~
-{: .language-python}
-~~~
-Minima add up to zero!
-~~~
-{: .output}
-In this way,
-we have asked Python to do something different depending on the condition of our data.
-Here we printed messages in all cases,
-but we could also imagine not using the `else` catch-all
-so that messages are only printed when something is wrong,
-freeing us from having to manually examine every plot for features we've seen before.
-> ## How Many Paths?
->
-> Consider this code:
->
-> ~~~
-> if 4 > 5:
->     print('A')
-> elif 4 == 5:
->     print('B')
-> elif 4 < 5:
->     print('C')
-> ~~~
-> {: .language-python}
->
-> Which of the following would be printed if you were to run this code?
-> Why did you pick this answer?
->
-> 1.  A
-> 2.  B
-> 3.  C
-> 4.  B and C
->
-> > ## Solution
-> > C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true,
-> > but `4 < 5` is true.
-> {: .solution}
-{: .challenge}
-> ## What Is Truth?
->
-> `True` and `False` booleans are not the only values in Python that are true and false.
-> In fact, *any* value can be used in an `if` or `elif`.
-> After reading and running the code below,
-> explain what the rule is for which values are considered true and which are considered false.
->
-> ~~~
-> if '':
->     print('empty string is true')
-> if 'word':
->     print('word is true')
-> if []:
->     print('empty list is true')
-> if [1, 2, 3]:
->     print('non-empty list is true')
-> if 0:
->     print('zero is true')
-> if 1:
->     print('one is true')
-> ~~~
-> {: .language-python}
-{: .challenge}
-> ## That's Not Not What I Meant
->
-> Sometimes it is useful to check whether some condition is not true.
-> The Boolean operator `not` can do this explicitly.
-> After reading and running the code below,
-> write some `if` statements that use `not` to test the rule
-> that you formulated in the previous challenge.
->
-> ~~~
-> if not '':
->     print('empty string is not true')
-> if not 'word':
->     print('word is not true')
-> if not not True:
->     print('not not True is true')
-> ~~~
-> {: .language-python}
-{: .challenge}
-> ## Close Enough
->
-> Write some conditions that print `True` if the variable `a` is within 10% of the variable `b`
-> and `False` otherwise.
-> Compare your implementation with your partner's:
-> do you get the same answer for all possible pairs of numbers?
->
-> > ## Solution 1
-> > ~~~
-> > a = 5
-> > b = 5.1
-> >
-> > if abs(a - b) < 0.1 * abs(b):
-> >     print('True')
-> > else:
-> >     print('False')
-> > ~~~
-> > {: .language-python}
-> {: .solution}
->
-> > ## Solution 2
-> > ~~~
-> > print(abs(a - b) < 0.1 * abs(b))
-> > ~~~
-> > {: .language-python}
-> >
-> > This works because the Booleans `True` and `False`
-> > have string representations which can be printed.
-> {: .solution}
-{: .challenge}
-> ## In-Place Operators
->
-> Python (and most other languages in the C family) provides
-> [in-place operators]({{ page.root }}/reference/#in-place-operators)
-> that work like this:
->
-> ~~~
-> x = 1  # original value
-> x += 1 # add one to x, assigning result back to x
-> x *= 3 # multiply x by 3
-> print(x)
-> ~~~
-> {: .language-python}
->
-> ~~~
-> 6
-> ~~~
-> {: .output}
->
-> Write some code that sums the positive and negative numbers in a list separately,
-> using in-place operators.
-> Do you think the result is more or less readable
-> than writing the same without in-place operators?
->
-> > ## Solution
-> > ~~~
-> > positive_sum = 0
-> > negative_sum = 0
-> > test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
-> > for num in test_list:
-> >     if num > 0:
-> >         positive_sum += num
-> >     elif num == 0:
-> >         pass
-> >     else:
-> >         negative_sum += num
-> > print(positive_sum, negative_sum)
-> > ~~~
-> > {: .language-python}
-> >
-> > Here `pass` means "don't do anything".
-> In this particular case, it's not actually needed, since if `num == 0` neither
-> > sum needs to change, but it illustrates the use of `elif` and `pass`.
-> {: .solution}
-{: .challenge}
-> ## Sorting a List Into Buckets
->
-> In our `data` folder, large data sets are stored in files whose names start with
-> "inflammation-" and small data sets -- in files whose names start with "small-". We
-> also have some other files that we do not care about at this point. We'd like to break all
-> these files into three lists called `large_files`, `small_files`, and `other_files`,
-> respectively.
->
-> Add code to the template below to do this. Note that the string method
-> [`startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith)
-> returns `True` if and only if the string it is called on starts with the string
-> passed as an argument, that is:
->
-> ~~~
-> "String".startswith("Str")
-> ~~~
-> {: .language-python}
-> ~~~
-> True
-> ~~~
-> {: .output}
-> But
-> ~~~
-> "String".startswith("str")
-> ~~~
-> {: .language-python}
-> ~~~
-> False
-> ~~~
-> {: .output}
->Use the following Python code as your starting point:
-> ~~~
-> filenames = ['inflammation-01.csv',
->          'myscript.py',
->          'inflammation-02.csv',
->          'small-01.csv',
->          'small-02.csv']
-> large_files = []
-> small_files = []
-> other_files = []
-> ~~~
-> {: .language-python}
->
-> Your solution should:
->
-> 1.  loop over the names of the files
-> 2.  figure out which group each filename belongs in
-> 3.  append the filename to that list
->
-> In the end the three lists should be:
->
-> ~~~
-> large_files = ['inflammation-01.csv', 'inflammation-02.csv']
-> small_files = ['small-01.csv', 'small-02.csv']
-> other_files = ['myscript.py']
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > ~~~
-> > for filename in filenames:
-> >     if filename.startswith('inflammation-'):
-> >         large_files.append(filename)
-> >     elif filename.startswith('small-'):
-> >         small_files.append(filename)
-> >     else:
-> >         other_files.append(filename)
-> >
-> > print('large_files:', large_files)
-> > print('small_files:', small_files)
-> > print('other_files:', other_files)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Counting Vowels
->
-> 1. Write a loop that counts the number of vowels in a character string.
-> 2. Test it on a few individual words and full sentences.
-> 3. Once you are done, compare your solution to your neighbor's.
->    Did you make the same decisions about how to handle the letter 'y'
->    (which some people think is a vowel, and some do not)?
->
-> > ## Solution
-> > ~~~
-> > vowels = 'aeiouAEIOU'
-> > sentence = 'Mary had a little lamb.'
-> > count = 0
-> > for char in sentence:
-> >     if char in vowels:
-> >         count += 1
-> >
-> > print("The number of vowels in this string is " + str(count))
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
--- a/_episodes/08-func.md
+++ b/_episodes/08-func.md
--- a/_episodes/09-errors.md
+++ b/_episodes/09-errors.md
---
-title: Errors and Exceptions
-teaching: 30
-exercises: 0
-questions:
- "How does Python report errors?"
- "How can I handle errors in Python programs?"
-objectives:
- "To be able to read a traceback, and determine where the error took place and what type it is."
- "To be able to describe the types of situations in which syntax errors,
-   indentation errors, name errors, index errors, and missing file errors occur."
-keypoints:
- "Tracebacks can look intimidating, but they give us a lot of useful information about
-   what went wrong in our program, including where the error occurred and
-   what type of error it was."
- "An error having to do with the 'grammar' or syntax of the program is called a `SyntaxError`.
-   If the issue has to do with how the code is indented,
-   then it will be called an `IndentationError`."
- "A `NameError` will occur when trying to use a variable that does not exist. Possible causes are
-  that a variable definition is missing, a variable reference differs from its definition
-  in spelling or capitalization, or the code contains a string that is missing quotes around it."
- "Containers like lists and strings will generate errors if you try to access items
-   in them that do not exist. This type of error is called an `IndexError`."
- "Trying to read a file that does not exist will give you an `FileNotFoundError`.
-   Trying to read a file that is open for writing, or writing to a file that is open for reading,
-   will give you an `IOError`."
---
-Every programmer encounters errors,
-both those who are just beginning,
-and those who have been programming for years.
-Encountering errors and exceptions can be very frustrating at times,
-and can make coding feel like a hopeless endeavour.
-However,
-understanding what the different types of errors are
-and when you are likely to encounter them can help a lot.
-Once you know *why* you get certain types of errors,
-they become much easier to fix.
-Errors in Python have a very specific form,
-called a [traceback]({{ page.root }}/reference/#traceback).
-Let's examine one:
-~~~
-# This code has an intentional error. You can type it directly or
-# use it for reference to understand the error message below.
-def favorite_ice_cream():
-    ice_creams = [
-        "chocolate",
-        "vanilla",
-        "strawberry"
-    ]
-    print(ice_creams[3])
-favorite_ice_cream()
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-IndexError                                Traceback (most recent call last)
-<ipython-input-1-70bd89baa4df> in <module>()
-      6     print(ice_creams[3])
-      7
----> 8 favorite_ice_cream()
-<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
-      4         "vanilla",                                                                    "strawberry"
-      5     ]
----> 6     print(ice_creams[3])
-      7
-      8 favorite_ice_cream()
-IndexError: list index out of range
-~~~
-{: .error}
-This particular traceback has two levels.
-You can determine the number of levels by looking for the number of arrows on the left hand side.
-In this case:
-1.  The first shows code from the cell above,
-    with an arrow pointing to Line 8 (which is `favorite_ice_cream()`).
-2.  The second shows some code in the function `favorite_ice_cream`,
-    with an arrow pointing to Line 6 (which is `print(ice_creams[3])`).
-The last level is the actual place where the error occurred.
-The other level(s) show what function the program executed to get to the next level down.
-So, in this case, the program first performed a
-[function call]({{ page.root }}/reference/#function-call) to the function `favorite_ice_cream`.
-Inside this function,
-the program encountered an error on Line 6, when it tried to run the code `print(ice_creams[3])`.
-> ## Long Tracebacks
->
-> Sometimes, you might see a traceback that is very long
-> -- sometimes they might even be 20 levels deep!
-> This can make it seem like something horrible happened,
-> but the length of the error message does not reflect severity, rather,
-> it indicates that your program called many functions before it encountered the error.
-> Most of the time, the actual place where the error occurred is at the bottom-most level,
-> so you can skip down the traceback to the bottom.
-{: .callout}
-So what error did the program actually encounter?
-In the last line of the traceback,
-Python helpfully tells us the category or type of error (in this case, it is an `IndexError`)
-and a more detailed error message (in this case, it says "list index out of range").
-If you encounter an error and don't know what it means,
-it is still important to read the traceback closely.
-That way,
-if you fix the error,
-but encounter a new one,
-you can tell that the error changed.
-Additionally,
-sometimes knowing *where* the error occurred is enough to fix it,
-even if you don't entirely understand the message.
-If you do encounter an error you don't recognize,
-try looking at the
-[official documentation on errors](http://docs.python.org/3/library/exceptions.html).
-However,
-note that you may not always be able to find the error there,
-as it is possible to create custom errors.
-In that case,
-hopefully the custom error message is informative enough to help you figure out what went wrong.
-## Syntax Errors
-When you forget a colon at the end of a line,
-accidentally add one space too many when indenting under an `if` statement,
-or forget a parenthesis,
-you will encounter a [syntax error]({{ page.root }}/reference/#syntax-error).
-This means that Python couldn't figure out how to read your program.
-This is similar to forgetting punctuation in English:
-for example,
-this text is difficult to read there is no punctuation there is also no capitalization
-why is this hard because you have to figure out where each sentence ends
-you also have to figure out where each sentence begins
-to some extent it might be ambiguous if there should be a sentence break or not
-People can typically figure out what is meant by text with no punctuation,
-but people are much smarter than computers.
-If Python doesn't know how to read the program,
-it will give up and inform you with an error.
-For example:
-~~~
-def some_function()
-    msg = "hello, world!"
-    print(msg)
-     return msg
-~~~
-{: .language-python}
-~~~
-  File "<ipython-input-3-6bb841ea1423>", line 1
-    def some_function()
-                       ^
-SyntaxError: invalid syntax
-~~~
-{: .error}
-Here, Python tells us that there is a `SyntaxError` on line 1,
-and even puts a little arrow in the place where there is an issue.
-In this case the problem is that the function definition is missing a colon at the end.
-Actually, the function above has *two* issues with syntax.
-If we fix the problem with the colon,
-we see that there is *also* an `IndentationError`,
-which means that the lines in the function definition do not all have the same indentation:
-~~~
-def some_function():
-    msg = "hello, world!"
-    print(msg)
-     return msg
-~~~
-{: .language-python}
-~~~
-  File "<ipython-input-4-ae290e7659cb>", line 4
-    return msg
-    ^
-IndentationError: unexpected indent
-~~~
-{: .error}
-Both `SyntaxError` and `IndentationError` indicate a problem with the syntax of your program,
-but an `IndentationError` is more specific:
-it *always* means that there is a problem with how your code is indented.
-> ## Tabs and Spaces
->
-> Some indentation errors are harder to spot than others.
-> In particular, mixing spaces and tabs can be difficult to spot
-> because they are both [whitespace]({{ page.root }}/reference/#whitespace).
-> In the example below, the first two lines in the body of the function
-> `some_function` are indented with tabs, while the third line &mdash; with spaces.
-> If you're working in a Jupyter notebook, be sure to copy and paste this example
-> rather than trying to type it in manually because Jupyter automatically replaces
-> tabs with spaces.
->
-> ~~~
-> def some_function():
-> 	msg = "hello, world!"
-> 	print(msg)
->         return msg
-> ~~~
-> {: .language-python}
->
-> Visually it is impossible to spot the error.
-> Fortunately, Python does not allow you to mix tabs and spaces.
->
-> ~~~
->   File "<ipython-input-5-653b36fbcd41>", line 4
->     return msg
->               ^
-> TabError: inconsistent use of tabs and spaces in indentation
-> ~~~
-> {: .error}
-{: .callout}
-## Variable Name Errors
-Another very common type of error is called a `NameError`,
-and occurs when you try to use a variable that does not exist.
-For example:
-~~~
-print(a)
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-NameError                                 Traceback (most recent call last)
-<ipython-input-7-9d7b17ad5387> in <module>()
----> 1 print(a)
-NameError: name 'a' is not defined
-~~~
-{: .error}
-Variable name errors come with some of the most informative error messages,
-which are usually of the form "name 'the_variable_name' is not defined".
-Why does this error message occur?
-That's a harder question to answer,
-because it depends on what your code is supposed to do.
-However,
-there are a few very common reasons why you might have an undefined variable.
-The first is that you meant to use a
-[string]({{ page.root }}/reference/#string), but forgot to put quotes around it:
-~~~
-print(hello)
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-NameError                                 Traceback (most recent call last)
-<ipython-input-8-9553ee03b645> in <module>()
----> 1 print(hello)
-NameError: name 'hello' is not defined
-~~~
-{: .error}
-The second reason is that you might be trying to use a variable that does not yet exist.
-In the following example,
-`count` should have been defined (e.g., with `count = 0`) before the for loop:
-~~~
-for number in range(10):
-    count = count + number
-print("The count is:", count)
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-NameError                                 Traceback (most recent call last)
-<ipython-input-9-dd6a12d7ca5c> in <module>()
-      1 for number in range(10):
----> 2     count = count + number
-      3 print("The count is:", count)
-NameError: name 'count' is not defined
-~~~
-{: .error}
-Finally, the third possibility is that you made a typo when you were writing your code.
-Let's say we fixed the error above by adding the line `Count = 0` before the for loop.
-Frustratingly, this actually does not fix the error.
-Remember that variables are [case-sensitive]({{ page.root }}/reference/#case-sensitive),
-so the variable `count` is different from `Count`. We still get the same error,
-because we still have not defined `count`:
-~~~
-Count = 0
-for number in range(10):
-    count = count + number
-print("The count is:", count)
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-NameError                                 Traceback (most recent call last)
-<ipython-input-10-d77d40059aea> in <module>()
-      1 Count = 0
-      2 for number in range(10):
----> 3     count = count + number
-      4 print("The count is:", count)
-NameError: name 'count' is not defined
-~~~
-{: .error}
-## Index Errors
-Next up are errors having to do with containers (like lists and strings) and the items within them.
-If you try to access an item in a list or a string that does not exist,
-then you will get an error.
-This makes sense:
-if you asked someone what day they would like to get coffee,
-and they answered "caturday",
-you might be a bit annoyed.
-Python gets similarly annoyed if you try to ask it for an item that doesn't exist:
-~~~
-letters = ['a', 'b', 'c']
-print("Letter #1 is", letters[0])
-print("Letter #2 is", letters[1])
-print("Letter #3 is", letters[2])
-print("Letter #4 is", letters[3])
-~~~
-{: .language-python}
-~~~
-Letter #1 is a
-Letter #2 is b
-Letter #3 is c
-~~~
-{: .output}
-~~~
---------------------------------------------------------------------------
-IndexError                                Traceback (most recent call last)
-<ipython-input-11-d817f55b7d6c> in <module>()
-      3 print("Letter #2 is", letters[1])
-      4 print("Letter #3 is", letters[2])
----> 5 print("Letter #4 is", letters[3])
-IndexError: list index out of range
-~~~
-{: .error}
-Here,
-Python is telling us that there is an `IndexError` in our code,
-meaning we tried to access a list index that did not exist.
-## File Errors
-The last type of error we'll cover today
-are those associated with reading and writing files: `FileNotFoundError`.
-If you try to read a file that does not exist,
-you will receive a `FileNotFoundError` telling you so.
-If you attempt to write to a file that was opened read-only, Python 3
-returns an `UnsupportedOperationError`.
-More generally, problems with input and output manifest as
-`IOError`s or `OSError`s, depending on the version of Python you use.
-~~~
-file_handle = open('myfile.txt', 'r')
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-FileNotFoundError                         Traceback (most recent call last)
-<ipython-input-14-f6e1ac4aee96> in <module>()
----> 1 file_handle = open('myfile.txt', 'r')
-FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
-~~~
-{: .error}
-One reason for receiving this error is that you specified an incorrect path to the file.
-For example,
-if I am currently in a folder called `myproject`,
-and I have a file in `myproject/writing/myfile.txt`,
-but I try to open `myfile.txt`,
-this will fail.
-The correct path would be `writing/myfile.txt`.
-It is also possible that the file name or its path contains a typo.
-A related issue can occur if you use the "read" flag instead of the "write" flag.
-Python will not give you an error if you try to open a file for writing
-when the file does not exist.
-However,
-if you meant to open a file for reading,
-but accidentally opened it for writing,
-and then try to read from it,
-you will get an `UnsupportedOperation` error
-telling you that the file was not opened for reading:
-~~~
-file_handle = open('myfile.txt', 'w')
-file_handle.read()
-~~~
-{: .language-python}
-~~~
---------------------------------------------------------------------------
-UnsupportedOperation                      Traceback (most recent call last)
-<ipython-input-15-b846479bc61f> in <module>()
-      1 file_handle = open('myfile.txt', 'w')
----> 2 file_handle.read()
-UnsupportedOperation: not readable
-~~~
-{: .error}
-These are the most common errors with files,
-though many others exist.
-If you get an error that you've never seen before,
-searching the Internet for that error type
-often reveals common reasons why you might get that error.
-> ## Reading Error Messages
->
-> Read the Python code and the resulting traceback below, and answer the following questions:
->
-> 1.  How many levels does the traceback have?
-> 2.  What is the function name where the error occurred?
-> 3.  On which line number in this function did the error occur?
-> 4.  What is the type of error?
-> 5.  What is the error message?
->
-> ~~~
-> # This code has an intentional error. Do not type it directly;
-> # use it for reference to understand the error message below.
-> def print_message(day):
->     messages = {
->         "monday": "Hello, world!",
->         "tuesday": "Today is Tuesday!",
->         "wednesday": "It is the middle of the week.",
->         "thursday": "Today is Donnerstag in German!",
->         "friday": "Last day of the week!",
->         "saturday": "Hooray for the weekend!",
->         "sunday": "Aw, the weekend is almost over."
->     }
->     print(messages[day])
->
-> def print_friday_message():
->     print_message("Friday")
->
-> print_friday_message()
-> ~~~
-> {: .language-python}
->
-> ~~~
-> ---------------------------------------------------------------------------
-> KeyError                                  Traceback (most recent call last)
-> <ipython-input-1-4be1945adbe2> in <module>()
->      14     print_message("Friday")
->      15
-> ---> 16 print_friday_message()
->
-> <ipython-input-1-4be1945adbe2> in print_friday_message()
->      12
->      13 def print_friday_message():
-> ---> 14     print_message("Friday")
->      15
->      16 print_friday_message()
->
-> <ipython-input-1-4be1945adbe2> in print_message(day)
->       9         "sunday": "Aw, the weekend is almost over."
->      10     }
-> ---> 11     print(messages[day])
->      12
->      13 def print_friday_message():
->
-> KeyError: 'Friday'
-> ~~~
-> {: .error}
->
-> > ## Solution
-> > 1. 3 levels
-> > 2. `print_message`
-> > 3. 11
-> > 4. `KeyError`
-> > 5. There isn't really a message; you're supposed to infer that `Friday` is not a key in `messages`.
-> {: .solution}
-{: .challenge}
-> ## Identifying Syntax Errors
->
-> 1. Read the code below, and (without running it) try to identify what the errors are.
-> 2. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`?
-> 3. Fix the error.
-> 4. Repeat steps 2 and 3, until you have fixed all the errors.
->
-> ~~~
-> def another_function
->   print("Syntax errors are annoying.")
->    print("But at least Python tells us about them!")
->   print("So they are usually not too hard to fix.")
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > `SyntaxError` for missing `():` at end of first line,
-> `IndentationError` for mismatch between second and third lines.
-> > A fixed version is:
-> >
-> > ~~~
-> > def another_function():
-> >     print("Syntax errors are annoying.")
-> >     print("But at least Python tells us about them!")
-> >     print("So they are usually not too hard to fix.")
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Identifying Variable Name Errors
->
-> 1. Read the code below, and (without running it) try to identify what the errors are.
-> 2. Run the code, and read the error message.
->    What type of `NameError` do you think this is?
->    In other words, is it a string with no quotes,
->    a misspelled variable,
->    or a variable that should have been defined but was not?
-> 3. Fix the error.
-> 4. Repeat steps 2 and 3, until you have fixed all the errors.
->
-> ~~~
-> for number in range(10):
->     # use a if the number is a multiple of 3, otherwise use b
->     if (Number % 3) == 0:
->         message = message + a
->     else:
->         message = message + "b"
-> print(message)
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > 3 `NameError`s for `number` being misspelled, for `message` not defined,
-> > and for `a` not being in quotes.
-> >
-> > Fixed version:
-> >
-> > ~~~
-> > message = ""
-> > for number in range(10):
-> >     # use a if the number is a multiple of 3, otherwise use b
-> >     if (number % 3) == 0:
-> >         message = message + "a"
-> >     else:
-> >         message = message + "b"
-> > print(message)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Identifying Index Errors
->
-> 1. Read the code below, and (without running it) try to identify what the errors are.
-> 2. Run the code, and read the error message. What type of error is it?
-> 3. Fix the error.
->
-> ~~~
-> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
-> print('My favorite season is ', seasons[4])
-> ~~~
-> {: .language-python}
->
-> > ## Solution
-> > `IndexError`; the last entry is `seasons[3]`, so `seasons[4]` doesn't make sense.
-> > A fixed version is:
-> >
-> > ~~~
-> > seasons = ['Spring', 'Summer', 'Fall', 'Winter']
-> > print('My favorite season is ', seasons[-1])
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
--- a/_episodes/10-defensive.md
+++ b/_episodes/10-defensive.md
--- a/_episodes/11-debugging.md
+++ b/_episodes/11-debugging.md
--- a/_episodes/12-cmdline.md
+++ b/_episodes/12-cmdline.md
--- a/_includes/links.md
+++ b/_includes/links.md
 {% include base_path.html %}
 [cc-by-human]: https://creativecommons.org/licenses/by/4.0/
 [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
+[cdh]: https://cdh.carpentries.org/
 [ci]: http://communityin.org/
 [coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
 [coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
@@ -24,10 +25,12 @@
 [lesson-aio]: {{ relative_root_path }}{% link aio.md %}
 [lesson-coc]: {{ relative_root_path }}{% link CODE_OF_CONDUCT.md %}
 [lesson-example]: https://carpentries.github.io/lesson-example/
+[lesson-example-blockquotes]: https://carpentries.github.io/lesson-example/04-formatting/index.html#special-blockquotes
 [lesson-license]: {{ relative_root_path }}{% link LICENSE.md %}
 [lesson-mainpage]: {{ relative_root_path }}{% link index.md %}
 [lesson-reference]: {{ relative_root_path }}{% link reference.md %}
 [lesson-setup]: {{ relative_root_path }}{% link setup.md %}
+[markdown-cheatsheet]: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
 [mit-license]: https://opensource.org/licenses/mit-license.html
 [morea]: https://morea-framework.github.io/
 [numfocus]: https://numfocus.org/
@@ -43,6 +46,7 @@
 [rubygems]: https://rubygems.org/pages/download/
 [styles]: https://github.com/carpentries/styles/
 [swc-lessons]: https://software-carpentry.org/lessons/
+[swc-python-gapminder]: http://swcarpentry.github.io/python-novice-gapminder/
 [swc-releases]: https://github.com/swcarpentry/swc-releases
 [training]: https://carpentries.github.io/instructor-training/
 [workshop-repo]: {{ site.workshop_repo }}