-title: Python Fundamentals
-teaching: 20
-exercises: 10
-- "What basic data types can I work with in Python?"
-- "How can I create a new variable in Python?"
-- "Can I change the value associated with a variable after I create it?"
-- "Assign values to variables."
-- "Basic data types in Python include integers, strings, and floating-point numbers."
-- "Use `variable = value` to assign a value to a variable in order to record it in memory."
-- "Variables are created on demand whenever a value is assigned to them."
-- "Use `print(something)` to display the value of `something`."
-## Variables
-Any Python interpreter can be used as a calculator:
-3 + 5 * 4
-{: .language-python}
-{: .output}
-This is great but not very interesting.
-To do anything useful with data, we need to assign its value to a _variable_.
-In Python, we can [assign]({{ page.root }}/reference/#assign) a value to a
-[variable]({{ page.root }}/reference/#variable), using the equals sign `=`.
-For example, to assign value `60` to a variable `weight_kg`, we would execute:
-weight_kg = 60
-{: .language-python}
-From now on, whenever we use `weight_kg`, Python will substitute the value we assigned to
-it. In layman's terms, **a variable is a name for a value**.
-In Python, variable names:
- - can include letters, digits, and underscores
- - cannot start with a digit
- - are [case sensitive]({{ page.root }}/reference/#case-sensitive).
-This means that, for example:
- - `weight0` is a valid variable name, whereas `0weight` is not
- - `weight` and `Weight` are different variables
-## Types of data
-Python knows various types of data. Three common ones are:
-* integer numbers
-* floating point numbers, and
-* strings.
-In the example above, variable `weight_kg` has an integer value of `60`.
-To create a variable with a floating point value, we can execute:
-weight_kg = 60.0
-{: .language-python}
-And to create a string, we add single or double quotes around some text, for example:
-weight_kg_text = 'weight in kilograms:'
-{: .language-python}
-## Using Variables in Python
-To display the value of a variable to the screen in Python, we can use the `print` function:
-{: .language-python}
-{: .output}
-We can display multiple things at once using only one `print` command:
-print(weight_kg_text, weight_kg)
-{: .language-python}
-weight in kilograms: 60.0
-{: .output}
-Moreover, we can do arithmetic with variables right inside the `print` function:
-print('weight in pounds:', 2.2 * weight_kg)
-{: .language-python}
-weight in pounds: 132.0
-{: .output}
-The above command, however, did not change the value of `weight_kg`:
-{: .language-python}
-{: .output}
-To change the value of the `weight_kg` variable, we have to
-**assign** `weight_kg` a new value using the equals `=` sign:
-weight_kg = 65.0
-print('weight in kilograms is now:', weight_kg)
-{: .language-python}
-weight in kilograms is now: 65.0
-{: .output}
-> ## Variables as Sticky Notes
-> A variable is analogous to a sticky note with a name written on it:
-> assigning a value to a variable is like putting that sticky note on a particular value.
-> ![Value of 65.0 with weight_kg label stuck on it](../fig/python-sticky-note-variables-01.svg)
-> This means that assigning a value to one variable does **not** change
-> values of other variables.
-> For example, let's store the subject's weight in pounds in its own variable:
-> ~~~
-> # There are 2.2 pounds per kilogram
-> weight_lb = 2.2 * weight_kg
-> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
-> ~~~
-> {: .language-python}
-> ~~~
-> weight in kilograms: 65.0 and in pounds: 143.0
-> ~~~
-> {: .output}
-> ![Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it](../fig/python-sticky-note-variables-02.svg)
-> Let's now change `weight_kg`:
-> ~~~
-> weight_kg = 100.0
-> print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
-> ~~~
-> {: .language-python}
-> ~~~
-> weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
-> ~~~
-> {: .output}
-> ![Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lb stuck on it](../fig/python-sticky-note-variables-03.svg)
-> Since `weight_lb` doesn't "remember" where its value comes from,
-> it is not updated when we change `weight_kg`.
-{: .callout}
-> ## Check Your Understanding
-> What values do the variables `mass` and `age` have after each statement in the following program?
-> Test your answers by executing the commands.
-> ~~~
-> mass = 47.5
-> age = 122
-> mass = mass * 2.0
-> age = age - 20
-> print(mass, age)
-> ~~~
-> {: .language-python}
-> > ## Solution
-> > ~~~
-> > 95.0 102
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-> ## Sorting Out References
-> What does the following program print out?
-> ~~~
-> first, second = 'Grace', 'Hopper'
-> third, fourth = second, first
-> print(third, fourth)
-> ~~~
-> {: .language-python}
-> > ## Solution
-> > ~~~
-> > Hopper Grace
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
+title: Syntax Elements & Powerful Functions
+teaching: 20
+exercises: 10
+- "What elements of Python syntax might I see in other people's code?"
+- "How can I use these additional features of Python to take my code to the next level?"
+- "What built-in functions and standard library modules are recommended to improve my code?"
+- "write comprehensions to improve code readability and efficiency."
+- "call functions designed to make common tasks easier and faster."
+- "recognise all elements of modern Python syntax and explain their purpose."
+- "Use comprehensions to efficiently create new iterables with fewer lines of code."
+- "Sets can be extremely useful when comparing collections of objects, and create significantly speed up your code."
+- "The `itertools` module includes many helpful functions for working with iterables."
+- "A decorator is a function that does something to the output of another function."
+## plan
+- Renato currently scheduled to lead this session
+- comprehensions (list, dictionary, generators)
+  - `yield`
+- sets - `{1,2,3}`
+- function argument passing: `*`, `**`, `/`
+  - packing/unpacking and the catchall pattern `first, *, last = mylist`
+- multi-dimension slicing (numpy/pandas)
+- `dot` - i.e. `this.that`
+- `_`, `__` - single, double underscore (meaning)
+- context managers (`with`) - [contextlib](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)
+- things you might see, including new features
+  - `@not_twitter` - decorators
+  - `import typing` - type annotations / hints
+  - `:=` - "walrus" operator
+  - `async`/`await` - `yield from`
+- commonly used functions
+  - `zip()`
+  - `set()`
+  - `enumerate()`
+  - `itertools.*`
+  - (?) `functools.partial` - needs a good realistic example
+- (?) honorable mentions - useful modules
+  - `plotnine`
+## Notes on how to use this lesson template
+See the [Lesson Example][lesson-example]
+and [The Carpentries Curriculum Development Handbook][cdh]
+for full details.
+Below should be all the things you need to know right now...
+### Creating pages
+- Write material with [Markdown][markdown-cheatsheet]
+  - markdown files will be rendered as HTML pages and included in the built site
+- `.md` files in the `_episodes` folder will be added to the Episodes dropdown, page navigation, etc
+- Markdown files must include _front matter_: metadata specified in a YAML header bounded by `---`
+- At minimum, this must include a `title` field
+title: The Title of the Section
+{: .source}
+- but really your episodes (lesson sections) should include:
+  - an estimate of time required for teaching & exercises
+  - main questions answered in the section
+  - learning objectives
+  - key points to summarise what's covered in the section (these end are added at the end of the lession section)
+- as an example, below is the front matter for this page
+title: Syntax Elements & Powerful Functions
+teaching: 20
+exercises: 10
+- "What elements of Python syntax might I see in other people's code?"
+- "How can I use these additional features of Python to take my code to the next level?"
+- "What built-in functions and standard library modules are recommended to improve my code?"
+- "write comprehensions to improve code readability and efficiency."
+- "call functions designed to make common tasks easier and faster."
+- "recognise all elements of modern Python syntax and explain their purpose."
+- "Use comprehensions to efficiently create new iterables with fewer lines of code."
+- "Sets can be extremely useful when comparing collections of objects, and create significantly speed up your code."
+- "The `itertools` module includes many helpful functions for working with iterables."
+- "A decorator is a function that does something to the output of another function."
+{: .source}
+## Code blocks
+code snippets written like this
+{% raw %}
+    ~~~
+    print(weight_kg)
+    ~~~
+    {: .language-python}
+    ~~~
+    60.0
+    ~~~
+    {: .output}
+{% endraw %}
+will produce formatted blocks like this:
+{: .language-python}
+{: .output}
+## Special blockquotes
+- The lesson template also includes a range of styled boxes
+  - examples for exercises and callouts below
+  - see [this section][lesson-example-blockquotes] of The Carpentries Lesson Example for the full list
+A callout block written like this
+> ## Callout block example
+> Write callout blocks as blockquotes,
+> with a styling tag (techincal term is a _class identifier_) at the end.
+> ~~~
+> # you can still include code blocks in the callout
+> weight_lb = 2.2 * weight_kg
+> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
+> ~~~
+> {: .language-python}
+> Use callouts for asides and comments -
+> anything that provides additional detail to the core of your material
+{: .callout}
+{: .source}
+will be rendered like this:
+> ## Callout block example
+> Write callout blocks as blockquotes,
+> with a styling tag (techincal term is a _class identifier_) at the end.
+> ~~~
+> # you can still include code blocks in the callout
+> weight_lb = 2.2 * weight_kg
+> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
+> ~~~
+> {: .language-python}
+> Use callouts for asides and comments -
+> anything that provides additional detail to the core of your material
+{: .callout}
+Similarly, exercises written like this
+> ## Sorting Out References
+> What does the following program print out?
+> ~~~
+> first, second = 'Grace', 'Hopper'
+> third, fourth = second, first
+> print(third, fourth)
+> ~~~
+> {: .language-python}
+> > ## Solution
+> >
+> > This text will only be visible if the solution is expanded
+> > ~~~
+> > Hopper Grace
+> > ~~~
+> > {: .output}
+> {: .solution}
+{: .challenge}
+{: .source}
+will be rendered like this (note the expandable box containing the solution):
+> ## Sorting Out References
+> What does the following program print out?
+> ~~~
+> first, second = 'Grace', 'Hopper'
+> third, fourth = second, first
+> print(third, fourth)
+> ~~~
+> {: .language-python}
+> > ## Solution
+> >
+> > This text will only be visible if the solution is expanded
+> > ~~~
+> > Hopper Grace
+> > ~~~
+> > {: .output}
+> {: .solution}
+{: .challenge}
+## Shared link references
+- Lastly, the last line in every `.md` file for each page should be
+{% raw %}
+`{% include links.md %}`
+{% endraw %}
+- This allows us to share link references across the entire site, which makes the links much more maintainable.
+  - link URLs should be put in the `_includes/links.md` file (ideally, arranged alphabetically by reference)
+  - you can then write Markdown links "reference-style" i.e. `[link text to be displayed][reference-id]`, with `[reference-id]: https://link.to.page` in `_includes/links.md`
+{% include links.md %}
+title: Working with Data
+teaching: 20
+exercises: 10
+- "How should I work with numeric data in Python?"
+- "What's the recommended way to handle and analyse tabular data?"
+- "How can I import tabular data for analysis in Python and export the results?"
+- "handle and summarise numeric data with Numpy."
+- "filter values in their data based on a range of conditions."
+- "load tabular data into a Pandas dataframe object."
+- "describe what is meant by the data type of an array/series, and the impact this has on how the data is handled."
+- "add and remove columns from a dataframe."
+- "select, aggregate, and visualise data in a dataframe."
+- "Specialised third-party libraries such as Numpy and Pandas provide powerful objects and functions that can help us analyse our data."
+- "Pandas dataframe objects allow us to efficiently load and handle large tabular data."
+- "Use the `pandas.read_csv` and `pandas.write_csv` functions to read and write tabular data."
+## plan
+- Toby currently scheduled to lead this session
+- Numpy
+  - arrays
+  - masking
+  - aside about data types and potential hazards
+  - reading data from a file (with note that more will come later on this topic)
+  - link to existing image analysis material
+- Pandas
+  - when an array just isn't enough
+  - DataFrames - re-use material from [Software Carpentry][swc-python-gapminder]?
+    - ideally with a more relevant example dataset... [maybe a COVID one](https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data/resource/260bbbde-2316-40eb-aec3-7cd7bfc2f590)
+    - include an aside about I/O - reading/writing files (pandas (the `.to_*()` methods and highlight some: `csv`, `json`, `feather`, `hdf`), numpy, `open()`, (?) bytes vs strings, (?) encoding)
+  - Finish with example of `df.plot()` to set the scene for plotting section
+{% include links.md %}
-title: Analyzing Patient Data
-teaching: 40
-exercises: 20
-- "How can I process tabular data files in Python?"
-- "Explain what a library is and what libraries are used for."
-- "Import a Python library and use the functions it contains."
-- "Read tabular data from a file into a program."
-- "Select individual values and subsections from data."
-- "Perform operations on arrays of data."
-- "Import a library into a program using `import libraryname`."
-- "Use the `numpy` library to work with arrays in Python."
-- "The expression `array.shape` gives the shape of an array."
-- "Use `array[x, y]` to select a single element from a 2D array."
-- "Array indices start at 0, not 1."
-- "Use `low:high` to specify a `slice` that includes the indices from `low` to `high-1`."
-- "Use `# some kind of explanation` to add comments to programs."
-- "Use `numpy.mean(array)`, `numpy.max(array)`, and `numpy.min(array)` to calculate simple statistics."
-- "Use `numpy.mean(array, axis=0)` or `numpy.mean(array, axis=1)` to calculate statistics across the specified axis."
-Words are useful, but what's more useful are the sentences and stories we build with them.
-Similarly, while a lot of powerful, general tools are built into Python,
-specialized tools built up from these basic units live in
-[libraries]({{ page.root }}/reference/#library)
-that can be called upon when needed.
-## Loading data into Python
-To begin processing inflammation data, we need to load it into Python.
-We can do that using a library called
-[NumPy](http://docs.scipy.org/doc/numpy/ "NumPy Documentation"), which stands for Numerical Python.
-In general, you should use this library when you want to do fancy things with lots of numbers,
-especially if you have matrices or arrays. To tell Python that we'd like to start using NumPy,
-we need to [import]({{ page.root }}/reference/#import) it:
-import numpy
-{: .language-python}
-Importing a library is like getting a piece of lab equipment out of a storage locker and setting it
-up on the bench. Libraries provide additional functionality to the basic Python package, much like
-a new piece of equipment adds functionality to a lab space. Just like in the lab, importing too
-many libraries can sometimes complicate and slow down your programs - so we only import what we
-need for each program.
-Once we've imported the library, we can ask the library to read our data file for us:
-numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-{: .language-python}
-array([[ 0.,  0.,  1., ...,  3.,  0.,  0.],
-       [ 0.,  1.,  2., ...,  1.,  0.,  1.],
-       [ 0.,  1.,  1., ...,  2.,  1.,  1.],
-       ...,
-       [ 0.,  1.,  1., ...,  1.,  1.,  1.],
-       [ 0.,  0.,  0., ...,  0.,  2.,  0.],
-       [ 0.,  0.,  1., ...,  1.,  1.,  0.]])
-{: .output}
-The expression `numpy.loadtxt(...)` is a [function call]({{ page.root }}/reference/#function-call)
-that asks Python to run the [function]({{ page.root }}/reference/#function) `loadtxt` which
-belongs to the `numpy` library. This [dotted notation]({{ page.root }}/reference/#dotted-notation)
-is used everywhere in Python: the thing that appears before the dot contains the thing that
-appears after.
-As an example, John Smith is the John that belongs to the Smith family.
-We could use the dot notation to write his name `smith.john`,
-just as `loadtxt` is a function that belongs to the `numpy` library.
-`numpy.loadtxt` has two [parameters]({{ page.root }}/reference/#parameter): the name of the file
-we want to read and the [delimiter]({{ page.root }}/reference/#delimiter) that separates values on
-a line. These both need to be character strings (or [strings]({{ page.root }}/reference/#string)
-for short), so we put them in quotes.
-Since we haven't told it to do anything else with the function's output,
-the [notebook]({{ page.root }}/reference/#notebook) displays it.
-In this case,
-that output is the data we just loaded.
-By default,
-only a few rows and columns are shown
-(with `...` to omit elements when displaying big arrays).
-Note that, to save space when displaying NumPy arrays, Python does not show us trailing zeros, so `1.0` becomes `1.`.
-> ## Importing libraries with shortcuts
-> In this lesson we use the `import numpy` [syntax]({{ page.root }}/reference/#syntax) to import NumPy.
-> However, shortcuts such as `import numpy as np` are frequently used.  Importing NumPy this way means that after the
-> inital import, rather than writing `numpy.loadtxt(...)`, you can now write `np.loadtxt(...)`. Some
-> people prefer this as it is quicker to type and results in shorter lines of code - especially for libraries
-> with long names! You will frequently see Python code online using a NumPy function with `np`, and it's
-> because they've used this shortcut. It makes no difference which approach you choose to take, but you must be
-> consistent as if you use `import numpy as np` then `numpy.loadtxt(...)` will not work, and you must use `np.loadtxt(...)`
-> instead. Because of this, when working with other people it is important you agree on how libraries are imported.
-{: .callout}
-Our call to `numpy.loadtxt` read our file
-but didn't save the data in memory.
-To do that,
-we need to assign the array to a variable. In a similar manner to how we assign a single
-value to a variable, we can also assign an array of values to a variable using the same syntax.
-Let's re-run `numpy.loadtxt` and save the returned data:
-data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-{: .language-python}
-This statement doesn't produce any output because we've assigned the output to the variable `data`.
-If we want to check that the data have been loaded,
-we can print the variable's value:
-{: .language-python}
-[[ 0.  0.  1. ...,  3.  0.  0.]
- [ 0.  1.  2. ...,  1.  0.  1.]
- [ 0.  1.  1. ...,  2.  1.  1.]
- ...,
- [ 0.  1.  1. ...,  1.  1.  1.]
- [ 0.  0.  0. ...,  0.  2.  0.]
- [ 0.  0.  1. ...,  1.  1.  0.]]
-{: .output}
-Now that the data are in memory,
-we can manipulate them.
-let's ask what [type]({{ page.root }}/reference/#type) of thing `data` refers to:
-{: .language-python}
-<class 'numpy.ndarray'>
-{: .output}
-The output tells us that `data` currently refers to
-an N-dimensional array, the functionality for which is provided by the NumPy library.
-These data correspond to arthritis patients' inflammation.
-The rows are the individual patients, and the columns
-are their daily inflammation measurements.
-> ## Data Type
-> A Numpy array contains one or more elements
-> of the same type. The `type` function will only tell you that
-> a variable is a NumPy array but won't tell you the type of
-> thing inside the array.
-> We can find out the type
-> of the data contained in the NumPy array.
-> ~~~
-> print(data.dtype)
-> ~~~
-> {: .language-python}
-> ~~~
-> float64
-> ~~~
-> {: .output}
-> This tells us that the NumPy array's elements are
-> [floating-point numbers]({{ page.root }}/reference/#floating-point number).
-{: .callout}
-With the following command, we can see the array's [shape]({{ page.root }}/reference/#shape):
-{: .language-python}
-(60, 40)
-{: .output}
-The output tells us that the `data` array variable contains 60 rows and 40 columns. When we
-created the variable `data` to store our arthritis data, we did not only create the array; we also
-created information about the array, called [members]({{ page.root }}/reference/#member) or
-attributes. This extra information describes `data` in the same way an adjective describes a noun.
-`data.shape` is an attribute of `data` which describes the dimensions of `data`. We use the same
-dotted notation for the attributes of variables that we use for the functions in libraries because
-they have the same part-and-whole relationship.
-If we want to get a single number from the array, we must provide an
-[index]({{ page.root }}/reference/#index) in square brackets after the variable name, just as we
-do in math when referring to an element of a matrix.  Our inflammation data has two dimensions, so
-we will need to use two indices to refer to one specific value:
-print('first value in data:', data[0, 0])
-{: .language-python}
-first value in data: 0.0
-{: .output}
-print('middle value in data:', data[30, 20])
-{: .language-python}
-middle value in data: 13.0
-{: .output}
-The expression `data[30, 20]` accesses the element at row 30, column 20. While this expression may
-not surprise you,
- `data[0, 0]` might.
-Programming languages like Fortran, MATLAB and R start counting at 1
-because that's what human beings have done for thousands of years.
-Languages in the C family (including C++, Java, Perl, and Python) count from 0
-because it represents an offset from the first value in the array (the second
-value is offset by one index from the first value). This is closer to the way
-that computers represent arrays (if you are interested in the historical
-reasons behind counting indices from zero, you can read
-[Mike Hoye's blog post](http://exple.tive.org/blarg/2013/10/22/citation-needed/)).
-As a result,
-if we have an M×N array in Python,
-its indices go from 0 to M-1 on the first axis
-and 0 to N-1 on the second.
-It takes a bit of getting used to,
-but one way to remember the rule is that
-the index is how many steps we have to take from the start to get the item we want.
-![Zero Index](../fig/python-zero-index.png)
-> ## In the Corner
-> What may also surprise you is that when Python displays an array,
-> it shows the element with index `[0, 0]` in the upper left corner
-> rather than the lower left.
-> This is consistent with the way mathematicians draw matrices
-> but different from the Cartesian coordinates.
-> The indices are (row, column) instead of (column, row) for the same reason,
-> which can be confusing when plotting data.
-{: .callout}
-## Slicing data
-An index like `[30, 20]` selects a single element of an array,
-but we can select whole sections as well.
-For example,
-we can select the first ten days (columns) of values
-for the first four patients (rows) like this:
-print(data[0:4, 0:10])
-{: .language-python}
-[[ 0.  0.  1.  3.  1.  2.  4.  7.  8.  3.]
- [ 0.  1.  2.  1.  2.  1.  3.  2.  2.  6.]
- [ 0.  1.  1.  3.  3.  2.  6.  2.  5.  9.]
- [ 0.  0.  2.  0.  4.  2.  2.  1.  6.  7.]]
-{: .output}
-The [slice]({{ page.root }}/reference/#slice) `0:4` means, "Start at index 0 and go up to, but not
-including, index 4."Again, the up-to-but-not-including takes a bit of getting used to, but the
-rule is that the difference between the upper and lower bounds is the number of values in the slice.
-We don't have to start slices at 0:
-print(data[5:10, 0:10])
-{: .language-python}
-[[ 0.  0.  1.  2.  2.  4.  2.  1.  6.  4.]
- [ 0.  0.  2.  2.  4.  2.  2.  5.  5.  8.]
- [ 0.  0.  1.  2.  3.  1.  2.  3.  5.  3.]
- [ 0.  0.  0.  3.  1.  5.  6.  5.  5.  8.]
- [ 0.  1.  1.  2.  1.  3.  5.  3.  5.  8.]]
-{: .output}
-We also don't have to include the upper and lower bound on the slice.  If we don't include the lower
-bound, Python uses 0 by default; if we don't include the upper, the slice runs to the end of the
-axis, and if we don't include either (i.e., if we use ':' on its own), the slice includes
-small = data[:3, 36:]
-print('small is:')
-{: .language-python}
-The above example selects rows 0 through 2 and columns 36 through to the end of the array.
-small is:
-[[ 2.  3.  0.  0.]
- [ 1.  1.  0.  1.]
- [ 2.  2.  1.  1.]]
-{: .output}
-## Analyzing data
-NumPy has several useful functions that take an array as input to perform operations on its values.
-If we want to find the average inflammation for all patients on
-all days, for example, we can ask NumPy to compute `data`'s mean value:
-{: .language-python}
-{: .output}
-`mean` is a [function]({{ page.root }}/reference/#function) that takes
-an array as an [argument]({{ page.root }}/reference/#argument).
-> ## Not All Functions Have Input
-> Generally, a function uses inputs to produce outputs.
-> However, some functions produce outputs without
-> needing any input. For example, checking the current time
-> doesn't require any input.
-> ~~~
-> import time
-> print(time.ctime())
-> ~~~
-> {: .language-python}
-> ~~~
-> Sat Mar 26 13:07:33 2016
-> ~~~
-> {: .output}
-> For functions that don't take in any arguments,
-> we still need parentheses (`()`)
-> to tell Python to go and do something for us.
-{: .callout}
-Let's use three other NumPy functions to get some descriptive values about the dataset.
-We'll also use multiple assignment,
-a convenient Python feature that will enable us to do this all in one line.
-maxval, minval, stdval = numpy.max(data), numpy.min(data), numpy.std(data)
-print('maximum inflammation:', maxval)
-print('minimum inflammation:', minval)
-print('standard deviation:', stdval)
-{: .language-python}
-Here we've assigned the return value from `numpy.max(data)` to the variable `maxval`, the value
-from `numpy.min(data)` to `minval`, and so on.
-maximum inflammation: 20.0
-minimum inflammation: 0.0
-standard deviation: 4.61383319712
-{: .output}
-> ## Mystery Functions in IPython
-> How did we know what functions NumPy has and how to use them?
-> If you are working in IPython or in a Jupyter Notebook, there is an easy way to find out.
-> If you type the name of something followed by a dot, then you can use tab completion
-> (e.g. type `numpy.` and then press tab)
-> to see a list of all functions and attributes that you can use. After selecting one, you
-> can also add a question mark (e.g. `numpy.cumprod?`), and IPython will return an
-> explanation of the method! This is the same as doing `help(numpy.cumprod)`.
-> Similarly, if you are using the "plain vanilla" Python interpreter, you can type `numpy.`
-> and press the <kbd>Tab</kbd> key twice for a listing of what is available. You can then use the
-> `help()` function to see an explanation of the function you're interested in,
-> for example: `help(numpy.cumprod)`.
-{: .callout}
-When analyzing data, though,
-we often want to look at variations in statistical values,
-such as the maximum inflammation per patient
-or the average inflammation per day.
-One way to do this is to create a new temporary array of the data we want,
-then ask it to do the calculation:
-patient_0 = data[0, :] # 0 on the first axis (rows), everything on the second (columns)
-print('maximum inflammation for patient 0:', numpy.max(patient_0))
-{: .language-python}
-maximum inflammation for patient 0: 18.0
-{: .output}
-Everything in a line of code following the '#' symbol is a
-[comment]({{ page.root }}/reference/#comment) that is ignored by Python.
-Comments allow programmers to leave explanatory notes for other
-programmers or their future selves.
-We don't actually need to store the row in a variable of its own.
-Instead, we can combine the selection and the function call:
-print('maximum inflammation for patient 2:', numpy.max(data[2, :]))
-{: .language-python}
-maximum inflammation for patient 2: 19.0
-{: .output}
-What if we need the maximum inflammation for each patient over all days (as in the
-next diagram on the left) or the average for each day (as in the
-diagram on the right)? As the diagram below shows, we want to perform the
-operation across an axis:
-![Per-patient maximum inflammation is computed row-wise across all columns using numpy.max(data, axis=1).
-Per-day average inflammation is computed column-wise across all rows using numpy.mean(data, axis=0).](../fig/python-operations-across-axes.png)
-To support this functionality,
-most array functions allow us to specify the axis we want to work on.
-If we ask for the average across axis 0 (rows in our 2D example),
-we get:
-print(numpy.mean(data, axis=0))
-{: .language-python}
-[  0.           0.45         1.11666667   1.75         2.43333333   3.15
-   3.8          3.88333333   5.23333333   5.51666667   5.95         5.9
-   8.35         7.73333333   8.36666667   9.5          9.58333333
-  10.63333333  11.56666667  12.35        13.25        11.96666667
-  11.03333333  10.16666667  10.           8.66666667   9.15         7.25
-   7.33333333   6.58333333   6.06666667   5.95         5.11666667   3.6
-   3.3          3.56666667   2.48333333   1.5          1.13333333
-   0.56666667]
-{: .output}
-As a quick check,
-we can ask this array what its shape is:
-print(numpy.mean(data, axis=0).shape)
-{: .language-python}
-{: .output}
-The expression `(40,)` tells us we have an N×1 vector,
-so this is the average inflammation per day for all patients.
-If we average across axis 1 (columns in our 2D example), we get:
-print(numpy.mean(data, axis=1))
-{: .language-python}
-[ 5.45   5.425  6.1    5.9    5.55   6.225  5.975  6.65   6.625  6.525
-  6.775  5.8    6.225  5.75   5.225  6.3    6.55   5.7    5.85   6.55
-  5.775  5.825  6.175  6.1    5.8    6.425  6.05   6.025  6.175  6.55
-  6.175  6.35   6.725  6.125  7.075  5.725  5.925  6.15   6.075  5.75
-  5.975  5.725  6.3    5.9    6.75   5.925  7.225  6.15   5.95   6.275  5.7
-  6.1    6.825  5.975  6.725  5.7    6.25   6.4    7.05   5.9  ]
-{: .output}
-which is the average inflammation per patient across all days.
-> ## Slicing Strings
-> A section of an array is called a [slice]({{ page.root }}/reference/#slice).
-> We can take slices of character strings as well:
-> ~~~
-> element = 'oxygen'
-> print('first three characters:', element[0:3])
-> print('last three characters:', element[3:6])
-> ~~~
-> {: .language-python}
-> ~~~
-> first three characters: oxy
-> last three characters: gen
-> ~~~
-> {: .output}
-> What is the value of `element[:4]`?
-> What about `element[4:]`?
-> Or `element[:]`?
-> > ## Solution
-> > ~~~
-> > oxyg
-> > en
-> > oxygen
-> > ~~~
-> > {: .output}
-> {: .solution}
-> What is `element[-1]`?
-> What is `element[-2]`?
-> > ## Solution
-> > ~~~
-> > n
-> > e
-> > ~~~
-> > {: .output}
-> {: .solution}
-> Given those answers,
-> explain what `element[1:-1]` does.
-> > ## Solution
-> > Creates a substring from index 1 up to (not including) the final index,
-> > effectively removing the first and last letters from 'oxygen'
-> {: .solution}
-> How can we rewrite the slice for getting the last three characters of `element`,
-> so that it works even if we assign a different string to `element`?
-> Test your solution with the following strings: `carpentry`, `clone`, `hi`. 
-> > ## Solution
-> > ~~~
-> > element = 'oxygen'
-> > print('last three characters:', element[-3:])
-> > element = 'carpentry'
-> > print('last three characters:', element[-3:])
-> > element = 'clone'
-> > print('last three characters:', element[-3:])
-> > element = 'hi'
-> > print('last three characters:', element[-3:])
-> > ~~~
-> > {: .language-python}
-> > ~~~
-> > last three characters: gen
-> > last three characters: try
-> > last three characters: one
-> > last three characters: hi
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-> ## Thin Slices
-> The expression `element[3:3]` produces an [empty string]({{ page.root }}/reference/#empty-string),
-> i.e., a string that contains no characters.
-> If `data` holds our array of patient data,
-> what does `data[3:3, 4:4]` produce?
-> What about `data[3:3, :]`?
-> > ## Solution
-> > ~~~
-> > array([], shape=(0, 0), dtype=float64)
-> > array([], shape=(0, 40), dtype=float64)
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-> ## Stacking Arrays
-> Arrays can be concatenated and stacked on top of one another,
-> using NumPy's `vstack` and `hstack` functions for vertical and horizontal stacking, respectively.
-> ~~~
-> import numpy
-> A = numpy.array([[1,2,3], [4,5,6], [7, 8, 9]])
-> print('A = ')
-> print(A)
-> B = numpy.hstack([A, A])
-> print('B = ')
-> print(B)
-> C = numpy.vstack([A, A])
-> print('C = ')
-> print(C)
-> ~~~
-> {: .language-python}
-> ~~~
-> A =
-> [[1 2 3]
->  [4 5 6]
->  [7 8 9]]
-> B =
-> [[1 2 3 1 2 3]
->  [4 5 6 4 5 6]
->  [7 8 9 7 8 9]]
-> C =
-> [[1 2 3]
->  [4 5 6]
->  [7 8 9]
->  [1 2 3]
->  [4 5 6]
->  [7 8 9]]
-> ~~~
-> {: .output}
-> Write some additional code that slices the first and last columns of `A`,
-> and stacks them into a 3x2 array.
-> Make sure to `print` the results to verify your solution.
-> > ## Solution
-> >
-> > A 'gotcha' with array indexing is that singleton dimensions
-> > are dropped by default. That means `A[:, 0]` is a one dimensional
-> > array, which won't stack as desired. To preserve singleton dimensions,
-> > the index itself can be a slice or array. For example, `A[:, :1]` returns
-> > a two dimensional array with one singleton dimension (i.e. a column
-> > vector).
-> >
-> > ~~~
-> > D = numpy.hstack((A[:, :1], A[:, -1:]))
-> > print('D = ')
-> > print(D)
-> > ~~~
-> > {: .language-python}
-> >
-> > ~~~
-> > D =
-> > [[1 3]
-> >  [4 6]
-> >  [7 9]]
-> > ~~~
-> > {: .output}
-> {: .solution}
-> > ## Solution
-> >
-> > An alternative way to achieve the same result is to use Numpy's
-> > delete function to remove the second column of A.
-> >
-> > ~~~
-> > D = numpy.delete(A, 1, 1)
-> > print('D = ')
-> > print(D)
-> > ~~~
-> > {: .language-python}
-> >
-> > ~~~
-> > D =
-> > [[1 3]
-> >  [4 6]
-> >  [7 9]]
-> > ~~~
-> > {: .output}
-> {: .solution}
-{: .challenge}
-> ## Change In Inflammation
-> The patient data is _longitudinal_ in the sense that each row represents a
-> series of observations relating to one individual.  This means that
-> the change in inflammation over time is a meaningful concept.
-> Let's find out how to calculate changes in the data contained in an array
-> with NumPy.
-> The `numpy.diff()` function takes an array and returns the differences
-> between two successive values. Let's use it to examine the changes
-> each day across the first week of patient 3 from our inflammation dataset.
-> ~~~
-> patient3_week1 = data[3, :7]
-> print(patient3_week1)
-> ~~~
-> {: .language-python}
-> ~~~
->  [0. 0. 2. 0. 4. 2. 2.]
-> ~~~
-> {: .output}
-> Calling `numpy.diff(patient3_week1)` would do the following calculations
-> ~~~
-> [ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]
-> ~~~
-> {: .language-python}
-> and return the 6 difference values in a new array.
-> ~~~
-> numpy.diff(patient3_week1)
-> ~~~
-> {: .language-python}
-> ~~~
-> array([ 0.,  2., -2.,  4., -2.,  0.])
-> ~~~
-> {: .output}
-> Note that the array of differences is shorter by one element (length 6).
-> When calling `numpy.diff` with a multi-dimensional array, an `axis` argument may
-> be passed to the function to specify which axis to process. When applying 
-> `numpy.diff` to our 2D inflammation array `data`, which axis would we specify?
-> > ## Solution
-> > Since the row axis (0) is patients, it does not make sense to get the
-> > difference between two arbitrary patients. The column axis (1) is in
-> > days, so the difference is the change in inflammation -- a meaningful
-> > concept.
-> >
-> > ~~~
-> > numpy.diff(data, axis=1)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-> If the shape of an individual data file is `(60, 40)` (60 rows and 40
-> columns), what would the shape of the array be after you run the `diff()`
-> function and why?
-> > ## Solution
-> > The shape will be `(60, 39)` because there is one fewer difference between
-> > columns than there are columns in the data.
-> {: .solution}
-> How would you find the largest change in inflammation for each patient? Does
-> it matter if the change in inflammation is an increase or a decrease?
-> > ## Solution
-> > By using the `numpy.max()` function after you apply the `numpy.diff()`
-> > function, you will get the largest difference between days.
-> >
-> > ~~~
-> > numpy.max(numpy.diff(data, axis=1), axis=1)
-> > ~~~
-> > {: .language-python}
-> >
-> > ~~~
-> > array([  7.,  12.,  11.,  10.,  11.,  13.,  10.,   8.,  10.,  10.,   7.,
-> >          7.,  13.,   7.,  10.,  10.,   8.,  10.,   9.,  10.,  13.,   7.,
-> >         12.,   9.,  12.,  11.,  10.,  10.,   7.,  10.,  11.,  10.,   8.,
-> >         11.,  12.,  10.,   9.,  10.,  13.,  10.,   7.,   7.,  10.,  13.,
-> >         12.,   8.,   8.,  10.,  10.,   9.,   8.,  13.,  10.,   7.,  10.,
-> >          8.,  12.,  10.,   7.,  12.])
-> > ~~~
-> > {: .language-python}
-> >
-> > If inflammation values *decrease* along an axis, then the difference from
-> > one element to the next will be negative. If
-> > you are interested in the **magnitude** of the change and not the
-> > direction, the `numpy.absolute()` function will provide that.
-> >
-> > Notice the difference if you get the largest _absolute_ difference
-> > between readings.
-> >
-> > ~~~
-> > numpy.max(numpy.absolute(numpy.diff(data, axis=1)), axis=1)
-> > ~~~
-> > {: .language-python}
-> >
-> > ~~~
-> > array([ 12.,  14.,  11.,  13.,  11.,  13.,  10.,  12.,  10.,  10.,  10.,
-> >         12.,  13.,  10.,  11.,  10.,  12.,  13.,   9.,  10.,  13.,   9.,
-> >         12.,   9.,  12.,  11.,  10.,  13.,   9.,  13.,  11.,  11.,   8.,
-> >         11.,  12.,  13.,   9.,  10.,  13.,  11.,  11.,  13.,  11.,  13.,
-> >         13.,  10.,   9.,  10.,  10.,   9.,   9.,  13.,  10.,   9.,  10.,
-> >         11.,  13.,  10.,  10.,  12.])
-> > ~~~
-> > {: .language-python}
-> >
-> {: .solution}
-{: .challenge}
-{% include links.md %}
-title: Visualizing Tabular Data
-teaching: 30
-exercises: 20
-- "How can I visualize tabular data in Python?"
-- "How can I group several plots together?"
-- "Plot simple graphs from data."
-- "Group several graphs in a single figure."
-- "Use the `pyplot` module from the `matplotlib` library for creating simple visualizations."
-## Visualizing data
-The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers," and
-the best way to develop insight is often to visualize data.  Visualization deserves an entire
-lecture of its own, but we can explore a few features of Python's `matplotlib` library here.  While
-there is no official plotting library, `matplotlib` is the _de facto_ standard.  First, we will
-import the `pyplot` module from `matplotlib` and use two of its functions to create and display a
-heat map of our data:
-import matplotlib.pyplot
-image = matplotlib.pyplot.imshow(data)
-{: .language-python}
-![Heatmap of the Data](../fig/inflammation-01-imshow.svg)
-Blue pixels in this heat map represent low values, while yellow pixels represent high values.  As we
-can see, inflammation rises and falls over a 40-day period.  Let's take a look at the average inflammation over time:
-ave_inflammation = numpy.mean(data, axis=0)
-ave_plot = matplotlib.pyplot.plot(ave_inflammation)
-{: .language-python}
-![Average Inflammation Over Time](../fig/inflammation-01-average.svg)
-Here, we have put the average inflammation per day across all patients in the variable `ave_inflammation`, then
-asked `matplotlib.pyplot` to create and display a line graph of those values.  The result is a
-roughly linear rise and fall, which is suspicious: we might instead expect a sharper rise and slower
-fall.  Let's have a look at two other statistics:
-max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
-{: .language-python}
-![Maximum Value Along The First Axis](../fig/inflammation-01-maximum.svg)
-min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
-{: .language-python}
-![Minimum Value Along The First Axis](../fig/inflammation-01-minimum.svg)
-The maximum value rises and falls smoothly, while the minimum seems to be a step function.  Neither
-trend seems particularly likely, so either there's a mistake in our calculations or something is
-wrong with our data.  This insight would have been difficult to reach by examining the numbers
-themselves without visualization tools.
-### Grouping plots
-You can group similar plots in a single figure using subplots.
-This script below uses a number of new commands. The function `matplotlib.pyplot.figure()`
-creates a space into which we will place all of our plots. The parameter `figsize`
-tells Python how big to make this space. Each subplot is placed into the figure using
-its `add_subplot` [method]({{ page.root }}/reference/#method). The `add_subplot` method takes 3
-parameters. The first denotes how many total rows of subplots there are, the second parameter
-refers to the total number of subplot columns, and the final parameter denotes which subplot
-your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a
-different variable (`axes1`, `axes2`, `axes3`). Once a subplot is created, the axes can
-be titled using the `set_xlabel()` command (or `set_ylabel()`).
-Here are our three plots side by side:
-import numpy
-import matplotlib.pyplot
-data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-axes1 = fig.add_subplot(1, 3, 1)
-axes2 = fig.add_subplot(1, 3, 2)
-axes3 = fig.add_subplot(1, 3, 3)
-axes1.plot(numpy.mean(data, axis=0))
-axes2.plot(numpy.max(data, axis=0))
-axes3.plot(numpy.min(data, axis=0))
-{: .language-python}
-![The Previous Plots as Subplots](../fig/inflammation-01-group-plot.svg)
-The [call]({{ page.root }}/reference/#function-call) to `loadtxt` reads our data,
-and the rest of the program tells the plotting library
-how large we want the figure to be,
-that we're creating three subplots,
-what to draw for each one,
-and that we want a tight layout.
-(If we leave out that call to `fig.tight_layout()`,
-the graphs will actually be squeezed together more closely.)
-> ## Plot Scaling
-> Why do all of our plots stop just short of the upper end of our graph?
-> > ## Solution
-> > Because matplotlib normally sets x and y axes limits to the min and max of our data
-> > (depending on data range)
-> {: .solution}
-> If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes',
-> for example:
-> ~~~
-> axes3.set_ylim(0,6)
-> ~~~
-> {: .language-python}
-> Update your plotting code to automatically set a more appropriate scale.
-> (Hint: you can make use of the `max` and `min` methods to help.)
-> > ## Solution
-> > ~~~
-> > # One method
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(data, axis=0))
-> > axes3.set_ylim(0,6)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-> > ## Solution
-> > ~~~
-> > # A more automated approach
-> > min_data = numpy.min(data, axis=0)
-> > axes3.set_ylabel('min')
-> > axes3.plot(min_data)
-> > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Drawing Straight Lines
-> In the center and right subplots above, we expect all lines to look like step functions because
-> non-integer value are not realistic for the minimum and maximum values. However, you can see
-> that the lines are not always vertical or horizontal, and in particular the step function
-> in the subplot on the right looks slanted. Why is this?
-> > ## Solution
-> > Because matplotlib interpolates (draws a straight line) between the points.
-> > One way to do avoid this is to use the Matplotlib `drawstyle` option:
-> >
-> > ~~~
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-> >
-> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-> >
-> > axes1 = fig.add_subplot(1, 3, 1)
-> > axes2 = fig.add_subplot(1, 3, 2)
-> > axes3 = fig.add_subplot(1, 3, 3)
-> >
-> > axes1.set_ylabel('average')
-> > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid')
-> >
-> > axes2.set_ylabel('max')
-> > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid')
-> >
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid')
-> >
-> > fig.tight_layout()
-> >
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> ![Plot with step lines](../fig/inflammation-01-line-styles.svg)
-> {: .solution}
-{: .challenge}
-> ## Make Your Own Plot
-> Create a plot showing the standard deviation (`numpy.std`)
-> of the inflammation data for each day across all patients.
-> > ## Solution
-> > ~~~
-> > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0))
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Moving Plots Around
-> Modify the program to display the three plots on top of one another
-> instead of side by side.
-> > ## Solution
-> > ~~~
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-> >
-> > # change figsize (swap width and height)
-> > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0))
-> >
-> > # change add_subplot (swap first two parameters)
-> > axes1 = fig.add_subplot(3, 1, 1)
-> > axes2 = fig.add_subplot(3, 1, 2)
-> > axes3 = fig.add_subplot(3, 1, 3)
-> >
-> > axes1.set_ylabel('average')
-> > axes1.plot(numpy.mean(data, axis=0))
-> >
-> > axes2.set_ylabel('max')
-> > axes2.plot(numpy.max(data, axis=0))
-> >
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(data, axis=0))
-> >
-> > fig.tight_layout()
-> >
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
+title: Plotting Data
+teaching: 20
+exercises: 10
+- "How can I create publication-ready figures with Python?"
+- "plot data in a Matplotlib figure."
+- "create multi-panelled figures."
+- "export figures in a variety of image formats."
+- "use interactive features of Jupyter to make it easier to fine-tune a plot."
+- "Matplotlib is a powerful plotting library for Python."
+- "It can also be annoyingly fiddly. Jupyter can help with this."
+## plan
+- Renato currently scheduled to lead this session
+- Matplotlib
+  - The multiple matplotlib interfaces: `pyplot` vs `OO API` vs obsolete `pylab`. See [this](https://matplotlib.org/api/index.html#usage-patterns)
+  - Concepts:
+    - [Artists & containers](https://matplotlib.org/tutorials/intermediate/artists.html#artist-tutorial)
+        - Figure
+        - Axes
+        - Axis & Ticks
+        - ...
+    - Sub-plots
+    - Labels
+    - Annotations
+    - Saving formats
+    - Jupyter integration(?)
+    - 3D plotting (?)
+    - Interactive plots (?)
+- plotting from pandas
+- altair? & plotnine?
+{% include links.md %}
+title: Parsing Command Line Arguments
+teaching: 20
+exercises: 10
+- "How can I access arguments passed to my Python script at runtime?"
+- "How can I create a sophisticated command line interface for my script?"
+- "How can I provide the user with more information about how to run my code?"
+- "access command line arguments with `sys.argv`."
+- "parse and use arguments and options with `argparse`."
+- "create a comprehensive usage statement for their script."
+- "Positional command line arguments can be accessed from inside a script through the `sys.argv` object."
+- "The `argparse` module allows us to create extensive and powerful command line interfaces for our scripts."
+- "`argparse` also constructs a standardised usage statement according to the parser's configuration."
+## plan
+- Toby currently scheduled to lead this session
+- `sys.argv`
+- `argparse`
+  - positonal arguments
+  - options
+  - capturing multiple items in a single argument
+  - usage statements
+  - help text
+- [`docopt`](http://docopt.org/) (?)
+- [`click`](https://click.palletsprojects.com/en/7.x/) (?)
+- [comparison of argparse, docopt and click](https://realpython.com/comparing-python-command-line-parsing-libraries-argparse-docopt-click/)
+{% include links.md %}
-title: Repeating Actions with Loops
-teaching: 30
-exercises: 0
-- "How can I do the same operations on many different values?"
-- "Explain what a `for` loop does."
-- "Correctly write `for` loops to repeat simple calculations."
-- "Trace changes to a loop variable as the loop runs."
-- "Trace changes to other variables as they are updated by a `for` loop."
-- "Use `for variable in sequence` to process the elements of a sequence one at a time."
-- "The body of a `for` loop must be indented."
-- "Use `len(thing)` to determine the length of something that contains other values."
-In the last episode, we wrote Python code that plots values of interest from our first
-inflammation dataset (`inflammation-01.csv`), which revealed some suspicious features in it.
-![Analysis of inflammation-01.csv](../fig/03-loop_2_0.png)
-We have a dozen data sets right now, though, and more on the way.
-We want to create plots for all of our data sets with a single statement.
-To do that, we'll have to teach the computer how to repeat things.
-An example task that we might want to repeat is printing each character in a
-word on a line of its own.
-word = 'lead'
-{: .language-python}
-In Python, a string is basically an ordered collection of characters, and every
-character has a unique number associated with it -- its index. This means that
-we can access characters in a string using their indices.
-For example, we can get the first character of the word `'lead'`, by using
-`word[0]`. One way to print each character is to use four `print` statements:
-{: .language-python}
-{: .output}
-This is a bad approach for three reasons:
-1.  **Not scalable**. Imagine you need to print characters of a string that is hundreds
-    of letters long.  It might be easier to type them in manually.
-2.  **Difficult to maintain**. If we want to decorate each printed character with an
-    asterisk or any other character, we would have to change four lines of code. While
-    this might not be a problem for short strings, it would definitely be a problem for
-    longer ones.
-3.  **Fragile**. If we use it with a word that has more characters than what we initially
-    envisioned, it will only display part of the word's characters. A shorter string, on
-    the other hand, will cause an error because it will be trying to display part of the
-    string that doesn't exist.
-word = 'tin'
-{: .language-python}
-{: .output}
-IndexError                                Traceback (most recent call last)
-<ipython-input-3-7974b6cdaf14> in <module>()
-      3 print(word[1])
-      4 print(word[2])
-----> 5 print(word[3])
-IndexError: string index out of range
-{: .error}
-Here's a better approach:
-word = 'lead'
-for char in word:
-    print(char)
-{: .language-python}
-{: .output}
-This is shorter --- certainly shorter than something that prints every character in a
-hundred-letter string --- and more robust as well:
-word = 'oxygen'
-for char in word:
-    print(char)
-{: .language-python}
-{: .output}
-The improved version uses a [for loop]({{ page.root }}/reference/#for-loop)
-to repeat an operation --- in this case, printing --- once for each thing in a sequence.
-The general form of a loop is:
-for variable in collection:
-    # do things using variable, such as print
-{: .language-python}
-Using the oxygen example above, the loop might look like this:
-where each character (`char`) in the variable `word` is looped through and printed one character
-after another. The numbers in the diagram denote which loop cycle the character was printed in (1
-being the first loop, and 6 being the final loop).
-We can call the [loop variable]({{ page.root }}/reference/#loop-variable) anything we like, but
-there must be a colon at the end of the line starting the loop, and we must indent anything we
-want to run inside the loop. Unlike many other languages, there is no command to signify the end
-of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.
-> ## What's in a name?
-> In the example above, the loop variable was given the name `char` as a mnemonic;
-> it is short for 'character'.  We can choose any name we want for variables.
-> We can even call our loop variable `banana`, as long as we use this name consistently:
-> ~~~
-> word = 'oxygen'
-> for banana in word:
->     print(banana)
-> ~~~
-> {: .language-python}
-> ~~~
-> o
-> x
-> y
-> g
-> e
-> n
-> ~~~
-> {: .output}
-> It is a good idea to choose variable names that are meaningful, otherwise it would be more
-> difficult to understand what the loop is doing.
-{: .callout}
-Here's another loop that repeatedly updates a variable:
-length = 0
-for vowel in 'aeiou':
-    length = length + 1
-print('There are', length, 'vowels')
-{: .language-python}
-There are 5 vowels
-{: .output}
-It's worth tracing the execution of this little program step by step.
-Since there are five characters in `'aeiou'`,
-the statement on line 3 will be executed five times.
-The first time around,
-`length` is zero (the value assigned to it on line 1)
-and `vowel` is `'a'`.
-The statement adds 1 to the old value of `length`,
-producing 1,
-and updates `length` to refer to that new value.
-The next time around,
-`vowel` is `'e'` and `length` is 1,
-so `length` is updated to be 2.
-After three more updates,
-`length` is 5;
-since there is nothing left in `'aeiou'` for Python to process,
-the loop finishes
-and the `print` statement on line 4 tells us our final answer.
-Note that a loop variable is a variable that's being used to record progress in a loop.
-It still exists after the loop is over,
-and we can re-use variables previously defined as loop variables as well:
-letter = 'z'
-for letter in 'abc':
-    print(letter)
-print('after the loop, letter is', letter)
-{: .language-python}
-after the loop, letter is c
-{: .output}
-Note also that finding the length of a string is such a common operation
-that Python actually has a built-in function to do it called `len`:
-{: .language-python}
-{: .output}
-`len` is much faster than any function we could write ourselves,
-and much easier to read than a two-line loop;
-it will also give us the length of many other things that we haven't met yet,
-so we should always use it when we can.
-> ## From 1 to N
-> Python has a built-in function called `range` that generates a sequence of numbers. `range` can
-> accept 1, 2, or 3 parameters.
-> * If one parameter is given, `range` generates a sequence of that length,
->   starting at zero and incrementing by 1.
->   For example, `range(3)` produces the numbers `0, 1, 2`.
-> * If two parameters are given, `range` starts at
->   the first and ends just before the second, incrementing by one.
->   For example, `range(2, 5)` produces `2, 3, 4`.
-> * If `range` is given 3 parameters,
->   it starts at the first one, ends just before the second one, and increments by the third one.
->   For example, `range(3, 10, 2)` produces `3, 5, 7, 9`.
-> Using `range`,
-> write a loop that uses `range` to print the first 3 natural numbers:
-> ~~~
-> 1
-> 2
-> 3
-> ~~~
-> {: .language-python}
-> > ## Solution
-> > ~~~
-> > for number in range(1, 4):
-> >     print(number)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Understanding the loops
-> Given the following loop:
-> ~~~
-> word = 'oxygen'
-> for char in word:
->     print(char)
-> ~~~
-> {: .language-python}
-> How many times is the body of the loop executed?
-> * 3 times
-> * 4 times
-> * 5 times
-> * 6 times
-> > ## Solution
-> >
-> > The body of the loop is executed 6 times.
-> >
-> {: .solution}
-{: .challenge}
-> ## Computing Powers With Loops
-> Exponentiation is built into Python:
-> ~~~
-> print(5 ** 3)
-> ~~~
-> {: .language-python}
-> ~~~
-> 125
-> ~~~
-> {: .output}
-> Write a loop that calculates the same result as `5 ** 3` using
-> multiplication (and without exponentiation).
-> > ## Solution
-> > ~~~
-> > result = 1
-> > for number in range(0, 3):
-> >     result = result * 5
-> > print(result)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Reverse a String
-> Knowing that two strings can be concatenated using the `+` operator,
-> write a loop that takes a string
-> and produces a new string with the characters in reverse order,
-> so `'Newton'` becomes `'notweN'`.
-> > ## Solution
-> > ~~~
-> > newstring = ''
-> > oldstring = 'Newton'
-> > for char in oldstring:
-> >     newstring = char + newstring
-> > print(newstring)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Computing the Value of a Polynomial
-> The built-in function `enumerate` takes a sequence (e.g. a list) and generates a
-> new sequence of the same length. Each element of the new sequence is a pair composed of the index
-> (0, 1, 2,...) and the value from the original sequence:
-> ~~~
-> for idx, val in enumerate(a_list):
->     # Do something using idx and val
-> ~~~
-> {: .language-python}
-> The code above loops through `a_list`, assigning the index to `idx` and the value to `val`.
-> Suppose you have encoded a polynomial as a list of coefficients in
-> the following way: the first element is the constant term, the
-> second element is the coefficient of the linear term, the third is the
-> coefficient of the quadratic term, etc.
-> ~~~
-> x = 5
-> coefs = [2, 4, 3]
-> y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
-> print(y)
-> ~~~
-> {: .language-python}
-> ~~~
-> 97
-> ~~~
-> {: .output}
-> Write a loop using `enumerate(coefs)` which computes the value `y` of any
-> polynomial, given `x` and `coefs`.
-> > ## Solution
-> > ~~~
-> > y = 0
-> > for idx, coef in enumerate(coefs):
-> >     y = y + coef * x**idx
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
-title: Storing Multiple Values in Lists
-teaching: 30
-exercises: 15
-- "How can I store many values together?"
-- "Explain what a list is."
-- "Create and index lists of simple values."
-- "Change the values of individual elements"
-- "Append values to an existing list"
-- "Reorder and slice list elements"
-- "Create and manipulate nested lists"
-- "`[value1, value2, value3, ...]` creates a list."
-- "Lists can contain any Python object, including lists (i.e., list of lists)."
-- "Lists are indexed and sliced with square brackets (e.g., list[0] and
-list[2:9]), in the same way as strings and arrays."
-- "Lists are mutable (i.e., their values can be changed in place)."
-- "Strings are immutable (i.e., the characters in them cannot be changed)."
-Similar to a string that can contain many characters, a list is a container that can store many values.
-Unlike NumPy arrays,
-lists are built into the language (so we don't have to load a library
-to use them).
-We create a list by putting values inside square brackets and separating the values with commas:
-odds = [1, 3, 5, 7]
-print('odds are:', odds)
-{: .language-python}
-odds are: [1, 3, 5, 7]
-{: .output}
-We can access elements of a list using indices -- numbered positions of elements in the list.
-These positions are numbered starting at 0, so the first element has an index of 0.
-print('first element:', odds[0])
-print('last element:', odds[3])
-print('"-1" element:', odds[-1])
-{: .language-python}
-first element: 1
-last element: 7
-"-1" element: 7
-{: .output}
-Yes, we can use negative numbers as indices in Python. When we do so, the index `-1` gives us the
-last element in the list, `-2` the second to last, and so on.
-Because of this, `odds[3]` and `odds[-1]` point to the same element here.
-If we loop over a list, the loop variable is assigned to its elements one at a time:
-for number in odds:
-    print(number)
-{: .language-python}
-{: .output}
-There is one important difference between lists and strings:
-we can change the values in a list,
-but we cannot change individual characters in a string.
-For example:
-names = ['Curie', 'Darwing', 'Turing']  # typo in Darwin's name
-print('names is originally:', names)
-names[1] = 'Darwin'  # correct the name
-print('final value of names:', names)
-{: .language-python}
-names is originally: ['Curie', 'Darwing', 'Turing']
-final value of names: ['Curie', 'Darwin', 'Turing']
-{: .output}
-works, but:
-name = 'Darwin'
-name[0] = 'd'
-{: .language-python}
-TypeError                                 Traceback (most recent call last)
-<ipython-input-8-220df48aeb2e> in <module>()
-      1 name = 'Darwin'
-----> 2 name[0] = 'd'
-TypeError: 'str' object does not support item assignment
-{: .error}
-does not.
-> ## Ch-Ch-Ch-Ch-Changes
-> Data which can be modified in place is called [mutable]({{ page.root }}/reference/#mutable),
-> while data which cannot be modified is called [immutable]({{ page.root }}/reference/#immutable).
-> Strings and numbers are immutable. This does not mean that variables with string or number values
-> are constants, but when we want to change the value of a string or number variable, we can only
-> replace the old value with a completely new value.
-> Lists and arrays, on the other hand, are mutable: we can modify them after they have been
-> created. We can change individual elements, append new elements, or reorder the whole list. For
-> some operations, like sorting, we can choose whether to use a function that modifies the data
-> in-place or a function that returns a modified copy and leaves the original unchanged.
-> Be careful when modifying data in-place. If two variables refer to the same list, and you modify
-> the list value, it will change for both variables!
-> ~~~
-> salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
-> my_salsa = salsa        # <-- my_salsa and salsa point to the *same* list data in memory
-> salsa[0] = 'hot peppers'
-> print('Ingredients in my salsa:', my_salsa)
-> ~~~
-> {: .language-python}
-> ~~~
-> Ingredients in my salsa: ['hot peppers', 'onions', 'cilantro', 'tomatoes']
-> ~~~
-> {: .output}
-> If you want variables with mutable values to be independent, you
-> must make a copy of the value when you assign it.
-> ~~~
-> salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
-> my_salsa = list(salsa)        # <-- makes a *copy* of the list
-> salsa[0] = 'hot peppers'
-> print('Ingredients in my salsa:', my_salsa)
-> ~~~
-> {: .language-python}
-> ~~~
-> Ingredients in my salsa: ['peppers', 'onions', 'cilantro', 'tomatoes']
-> ~~~
-> {: .output}
-> Because of pitfalls like this, code which modifies data in place can be more difficult to
-> understand. However, it is often far more efficient to modify a large data structure in place
-> than to create a modified copy for every small change. You should consider both of these aspects
-> when writing your code.
-{: .callout}
-> ## Nested Lists
-> Since a list can contain any Python variables, it can even contain other lists.
-> For example, we could represent the products in the shelves of a small grocery shop:
-> ~~~
-> x = [['pepper', 'zucchini', 'onion'],
->      ['cabbage', 'lettuce', 'garlic'],
->      ['apple', 'pear', 'banana']]
-> ~~~
-> {: .language-python}
-> Here is a visual example of how indexing a list of lists `x` works:
-> [![x is represented as a pepper shaker containing several packets of pepper. [x[0]] is represented
-> as a pepper shaker containing a single packet of pepper. x[0] is represented as a single packet of
-> pepper. x[0][0] is represented as single grain of pepper.  Adapted 
-> from @hadleywickham.](../fig/indexing_lists_python.png)][hadleywickham-tweet]
-> Using the previously declared list `x`, these would be the results of the
-> index operations shown in the image:
-> ~~~
-> print([x[0]])
-> ~~~
-> {: .language-python}
-> ~~~
-> [['pepper', 'zucchini', 'onion']]
-> ~~~
-> {: .output}
-> ~~~
-> print(x[0])
-> ~~~
-> {: .language-python}
-> ~~~
-> ['pepper', 'zucchini', 'onion']
-> ~~~
-> {: .output}
-> ~~~
-> print(x[0][0])
-> ~~~
-> {: .language-python}
-> ~~~
-> 'pepper'
-> ~~~
-> {: .output}
-> Thanks to [Hadley Wickham][hadleywickham-tweet]
-> for the image above.
-{: .callout}
-> ## Heterogeneous Lists
-> Lists in Python can contain elements of different types. Example:
-> ~~~
-> sample_ages = [10, 12.5, 'Unknown']
-> ~~~
-> {: .language-python}
-{: .callout}
-There are many ways to change the contents of lists besides assigning new values to
-individual elements:
-print('odds after adding a value:', odds)
-{: .language-python}
-odds after adding a value: [1, 3, 5, 7, 11]
-{: .output}
-removed_element = odds.pop(0)
-print('odds after removing the first element:', odds)
-print('removed_element:', removed_element)
-{: .language-python}
-odds after removing the first element: [3, 5, 7, 11]
-removed_element: 1
-{: .output}
-print('odds after reversing:', odds)
-{: .language-python}
-odds after reversing: [11, 7, 5, 3]
-{: .output}
-While modifying in place, it is useful to remember that Python treats lists in a slightly
-counter-intuitive way.
-As we saw earlier, when we modified the `salsa` list item in-place, if we make a list, (attempt to) copy it and then modify this list, we can cause all sorts of trouble. This also applies to modifying the list using the above functions:
-odds = [1, 3, 5, 7]
-primes = odds
-print('primes:', primes)
-print('odds:', odds)
-{: .language-python}
-primes: [1, 3, 5, 7, 2]
-odds: [1, 3, 5, 7, 2]
-{: .output}
-This is because Python stores a list in memory, and then can use multiple names to refer to the
-same list. If all we want to do is copy a (simple) list, we can again use the `list` function, so we do
-not modify a list we did not mean to:
-odds = [1, 3, 5, 7]
-primes = list(odds)
-print('primes:', primes)
-print('odds:', odds)
-{: .language-python}
-primes: [1, 3, 5, 7, 2]
-odds: [1, 3, 5, 7]
-{: .output}
-> ## Turn a String Into a List
-> Use a for-loop to convert the string "hello" into a list of letters:
-> ~~~
-> ["h", "e", "l", "l", "o"]
-> ~~~
-> {: .language-python}
-> Hint: You can create an empty list like this:
-> ~~~
-> my_list = []
-> ~~~
-> {: .language-python}
-> > ## Solution
-> > ~~~
-> > my_list = []
-> > for char in "hello":
-> >     my_list.append(char)
-> > print(my_list)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-Subsets of lists and strings can be accessed by specifying ranges of values in brackets,
-similar to how we accessed ranges of positions in a NumPy array.
-This is commonly referred to as "slicing" the list/string.
-binomial_name = "Drosophila melanogaster"
-group = binomial_name[0:10]
-print("group:", group)
-species = binomial_name[11:23]
-print("species:", species)
-chromosomes = ["X", "Y", "2", "3", "4"]
-autosomes = chromosomes[2:5]
-print("autosomes:", autosomes)
-last = chromosomes[-1]
-print("last:", last)
-{: .language-python}
-group: Drosophila
-species: melanogaster
-autosomes: ["2", "3", "4"]
-last: 4
-{: .output}
-> ## Slicing From the End
-> Use slicing to access only the last four characters of a string or entries of a list.
-> ~~~
-> string_for_slicing = "Observation date: 02-Feb-2013"
-> list_for_slicing = [["fluorine", "F"],
->                     ["chlorine", "Cl"],
->                     ["bromine", "Br"],
->                     ["iodine", "I"],
->                     ["astatine", "At"]]
-> ~~~
-> {: .language-python}
-> ~~~
-> "2013"
-> [["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
-> ~~~
-> {: .output}
-> Would your solution work regardless of whether you knew beforehand
-> the length of the string or list
-> (e.g. if you wanted to apply the solution to a set of lists of different lengths)?
-> If not, try to change your approach to make it more robust.
-> Hint: Remember that indices can be negative as well as positive
-> > ## Solution
-> > Use negative indices to count elements from the end of a container (such as list or string):
-> >
-> > ~~~
-> > string_for_slicing[-4:]
-> > list_for_slicing[-4:]
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Non-Continuous Slices
-> So far we've seen how to use slicing to take single blocks
-> of successive entries from a sequence.
-> But what if we want to take a subset of entries
-> that aren't next to each other in the sequence?
-> You can achieve this by providing a third argument
-> to the range within the brackets, called the _step size_.
-> The example below shows how you can take every third entry in a list:
-> ~~~
-> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
-> subset = primes[0:12:3]
-> print("subset", subset)
-> ~~~
-> {: .language-python}
-> ~~~
-> subset [2, 7, 17, 29]
-> ~~~
-> {: .output}
-> Notice that the slice taken begins with the first entry in the range,
-> followed by entries taken at equally-spaced intervals (the steps) thereafter.
-> If you wanted to begin the subset with the third entry,
-> you would need to specify that as the starting point of the sliced range:
-> ~~~
-> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
-> subset = primes[2:12:3]
-> print("subset", subset)
-> ~~~
-> {: .language-python}
-> ~~~
-> subset [5, 13, 23, 37]
-> ~~~
-> {: .output}
-> Use the step size argument to create a new string
-> that contains only every other character in the string
-> "In an octopus's garden in the shade"
-> ~~~
-> beatles = "In an octopus's garden in the shade"
-> ~~~
-> {: .language-python}
-> ~~~
-> I notpssgre ntesae
-> ~~~
-> {: .output}
-> > ## Solution
-> > To obtain every other character you need to provide a slice with the step
-> > size of 2:
-> >
-> > ~~~
-> > beatles[0:35:2]
-> > ~~~
-> > {: .language-python}
-> >
-> > You can also leave out the beginning and end of the slice to take the whole string
-> > and provide only the step argument to go every second
-> > element:
-> >
-> > ~~~
-> > beatles[::2]
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-If you want to take a slice from the beginning of a sequence, you can omit the first index in the
-date = "Monday 4 January 2016"
-day = date[0:6]
-print("Using 0 to begin range:", day)
-day = date[:6]
-print("Omitting beginning index:", day)
-{: .language-python}
-Using 0 to begin range: Monday
-Omitting beginning index: Monday
-{: .output}
-And similarly, you can omit the ending index in the range to take a slice to the very end of the
-months = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]
-sond = months[8:12]
-print("With known last position:", sond)
-sond = months[8:len(months)]
-print("Using len() to get last entry:", sond)
-sond = months[8:]
-print("Omitting ending index:", sond)
-{: .language-python}
-With known last position: ["sep", "oct", "nov", "dec"]
-Using len() to get last entry: ["sep", "oct", "nov", "dec"]
-Omitting ending index: ["sep", "oct", "nov", "dec"]
-{: .output}
-> ## Overloading
-> `+` usually means addition, but when used on strings or lists, it means "concatenate".
-> Given that, what do you think the multiplication operator `*` does on lists?
-> In particular, what will be the output of the following code?
-> ~~~
-> counts = [2, 4, 6, 8, 10]
-> repeats = counts * 2
-> print(repeats)
-> ~~~
-> {: .language-python}
-> 1.  `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]`
-> 2.  `[4, 8, 12, 16, 20]`
-> 3.  `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]`
-> 4.  `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]`
-> The technical term for this is *operator overloading*:
-> a single operator, like `+` or `*`,
-> can do different things depending on what it's applied to.
-> > ## Solution
-> >
-> > The multiplication operator `*` used on a list replicates elements of the list and concatenates
-> > them together:
-> >
-> > ~~~
-> > [2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
-> > ~~~
-> > {: .output}
-> >
-> > It's equivalent to:
-> >
-> > ~~~
-> > counts + counts
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-[hadleywickham-tweet]: https://twitter.com/hadleywickham/status/643381054758363136
-{% include links.md %}
+title: Coding Style
+teaching: 20
+exercises: 10
+- "How should I organise my code?"
+- "What are some practical steps I can take to improve the quality and readability of my scripts?"
+- "What tools exist to help me follow good coding style?"
+- "write and adjust code to follow standards of style and organisation."
+- "use a linter to check and modify their code to follow PEP8."
+- "provide sufficient documentation for their functions and scripts."
+- "It is easier to read and maintain scripts and Jupyter notebooks that are well organised."
+- "The most commonly-used style guide for Python is detailed in PEP8."
+- "Linters such as `flake8` and `black` can help us follow style standards."
+- "The rules and standards should be followed within reason, but exceptions can be made according to your best judgement."
+## plan
+- Toby currently scheduled to lead this session
+- can base a lot of this on https://merely-useful.github.io/py-rse/py-rse-style.html
+- something on project structure and file organization (?)
+  - specially relevant if planning to make a python package
+  - code organisation & jargon (packages, modules, files, classes, functions)
+    - a word about avoiding circular imports (?)
+- PEP8
+  - `pycodestyle`/`pylint` - only warn, doesn't modify code - [see also this comparison](https://books.agiliq.com/projects/essential-python-tools/en/latest/linters.html)
+  - `black` - modifies code - note still [**beta**](https://github.com/psf/black#note-this-is-a-beta-product)
+- documentation
+  - docstrings
+  - `sphinx`?
+- include tips for good Jupyter hygiene
+  - name the notebook before you do anything else!
+  - be careful with cell order
+  - clear output before saving
+{% include links.md %}
+title: Coding Style
+teaching: 20
+exercises: 10
+- "How can I practice the new skills I've learned?"
+- "apply their Python skills to solve more extensive challenges."
+- "There are many coding challenges to be found online, which can be used to exercise your Python skills."
+## plan
+- advent of code
+- rosalind
+- must recommend/suggest challenges that use what we've covered in the previous material
+{% include links.md %}
-title: Analyzing Data from Multiple Files
-teaching: 20
-exercises: 0
-- "How can I do the same operations on many different files?"
-- "Use a library function to get a list of filenames that match a wildcard pattern."
-- "Write a `for` loop to process multiple files."
-- "Use `glob.glob(pattern)` to create a list of files whose names match a pattern."
-- "Use `*` in a pattern to match zero or more characters, and `?` to match any single character."
-We now have almost everything we need to process all our data files.
-The only thing that's missing is a library with a rather unpleasant name:
-import glob
-{: .language-python}
-The `glob` library contains a function, also called `glob`,
-that finds files and directories whose names match a pattern.
-We provide those patterns as strings:
-the character `*` matches zero or more characters,
-while `?` matches any one character.
-We can use this to get the names of all the CSV files in the current directory:
-{: .language-python}
-['inflammation-05.csv', 'inflammation-11.csv', 'inflammation-12.csv', 'inflammation-08.csv',
-'inflammation-03.csv', 'inflammation-06.csv', 'inflammation-09.csv', 'inflammation-07.csv',
-'inflammation-10.csv', 'inflammation-02.csv', 'inflammation-04.csv', 'inflammation-01.csv']
-{: .output}
-As these examples show,
-`glob.glob`'s result is a list of file and directory paths in arbitrary order.
-This means we can loop over it
-to do something with each filename in turn.
-In our case,
-the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
-If we want to start by analyzing just the first three files in alphabetical order, we can use the
-`sorted` built-in function to generate a new sorted list from the `glob.glob` output:
-import glob
-import numpy
-import matplotlib.pyplot
-filenames = sorted(glob.glob('inflammation*.csv'))
-filenames = filenames[0:3]
-for filename in filenames:
-    print(filename)
-    data = numpy.loadtxt(fname=filename, delimiter=',')
-    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-    axes1 = fig.add_subplot(1, 3, 1)
-    axes2 = fig.add_subplot(1, 3, 2)
-    axes3 = fig.add_subplot(1, 3, 3)
-    axes1.set_ylabel('average')
-    axes1.plot(numpy.mean(data, axis=0))
-    axes2.set_ylabel('max')
-    axes2.plot(numpy.max(data, axis=0))
-    axes3.set_ylabel('min')
-    axes3.plot(numpy.min(data, axis=0))
-    fig.tight_layout()
-    matplotlib.pyplot.show()
-{: .language-python}
-{: .output}
-![Analysis of inflammation-01.csv](../fig/03-loop_49_1.png)
-{: .output}
-![Analysis of inflammation-02.csv](../fig/03-loop_49_3.png)
-{: .output}
-![Analysis of inflammation-03.csv](../fig/03-loop_49_5.png)
-Sure enough,
-the maxima of the first two data sets show exactly the same ramp as the first,
-and their minima show the same staircase structure;
-a different situation has been revealed in the third dataset,
-where the maxima are a bit less regular, but the minima are consistently zero.
-> ## Plotting Differences
-> Plot the difference between the average inflammations reported in the first and second datasets
-> (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly),
-> i.e., the difference between the leftmost plots of the first two figures.
-> > ## Solution
-> > ~~~
-> > import glob
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > filenames = sorted(glob.glob('inflammation*.csv'))
-> >
-> > data0 = numpy.loadtxt(fname=filenames[0], delimiter=',')
-> > data1 = numpy.loadtxt(fname=filenames[1], delimiter=',')
-> >
-> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-> >
-> > matplotlib.pyplot.ylabel('Difference in average')
-> > matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0))
-> >
-> > fig.tight_layout()
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Generate Composite Statistics
-> Use each of the files once to generate a dataset containing values averaged over all patients:
-> ~~~
-> filenames = glob.glob('inflammation*.csv')
-> composite_data = numpy.zeros((60,40))
-> for filename in filenames:
->     # sum each new file's data into composite_data as it's read
->     #
-> # and then divide the composite_data by number of samples
-> composite_data = composite_data / len(filenames)
-> ~~~
-> {: .language-python}
-> Then use pyplot to generate average, max, and min for all patients.
-> > ## Solution
-> > ~~~
-> > import glob
-> > import numpy
-> > import matplotlib.pyplot
-> >
-> > filenames = glob.glob('inflammation*.csv')
-> > composite_data = numpy.zeros((60,40))
-> >
-> > for filename in filenames:
-> >     data = numpy.loadtxt(fname = filename, delimiter=',')
-> >     composite_data = composite_data + data
-> >
-> > composite_data = composite_data / len(filenames)
-> >
-> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
-> >
-> > axes1 = fig.add_subplot(1, 3, 1)
-> > axes2 = fig.add_subplot(1, 3, 2)
-> > axes3 = fig.add_subplot(1, 3, 3)
-> >
-> > axes1.set_ylabel('average')
-> > axes1.plot(numpy.mean(composite_data, axis=0))
-> >
-> > axes2.set_ylabel('max')
-> > axes2.plot(numpy.max(composite_data, axis=0))
-> >
-> > axes3.set_ylabel('min')
-> > axes3.plot(numpy.min(composite_data, axis=0))
-> >
-> > fig.tight_layout()
-> >
-> > matplotlib.pyplot.show()
-> > ~~~
-> > {: .language-python}
->{: .solution}
-{: .challenge}
-{% include links.md %}
-title: Making Choices
-teaching: 30
-exercises: 0
-- "How can my programs do different things based on data values?"
-- "Write conditional statements including `if`, `elif`, and `else` branches."
-- "Correctly evaluate expressions containing `and` and `or`."
-- "Use `if condition` to start a conditional statement, `elif condition` to
-   provide additional tests, and `else` to provide a default."
-- "The bodies of the branches of conditional statements must be indented."
-- "Use `==` to test for equality."
-- "`X and Y` is only true if both `X` and `Y` are true."
-- "`X or Y` is true if either `X` or `Y`, or both, are true."
-- "Zero, the empty string, and the empty list are considered false;
-   all other numbers, strings, and lists are considered true."
-- "`True` and `False` represent truth values."
-In our last lesson, we discovered something suspicious was going on
-in our inflammation data by drawing some plots.
-How can we use Python to automatically recognize the different features we saw,
-and take a different action for each? In this lesson, we'll learn how to write code that
-runs only when certain conditions are true.
-## Conditionals
-We can ask Python to take different actions, depending on a condition, with an `if` statement:
-num = 37
-if num > 100:
-    print('greater')
-    print('not greater')
-{: .language-python}
-not greater
-{: .output}
-The second line of this code uses the keyword `if` to tell Python that we want to make a choice.
-If the test that follows the `if` statement is true,
-the body of the `if`
-(i.e., the set of lines indented underneath it) is executed, and "greater" is printed.
-If the test is false,
-the body of the `else` is executed instead, and "not greater" is printed.
-Only one or the other is ever executed before continuing on with program execution to print "done":
-![A flowchart diagram of the if-else construct that tests if variable num is greater than 100](../fig/python-flowchart-conditional.png)
-Conditional statements don't have to include an `else`.
-If there isn't one,
-Python simply does nothing if the test is false:
-num = 53
-print('before conditional...')
-if num > 100:
-    print(num,' is greater than 100')
-print('...after conditional')
-{: .language-python}
-before conditional...
-...after conditional
-{: .output}
-We can also chain several tests together using `elif`,
-which is short for "else if".
-The following Python code uses `elif` to print the sign of a number.
-num = -3
-if num > 0:
-    print(num, 'is positive')
-elif num == 0:
-    print(num, 'is zero')
-    print(num, 'is negative')
-{: .language-python}
--3 is negative
-{: .output}
-Note that to test for equality we use a double equals sign `==`
-rather than a single equals sign `=` which is used to assign values.
-We can also combine tests using `and` and `or`.
-`and` is only true if both parts are true:
-if (1 > 0) and (-1 > 0):
-    print('both parts are true')
-    print('at least one part is false')
-{: .language-python}
-at least one part is false
-{: .output}
-while `or` is true if at least one part is true:
-if (1 < 0) or (-1 < 0):
-    print('at least one test is true')
-{: .language-python}
-at least one test is true
-{: .output}
-> ## `True` and `False`
-> `True` and `False` are special words in Python called `booleans`,
-> which represent truth values. A statement such as `1 < 0` returns
-> the value `False`, while `-1 < 0` returns the value `True`.
-{: .callout}
-## Checking our Data
-Now that we've seen how conditionals work,
-we can use them to check for the suspicious features we saw in our inflammation data.
-We are about to use functions provided by the `numpy` module again.
-Therefore, if you're working in a new Python session, make sure to load the
-module with:
-import numpy
-{: .language-python}
-From the first couple of plots, we saw that maximum daily inflammation exhibits
-a strange behavior and raises one unit a day.
-Wouldn't it be a good idea to detect such behavior and report it as suspicious?
-Let's do that!
-However, instead of checking every single day of the study, let's merely check
-if maximum inflammation in the beginning (day 0) and in the middle (day 20) of
-the study are equal to the corresponding day numbers.
-max_inflammation_0 = numpy.max(data, axis=0)[0]
-max_inflammation_20 = numpy.max(data, axis=0)[20]
-if max_inflammation_0 == 0 and max_inflammation_20 == 20:
-    print('Suspicious looking maxima!')
-{: .language-python}
-We also saw a different problem in the third dataset;
-the minima per day were all zero (looks like a healthy person snuck into our study).
-We can also check for this with an `elif` condition:
-elif numpy.sum(numpy.min(data, axis=0)) == 0:
-    print('Minima add up to zero!')
-{: .language-python}
-And if neither of these conditions are true, we can use `else` to give the all-clear:
-    print('Seems OK!')
-{: .language-python}
-Let's test that out:
-data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
-max_inflammation_0 = numpy.max(data, axis=0)[0]
-max_inflammation_20 = numpy.max(data, axis=0)[20]
-if max_inflammation_0 == 0 and max_inflammation_20 == 20:
-    print('Suspicious looking maxima!')
-elif numpy.sum(numpy.min(data, axis=0)) == 0:
-    print('Minima add up to zero!')
-    print('Seems OK!')
-{: .language-python}
-Suspicious looking maxima!
-{: .output}
-data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
-max_inflammation_0 = numpy.max(data, axis=0)[0]
-max_inflammation_20 = numpy.max(data, axis=0)[20]
-if max_inflammation_0 == 0 and max_inflammation_20 == 20:
-    print('Suspicious looking maxima!')
-elif numpy.sum(numpy.min(data, axis=0)) == 0:
-    print('Minima add up to zero!')
-    print('Seems OK!')
-{: .language-python}
-Minima add up to zero!
-{: .output}
-In this way,
-we have asked Python to do something different depending on the condition of our data.
-Here we printed messages in all cases,
-but we could also imagine not using the `else` catch-all
-so that messages are only printed when something is wrong,
-freeing us from having to manually examine every plot for features we've seen before.
-> ## How Many Paths?
-> Consider this code:
-> ~~~
-> if 4 > 5:
->     print('A')
-> elif 4 == 5:
->     print('B')
-> elif 4 < 5:
->     print('C')
-> ~~~
-> {: .language-python}
-> Which of the following would be printed if you were to run this code?
-> Why did you pick this answer?
-> 1.  A
-> 2.  B
-> 3.  C
-> 4.  B and C
-> > ## Solution
-> > C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true,
-> > but `4 < 5` is true.
-> {: .solution}
-{: .challenge}
-> ## What Is Truth?
-> `True` and `False` booleans are not the only values in Python that are true and false.
-> In fact, *any* value can be used in an `if` or `elif`.
-> After reading and running the code below,
-> explain what the rule is for which values are considered true and which are considered false.
-> ~~~
-> if '':
->     print('empty string is true')
-> if 'word':
->     print('word is true')
-> if []:
->     print('empty list is true')
-> if [1, 2, 3]:
->     print('non-empty list is true')
-> if 0:
->     print('zero is true')
-> if 1:
->     print('one is true')
-> ~~~
-> {: .language-python}
-{: .challenge}
-> ## That's Not Not What I Meant
-> Sometimes it is useful to check whether some condition is not true.
-> The Boolean operator `not` can do this explicitly.
-> After reading and running the code below,
-> write some `if` statements that use `not` to test the rule
-> that you formulated in the previous challenge.
-> ~~~
-> if not '':
->     print('empty string is not true')
-> if not 'word':
->     print('word is not true')
-> if not not True:
->     print('not not True is true')
-> ~~~
-> {: .language-python}
-{: .challenge}
-> ## Close Enough
-> Write some conditions that print `True` if the variable `a` is within 10% of the variable `b`
-> and `False` otherwise.
-> Compare your implementation with your partner's:
-> do you get the same answer for all possible pairs of numbers?
-> > ## Solution 1
-> > ~~~
-> > a = 5
-> > b = 5.1
-> >
-> > if abs(a - b) < 0.1 * abs(b):
-> >     print('True')
-> > else:
-> >     print('False')
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-> > ## Solution 2
-> > ~~~
-> > print(abs(a - b) < 0.1 * abs(b))
-> > ~~~
-> > {: .language-python}
-> >
-> > This works because the Booleans `True` and `False`
-> > have string representations which can be printed.
-> {: .solution}
-{: .challenge}
-> ## In-Place Operators
-> Python (and most other languages in the C family) provides
-> [in-place operators]({{ page.root }}/reference/#in-place-operators)
-> that work like this:
-> ~~~
-> x = 1  # original value
-> x += 1 # add one to x, assigning result back to x
-> x *= 3 # multiply x by 3
-> print(x)
-> ~~~
-> {: .language-python}
-> ~~~
-> 6
-> ~~~
-> {: .output}
-> Write some code that sums the positive and negative numbers in a list separately,
-> using in-place operators.
-> Do you think the result is more or less readable
-> than writing the same without in-place operators?
-> > ## Solution
-> > ~~~
-> > positive_sum = 0
-> > negative_sum = 0
-> > test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
-> > for num in test_list:
-> >     if num > 0:
-> >         positive_sum += num
-> >     elif num == 0:
-> >         pass
-> >     else:
-> >         negative_sum += num
-> > print(positive_sum, negative_sum)
-> > ~~~
-> > {: .language-python}
-> >
-> > Here `pass` means "don't do anything".
-> In this particular case, it's not actually needed, since if `num == 0` neither
-> > sum needs to change, but it illustrates the use of `elif` and `pass`.
-> {: .solution}
-{: .challenge}
-> ## Sorting a List Into Buckets
-> In our `data` folder, large data sets are stored in files whose names start with
-> "inflammation-" and small data sets -- in files whose names start with "small-". We
-> also have some other files that we do not care about at this point. We'd like to break all
-> these files into three lists called `large_files`, `small_files`, and `other_files`,
-> respectively.
-> Add code to the template below to do this. Note that the string method
-> [`startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith)
-> returns `True` if and only if the string it is called on starts with the string
-> passed as an argument, that is:
-> ~~~
-> "String".startswith("Str")
-> ~~~
-> {: .language-python}
-> ~~~
-> True
-> ~~~
-> {: .output}
-> But
-> ~~~
-> "String".startswith("str")
-> ~~~
-> {: .language-python}
-> ~~~
-> False
-> ~~~
-> {: .output}
->Use the following Python code as your starting point:
-> ~~~
-> filenames = ['inflammation-01.csv',
->          'myscript.py',
->          'inflammation-02.csv',
->          'small-01.csv',
->          'small-02.csv']
-> large_files = []
-> small_files = []
-> other_files = []
-> ~~~
-> {: .language-python}
-> Your solution should:
-> 1.  loop over the names of the files
-> 2.  figure out which group each filename belongs in
-> 3.  append the filename to that list
-> In the end the three lists should be:
-> ~~~
-> large_files = ['inflammation-01.csv', 'inflammation-02.csv']
-> small_files = ['small-01.csv', 'small-02.csv']
-> other_files = ['myscript.py']
-> ~~~
-> {: .language-python}
-> > ## Solution
-> > ~~~
-> > for filename in filenames:
-> >     if filename.startswith('inflammation-'):
-> >         large_files.append(filename)
-> >     elif filename.startswith('small-'):
-> >         small_files.append(filename)
-> >     else:
-> >         other_files.append(filename)
-> >
-> > print('large_files:', large_files)
-> > print('small_files:', small_files)
-> > print('other_files:', other_files)
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-> ## Counting Vowels
-> 1. Write a loop that counts the number of vowels in a character string.
-> 2. Test it on a few individual words and full sentences.
-> 3. Once you are done, compare your solution to your neighbor's.
->    Did you make the same decisions about how to handle the letter 'y'
->    (which some people think is a vowel, and some do not)?
-> > ## Solution
-> > ~~~
-> > vowels = 'aeiouAEIOU'
-> > sentence = 'Mary had a little lamb.'
-> > count = 0
-> > for char in sentence:
-> >     if char in vowels:
-> >         count += 1
-> >
-> > print("The number of vowels in this string is " + str(count))
-> > ~~~
-> > {: .language-python}
-> {: .solution}
-{: .challenge}
-{% include links.md %}
-title: Creating Functions
-teaching: 30
-exercises: 0
-- "How can I define new functions?"
-- "What's the difference between defining and calling a function?"
-- "What happens when I call a function?"
-- "Define a function that takes parameters."
-- "Return a value from a function."
-- "Test and debug a function."
-- "Set default values for function parameters."
-- "Explain why we should divide programs into small, single-purpose functions."
-- "Define a function using `def function_name(parameter)`."
-- "The body of a function must be indented."
-- "Call a function using `function_name(value)`."
-- "Numbers are stored as integers or floating-point numbers."
-- "Variables defined within a function can only be seen and used within the body of the function."
-- "If a variable is not defined within the function it is used,
-   Python looks for a definition before the function call"
-- "Use `help(thing)` to view help for something."
-- "Put docstrings in functions to provide help for that function."
-- "Specify default values for parameters when defining a function using `name=value`
