Skip to content
Snippets Groups Projects
Commit 2186877a authored by Renato Alves's avatar Renato Alves :seedling:
Browse files

Merge branch 'initial_setup' into 'master'

Initial setup

See merge request !1
parents f3057983 b740a91f
No related branches found
No related tags found
1 merge request!1Initial setup
Pipeline #15583 passed
Showing
with 400 additions and 6320 deletions
---
title: Python Fundamentals
teaching: 20
exercises: 10
questions:
- "What basic data types can I work with in Python?"
- "How can I create a new variable in Python?"
- "Can I change the value associated with a variable after I create it?"
objectives:
- "Assign values to variables."
keypoints:
- "Basic data types in Python include integers, strings, and floating-point numbers."
- "Use `variable = value` to assign a value to a variable in order to record it in memory."
- "Variables are created on demand whenever a value is assigned to them."
- "Use `print(something)` to display the value of `something`."
---
## Variables
Any Python interpreter can be used as a calculator:
~~~
3 + 5 * 4
~~~
{: .language-python}
~~~
23
~~~
{: .output}
This is great but not very interesting.
To do anything useful with data, we need to assign its value to a _variable_.
In Python, we can [assign]({{ page.root }}/reference/#assign) a value to a
[variable]({{ page.root }}/reference/#variable), using the equals sign `=`.
For example, to assign value `60` to a variable `weight_kg`, we would execute:
~~~
weight_kg = 60
~~~
{: .language-python}
From now on, whenever we use `weight_kg`, Python will substitute the value we assigned to
it. In layman's terms, **a variable is a name for a value**.
In Python, variable names:
- can include letters, digits, and underscores
- cannot start with a digit
- are [case sensitive]({{ page.root }}/reference/#case-sensitive).
This means that, for example:
- `weight0` is a valid variable name, whereas `0weight` is not
- `weight` and `Weight` are different variables
## Types of data
Python knows various types of data. Three common ones are:
* integer numbers
* floating point numbers, and
* strings.
In the example above, variable `weight_kg` has an integer value of `60`.
To create a variable with a floating point value, we can execute:
~~~
weight_kg = 60.0
~~~
{: .language-python}
And to create a string, we add single or double quotes around some text, for example:
~~~
weight_kg_text = 'weight in kilograms:'
~~~
{: .language-python}
## Using Variables in Python
To display the value of a variable to the screen in Python, we can use the `print` function:
~~~
print(weight_kg)
~~~
{: .language-python}
~~~
60.0
~~~
{: .output}
We can display multiple things at once using only one `print` command:
~~~
print(weight_kg_text, weight_kg)
~~~
{: .language-python}
~~~
weight in kilograms: 60.0
~~~
{: .output}
Moreover, we can do arithmetic with variables right inside the `print` function:
~~~
print('weight in pounds:', 2.2 * weight_kg)
~~~
{: .language-python}
~~~
weight in pounds: 132.0
~~~
{: .output}
The above command, however, did not change the value of `weight_kg`:
~~~
print(weight_kg)
~~~
{: .language-python}
~~~
60.0
~~~
{: .output}
To change the value of the `weight_kg` variable, we have to
**assign** `weight_kg` a new value using the equals `=` sign:
~~~
weight_kg = 65.0
print('weight in kilograms is now:', weight_kg)
~~~
{: .language-python}
~~~
weight in kilograms is now: 65.0
~~~
{: .output}
> ## Variables as Sticky Notes
>
> A variable is analogous to a sticky note with a name written on it:
> assigning a value to a variable is like putting that sticky note on a particular value.
>
> ![Value of 65.0 with weight_kg label stuck on it](../fig/python-sticky-note-variables-01.svg)
>
> This means that assigning a value to one variable does **not** change
> values of other variables.
> For example, let's store the subject's weight in pounds in its own variable:
>
> ~~~
> # There are 2.2 pounds per kilogram
> weight_lb = 2.2 * weight_kg
> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
> ~~~
> {: .language-python}
>
> ~~~
> weight in kilograms: 65.0 and in pounds: 143.0
> ~~~
> {: .output}
>
> ![Value of 65.0 with weight_kg label stuck on it, and value of 143.0 with weight_lb label stuck on it](../fig/python-sticky-note-variables-02.svg)
>
> Let's now change `weight_kg`:
>
> ~~~
> weight_kg = 100.0
> print('weight in kilograms is now:', weight_kg, 'and weight in pounds is still:', weight_lb)
> ~~~
> {: .language-python}
>
> ~~~
> weight in kilograms is now: 100.0 and weight in pounds is still: 143.0
> ~~~
> {: .output}
>
> ![Value of 100.0 with label weight_kg stuck on it, and value of 143.0 with label weight_lb stuck on it](../fig/python-sticky-note-variables-03.svg)
>
> Since `weight_lb` doesn't "remember" where its value comes from,
> it is not updated when we change `weight_kg`.
{: .callout}
> ## Check Your Understanding
>
> What values do the variables `mass` and `age` have after each statement in the following program?
> Test your answers by executing the commands.
>
> ~~~
> mass = 47.5
> age = 122
> mass = mass * 2.0
> age = age - 20
> print(mass, age)
> ~~~
> {: .language-python}
>
> > ## Solution
> > ~~~
> > 95.0 102
> > ~~~
> > {: .output}
> {: .solution}
{: .challenge}
> ## Sorting Out References
>
> What does the following program print out?
>
> ~~~
> first, second = 'Grace', 'Hopper'
> third, fourth = second, first
> print(third, fourth)
> ~~~
> {: .language-python}
>
> > ## Solution
> > ~~~
> > Hopper Grace
> > ~~~
> > {: .output}
> {: .solution}
{: .challenge}
{% include links.md %}
---
title: Syntax Elements & Powerful Functions
teaching: 20
exercises: 10
questions:
- "What elements of Python syntax might I see in other people's code?"
- "How can I use these additional features of Python to take my code to the next level?"
- "What built-in functions and standard library modules are recommended to improve my code?"
objectives:
- "write comprehensions to improve code readability and efficiency."
- "call functions designed to make common tasks easier and faster."
- "recognise all elements of modern Python syntax and explain their purpose."
keypoints:
- "Use comprehensions to efficiently create new iterables with fewer lines of code."
- "Sets can be extremely useful when comparing collections of objects, and create significantly speed up your code."
- "The `itertools` module includes many helpful functions for working with iterables."
- "A decorator is a function that does something to the output of another function."
---
## plan
- Renato currently scheduled to lead this session
- comprehensions (list, dictionary, generators)
- `yield`
- sets - `{1,2,3}`
- function argument passing: `*`, `**`, `/`
- packing/unpacking and the catchall pattern `first, *, last = mylist`
- multi-dimension slicing (numpy/pandas)
- `dot` - i.e. `this.that`
- `_`, `__` - single, double underscore (meaning)
- context managers (`with`) - [contextlib](https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager)
- things you might see, including new features
- `@not_twitter` - decorators
- `import typing` - type annotations / hints
- `:=` - "walrus" operator
- `async`/`await` - `yield from`
- commonly used functions
- `zip()`
- `set()`
- `enumerate()`
- `itertools.*`
- (?) `functools.partial` - needs a good realistic example
- (?) honorable mentions - useful modules
- `plotnine`
## Notes on how to use this lesson template
See the [Lesson Example][lesson-example]
and [The Carpentries Curriculum Development Handbook][cdh]
for full details.
Below should be all the things you need to know right now...
### Creating pages
- Write material with [Markdown][markdown-cheatsheet]
- markdown files will be rendered as HTML pages and included in the built site
- `.md` files in the `_episodes` folder will be added to the Episodes dropdown, page navigation, etc
- Markdown files must include _front matter_: metadata specified in a YAML header bounded by `---`
- At minimum, this must include a `title` field
~~~
---
title: The Title of the Section
---
~~~
{: .source}
- but really your episodes (lesson sections) should include:
- an estimate of time required for teaching & exercises
- main questions answered in the section
- learning objectives
- key points to summarise what's covered in the section (these end are added at the end of the lession section)
- as an example, below is the front matter for this page
~~~
---
title: Syntax Elements & Powerful Functions
teaching: 20
exercises: 10
questions:
- "What elements of Python syntax might I see in other people's code?"
- "How can I use these additional features of Python to take my code to the next level?"
- "What built-in functions and standard library modules are recommended to improve my code?"
objectives:
- "write comprehensions to improve code readability and efficiency."
- "call functions designed to make common tasks easier and faster."
- "recognise all elements of modern Python syntax and explain their purpose."
keypoints:
- "Use comprehensions to efficiently create new iterables with fewer lines of code."
- "Sets can be extremely useful when comparing collections of objects, and create significantly speed up your code."
- "The `itertools` module includes many helpful functions for working with iterables."
- "A decorator is a function that does something to the output of another function."
---
~~~
{: .source}
## Code blocks
code snippets written like this
{% raw %}
~~~
print(weight_kg)
~~~
{: .language-python}
~~~
60.0
~~~
{: .output}
{% endraw %}
will produce formatted blocks like this:
~~~
print(weight_kg)
~~~
{: .language-python}
~~~
60.0
~~~
{: .output}
## Special blockquotes
- The lesson template also includes a range of styled boxes
- examples for exercises and callouts below
- see [this section][lesson-example-blockquotes] of The Carpentries Lesson Example for the full list
A callout block written like this
~~~
> ## Callout block example
>
> Write callout blocks as blockquotes,
> with a styling tag (techincal term is a _class identifier_) at the end.
>
> ~~~
> # you can still include code blocks in the callout
> weight_lb = 2.2 * weight_kg
> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
> ~~~
> {: .language-python}
>
> Use callouts for asides and comments -
> anything that provides additional detail to the core of your material
{: .callout}
~~~
{: .source}
will be rendered like this:
> ## Callout block example
>
> Write callout blocks as blockquotes,
> with a styling tag (techincal term is a _class identifier_) at the end.
>
> ~~~
> # you can still include code blocks in the callout
> weight_lb = 2.2 * weight_kg
> print(weight_kg_text, weight_kg, 'and in pounds:', weight_lb)
> ~~~
> {: .language-python}
>
> Use callouts for asides and comments -
> anything that provides additional detail to the core of your material
{: .callout}
Similarly, exercises written like this
~~~
> ## Sorting Out References
>
> What does the following program print out?
>
> ~~~
> first, second = 'Grace', 'Hopper'
> third, fourth = second, first
> print(third, fourth)
> ~~~
> {: .language-python}
>
> > ## Solution
> >
> > This text will only be visible if the solution is expanded
> > ~~~
> > Hopper Grace
> > ~~~
> > {: .output}
> {: .solution}
{: .challenge}
~~~
{: .source}
will be rendered like this (note the expandable box containing the solution):
> ## Sorting Out References
>
> What does the following program print out?
>
> ~~~
> first, second = 'Grace', 'Hopper'
> third, fourth = second, first
> print(third, fourth)
> ~~~
> {: .language-python}
>
> > ## Solution
> >
> > This text will only be visible if the solution is expanded
> > ~~~
> > Hopper Grace
> > ~~~
> > {: .output}
> {: .solution}
{: .challenge}
## Shared link references
- Lastly, the last line in every `.md` file for each page should be
{% raw %}
`{% include links.md %}`
{% endraw %}
- This allows us to share link references across the entire site, which makes the links much more maintainable.
- link URLs should be put in the `_includes/links.md` file (ideally, arranged alphabetically by reference)
- you can then write Markdown links "reference-style" i.e. `[link text to be displayed][reference-id]`, with `[reference-id]: https://link.to.page` in `_includes/links.md`
{% include links.md %}
---
title: Working with Data
teaching: 20
exercises: 10
questions:
- "How should I work with numeric data in Python?"
- "What's the recommended way to handle and analyse tabular data?"
- "How can I import tabular data for analysis in Python and export the results?"
objectives:
- "handle and summarise numeric data with Numpy."
- "filter values in their data based on a range of conditions."
- "load tabular data into a Pandas dataframe object."
- "describe what is meant by the data type of an array/series, and the impact this has on how the data is handled."
- "add and remove columns from a dataframe."
- "select, aggregate, and visualise data in a dataframe."
keypoints:
- "Specialised third-party libraries such as Numpy and Pandas provide powerful objects and functions that can help us analyse our data."
- "Pandas dataframe objects allow us to efficiently load and handle large tabular data."
- "Use the `pandas.read_csv` and `pandas.write_csv` functions to read and write tabular data."
---
## plan
- Toby currently scheduled to lead this session
- Numpy
- arrays
- masking
- aside about data types and potential hazards
- reading data from a file (with note that more will come later on this topic)
- link to existing image analysis material
- Pandas
- when an array just isn't enough
- DataFrames - re-use material from [Software Carpentry][swc-python-gapminder]?
- ideally with a more relevant example dataset... [maybe a COVID one](https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data/resource/260bbbde-2316-40eb-aec3-7cd7bfc2f590)
- include an aside about I/O - reading/writing files (pandas (the `.to_*()` methods and highlight some: `csv`, `json`, `feather`, `hdf`), numpy, `open()`, (?) bytes vs strings, (?) encoding)
- Finish with example of `df.plot()` to set the scene for plotting section
{% include links.md %}
This diff is collapsed.
---
title: Visualizing Tabular Data
teaching: 30
exercises: 20
questions:
- "How can I visualize tabular data in Python?"
- "How can I group several plots together?"
objectives:
- "Plot simple graphs from data."
- "Group several graphs in a single figure."
keypoints:
- "Use the `pyplot` module from the `matplotlib` library for creating simple visualizations."
---
## Visualizing data
The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers," and
the best way to develop insight is often to visualize data. Visualization deserves an entire
lecture of its own, but we can explore a few features of Python's `matplotlib` library here. While
there is no official plotting library, `matplotlib` is the _de facto_ standard. First, we will
import the `pyplot` module from `matplotlib` and use two of its functions to create and display a
heat map of our data:
~~~
import matplotlib.pyplot
image = matplotlib.pyplot.imshow(data)
matplotlib.pyplot.show()
~~~
{: .language-python}
![Heatmap of the Data](../fig/inflammation-01-imshow.svg)
Blue pixels in this heat map represent low values, while yellow pixels represent high values. As we
can see, inflammation rises and falls over a 40-day period. Let's take a look at the average inflammation over time:
~~~
ave_inflammation = numpy.mean(data, axis=0)
ave_plot = matplotlib.pyplot.plot(ave_inflammation)
matplotlib.pyplot.show()
~~~
{: .language-python}
![Average Inflammation Over Time](../fig/inflammation-01-average.svg)
Here, we have put the average inflammation per day across all patients in the variable `ave_inflammation`, then
asked `matplotlib.pyplot` to create and display a line graph of those values. The result is a
roughly linear rise and fall, which is suspicious: we might instead expect a sharper rise and slower
fall. Let's have a look at two other statistics:
~~~
max_plot = matplotlib.pyplot.plot(numpy.max(data, axis=0))
matplotlib.pyplot.show()
~~~
{: .language-python}
![Maximum Value Along The First Axis](../fig/inflammation-01-maximum.svg)
~~~
min_plot = matplotlib.pyplot.plot(numpy.min(data, axis=0))
matplotlib.pyplot.show()
~~~
{: .language-python}
![Minimum Value Along The First Axis](../fig/inflammation-01-minimum.svg)
The maximum value rises and falls smoothly, while the minimum seems to be a step function. Neither
trend seems particularly likely, so either there's a mistake in our calculations or something is
wrong with our data. This insight would have been difficult to reach by examining the numbers
themselves without visualization tools.
### Grouping plots
You can group similar plots in a single figure using subplots.
This script below uses a number of new commands. The function `matplotlib.pyplot.figure()`
creates a space into which we will place all of our plots. The parameter `figsize`
tells Python how big to make this space. Each subplot is placed into the figure using
its `add_subplot` [method]({{ page.root }}/reference/#method). The `add_subplot` method takes 3
parameters. The first denotes how many total rows of subplots there are, the second parameter
refers to the total number of subplot columns, and the final parameter denotes which subplot
your variable is referencing (left-to-right, top-to-bottom). Each subplot is stored in a
different variable (`axes1`, `axes2`, `axes3`). Once a subplot is created, the axes can
be titled using the `set_xlabel()` command (or `set_ylabel()`).
Here are our three plots side by side:
~~~
import numpy
import matplotlib.pyplot
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
~~~
{: .language-python}
![The Previous Plots as Subplots](../fig/inflammation-01-group-plot.svg)
The [call]({{ page.root }}/reference/#function-call) to `loadtxt` reads our data,
and the rest of the program tells the plotting library
how large we want the figure to be,
that we're creating three subplots,
what to draw for each one,
and that we want a tight layout.
(If we leave out that call to `fig.tight_layout()`,
the graphs will actually be squeezed together more closely.)
> ## Plot Scaling
>
> Why do all of our plots stop just short of the upper end of our graph?
>
> > ## Solution
> > Because matplotlib normally sets x and y axes limits to the min and max of our data
> > (depending on data range)
> {: .solution}
>
> If we want to change this, we can use the `set_ylim(min, max)` method of each 'axes',
> for example:
>
> ~~~
> axes3.set_ylim(0,6)
> ~~~
> {: .language-python}
>
> Update your plotting code to automatically set a more appropriate scale.
> (Hint: you can make use of the `max` and `min` methods to help.)
>
> > ## Solution
> > ~~~
> > # One method
> > axes3.set_ylabel('min')
> > axes3.plot(numpy.min(data, axis=0))
> > axes3.set_ylim(0,6)
> > ~~~
> > {: .language-python}
> {: .solution}
>
> > ## Solution
> > ~~~
> > # A more automated approach
> > min_data = numpy.min(data, axis=0)
> > axes3.set_ylabel('min')
> > axes3.plot(min_data)
> > axes3.set_ylim(numpy.min(min_data), numpy.max(min_data) * 1.1)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Drawing Straight Lines
>
> In the center and right subplots above, we expect all lines to look like step functions because
> non-integer value are not realistic for the minimum and maximum values. However, you can see
> that the lines are not always vertical or horizontal, and in particular the step function
> in the subplot on the right looks slanted. Why is this?
>
> > ## Solution
> > Because matplotlib interpolates (draws a straight line) between the points.
> > One way to do avoid this is to use the Matplotlib `drawstyle` option:
> >
> > ~~~
> > import numpy
> > import matplotlib.pyplot
> >
> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
> >
> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
> >
> > axes1 = fig.add_subplot(1, 3, 1)
> > axes2 = fig.add_subplot(1, 3, 2)
> > axes3 = fig.add_subplot(1, 3, 3)
> >
> > axes1.set_ylabel('average')
> > axes1.plot(numpy.mean(data, axis=0), drawstyle='steps-mid')
> >
> > axes2.set_ylabel('max')
> > axes2.plot(numpy.max(data, axis=0), drawstyle='steps-mid')
> >
> > axes3.set_ylabel('min')
> > axes3.plot(numpy.min(data, axis=0), drawstyle='steps-mid')
> >
> > fig.tight_layout()
> >
> > matplotlib.pyplot.show()
> > ~~~
> > {: .language-python}
> ![Plot with step lines](../fig/inflammation-01-line-styles.svg)
> {: .solution}
{: .challenge}
> ## Make Your Own Plot
>
> Create a plot showing the standard deviation (`numpy.std`)
> of the inflammation data for each day across all patients.
>
> > ## Solution
> > ~~~
> > std_plot = matplotlib.pyplot.plot(numpy.std(data, axis=0))
> > matplotlib.pyplot.show()
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Moving Plots Around
>
> Modify the program to display the three plots on top of one another
> instead of side by side.
>
> > ## Solution
> > ~~~
> > import numpy
> > import matplotlib.pyplot
> >
> > data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
> >
> > # change figsize (swap width and height)
> > fig = matplotlib.pyplot.figure(figsize=(3.0, 10.0))
> >
> > # change add_subplot (swap first two parameters)
> > axes1 = fig.add_subplot(3, 1, 1)
> > axes2 = fig.add_subplot(3, 1, 2)
> > axes3 = fig.add_subplot(3, 1, 3)
> >
> > axes1.set_ylabel('average')
> > axes1.plot(numpy.mean(data, axis=0))
> >
> > axes2.set_ylabel('max')
> > axes2.plot(numpy.max(data, axis=0))
> >
> > axes3.set_ylabel('min')
> > axes3.plot(numpy.min(data, axis=0))
> >
> > fig.tight_layout()
> >
> > matplotlib.pyplot.show()
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
{% include links.md %}
---
title: Plotting Data
teaching: 20
exercises: 10
questions:
- "How can I create publication-ready figures with Python?"
objectives:
- "plot data in a Matplotlib figure."
- "create multi-panelled figures."
- "export figures in a variety of image formats."
- "use interactive features of Jupyter to make it easier to fine-tune a plot."
keypoints:
- "Matplotlib is a powerful plotting library for Python."
- "It can also be annoyingly fiddly. Jupyter can help with this."
---
## plan
- Renato currently scheduled to lead this session
- Matplotlib
- The multiple matplotlib interfaces: `pyplot` vs `OO API` vs obsolete `pylab`. See [this](https://matplotlib.org/api/index.html#usage-patterns)
- Concepts:
- [Artists & containers](https://matplotlib.org/tutorials/intermediate/artists.html#artist-tutorial)
- Figure
- Axes
- Axis & Ticks
- ...
- Sub-plots
- Labels
- Annotations
- Saving formats
- Jupyter integration(?)
- 3D plotting (?)
- Interactive plots (?)
- plotting from pandas
- altair? & plotnine?
{% include links.md %}
---
title: Parsing Command Line Arguments
teaching: 20
exercises: 10
questions:
- "How can I access arguments passed to my Python script at runtime?"
- "How can I create a sophisticated command line interface for my script?"
- "How can I provide the user with more information about how to run my code?"
objectives:
- "access command line arguments with `sys.argv`."
- "parse and use arguments and options with `argparse`."
- "create a comprehensive usage statement for their script."
keypoints:
- "Positional command line arguments can be accessed from inside a script through the `sys.argv` object."
- "The `argparse` module allows us to create extensive and powerful command line interfaces for our scripts."
- "`argparse` also constructs a standardised usage statement according to the parser's configuration."
---
## plan
- Toby currently scheduled to lead this session
- `sys.argv`
- `argparse`
- positonal arguments
- options
- capturing multiple items in a single argument
- usage statements
- help text
- [`docopt`](http://docopt.org/) (?)
- [`click`](https://click.palletsprojects.com/en/7.x/) (?)
- [comparison of argparse, docopt and click](https://realpython.com/comparing-python-command-line-parsing-libraries-argparse-docopt-click/)
{% include links.md %}
---
title: Repeating Actions with Loops
teaching: 30
exercises: 0
questions:
- "How can I do the same operations on many different values?"
objectives:
- "Explain what a `for` loop does."
- "Correctly write `for` loops to repeat simple calculations."
- "Trace changes to a loop variable as the loop runs."
- "Trace changes to other variables as they are updated by a `for` loop."
keypoints:
- "Use `for variable in sequence` to process the elements of a sequence one at a time."
- "The body of a `for` loop must be indented."
- "Use `len(thing)` to determine the length of something that contains other values."
---
In the last episode, we wrote Python code that plots values of interest from our first
inflammation dataset (`inflammation-01.csv`), which revealed some suspicious features in it.
![Analysis of inflammation-01.csv](../fig/03-loop_2_0.png)
We have a dozen data sets right now, though, and more on the way.
We want to create plots for all of our data sets with a single statement.
To do that, we'll have to teach the computer how to repeat things.
An example task that we might want to repeat is printing each character in a
word on a line of its own.
~~~
word = 'lead'
~~~
{: .language-python}
In Python, a string is basically an ordered collection of characters, and every
character has a unique number associated with it -- its index. This means that
we can access characters in a string using their indices.
For example, we can get the first character of the word `'lead'`, by using
`word[0]`. One way to print each character is to use four `print` statements:
~~~
print(word[0])
print(word[1])
print(word[2])
print(word[3])
~~~
{: .language-python}
~~~
l
e
a
d
~~~
{: .output}
This is a bad approach for three reasons:
1. **Not scalable**. Imagine you need to print characters of a string that is hundreds
of letters long. It might be easier to type them in manually.
2. **Difficult to maintain**. If we want to decorate each printed character with an
asterisk or any other character, we would have to change four lines of code. While
this might not be a problem for short strings, it would definitely be a problem for
longer ones.
3. **Fragile**. If we use it with a word that has more characters than what we initially
envisioned, it will only display part of the word's characters. A shorter string, on
the other hand, will cause an error because it will be trying to display part of the
string that doesn't exist.
~~~
word = 'tin'
print(word[0])
print(word[1])
print(word[2])
print(word[3])
~~~
{: .language-python}
~~~
t
i
n
~~~
{: .output}
~~~
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-7974b6cdaf14> in <module>()
3 print(word[1])
4 print(word[2])
----> 5 print(word[3])
IndexError: string index out of range
~~~
{: .error}
Here's a better approach:
~~~
word = 'lead'
for char in word:
print(char)
~~~
{: .language-python}
~~~
l
e
a
d
~~~
{: .output}
This is shorter --- certainly shorter than something that prints every character in a
hundred-letter string --- and more robust as well:
~~~
word = 'oxygen'
for char in word:
print(char)
~~~
{: .language-python}
~~~
o
x
y
g
e
n
~~~
{: .output}
The improved version uses a [for loop]({{ page.root }}/reference/#for-loop)
to repeat an operation --- in this case, printing --- once for each thing in a sequence.
The general form of a loop is:
~~~
for variable in collection:
# do things using variable, such as print
~~~
{: .language-python}
Using the oxygen example above, the loop might look like this:
![loop_image](../fig/loops_image.png)
where each character (`char`) in the variable `word` is looped through and printed one character
after another. The numbers in the diagram denote which loop cycle the character was printed in (1
being the first loop, and 6 being the final loop).
We can call the [loop variable]({{ page.root }}/reference/#loop-variable) anything we like, but
there must be a colon at the end of the line starting the loop, and we must indent anything we
want to run inside the loop. Unlike many other languages, there is no command to signify the end
of the loop body (e.g. `end for`); what is indented after the `for` statement belongs to the loop.
> ## What's in a name?
>
>
> In the example above, the loop variable was given the name `char` as a mnemonic;
> it is short for 'character'. We can choose any name we want for variables.
> We can even call our loop variable `banana`, as long as we use this name consistently:
>
> ~~~
> word = 'oxygen'
> for banana in word:
> print(banana)
> ~~~
> {: .language-python}
>
> ~~~
> o
> x
> y
> g
> e
> n
> ~~~
> {: .output}
>
> It is a good idea to choose variable names that are meaningful, otherwise it would be more
> difficult to understand what the loop is doing.
{: .callout}
Here's another loop that repeatedly updates a variable:
~~~
length = 0
for vowel in 'aeiou':
length = length + 1
print('There are', length, 'vowels')
~~~
{: .language-python}
~~~
There are 5 vowels
~~~
{: .output}
It's worth tracing the execution of this little program step by step.
Since there are five characters in `'aeiou'`,
the statement on line 3 will be executed five times.
The first time around,
`length` is zero (the value assigned to it on line 1)
and `vowel` is `'a'`.
The statement adds 1 to the old value of `length`,
producing 1,
and updates `length` to refer to that new value.
The next time around,
`vowel` is `'e'` and `length` is 1,
so `length` is updated to be 2.
After three more updates,
`length` is 5;
since there is nothing left in `'aeiou'` for Python to process,
the loop finishes
and the `print` statement on line 4 tells us our final answer.
Note that a loop variable is a variable that's being used to record progress in a loop.
It still exists after the loop is over,
and we can re-use variables previously defined as loop variables as well:
~~~
letter = 'z'
for letter in 'abc':
print(letter)
print('after the loop, letter is', letter)
~~~
{: .language-python}
~~~
a
b
c
after the loop, letter is c
~~~
{: .output}
Note also that finding the length of a string is such a common operation
that Python actually has a built-in function to do it called `len`:
~~~
print(len('aeiou'))
~~~
{: .language-python}
~~~
5
~~~
{: .output}
`len` is much faster than any function we could write ourselves,
and much easier to read than a two-line loop;
it will also give us the length of many other things that we haven't met yet,
so we should always use it when we can.
> ## From 1 to N
>
> Python has a built-in function called `range` that generates a sequence of numbers. `range` can
> accept 1, 2, or 3 parameters.
>
> * If one parameter is given, `range` generates a sequence of that length,
> starting at zero and incrementing by 1.
> For example, `range(3)` produces the numbers `0, 1, 2`.
> * If two parameters are given, `range` starts at
> the first and ends just before the second, incrementing by one.
> For example, `range(2, 5)` produces `2, 3, 4`.
> * If `range` is given 3 parameters,
> it starts at the first one, ends just before the second one, and increments by the third one.
> For example, `range(3, 10, 2)` produces `3, 5, 7, 9`.
>
> Using `range`,
> write a loop that uses `range` to print the first 3 natural numbers:
>
> ~~~
> 1
> 2
> 3
> ~~~
> {: .language-python}
>
> > ## Solution
> > ~~~
> > for number in range(1, 4):
> > print(number)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Understanding the loops
>
> Given the following loop:
> ~~~
> word = 'oxygen'
> for char in word:
> print(char)
> ~~~
> {: .language-python}
>
> How many times is the body of the loop executed?
>
> * 3 times
> * 4 times
> * 5 times
> * 6 times
>
> > ## Solution
> >
> > The body of the loop is executed 6 times.
> >
> {: .solution}
{: .challenge}
> ## Computing Powers With Loops
>
> Exponentiation is built into Python:
>
> ~~~
> print(5 ** 3)
> ~~~
> {: .language-python}
>
> ~~~
> 125
> ~~~
> {: .output}
>
> Write a loop that calculates the same result as `5 ** 3` using
> multiplication (and without exponentiation).
>
> > ## Solution
> > ~~~
> > result = 1
> > for number in range(0, 3):
> > result = result * 5
> > print(result)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Reverse a String
>
> Knowing that two strings can be concatenated using the `+` operator,
> write a loop that takes a string
> and produces a new string with the characters in reverse order,
> so `'Newton'` becomes `'notweN'`.
>
> > ## Solution
> > ~~~
> > newstring = ''
> > oldstring = 'Newton'
> > for char in oldstring:
> > newstring = char + newstring
> > print(newstring)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Computing the Value of a Polynomial
>
> The built-in function `enumerate` takes a sequence (e.g. a list) and generates a
> new sequence of the same length. Each element of the new sequence is a pair composed of the index
> (0, 1, 2,...) and the value from the original sequence:
>
> ~~~
> for idx, val in enumerate(a_list):
> # Do something using idx and val
> ~~~
> {: .language-python}
>
> The code above loops through `a_list`, assigning the index to `idx` and the value to `val`.
>
> Suppose you have encoded a polynomial as a list of coefficients in
> the following way: the first element is the constant term, the
> second element is the coefficient of the linear term, the third is the
> coefficient of the quadratic term, etc.
>
> ~~~
> x = 5
> coefs = [2, 4, 3]
> y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
> print(y)
> ~~~
> {: .language-python}
>
> ~~~
> 97
> ~~~
> {: .output}
>
> Write a loop using `enumerate(coefs)` which computes the value `y` of any
> polynomial, given `x` and `coefs`.
>
> > ## Solution
> > ~~~
> > y = 0
> > for idx, coef in enumerate(coefs):
> > y = y + coef * x**idx
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
{% include links.md %}
---
title: Storing Multiple Values in Lists
teaching: 30
exercises: 15
questions:
- "How can I store many values together?"
objectives:
- "Explain what a list is."
- "Create and index lists of simple values."
- "Change the values of individual elements"
- "Append values to an existing list"
- "Reorder and slice list elements"
- "Create and manipulate nested lists"
keypoints:
- "`[value1, value2, value3, ...]` creates a list."
- "Lists can contain any Python object, including lists (i.e., list of lists)."
- "Lists are indexed and sliced with square brackets (e.g., list[0] and
list[2:9]), in the same way as strings and arrays."
- "Lists are mutable (i.e., their values can be changed in place)."
- "Strings are immutable (i.e., the characters in them cannot be changed)."
---
Similar to a string that can contain many characters, a list is a container that can store many values.
Unlike NumPy arrays,
lists are built into the language (so we don't have to load a library
to use them).
We create a list by putting values inside square brackets and separating the values with commas:
~~~
odds = [1, 3, 5, 7]
print('odds are:', odds)
~~~
{: .language-python}
~~~
odds are: [1, 3, 5, 7]
~~~
{: .output}
We can access elements of a list using indices -- numbered positions of elements in the list.
These positions are numbered starting at 0, so the first element has an index of 0.
~~~
print('first element:', odds[0])
print('last element:', odds[3])
print('"-1" element:', odds[-1])
~~~
{: .language-python}
~~~
first element: 1
last element: 7
"-1" element: 7
~~~
{: .output}
Yes, we can use negative numbers as indices in Python. When we do so, the index `-1` gives us the
last element in the list, `-2` the second to last, and so on.
Because of this, `odds[3]` and `odds[-1]` point to the same element here.
If we loop over a list, the loop variable is assigned to its elements one at a time:
~~~
for number in odds:
print(number)
~~~
{: .language-python}
~~~
1
3
5
7
~~~
{: .output}
There is one important difference between lists and strings:
we can change the values in a list,
but we cannot change individual characters in a string.
For example:
~~~
names = ['Curie', 'Darwing', 'Turing'] # typo in Darwin's name
print('names is originally:', names)
names[1] = 'Darwin' # correct the name
print('final value of names:', names)
~~~
{: .language-python}
~~~
names is originally: ['Curie', 'Darwing', 'Turing']
final value of names: ['Curie', 'Darwin', 'Turing']
~~~
{: .output}
works, but:
~~~
name = 'Darwin'
name[0] = 'd'
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-220df48aeb2e> in <module>()
1 name = 'Darwin'
----> 2 name[0] = 'd'
TypeError: 'str' object does not support item assignment
~~~
{: .error}
does not.
> ## Ch-Ch-Ch-Ch-Changes
>
> Data which can be modified in place is called [mutable]({{ page.root }}/reference/#mutable),
> while data which cannot be modified is called [immutable]({{ page.root }}/reference/#immutable).
> Strings and numbers are immutable. This does not mean that variables with string or number values
> are constants, but when we want to change the value of a string or number variable, we can only
> replace the old value with a completely new value.
>
> Lists and arrays, on the other hand, are mutable: we can modify them after they have been
> created. We can change individual elements, append new elements, or reorder the whole list. For
> some operations, like sorting, we can choose whether to use a function that modifies the data
> in-place or a function that returns a modified copy and leaves the original unchanged.
>
> Be careful when modifying data in-place. If two variables refer to the same list, and you modify
> the list value, it will change for both variables!
>
> ~~~
> salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
> my_salsa = salsa # <-- my_salsa and salsa point to the *same* list data in memory
> salsa[0] = 'hot peppers'
> print('Ingredients in my salsa:', my_salsa)
> ~~~
> {: .language-python}
>
> ~~~
> Ingredients in my salsa: ['hot peppers', 'onions', 'cilantro', 'tomatoes']
> ~~~
> {: .output}
>
> If you want variables with mutable values to be independent, you
> must make a copy of the value when you assign it.
>
> ~~~
> salsa = ['peppers', 'onions', 'cilantro', 'tomatoes']
> my_salsa = list(salsa) # <-- makes a *copy* of the list
> salsa[0] = 'hot peppers'
> print('Ingredients in my salsa:', my_salsa)
> ~~~
> {: .language-python}
>
> ~~~
> Ingredients in my salsa: ['peppers', 'onions', 'cilantro', 'tomatoes']
> ~~~
> {: .output}
>
> Because of pitfalls like this, code which modifies data in place can be more difficult to
> understand. However, it is often far more efficient to modify a large data structure in place
> than to create a modified copy for every small change. You should consider both of these aspects
> when writing your code.
{: .callout}
> ## Nested Lists
> Since a list can contain any Python variables, it can even contain other lists.
>
> For example, we could represent the products in the shelves of a small grocery shop:
>
> ~~~
> x = [['pepper', 'zucchini', 'onion'],
> ['cabbage', 'lettuce', 'garlic'],
> ['apple', 'pear', 'banana']]
> ~~~
> {: .language-python}
>
> Here is a visual example of how indexing a list of lists `x` works:
>
> [![x is represented as a pepper shaker containing several packets of pepper. [x[0]] is represented
> as a pepper shaker containing a single packet of pepper. x[0] is represented as a single packet of
> pepper. x[0][0] is represented as single grain of pepper. Adapted
> from @hadleywickham.](../fig/indexing_lists_python.png)][hadleywickham-tweet]
>
> Using the previously declared list `x`, these would be the results of the
> index operations shown in the image:
>
> ~~~
> print([x[0]])
> ~~~
> {: .language-python}
>
> ~~~
> [['pepper', 'zucchini', 'onion']]
> ~~~
> {: .output}
>
> ~~~
> print(x[0])
> ~~~
> {: .language-python}
>
> ~~~
> ['pepper', 'zucchini', 'onion']
> ~~~
> {: .output}
>
> ~~~
> print(x[0][0])
> ~~~
> {: .language-python}
>
> ~~~
> 'pepper'
> ~~~
> {: .output}
>
> Thanks to [Hadley Wickham][hadleywickham-tweet]
> for the image above.
{: .callout}
> ## Heterogeneous Lists
> Lists in Python can contain elements of different types. Example:
> ~~~
> sample_ages = [10, 12.5, 'Unknown']
> ~~~
> {: .language-python}
{: .callout}
There are many ways to change the contents of lists besides assigning new values to
individual elements:
~~~
odds.append(11)
print('odds after adding a value:', odds)
~~~
{: .language-python}
~~~
odds after adding a value: [1, 3, 5, 7, 11]
~~~
{: .output}
~~~
removed_element = odds.pop(0)
print('odds after removing the first element:', odds)
print('removed_element:', removed_element)
~~~
{: .language-python}
~~~
odds after removing the first element: [3, 5, 7, 11]
removed_element: 1
~~~
{: .output}
~~~
odds.reverse()
print('odds after reversing:', odds)
~~~
{: .language-python}
~~~
odds after reversing: [11, 7, 5, 3]
~~~
{: .output}
While modifying in place, it is useful to remember that Python treats lists in a slightly
counter-intuitive way.
As we saw earlier, when we modified the `salsa` list item in-place, if we make a list, (attempt to) copy it and then modify this list, we can cause all sorts of trouble. This also applies to modifying the list using the above functions:
~~~
odds = [1, 3, 5, 7]
primes = odds
primes.append(2)
print('primes:', primes)
print('odds:', odds)
~~~
{: .language-python}
~~~
primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7, 2]
~~~
{: .output}
This is because Python stores a list in memory, and then can use multiple names to refer to the
same list. If all we want to do is copy a (simple) list, we can again use the `list` function, so we do
not modify a list we did not mean to:
~~~
odds = [1, 3, 5, 7]
primes = list(odds)
primes.append(2)
print('primes:', primes)
print('odds:', odds)
~~~
{: .language-python}
~~~
primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7]
~~~
{: .output}
> ## Turn a String Into a List
>
> Use a for-loop to convert the string "hello" into a list of letters:
>
> ~~~
> ["h", "e", "l", "l", "o"]
> ~~~
> {: .language-python}
>
> Hint: You can create an empty list like this:
>
> ~~~
> my_list = []
> ~~~
> {: .language-python}
>
> > ## Solution
> > ~~~
> > my_list = []
> > for char in "hello":
> > my_list.append(char)
> > print(my_list)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
Subsets of lists and strings can be accessed by specifying ranges of values in brackets,
similar to how we accessed ranges of positions in a NumPy array.
This is commonly referred to as "slicing" the list/string.
~~~
binomial_name = "Drosophila melanogaster"
group = binomial_name[0:10]
print("group:", group)
species = binomial_name[11:23]
print("species:", species)
chromosomes = ["X", "Y", "2", "3", "4"]
autosomes = chromosomes[2:5]
print("autosomes:", autosomes)
last = chromosomes[-1]
print("last:", last)
~~~
{: .language-python}
~~~
group: Drosophila
species: melanogaster
autosomes: ["2", "3", "4"]
last: 4
~~~
{: .output}
> ## Slicing From the End
>
> Use slicing to access only the last four characters of a string or entries of a list.
>
> ~~~
> string_for_slicing = "Observation date: 02-Feb-2013"
> list_for_slicing = [["fluorine", "F"],
> ["chlorine", "Cl"],
> ["bromine", "Br"],
> ["iodine", "I"],
> ["astatine", "At"]]
> ~~~
> {: .language-python}
>
> ~~~
> "2013"
> [["chlorine", "Cl"], ["bromine", "Br"], ["iodine", "I"], ["astatine", "At"]]
> ~~~
> {: .output}
>
> Would your solution work regardless of whether you knew beforehand
> the length of the string or list
> (e.g. if you wanted to apply the solution to a set of lists of different lengths)?
> If not, try to change your approach to make it more robust.
>
> Hint: Remember that indices can be negative as well as positive
>
> > ## Solution
> > Use negative indices to count elements from the end of a container (such as list or string):
> >
> > ~~~
> > string_for_slicing[-4:]
> > list_for_slicing[-4:]
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Non-Continuous Slices
>
> So far we've seen how to use slicing to take single blocks
> of successive entries from a sequence.
> But what if we want to take a subset of entries
> that aren't next to each other in the sequence?
>
> You can achieve this by providing a third argument
> to the range within the brackets, called the _step size_.
> The example below shows how you can take every third entry in a list:
>
> ~~~
> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
> subset = primes[0:12:3]
> print("subset", subset)
> ~~~
> {: .language-python}
>
> ~~~
> subset [2, 7, 17, 29]
> ~~~
> {: .output}
>
> Notice that the slice taken begins with the first entry in the range,
> followed by entries taken at equally-spaced intervals (the steps) thereafter.
> If you wanted to begin the subset with the third entry,
> you would need to specify that as the starting point of the sliced range:
>
> ~~~
> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
> subset = primes[2:12:3]
> print("subset", subset)
> ~~~
> {: .language-python}
>
> ~~~
> subset [5, 13, 23, 37]
> ~~~
> {: .output}
>
> Use the step size argument to create a new string
> that contains only every other character in the string
> "In an octopus's garden in the shade"
>
> ~~~
> beatles = "In an octopus's garden in the shade"
> ~~~
> {: .language-python}
>
> ~~~
> I notpssgre ntesae
> ~~~
> {: .output}
>
> > ## Solution
> > To obtain every other character you need to provide a slice with the step
> > size of 2:
> >
> > ~~~
> > beatles[0:35:2]
> > ~~~
> > {: .language-python}
> >
> > You can also leave out the beginning and end of the slice to take the whole string
> > and provide only the step argument to go every second
> > element:
> >
> > ~~~
> > beatles[::2]
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
If you want to take a slice from the beginning of a sequence, you can omit the first index in the
range:
~~~
date = "Monday 4 January 2016"
day = date[0:6]
print("Using 0 to begin range:", day)
day = date[:6]
print("Omitting beginning index:", day)
~~~
{: .language-python}
~~~
Using 0 to begin range: Monday
Omitting beginning index: Monday
~~~
{: .output}
And similarly, you can omit the ending index in the range to take a slice to the very end of the
sequence:
~~~
months = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]
sond = months[8:12]
print("With known last position:", sond)
sond = months[8:len(months)]
print("Using len() to get last entry:", sond)
sond = months[8:]
print("Omitting ending index:", sond)
~~~
{: .language-python}
~~~
With known last position: ["sep", "oct", "nov", "dec"]
Using len() to get last entry: ["sep", "oct", "nov", "dec"]
Omitting ending index: ["sep", "oct", "nov", "dec"]
~~~
{: .output}
> ## Overloading
>
> `+` usually means addition, but when used on strings or lists, it means "concatenate".
> Given that, what do you think the multiplication operator `*` does on lists?
> In particular, what will be the output of the following code?
>
> ~~~
> counts = [2, 4, 6, 8, 10]
> repeats = counts * 2
> print(repeats)
> ~~~
> {: .language-python}
>
> 1. `[2, 4, 6, 8, 10, 2, 4, 6, 8, 10]`
> 2. `[4, 8, 12, 16, 20]`
> 3. `[[2, 4, 6, 8, 10],[2, 4, 6, 8, 10]]`
> 4. `[2, 4, 6, 8, 10, 4, 8, 12, 16, 20]`
>
> The technical term for this is *operator overloading*:
> a single operator, like `+` or `*`,
> can do different things depending on what it's applied to.
>
> > ## Solution
> >
> > The multiplication operator `*` used on a list replicates elements of the list and concatenates
> > them together:
> >
> > ~~~
> > [2, 4, 6, 8, 10, 2, 4, 6, 8, 10]
> > ~~~
> > {: .output}
> >
> > It's equivalent to:
> >
> > ~~~
> > counts + counts
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
[hadleywickham-tweet]: https://twitter.com/hadleywickham/status/643381054758363136
{% include links.md %}
---
title: Coding Style
teaching: 20
exercises: 10
questions:
- "How should I organise my code?"
- "What are some practical steps I can take to improve the quality and readability of my scripts?"
- "What tools exist to help me follow good coding style?"
objectives:
- "write and adjust code to follow standards of style and organisation."
- "use a linter to check and modify their code to follow PEP8."
- "provide sufficient documentation for their functions and scripts."
keypoints:
- "It is easier to read and maintain scripts and Jupyter notebooks that are well organised."
- "The most commonly-used style guide for Python is detailed in PEP8."
- "Linters such as `flake8` and `black` can help us follow style standards."
- "The rules and standards should be followed within reason, but exceptions can be made according to your best judgement."
---
## plan
- Toby currently scheduled to lead this session
- can base a lot of this on https://merely-useful.github.io/py-rse/py-rse-style.html
- something on project structure and file organization (?)
- specially relevant if planning to make a python package
- code organisation & jargon (packages, modules, files, classes, functions)
- a word about avoiding circular imports (?)
- PEP8
- `pycodestyle`/`pylint` - only warn, doesn't modify code - [see also this comparison](https://books.agiliq.com/projects/essential-python-tools/en/latest/linters.html)
- `black` - modifies code - note still [**beta**](https://github.com/psf/black#note-this-is-a-beta-product)
- documentation
- docstrings
- `sphinx`?
- include tips for good Jupyter hygiene
- name the notebook before you do anything else!
- be careful with cell order
- clear output before saving
{% include links.md %}
---
title: Coding Style
teaching: 20
exercises: 10
questions:
- "How can I practice the new skills I've learned?"
objectives:
- "apply their Python skills to solve more extensive challenges."
keypoints:
- "There are many coding challenges to be found online, which can be used to exercise your Python skills."
---
## plan
- advent of code
- rosalind
- must recommend/suggest challenges that use what we've covered in the previous material
{% include links.md %}
---
title: Analyzing Data from Multiple Files
teaching: 20
exercises: 0
questions:
- "How can I do the same operations on many different files?"
objectives:
- "Use a library function to get a list of filenames that match a wildcard pattern."
- "Write a `for` loop to process multiple files."
keypoints:
- "Use `glob.glob(pattern)` to create a list of files whose names match a pattern."
- "Use `*` in a pattern to match zero or more characters, and `?` to match any single character."
---
We now have almost everything we need to process all our data files.
The only thing that's missing is a library with a rather unpleasant name:
~~~
import glob
~~~
{: .language-python}
The `glob` library contains a function, also called `glob`,
that finds files and directories whose names match a pattern.
We provide those patterns as strings:
the character `*` matches zero or more characters,
while `?` matches any one character.
We can use this to get the names of all the CSV files in the current directory:
~~~
print(glob.glob('inflammation*.csv'))
~~~
{: .language-python}
~~~
['inflammation-05.csv', 'inflammation-11.csv', 'inflammation-12.csv', 'inflammation-08.csv',
'inflammation-03.csv', 'inflammation-06.csv', 'inflammation-09.csv', 'inflammation-07.csv',
'inflammation-10.csv', 'inflammation-02.csv', 'inflammation-04.csv', 'inflammation-01.csv']
~~~
{: .output}
As these examples show,
`glob.glob`'s result is a list of file and directory paths in arbitrary order.
This means we can loop over it
to do something with each filename in turn.
In our case,
the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
If we want to start by analyzing just the first three files in alphabetical order, we can use the
`sorted` built-in function to generate a new sorted list from the `glob.glob` output:
~~~
import glob
import numpy
import matplotlib.pyplot
filenames = sorted(glob.glob('inflammation*.csv'))
filenames = filenames[0:3]
for filename in filenames:
print(filename)
data = numpy.loadtxt(fname=filename, delimiter=',')
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(numpy.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(numpy.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(numpy.min(data, axis=0))
fig.tight_layout()
matplotlib.pyplot.show()
~~~
{: .language-python}
~~~
inflammation-01.csv
~~~
{: .output}
![Analysis of inflammation-01.csv](../fig/03-loop_49_1.png)
~~~
inflammation-02.csv
~~~
{: .output}
![Analysis of inflammation-02.csv](../fig/03-loop_49_3.png)
~~~
inflammation-03.csv
~~~
{: .output}
![Analysis of inflammation-03.csv](../fig/03-loop_49_5.png)
Sure enough,
the maxima of the first two data sets show exactly the same ramp as the first,
and their minima show the same staircase structure;
a different situation has been revealed in the third dataset,
where the maxima are a bit less regular, but the minima are consistently zero.
> ## Plotting Differences
>
> Plot the difference between the average inflammations reported in the first and second datasets
> (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly),
> i.e., the difference between the leftmost plots of the first two figures.
>
> > ## Solution
> > ~~~
> > import glob
> > import numpy
> > import matplotlib.pyplot
> >
> > filenames = sorted(glob.glob('inflammation*.csv'))
> >
> > data0 = numpy.loadtxt(fname=filenames[0], delimiter=',')
> > data1 = numpy.loadtxt(fname=filenames[1], delimiter=',')
> >
> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
> >
> > matplotlib.pyplot.ylabel('Difference in average')
> > matplotlib.pyplot.plot(numpy.mean(data0, axis=0) - numpy.mean(data1, axis=0))
> >
> > fig.tight_layout()
> > matplotlib.pyplot.show()
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Generate Composite Statistics
>
> Use each of the files once to generate a dataset containing values averaged over all patients:
>
> ~~~
> filenames = glob.glob('inflammation*.csv')
> composite_data = numpy.zeros((60,40))
> for filename in filenames:
> # sum each new file's data into composite_data as it's read
> #
> # and then divide the composite_data by number of samples
> composite_data = composite_data / len(filenames)
> ~~~
> {: .language-python}
>
> Then use pyplot to generate average, max, and min for all patients.
>
> > ## Solution
> > ~~~
> > import glob
> > import numpy
> > import matplotlib.pyplot
> >
> > filenames = glob.glob('inflammation*.csv')
> > composite_data = numpy.zeros((60,40))
> >
> > for filename in filenames:
> > data = numpy.loadtxt(fname = filename, delimiter=',')
> > composite_data = composite_data + data
> >
> > composite_data = composite_data / len(filenames)
> >
> > fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))
> >
> > axes1 = fig.add_subplot(1, 3, 1)
> > axes2 = fig.add_subplot(1, 3, 2)
> > axes3 = fig.add_subplot(1, 3, 3)
> >
> > axes1.set_ylabel('average')
> > axes1.plot(numpy.mean(composite_data, axis=0))
> >
> > axes2.set_ylabel('max')
> > axes2.plot(numpy.max(composite_data, axis=0))
> >
> > axes3.set_ylabel('min')
> > axes3.plot(numpy.min(composite_data, axis=0))
> >
> > fig.tight_layout()
> >
> > matplotlib.pyplot.show()
> > ~~~
> > {: .language-python}
>{: .solution}
{: .challenge}
{% include links.md %}
---
title: Making Choices
teaching: 30
exercises: 0
questions:
- "How can my programs do different things based on data values?"
objectives:
- "Write conditional statements including `if`, `elif`, and `else` branches."
- "Correctly evaluate expressions containing `and` and `or`."
keypoints:
- "Use `if condition` to start a conditional statement, `elif condition` to
provide additional tests, and `else` to provide a default."
- "The bodies of the branches of conditional statements must be indented."
- "Use `==` to test for equality."
- "`X and Y` is only true if both `X` and `Y` are true."
- "`X or Y` is true if either `X` or `Y`, or both, are true."
- "Zero, the empty string, and the empty list are considered false;
all other numbers, strings, and lists are considered true."
- "`True` and `False` represent truth values."
---
In our last lesson, we discovered something suspicious was going on
in our inflammation data by drawing some plots.
How can we use Python to automatically recognize the different features we saw,
and take a different action for each? In this lesson, we'll learn how to write code that
runs only when certain conditions are true.
## Conditionals
We can ask Python to take different actions, depending on a condition, with an `if` statement:
~~~
num = 37
if num > 100:
print('greater')
else:
print('not greater')
print('done')
~~~
{: .language-python}
~~~
not greater
done
~~~
{: .output}
The second line of this code uses the keyword `if` to tell Python that we want to make a choice.
If the test that follows the `if` statement is true,
the body of the `if`
(i.e., the set of lines indented underneath it) is executed, and "greater" is printed.
If the test is false,
the body of the `else` is executed instead, and "not greater" is printed.
Only one or the other is ever executed before continuing on with program execution to print "done":
![A flowchart diagram of the if-else construct that tests if variable num is greater than 100](../fig/python-flowchart-conditional.png)
Conditional statements don't have to include an `else`.
If there isn't one,
Python simply does nothing if the test is false:
~~~
num = 53
print('before conditional...')
if num > 100:
print(num,' is greater than 100')
print('...after conditional')
~~~
{: .language-python}
~~~
before conditional...
...after conditional
~~~
{: .output}
We can also chain several tests together using `elif`,
which is short for "else if".
The following Python code uses `elif` to print the sign of a number.
~~~
num = -3
if num > 0:
print(num, 'is positive')
elif num == 0:
print(num, 'is zero')
else:
print(num, 'is negative')
~~~
{: .language-python}
~~~
-3 is negative
~~~
{: .output}
Note that to test for equality we use a double equals sign `==`
rather than a single equals sign `=` which is used to assign values.
We can also combine tests using `and` and `or`.
`and` is only true if both parts are true:
~~~
if (1 > 0) and (-1 > 0):
print('both parts are true')
else:
print('at least one part is false')
~~~
{: .language-python}
~~~
at least one part is false
~~~
{: .output}
while `or` is true if at least one part is true:
~~~
if (1 < 0) or (-1 < 0):
print('at least one test is true')
~~~
{: .language-python}
~~~
at least one test is true
~~~
{: .output}
> ## `True` and `False`
> `True` and `False` are special words in Python called `booleans`,
> which represent truth values. A statement such as `1 < 0` returns
> the value `False`, while `-1 < 0` returns the value `True`.
{: .callout}
## Checking our Data
Now that we've seen how conditionals work,
we can use them to check for the suspicious features we saw in our inflammation data.
We are about to use functions provided by the `numpy` module again.
Therefore, if you're working in a new Python session, make sure to load the
module with:
~~~
import numpy
~~~
{: .language-python}
From the first couple of plots, we saw that maximum daily inflammation exhibits
a strange behavior and raises one unit a day.
Wouldn't it be a good idea to detect such behavior and report it as suspicious?
Let's do that!
However, instead of checking every single day of the study, let's merely check
if maximum inflammation in the beginning (day 0) and in the middle (day 20) of
the study are equal to the corresponding day numbers.
~~~
max_inflammation_0 = numpy.max(data, axis=0)[0]
max_inflammation_20 = numpy.max(data, axis=0)[20]
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print('Suspicious looking maxima!')
~~~
{: .language-python}
We also saw a different problem in the third dataset;
the minima per day were all zero (looks like a healthy person snuck into our study).
We can also check for this with an `elif` condition:
~~~
elif numpy.sum(numpy.min(data, axis=0)) == 0:
print('Minima add up to zero!')
~~~
{: .language-python}
And if neither of these conditions are true, we can use `else` to give the all-clear:
~~~
else:
print('Seems OK!')
~~~
{: .language-python}
Let's test that out:
~~~
data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')
max_inflammation_0 = numpy.max(data, axis=0)[0]
max_inflammation_20 = numpy.max(data, axis=0)[20]
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print('Suspicious looking maxima!')
elif numpy.sum(numpy.min(data, axis=0)) == 0:
print('Minima add up to zero!')
else:
print('Seems OK!')
~~~
{: .language-python}
~~~
Suspicious looking maxima!
~~~
{: .output}
~~~
data = numpy.loadtxt(fname='inflammation-03.csv', delimiter=',')
max_inflammation_0 = numpy.max(data, axis=0)[0]
max_inflammation_20 = numpy.max(data, axis=0)[20]
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print('Suspicious looking maxima!')
elif numpy.sum(numpy.min(data, axis=0)) == 0:
print('Minima add up to zero!')
else:
print('Seems OK!')
~~~
{: .language-python}
~~~
Minima add up to zero!
~~~
{: .output}
In this way,
we have asked Python to do something different depending on the condition of our data.
Here we printed messages in all cases,
but we could also imagine not using the `else` catch-all
so that messages are only printed when something is wrong,
freeing us from having to manually examine every plot for features we've seen before.
> ## How Many Paths?
>
> Consider this code:
>
> ~~~
> if 4 > 5:
> print('A')
> elif 4 == 5:
> print('B')
> elif 4 < 5:
> print('C')
> ~~~
> {: .language-python}
>
> Which of the following would be printed if you were to run this code?
> Why did you pick this answer?
>
> 1. A
> 2. B
> 3. C
> 4. B and C
>
> > ## Solution
> > C gets printed because the first two conditions, `4 > 5` and `4 == 5`, are not true,
> > but `4 < 5` is true.
> {: .solution}
{: .challenge}
> ## What Is Truth?
>
> `True` and `False` booleans are not the only values in Python that are true and false.
> In fact, *any* value can be used in an `if` or `elif`.
> After reading and running the code below,
> explain what the rule is for which values are considered true and which are considered false.
>
> ~~~
> if '':
> print('empty string is true')
> if 'word':
> print('word is true')
> if []:
> print('empty list is true')
> if [1, 2, 3]:
> print('non-empty list is true')
> if 0:
> print('zero is true')
> if 1:
> print('one is true')
> ~~~
> {: .language-python}
{: .challenge}
> ## That's Not Not What I Meant
>
> Sometimes it is useful to check whether some condition is not true.
> The Boolean operator `not` can do this explicitly.
> After reading and running the code below,
> write some `if` statements that use `not` to test the rule
> that you formulated in the previous challenge.
>
> ~~~
> if not '':
> print('empty string is not true')
> if not 'word':
> print('word is not true')
> if not not True:
> print('not not True is true')
> ~~~
> {: .language-python}
{: .challenge}
> ## Close Enough
>
> Write some conditions that print `True` if the variable `a` is within 10% of the variable `b`
> and `False` otherwise.
> Compare your implementation with your partner's:
> do you get the same answer for all possible pairs of numbers?
>
> > ## Solution 1
> > ~~~
> > a = 5
> > b = 5.1
> >
> > if abs(a - b) < 0.1 * abs(b):
> > print('True')
> > else:
> > print('False')
> > ~~~
> > {: .language-python}
> {: .solution}
>
> > ## Solution 2
> > ~~~
> > print(abs(a - b) < 0.1 * abs(b))
> > ~~~
> > {: .language-python}
> >
> > This works because the Booleans `True` and `False`
> > have string representations which can be printed.
> {: .solution}
{: .challenge}
> ## In-Place Operators
>
> Python (and most other languages in the C family) provides
> [in-place operators]({{ page.root }}/reference/#in-place-operators)
> that work like this:
>
> ~~~
> x = 1 # original value
> x += 1 # add one to x, assigning result back to x
> x *= 3 # multiply x by 3
> print(x)
> ~~~
> {: .language-python}
>
> ~~~
> 6
> ~~~
> {: .output}
>
> Write some code that sums the positive and negative numbers in a list separately,
> using in-place operators.
> Do you think the result is more or less readable
> than writing the same without in-place operators?
>
> > ## Solution
> > ~~~
> > positive_sum = 0
> > negative_sum = 0
> > test_list = [3, 4, 6, 1, -1, -5, 0, 7, -8]
> > for num in test_list:
> > if num > 0:
> > positive_sum += num
> > elif num == 0:
> > pass
> > else:
> > negative_sum += num
> > print(positive_sum, negative_sum)
> > ~~~
> > {: .language-python}
> >
> > Here `pass` means "don't do anything".
> In this particular case, it's not actually needed, since if `num == 0` neither
> > sum needs to change, but it illustrates the use of `elif` and `pass`.
> {: .solution}
{: .challenge}
> ## Sorting a List Into Buckets
>
> In our `data` folder, large data sets are stored in files whose names start with
> "inflammation-" and small data sets -- in files whose names start with "small-". We
> also have some other files that we do not care about at this point. We'd like to break all
> these files into three lists called `large_files`, `small_files`, and `other_files`,
> respectively.
>
> Add code to the template below to do this. Note that the string method
> [`startswith`](https://docs.python.org/3/library/stdtypes.html#str.startswith)
> returns `True` if and only if the string it is called on starts with the string
> passed as an argument, that is:
>
> ~~~
> "String".startswith("Str")
> ~~~
> {: .language-python}
> ~~~
> True
> ~~~
> {: .output}
> But
> ~~~
> "String".startswith("str")
> ~~~
> {: .language-python}
> ~~~
> False
> ~~~
> {: .output}
>Use the following Python code as your starting point:
> ~~~
> filenames = ['inflammation-01.csv',
> 'myscript.py',
> 'inflammation-02.csv',
> 'small-01.csv',
> 'small-02.csv']
> large_files = []
> small_files = []
> other_files = []
> ~~~
> {: .language-python}
>
> Your solution should:
>
> 1. loop over the names of the files
> 2. figure out which group each filename belongs in
> 3. append the filename to that list
>
> In the end the three lists should be:
>
> ~~~
> large_files = ['inflammation-01.csv', 'inflammation-02.csv']
> small_files = ['small-01.csv', 'small-02.csv']
> other_files = ['myscript.py']
> ~~~
> {: .language-python}
>
> > ## Solution
> > ~~~
> > for filename in filenames:
> > if filename.startswith('inflammation-'):
> > large_files.append(filename)
> > elif filename.startswith('small-'):
> > small_files.append(filename)
> > else:
> > other_files.append(filename)
> >
> > print('large_files:', large_files)
> > print('small_files:', small_files)
> > print('other_files:', other_files)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Counting Vowels
>
> 1. Write a loop that counts the number of vowels in a character string.
> 2. Test it on a few individual words and full sentences.
> 3. Once you are done, compare your solution to your neighbor's.
> Did you make the same decisions about how to handle the letter 'y'
> (which some people think is a vowel, and some do not)?
>
> > ## Solution
> > ~~~
> > vowels = 'aeiouAEIOU'
> > sentence = 'Mary had a little lamb.'
> > count = 0
> > for char in sentence:
> > if char in vowels:
> > count += 1
> >
> > print("The number of vowels in this string is " + str(count))
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
{% include links.md %}
This diff is collapsed.
---
title: Errors and Exceptions
teaching: 30
exercises: 0
questions:
- "How does Python report errors?"
- "How can I handle errors in Python programs?"
objectives:
- "To be able to read a traceback, and determine where the error took place and what type it is."
- "To be able to describe the types of situations in which syntax errors,
indentation errors, name errors, index errors, and missing file errors occur."
keypoints:
- "Tracebacks can look intimidating, but they give us a lot of useful information about
what went wrong in our program, including where the error occurred and
what type of error it was."
- "An error having to do with the 'grammar' or syntax of the program is called a `SyntaxError`.
If the issue has to do with how the code is indented,
then it will be called an `IndentationError`."
- "A `NameError` will occur when trying to use a variable that does not exist. Possible causes are
that a variable definition is missing, a variable reference differs from its definition
in spelling or capitalization, or the code contains a string that is missing quotes around it."
- "Containers like lists and strings will generate errors if you try to access items
in them that do not exist. This type of error is called an `IndexError`."
- "Trying to read a file that does not exist will give you an `FileNotFoundError`.
Trying to read a file that is open for writing, or writing to a file that is open for reading,
will give you an `IOError`."
---
Every programmer encounters errors,
both those who are just beginning,
and those who have been programming for years.
Encountering errors and exceptions can be very frustrating at times,
and can make coding feel like a hopeless endeavour.
However,
understanding what the different types of errors are
and when you are likely to encounter them can help a lot.
Once you know *why* you get certain types of errors,
they become much easier to fix.
Errors in Python have a very specific form,
called a [traceback]({{ page.root }}/reference/#traceback).
Let's examine one:
~~~
# This code has an intentional error. You can type it directly or
# use it for reference to understand the error message below.
def favorite_ice_cream():
ice_creams = [
"chocolate",
"vanilla",
"strawberry"
]
print(ice_creams[3])
favorite_ice_cream()
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-1-70bd89baa4df> in <module>()
6 print(ice_creams[3])
7
----> 8 favorite_ice_cream()
<ipython-input-1-70bd89baa4df> in favorite_ice_cream()
4 "vanilla", "strawberry"
5 ]
----> 6 print(ice_creams[3])
7
8 favorite_ice_cream()
IndexError: list index out of range
~~~
{: .error}
This particular traceback has two levels.
You can determine the number of levels by looking for the number of arrows on the left hand side.
In this case:
1. The first shows code from the cell above,
with an arrow pointing to Line 8 (which is `favorite_ice_cream()`).
2. The second shows some code in the function `favorite_ice_cream`,
with an arrow pointing to Line 6 (which is `print(ice_creams[3])`).
The last level is the actual place where the error occurred.
The other level(s) show what function the program executed to get to the next level down.
So, in this case, the program first performed a
[function call]({{ page.root }}/reference/#function-call) to the function `favorite_ice_cream`.
Inside this function,
the program encountered an error on Line 6, when it tried to run the code `print(ice_creams[3])`.
> ## Long Tracebacks
>
> Sometimes, you might see a traceback that is very long
> -- sometimes they might even be 20 levels deep!
> This can make it seem like something horrible happened,
> but the length of the error message does not reflect severity, rather,
> it indicates that your program called many functions before it encountered the error.
> Most of the time, the actual place where the error occurred is at the bottom-most level,
> so you can skip down the traceback to the bottom.
{: .callout}
So what error did the program actually encounter?
In the last line of the traceback,
Python helpfully tells us the category or type of error (in this case, it is an `IndexError`)
and a more detailed error message (in this case, it says "list index out of range").
If you encounter an error and don't know what it means,
it is still important to read the traceback closely.
That way,
if you fix the error,
but encounter a new one,
you can tell that the error changed.
Additionally,
sometimes knowing *where* the error occurred is enough to fix it,
even if you don't entirely understand the message.
If you do encounter an error you don't recognize,
try looking at the
[official documentation on errors](http://docs.python.org/3/library/exceptions.html).
However,
note that you may not always be able to find the error there,
as it is possible to create custom errors.
In that case,
hopefully the custom error message is informative enough to help you figure out what went wrong.
## Syntax Errors
When you forget a colon at the end of a line,
accidentally add one space too many when indenting under an `if` statement,
or forget a parenthesis,
you will encounter a [syntax error]({{ page.root }}/reference/#syntax-error).
This means that Python couldn't figure out how to read your program.
This is similar to forgetting punctuation in English:
for example,
this text is difficult to read there is no punctuation there is also no capitalization
why is this hard because you have to figure out where each sentence ends
you also have to figure out where each sentence begins
to some extent it might be ambiguous if there should be a sentence break or not
People can typically figure out what is meant by text with no punctuation,
but people are much smarter than computers.
If Python doesn't know how to read the program,
it will give up and inform you with an error.
For example:
~~~
def some_function()
msg = "hello, world!"
print(msg)
return msg
~~~
{: .language-python}
~~~
File "<ipython-input-3-6bb841ea1423>", line 1
def some_function()
^
SyntaxError: invalid syntax
~~~
{: .error}
Here, Python tells us that there is a `SyntaxError` on line 1,
and even puts a little arrow in the place where there is an issue.
In this case the problem is that the function definition is missing a colon at the end.
Actually, the function above has *two* issues with syntax.
If we fix the problem with the colon,
we see that there is *also* an `IndentationError`,
which means that the lines in the function definition do not all have the same indentation:
~~~
def some_function():
msg = "hello, world!"
print(msg)
return msg
~~~
{: .language-python}
~~~
File "<ipython-input-4-ae290e7659cb>", line 4
return msg
^
IndentationError: unexpected indent
~~~
{: .error}
Both `SyntaxError` and `IndentationError` indicate a problem with the syntax of your program,
but an `IndentationError` is more specific:
it *always* means that there is a problem with how your code is indented.
> ## Tabs and Spaces
>
> Some indentation errors are harder to spot than others.
> In particular, mixing spaces and tabs can be difficult to spot
> because they are both [whitespace]({{ page.root }}/reference/#whitespace).
> In the example below, the first two lines in the body of the function
> `some_function` are indented with tabs, while the third line &mdash; with spaces.
> If you're working in a Jupyter notebook, be sure to copy and paste this example
> rather than trying to type it in manually because Jupyter automatically replaces
> tabs with spaces.
>
> ~~~
> def some_function():
> msg = "hello, world!"
> print(msg)
> return msg
> ~~~
> {: .language-python}
>
> Visually it is impossible to spot the error.
> Fortunately, Python does not allow you to mix tabs and spaces.
>
> ~~~
> File "<ipython-input-5-653b36fbcd41>", line 4
> return msg
> ^
> TabError: inconsistent use of tabs and spaces in indentation
> ~~~
> {: .error}
{: .callout}
## Variable Name Errors
Another very common type of error is called a `NameError`,
and occurs when you try to use a variable that does not exist.
For example:
~~~
print(a)
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-9d7b17ad5387> in <module>()
----> 1 print(a)
NameError: name 'a' is not defined
~~~
{: .error}
Variable name errors come with some of the most informative error messages,
which are usually of the form "name 'the_variable_name' is not defined".
Why does this error message occur?
That's a harder question to answer,
because it depends on what your code is supposed to do.
However,
there are a few very common reasons why you might have an undefined variable.
The first is that you meant to use a
[string]({{ page.root }}/reference/#string), but forgot to put quotes around it:
~~~
print(hello)
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-8-9553ee03b645> in <module>()
----> 1 print(hello)
NameError: name 'hello' is not defined
~~~
{: .error}
The second reason is that you might be trying to use a variable that does not yet exist.
In the following example,
`count` should have been defined (e.g., with `count = 0`) before the for loop:
~~~
for number in range(10):
count = count + number
print("The count is:", count)
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-9-dd6a12d7ca5c> in <module>()
1 for number in range(10):
----> 2 count = count + number
3 print("The count is:", count)
NameError: name 'count' is not defined
~~~
{: .error}
Finally, the third possibility is that you made a typo when you were writing your code.
Let's say we fixed the error above by adding the line `Count = 0` before the for loop.
Frustratingly, this actually does not fix the error.
Remember that variables are [case-sensitive]({{ page.root }}/reference/#case-sensitive),
so the variable `count` is different from `Count`. We still get the same error,
because we still have not defined `count`:
~~~
Count = 0
for number in range(10):
count = count + number
print("The count is:", count)
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-10-d77d40059aea> in <module>()
1 Count = 0
2 for number in range(10):
----> 3 count = count + number
4 print("The count is:", count)
NameError: name 'count' is not defined
~~~
{: .error}
## Index Errors
Next up are errors having to do with containers (like lists and strings) and the items within them.
If you try to access an item in a list or a string that does not exist,
then you will get an error.
This makes sense:
if you asked someone what day they would like to get coffee,
and they answered "caturday",
you might be a bit annoyed.
Python gets similarly annoyed if you try to ask it for an item that doesn't exist:
~~~
letters = ['a', 'b', 'c']
print("Letter #1 is", letters[0])
print("Letter #2 is", letters[1])
print("Letter #3 is", letters[2])
print("Letter #4 is", letters[3])
~~~
{: .language-python}
~~~
Letter #1 is a
Letter #2 is b
Letter #3 is c
~~~
{: .output}
~~~
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-11-d817f55b7d6c> in <module>()
3 print("Letter #2 is", letters[1])
4 print("Letter #3 is", letters[2])
----> 5 print("Letter #4 is", letters[3])
IndexError: list index out of range
~~~
{: .error}
Here,
Python is telling us that there is an `IndexError` in our code,
meaning we tried to access a list index that did not exist.
## File Errors
The last type of error we'll cover today
are those associated with reading and writing files: `FileNotFoundError`.
If you try to read a file that does not exist,
you will receive a `FileNotFoundError` telling you so.
If you attempt to write to a file that was opened read-only, Python 3
returns an `UnsupportedOperationError`.
More generally, problems with input and output manifest as
`IOError`s or `OSError`s, depending on the version of Python you use.
~~~
file_handle = open('myfile.txt', 'r')
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-14-f6e1ac4aee96> in <module>()
----> 1 file_handle = open('myfile.txt', 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
~~~
{: .error}
One reason for receiving this error is that you specified an incorrect path to the file.
For example,
if I am currently in a folder called `myproject`,
and I have a file in `myproject/writing/myfile.txt`,
but I try to open `myfile.txt`,
this will fail.
The correct path would be `writing/myfile.txt`.
It is also possible that the file name or its path contains a typo.
A related issue can occur if you use the "read" flag instead of the "write" flag.
Python will not give you an error if you try to open a file for writing
when the file does not exist.
However,
if you meant to open a file for reading,
but accidentally opened it for writing,
and then try to read from it,
you will get an `UnsupportedOperation` error
telling you that the file was not opened for reading:
~~~
file_handle = open('myfile.txt', 'w')
file_handle.read()
~~~
{: .language-python}
~~~
---------------------------------------------------------------------------
UnsupportedOperation Traceback (most recent call last)
<ipython-input-15-b846479bc61f> in <module>()
1 file_handle = open('myfile.txt', 'w')
----> 2 file_handle.read()
UnsupportedOperation: not readable
~~~
{: .error}
These are the most common errors with files,
though many others exist.
If you get an error that you've never seen before,
searching the Internet for that error type
often reveals common reasons why you might get that error.
> ## Reading Error Messages
>
> Read the Python code and the resulting traceback below, and answer the following questions:
>
> 1. How many levels does the traceback have?
> 2. What is the function name where the error occurred?
> 3. On which line number in this function did the error occur?
> 4. What is the type of error?
> 5. What is the error message?
>
> ~~~
> # This code has an intentional error. Do not type it directly;
> # use it for reference to understand the error message below.
> def print_message(day):
> messages = {
> "monday": "Hello, world!",
> "tuesday": "Today is Tuesday!",
> "wednesday": "It is the middle of the week.",
> "thursday": "Today is Donnerstag in German!",
> "friday": "Last day of the week!",
> "saturday": "Hooray for the weekend!",
> "sunday": "Aw, the weekend is almost over."
> }
> print(messages[day])
>
> def print_friday_message():
> print_message("Friday")
>
> print_friday_message()
> ~~~
> {: .language-python}
>
> ~~~
> ---------------------------------------------------------------------------
> KeyError Traceback (most recent call last)
> <ipython-input-1-4be1945adbe2> in <module>()
> 14 print_message("Friday")
> 15
> ---> 16 print_friday_message()
>
> <ipython-input-1-4be1945adbe2> in print_friday_message()
> 12
> 13 def print_friday_message():
> ---> 14 print_message("Friday")
> 15
> 16 print_friday_message()
>
> <ipython-input-1-4be1945adbe2> in print_message(day)
> 9 "sunday": "Aw, the weekend is almost over."
> 10 }
> ---> 11 print(messages[day])
> 12
> 13 def print_friday_message():
>
> KeyError: 'Friday'
> ~~~
> {: .error}
>
> > ## Solution
> > 1. 3 levels
> > 2. `print_message`
> > 3. 11
> > 4. `KeyError`
> > 5. There isn't really a message; you're supposed to infer that `Friday` is not a key in `messages`.
> {: .solution}
{: .challenge}
> ## Identifying Syntax Errors
>
> 1. Read the code below, and (without running it) try to identify what the errors are.
> 2. Run the code, and read the error message. Is it a `SyntaxError` or an `IndentationError`?
> 3. Fix the error.
> 4. Repeat steps 2 and 3, until you have fixed all the errors.
>
> ~~~
> def another_function
> print("Syntax errors are annoying.")
> print("But at least Python tells us about them!")
> print("So they are usually not too hard to fix.")
> ~~~
> {: .language-python}
>
> > ## Solution
> > `SyntaxError` for missing `():` at end of first line,
> `IndentationError` for mismatch between second and third lines.
> > A fixed version is:
> >
> > ~~~
> > def another_function():
> > print("Syntax errors are annoying.")
> > print("But at least Python tells us about them!")
> > print("So they are usually not too hard to fix.")
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Identifying Variable Name Errors
>
> 1. Read the code below, and (without running it) try to identify what the errors are.
> 2. Run the code, and read the error message.
> What type of `NameError` do you think this is?
> In other words, is it a string with no quotes,
> a misspelled variable,
> or a variable that should have been defined but was not?
> 3. Fix the error.
> 4. Repeat steps 2 and 3, until you have fixed all the errors.
>
> ~~~
> for number in range(10):
> # use a if the number is a multiple of 3, otherwise use b
> if (Number % 3) == 0:
> message = message + a
> else:
> message = message + "b"
> print(message)
> ~~~
> {: .language-python}
>
> > ## Solution
> > 3 `NameError`s for `number` being misspelled, for `message` not defined,
> > and for `a` not being in quotes.
> >
> > Fixed version:
> >
> > ~~~
> > message = ""
> > for number in range(10):
> > # use a if the number is a multiple of 3, otherwise use b
> > if (number % 3) == 0:
> > message = message + "a"
> > else:
> > message = message + "b"
> > print(message)
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
> ## Identifying Index Errors
>
> 1. Read the code below, and (without running it) try to identify what the errors are.
> 2. Run the code, and read the error message. What type of error is it?
> 3. Fix the error.
>
> ~~~
> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
> print('My favorite season is ', seasons[4])
> ~~~
> {: .language-python}
>
> > ## Solution
> > `IndexError`; the last entry is `seasons[3]`, so `seasons[4]` doesn't make sense.
> > A fixed version is:
> >
> > ~~~
> > seasons = ['Spring', 'Summer', 'Fall', 'Winter']
> > print('My favorite season is ', seasons[-1])
> > ~~~
> > {: .language-python}
> {: .solution}
{: .challenge}
{% include links.md %}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
{% include base_path.html %} {% include base_path.html %}
[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ [cc-by-human]: https://creativecommons.org/licenses/by/4.0/
[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode [cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
[cdh]: https://cdh.carpentries.org/
[ci]: http://communityin.org/ [ci]: http://communityin.org/
[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html [coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html [coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
...@@ -24,10 +25,12 @@ ...@@ -24,10 +25,12 @@
[lesson-aio]: {{ relative_root_path }}{% link aio.md %} [lesson-aio]: {{ relative_root_path }}{% link aio.md %}
[lesson-coc]: {{ relative_root_path }}{% link CODE_OF_CONDUCT.md %} [lesson-coc]: {{ relative_root_path }}{% link CODE_OF_CONDUCT.md %}
[lesson-example]: https://carpentries.github.io/lesson-example/ [lesson-example]: https://carpentries.github.io/lesson-example/
[lesson-example-blockquotes]: https://carpentries.github.io/lesson-example/04-formatting/index.html#special-blockquotes
[lesson-license]: {{ relative_root_path }}{% link LICENSE.md %} [lesson-license]: {{ relative_root_path }}{% link LICENSE.md %}
[lesson-mainpage]: {{ relative_root_path }}{% link index.md %} [lesson-mainpage]: {{ relative_root_path }}{% link index.md %}
[lesson-reference]: {{ relative_root_path }}{% link reference.md %} [lesson-reference]: {{ relative_root_path }}{% link reference.md %}
[lesson-setup]: {{ relative_root_path }}{% link setup.md %} [lesson-setup]: {{ relative_root_path }}{% link setup.md %}
[markdown-cheatsheet]: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
[mit-license]: https://opensource.org/licenses/mit-license.html [mit-license]: https://opensource.org/licenses/mit-license.html
[morea]: https://morea-framework.github.io/ [morea]: https://morea-framework.github.io/
[numfocus]: https://numfocus.org/ [numfocus]: https://numfocus.org/
...@@ -43,6 +46,7 @@ ...@@ -43,6 +46,7 @@
[rubygems]: https://rubygems.org/pages/download/ [rubygems]: https://rubygems.org/pages/download/
[styles]: https://github.com/carpentries/styles/ [styles]: https://github.com/carpentries/styles/
[swc-lessons]: https://software-carpentry.org/lessons/ [swc-lessons]: https://software-carpentry.org/lessons/
[swc-python-gapminder]: http://swcarpentry.github.io/python-novice-gapminder/
[swc-releases]: https://github.com/swcarpentry/swc-releases [swc-releases]: https://github.com/swcarpentry/swc-releases
[training]: https://carpentries.github.io/instructor-training/ [training]: https://carpentries.github.io/instructor-training/
[workshop-repo]: {{ site.workshop_repo }} [workshop-repo]: {{ site.workshop_repo }}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment