Verified Commit 8302c3ba authored by Toby Hodges's avatar Toby Hodges Committed by Renato Alves
Browse files

add TOC to episodes

parent e4ee747e
......@@ -60,7 +60,23 @@ keypoints:
In this session we will cover additional syntactic elements providing examples of their use along the way.
### Many keywords
- [Keywords](#many-keywords)
- [`del`](#delete-what-is-no-longer-needed)
- [`while`](#for-good-measure-while-we-are-here)
- [Sets](#setting-things-straight)
- [String formatting](#string-formatting-variants)
- [Tracebacks and exceptions](#expecting-the-unexpected)
- [Advanced function definition](#advanced-function-definition)
- [Argument expansion outside functions](#argument-expansion-outside-functions)
- [Generators](#generators)
- [Comprehensions](#comprehensions)
- [File handling and `with`](#file-handling-and-with)
- [Useful standard library modules](#useful-standard-library-modules)
- [`glob`](#globbing-patterns)
- [`collections`](#convenient-collections)
- [Python code in the wild](#python-code-in-the-wild)
## Many keywords
When using a text editor that is capable of coloring the code,
referred to as *syntax highlighting*,
......@@ -90,7 +106,7 @@ which in Python 3.7 prints:
~~~
{: .language-python}
#### `del`ete what is no longer needed
### `del`ete what is no longer needed
The `del` keyword is used to delete elements from containers such as `list` and `dict`
or to delete variables and the associated data from memory.
......@@ -136,7 +152,7 @@ del shopping_list[1]
> {: .solution}
{: .challenge}
#### `for` good measure `while` we are here
### `for` good measure `while` we are here
When trying to repeat actions in your code, such as applying a mathematical operation to a list of numbers,
you typically resort to using a loop.
......@@ -275,7 +291,7 @@ We are done with the loop
> {: .solution}
{: .challenge}
#### `Set`ting things straight
## `Set`ting things straight
When working with collections of objects, finding common patterns or building a [Venn diagram](https://en.wikipedia.org/wiki/Venn_diagram#Overview),
you may feel tempted to calculate *union* and *intersection* using `list` and `for` loops,
......@@ -381,6 +397,7 @@ Whenever you want to find out if a value exists in a collection, but don't care
about _where_ in that collection it exists,
your code will run much faster if you're looking up those values
in a `set` (or `dict`ionary), instead of a `list`.
## String formatting variants
When dealing with or producing text using content stored in different variables
......@@ -1075,7 +1092,7 @@ Positional arguments: (4, 3, 3, 5) - Keyword arguments: {'name': 'John', 'age':
> {: .solution}
{: .challenge}
#### Argument expansion outside functions
### Argument expansion outside functions
The `*` and `**` syntax can also be used outside functions to expand values:
~~~
......@@ -1103,7 +1120,7 @@ print(together)
~~~
{: .output}
#### Generators
### Generators
You are now fully empowered (pun intended) to scale up your Python analysis.
Perhaps you will be using `numpy` or `scikit-learn` to analyse images.
......@@ -1482,7 +1499,7 @@ OMG! I have received A MESSAGE!!!
## Useful standard library modules
#### Globbing patterns
### Globbing patterns
When working with files, it is also often useful to pattern match files based on their
filename, extension or a combination of both.
......
......@@ -73,6 +73,26 @@ It provides the kind of functionality -
data selection, filtering, aggregation, and (simple) visualisation -
required for modern, reproducible data analysis.
- [NumPy](#numpy)
- [What data to use with NumPy?](#what-data-to-use-with-numpy)
- [Reading data to a NumPy array](#reading-data-to-a-numpy-array)
- [The basic features of NumPy arrays](#the-basic-features-of-numpy-arrays)
- [Indexing arrays](#indexing-arrays)
- [Boolean indexing](#boolean-indexing)
- [The power of vectorisation](#the-power-of-vectorisation)
- [NumPy data types](#numpy-data-types)
- [pandas](#pandas)
- [Loading data](#loading-data)
- [Working with dataframes](#working-with-dataframes)
- [Selecting data](#selecting-data-in-a-dataframe)
- [Filtering data](#filtering-data-in-a-dataframe)
- [Accidentally omitted data](#accidentally-omitted-data)
- [Advanced-filtering](#advanced-filtering)
- [Combining dataframes](#combining-dataframes)
- [Working with datetime columns](#working-with-datetime-columns)
- [Groupby & split-apply-combine](#groupby--split-apply-combine)
- [Conclusion](#conclusion)
## NumPy
The central feature of the NumPy library, is an object known as the `ndarray`
......@@ -82,7 +102,7 @@ way. The `ndarray` is:
* Homogeneous = all elements must be of the same type e.g. all integers
* Vectorised = allows us to do fast operations on the whole array, without needing loops (we'll come back to this later!)
### What data to use with NumPy?
### What Data to Use with NumPy?
NumPy can be useful for working with all kinds of numeric data that fulfill the criteria above
(i.e. homogeneous and multidimensional). One of the most common applications though, is image data.
Images are essentially arrays of numbers that represent the brightness of each pixel. For example, take the simple
......@@ -98,7 +118,7 @@ The entire nuclei image may appear black when viewed in a web browser.
This is nothing to worry about.
(The code examples assume that you save these files in a folder called `data`.)
### Reading data to a NumPy array
### Reading Data to a NumPy Array
We'll use the popular image analysis package scikit-image,
to read two example images into NumPy arrays (if you want to learn more about image analysis with
tools like scikit-image - check out our existing [image analysis course] [image-analysis-course]).
......@@ -119,7 +139,7 @@ plt.imshow(nuclei)
The 'raw' image is an electron microscopy image of cells from the marine worm
*Platynereis dumerilii*. The 'nuclei' image is a segmentation of the nuclei of these same cells.
### The basic features of NumPy arrays
### The Basic Features of NumPy Arrays
Once we have our data in a NumPy array, we want to explore it a bit
e.g. how many dimensions does our array have, and of what size?
This is represented by the `shape` of an array, which denotes the length of the array in each dimension.
......@@ -171,7 +191,7 @@ print(np.std(raw))
> {: .solution }
{: .challenge }
### Indexing arrays
### Indexing Arrays
Now we have a general idea of what our array contains, we want to start manipulating particular regions
of it.
......@@ -208,7 +228,7 @@ array([[156, 173, 156, 161],
~~~
{: .output }
> ## 2.2. Subsetting a NumPy array
> ## 2.2. Subsetting a NumPy Array
>
> Crop the 'raw' image, by removing a border of 500 pixels on all sides.
>
......@@ -222,7 +242,7 @@ array([[156, 173, 156, 161],
> {: .solution }
{: .challenge }
### Boolean indexing
### Boolean Indexing
Sometimes we want to access certain parts of an array not based on position, but instead on some criterion.
e.g. selecting values that are above some threshold.
......@@ -271,7 +291,7 @@ print(raw[criteria])
~~~
{: .output }
> ## 2.3. Masking arrays
> ## 2.3. Masking Arrays
>
> The nuclei image contains a binary segmentation i.e.:
>
......@@ -296,7 +316,7 @@ print(raw[criteria])
> {: .solution }
{: .challenge }
### The power of vectorisation
### The Power of Vectorisation
One of the big advantages of NumPy is that operations are **vectorised**. This means that operations can be
applied to the whole array very quickly, without the need for loops. Many of the operations we've used
......@@ -330,7 +350,7 @@ np.exp(2)
~~~
{: .language-python }
### NumPy data types
### NumPy Data Types
As we touched on briefly earlier, each `ndarray` has a particular data type (`dtype`) assigned to it.
This defines what kind of values (and what range of values) can be placed in the array.
......@@ -360,7 +380,7 @@ bitsize allows you to store a wider range of values, but will take up more space
always a trade-off between the space it takes up in your computer's memory, and the size of the numbers you want to store.
Note that the size of the values stored in the array has little effect on the memory it takes up i.e. an array of small values but with a large bitsize will still take up a lot of memory.
> ## 2.4. Working with data types
> ## 2.4. Working with Data Types
>
> 1. Increase the brightness of the image by 100
> 2. Why does the result look so bizarre? What is going wrong here?
......@@ -419,7 +439,7 @@ to create our first `DataFrame` object.
The code examples assume that you save these files in a folder called `data`.)
~~~
import pandas as pd # this is how pandas is traditionally installed
import pandas as pd # this is how pandas is traditionally imported
covid_cases = pd.read_csv("{{page.cases_data}}")
~~~
{: .language-python }
......@@ -470,7 +490,7 @@ From this output, we can already get a feeling for the data we've loaded:
- the data in these columns include **integers** (day-of-month, number of cases, etc), **floating point numbers** (population of country in 2019), and **non-numeric data** (dates, continent and country names, etc)
- the rows seem to be **indexed numerically** starting from 0
- the dataframe includes data from **at least two countries** (Afghanistan and Zimbabwe) **and continents** (Asia and Africa).
- based on the appearance of Afghanistan at the top of the dataframe, and Zimbabwe at the bottom, we may assume that counries are ordered alphabetically though we aren't yet able to understand all the details of that ordering
- based on the appearance of Afghanistan at the top of the dataframe, and Zimbabwe at the bottom, we may assume that countries are ordered alphabetically though we aren't yet able to understand all the details of that ordering
> ## Jupyter 🧡 pandas
>
......@@ -492,8 +512,7 @@ From this output, we can already get a feeling for the data we've loaded:
### Working with Dataframes
All those `...` in the output from `print` above indicate that some lines
(in this case 25507!) were skipped to display
a truncated view of the dataframe.
were skipped to display a truncated view of the dataframe.
For such a large dataset, it's unhelpful to view the entire thing at once.
For convenience, `DataFrame` objects are equipped with
`head` and `tail` methods that allow us to view only the first or last
......
......@@ -35,6 +35,15 @@ In this section, we'll explore two options available in the standard library
for handling information provided by the user to our program
as part of the command line used to execute the script.
- [`sys.argv`](#simple-command-line-argument-access-with-sysargv)
- [`argparse`](#handling-options-and-arguments-with-argparse)
- [Positional arguments](#positional-arguments)
- [Options](#options)
- [Restricting input values](#restricting-input-values)
- [Default values](#default-values)
- [Capturing multiple values](#capturing-multiple-values)
- [Mutually exclusive arguments](#mutually-exclusive-arguments)
## Simple Command Line Argument Access with `sys.argv`
The `argv` object within the [`sys`][sys-module] module
......
......@@ -30,6 +30,19 @@ you write more powerful programs with Python.
Now we're going to spend a little time discussing how you can
**write better code** with Python.
- [Why is style important?](#why-is-style-important)
- [PEP 8](#pep-8)
- [Layout](#layout)
- [Whitespace](#whitespace)
- [Trailing commas](#trailing-commas)
- [Comments](#comments)
- [Code checkers](#code-checkers)
- [`pycodestyle`](#pycodestyle)
- [`pyflakes`](#pyflakes)
- [`pylint`](#pylint)
- [Documentation](#documentation)
- [Good Jupyter hygiene](#good-jupyter-hygiene)
## Why is Style Important?
At some stage,
......@@ -354,7 +367,7 @@ is a very good idea.
Useful comments should tell the reader _why_ the code is written this way,
and sometimes _what_ the code is doing.
For example, it might be useful to add a comment
explaining what that number 10 that appeared out of nowhere is.
explaining what that number 10 is that appeared out of nowhere.
The _how_ should usually be something you get by reading the code.
Comments should begin with a single space after the `#`.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment