more intro for Counter & defaultdict

parent 17098cdf
......@@ -1435,7 +1435,7 @@ OMG! I have received A MESSAGE!!!
> However, consider the situation where `y` is `0` and a `ZeroDivisionError` exception happens.
> If unhandled, the `with` will ensure the file is `closed()` but you will be left with a half-written (or corrupted) file.
>
> To avoid this situation your task is to create a better version of `open()` that we will call `safe_write()`.
> To avoid this situation your task is to create a better version of `open()` that we will call `safe_write()`.
> `safe_write()` should do the same as `open()` in `wt` mode,
> but in addition should delete the file if an error occurs.
>
......@@ -1479,6 +1479,8 @@ OMG! I have received A MESSAGE!!!
> {: .solution}
{: .challenge}
## Useful standard library modules
#### Globbing patterns
When working with files, it is also often useful to pattern match files based on their
......@@ -1501,12 +1503,89 @@ for filename in iglob("*.csv"):
{: .language-python}
which creates a file `new_<filename>.csv` for every `<filename>.csv` file in the current directory.
#### Useful standard library modules
### Convenient collections
Another module in the standard library, [`collections`][collections-module],
contains a number of specialised objects designed to handle common tasks when
working with collections of values.
It provides [_ordered dictionaries_][collections-ordered-dict],
which remember the order in which items were added
(less useful [since Python version 3.6][py-36-dictionaries]),
[_named_tuples_][collections-named-tuple],
which allow lookup of named attribute values (e.g. `beehive.queen`)
without the hassle of [defining a whole new class][classes],
and [`deque`s][collections-deque],
which are extremely powerful when working with containers of a pre-defined size
whose members are expected to frequently change and rotate positions.
Here, we'll focus on two more classes from `collections` that we use most often:
`Counter` and `defaultdict`.
`Counter` provides an efficient way to count occurances of values
within a collection.
Once created, a `Counter` object can be treated similarly to a dictionary:
~~~
from collections import Counter
nucleotide_frequencies = Counter('ACGUGUCGAACUAACGCC')
print(nucleotide_frequencies['C'])
long_string = """
This is the tale of a tiny snail, and a great big, grey blue, humpback whale. This is a rock, as black as soot, and this is a snail with an itchy foot.
The sea snail slithered all over the rock, and gazed at the sea and the ships in the dock.
"""
word_counts = Counter(long_string.replace(',','').replace(',','').lower().split())
print(word_counts['this'])
~~~
{: .language-python }
~~~
6
3
~~~
{: .output }
`defaultdict` can save you some time if you know in advance the kind of data
you expect to collect as values in a dictionary.
When iteratively populating a native `dict` dictionary -
sometimes adding to/adjusting entries already present in the dictionary,
sometimes creating new entries -
it is necessary to separately specify what should happen when
a key is being used for the first time.
`defaultdict` allows us to define a function that will be used to intialise the
defualt value when a new key is used
to access the `defaultdict` object for the first time:
- [`os.*`][os-module] and [`sys`][sys-module] - functions to work with the file and operating system
- [`collections.*`][collections-module] - a collection of alternative containers with different proteins
- [`itertools.*`][itertools-module] - a collection of functions that implement efficient algorithms on top of iterators/generators for good resource management
- [`functools.*`][functools-module] - a collection of functions that take as inputs other functions.
~~~
from collections import defaultdict
input_data = """human eyes
canary wings
human teeth
canary beak
canary eyes
platypus beak"""
features = default_dict(set)
for line in input_data.split('\n'):
organism, feature = line.split()
features[organism].add(feature)
print(features[human])
~~~
{: .language-python }
~~~
{'eyes', 'teeth'}
~~~
{: .output }
> ## More useful standard modules
>
> - [`os.*`][os-module] and [`sys`][sys-module] - functions to work with the file and operating system
> - [`itertools.*`][itertools-module] - a collection of functions that implement efficient algorithms on top of iterators/generators for good resource management
> - [`functools.*`][functools-module] - a collection of functions that take as inputs other functions.
{: .callout }
> ## 1.12. Maintaining order
>
......@@ -1641,15 +1720,15 @@ Furthermore, and as you'll see in the next chapter, once you go beyond the stand
the Python ecosystem is brimming with useful libraries for all kinds of purposes.
> ## Additional syntax & latest features
>
>
> There are some more elements of Python syntax that we haven't covered here, which we briefly describe below. Follow the links for recommended resources to learn more about each one.
>
>
> - `_`, `__` - single, double underscore - usually as prefix to variable names, used to represent private or internal variables
> - [`if __name__ == "__main__":`][if-name-main] - present at the bottom of modules - specifies code that should run when the script executed but not when it's `import`ed.
> - [`class`es][classes] - an extremely powerful construct - you have probably already used them without realizing it.
>
>
> The features below were added in the most recent major release of Python (at time of writing), version 3.8:
>
>
> - [`yield from`][generator-delegate] - syntax to delegate to sub-generators
> - [`typing` module][typing-module] - type annotations / hints - see also [mypy][mypy]
> - [`:=`][walrus] - *walrus* operator
......
......@@ -21,7 +21,10 @@
[click]: https://click.palletsprojects.com/en/7.x/
[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html
[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
[collections-deque]: https://docs.python.org/2/library/collections.html#collections.deque
[collections-module]: https://docs.python.org/3/library/collections.html
[collections-named-tuple]: https://docs.python.org/2/library/collections.html#collections.namedtuple
[collections-ordered-dict]: https://docs.python.org/2/library/collections.html#collections.OrderedDict
[concept-maps]: https://carpentries.github.io/instructor-training/05-memory/
[contextlib]: https://docs.python.org/3/library/contextlib.html
[contrib-covenant]: https://contributor-covenant.org/
......@@ -102,6 +105,7 @@
[pep-8-recommendations]: https://www.python.org/dev/peps/pep-0008/#programming-recommendations
[pep-8-whitespace]: https://www.python.org/dev/peps/pep-0008/#whitespace-in-expressions-and-statements
[power-operator]: https://www.educative.io/edpresso/power-operator-in-python
[py-36-dictionaries]: https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6
[py-rse-coc]: https://merely-useful.github.io/py-rse/teams.html#teams-coc
[py-rse-config]: https://merely-useful.github.io/py-rse/configuration.html
[py-rse-style]: https://merely-useful.github.io/py-rse/py-rse-style.html
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment