Commit 43e065fc authored by Sudeep Sahadevan's avatar Sudeep Sahadevan
Browse files

session 12 update

parent f19c5538
%% Cell type:markdown id:a9fc6c12 tags:
## Functional programming
![fp diagram](fp.png "Functional programming")
Image taken from: [So You Want to be a Functional Programmer (Part 1)](https://cscalfani.medium.com/so-you-want-to-be-a-functional-programmer-part-1-1f15e387e536)
>**Functional programming** a programming paradigm where programs are constructed by applying and composing functions
[Wikipedia](https://en.wikipedia.org/wiki/Functional_programming)
* **Declarative** $\rightarrow$ What result do I want ?
* **Imperative** $\rightarrow$ What steps do I take to solve the problem ? (Python)
#### Non comprehensive list of functional progamming concepts:
* Immutability
* Pure functions
* Higher order functions
#### Why ?
* Code becomes concise and simple to understand
* Supports lazy evaluation
* Easier debugging and testing
* Easier parallelism/concurrency
%% Cell type:markdown id:b8068921 tags:
### Immutability
Data cannot be changed once it is generated
%% Cell type:code id:4e30c928 tags:
``` python
num_list = [1,2,3,4,5,6]
num_tup = (1,2,3,4,5,6)
```
%% Cell type:code id:7528bc37 tags:
``` python
# first element in list
print('First element in list :',num_list[0])
num_list[0] = 0
print('Changed first element : ',num_list[0])
# first element in tuple
print('First element in tuple : ',num_tup[0])
num_tup[0] = 0
print('Changed first element : ',num_tup[0])
```
%%%% Output: stream
First element in list : 1
Changed first element : 0
First element in tuple : 1
%%%% Output: error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-a2a0e40ac0b7> in <module>
5 # first element in tuple
6 print('First element in tuple : ',num_tup[0])
----> 7 num_tup[0] = 0
8 print('Changed first element : ',num_tup[0])
TypeError: 'tuple' object does not support item assignment
%% Cell type:markdown id:d4b152b5 tags:
### Pure functions
* Given the same input always produce the same output
* Do not change value of inputs or any variable that exists outside the scope of the function
%% Cell type:code id:214bcf3c tags:
``` python
def add(x,y):
return x+y
counter = 0
def add_counter(x,y):
global counter
counter +=1
return x+y
p = 1
def power(x):
return p**x
```
%% Cell type:code id:b57bfc8b tags:
``` python
# sum --> pure
print('adding : ',add(18,23))
# sum_counter --> impure
print('adding counter (first call): ',add_counter(18,23),counter)
print('adding counter (second call): ',add_counter(18,23),counter)
# power --> impure
print('power (p = 1):',power(5))
p = 2
print('power (p = 2):',power(5))
```
%%%% Output: stream
adding : 41
adding counter (first call): 41 1
adding counter (second call): 41 2
power (p = 1): 1
power (p = 2): 32
%% Cell type:markdown id:a340ce34 tags:
### Higher order functions
Functions can either accept functions as input or can return functions as output.
Examples from [Higher Order Functions in Python](https://www.geeksforgeeks.org/higher-order-functions-in-python/)
#### Functions accepting functions as input
%% Cell type:code id:a15d07ad tags:
``` python
def shout(text):
return text.upper()
def whisper(text):
return text.lower()
def greet(func):
greeting = func('Hello World!')
print(func.__name__,' : ',greeting)
greet(shout)
greet(whisper)
```
%%%% Output: stream
shout : HELLO WORLD!
whisper : hello world!
%% Cell type:markdown id:21d47da1 tags:
#### Functions returning functions as output
**GTF** and **GFF** attribute column parser
%% Cell type:code id:46349b39 tags:
``` python
import re
def attrib_parser(fname):
# gff parser
def gff(attrib_str):
return re.findall(r'(\w+)\=([^;]+)', attrib_str , re.IGNORECASE)
# gtf parser
def gtf(attrib_str):
return re.findall(r'(\w+)\s{1,}"([^"]+)"', attrib_str ,re.IGNORECASE)
# return a parser based on file name extension
if re.match('^.*gff.*', fname, re.IGNORECASE): # gff formatted input
return gff
elif re.match('.*gtf.*', fname, re.IGNORECASE): # gtf formatted input
return gtf
else:
raise NotImplementedError('Cannot detect {} file format!'.format(fname))
```
%% Cell type:code id:da4efbfa tags:
``` python
# gff attribute column
attrib_gff = 'ID=ENSMUST00000193812.1;Parent=ENSMUSG00000102693.1;gene_id=ENSMUSG00000102693.1;transcript_id=ENSMUST00000193812.1;gene_type=TEC;gene_name=RP23-271O17.1;transcript_type=TEC;transcript_name=RP23-271O17.1-001;level=2;transcript_support_level=NA;tag=basic;havana_gene=OTTMUSG00000049935.1;havana_transcript=OTTMUST00000127109.1'
# parsing gff formatted file
file_name = 'some_file_name.gff'
parser = attrib_parser(file_name)
print('GFF formatted attributes')
for key_value in parser(attrib_gff):
print(key_value)
```
%%%% Output: stream
GFF formatted attributes
('ID', 'ENSMUST00000193812.1')
('Parent', 'ENSMUSG00000102693.1')
('gene_id', 'ENSMUSG00000102693.1')
('transcript_id', 'ENSMUST00000193812.1')
('gene_type', 'TEC')
('gene_name', 'RP23-271O17.1')
('transcript_type', 'TEC')
('transcript_name', 'RP23-271O17.1-001')
('level', '2')
('transcript_support_level', 'NA')
('tag', 'basic')
('havana_gene', 'OTTMUSG00000049935.1')
('havana_transcript', 'OTTMUST00000127109.1')
%% Cell type:code id:0aa49f71 tags:
``` python
# gtf attribute column
attrib_gtf = 'gene_id "ENSMUSG00000102693.1"; transcript_id "ENSMUST00000193812.1"; gene_type "TEC"; gene_name "RP23-271O17.1"; transcript_type "TEC"; transcript_name "RP23-271O17.1-001"; level 2; transcript_support_level "NA"; tag "basic"; havana_gene "OTTMUSG00000049935.1"; havana_transcript "OTTMUST00000127109.1";'
# parsing gtf formatted file
file_name = 'some_file_name.gtf'
parser = attrib_parser(file_name)
print('GTF formatted attributes ')
for key_value in parser(attrib_gtf):
print(key_value)
```
%%%% Output: stream
GTF formatted attributes
('gene_id', 'ENSMUSG00000102693.1')
('transcript_id', 'ENSMUST00000193812.1')
('gene_type', 'TEC')
('gene_name', 'RP23-271O17.1')
('transcript_type', 'TEC')
('transcript_name', 'RP23-271O17.1-001')
('transcript_support_level', 'NA')
('tag', 'basic')
('havana_gene', 'OTTMUSG00000049935.1')
('havana_transcript', 'OTTMUST00000127109.1')
%% Cell type:markdown id:f560944f tags:
### Built-in higher order functions: `map`, `filter` and `reduce`
### Built-in higher order functions: `map`, `filter`, `reduce` (and `lambda` too)
%% Cell type:code id:085a88c1 tags:
``` python
def square(x):
return x ** 2
def iseven(n):
return n % 2 == 0
def multiple3(n):
return n % 3 == 0
def text_filter(x):
return len(x) > 3
```
%% Cell type:code id:717f703d tags:
``` python
data = [1,2,3,4,5,6,7,8,9,10]
text = ['this', 'is', 'Some', 'TEXT']
```
%% Cell type:markdown id:77c11d58 tags:
#### Map
Broadcasting operations using `map` and `filter`
#### map
%% Cell type:code id:74da56c1 tags:
``` python
list(map(square, data))
```
%%%% Output: execute_result
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
%% Cell type:code id:f3f60f05 tags:
``` python
[square(x) for x in data]
```
%%%% Output: execute_result
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
%% Cell type:code id:e82d677a tags:
``` python
list(map(str.title, text))
```
%%%% Output: execute_result
['This', 'Is', 'Some', 'Text']
%% Cell type:markdown id:ec7e0b99 tags:
#### Filter
#### filter
%% Cell type:code id:a9a99457 tags:
``` python
list(filter(iseven, data))
```
%%%% Output: execute_result
[2, 4, 6, 8, 10]
%% Cell type:code id:8c8bf214 tags:
``` python
[ x for x in data if iseven(x) ]
```
%%%% Output: execute_result
[2, 4, 6, 8, 10]
%% Cell type:code id:6d35fb18 tags:
``` python
list(filter(text_filter, text))
```
%%%% Output: execute_result
['this', 'Some', 'TEXT']
%% Cell type:markdown id:a928fa23 tags:
#### lambda
`lambda` function is a single line function declared without a name, can have any number of arguments, but can have only one expression.
```python
>>> lambda [parameter_list] : expression
```
Rewriting `square` function using `lambda`
%% Cell type:code id:ebb6f7ef tags:
``` python
lambda x: x**2
```
%%%% Output: execute_result
<function __main__.<lambda>(x)>
%% Cell type:code id:5203c823 tags:
``` python
(lambda x: x**2)(2)
```
%%%% Output: execute_result
4
%% Cell type:markdown id:283f8587 tags:
##### `map` and `filter` using `lambda`
%% Cell type:code id:0cc74352 tags:
``` python
list(map(lambda x: x**2, data))
```
%%%% Output: execute_result
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
%% Cell type:code id:2360f9ad tags:
``` python
list(filter(lambda x: len(x) >3, text))
```
%%%% Output: execute_result
['this', 'Some', 'TEXT']
%% Cell type:markdown id:429d14f6 tags:
#### Reduce
#### reduce
Upto Python 2, `reduce()` was a built-in function, but in Python 3 it is demoted to `functools`
> I received an email from a compatriot lamenting the planned demise of `reduce()` and `lambda` in **Python 3000**. After a few exchanges I think even he agreed that they can go. Here's a summary, including my reasons for dropping `lambda`, `map()` and `filter()`. I expect tons of disagreement in the feedback, all from ex-Lisp-or-Scheme folks. :-)
[Guido van Rossum](https://www.artima.com/weblogs/viewpost.jsp?thread=98196), _March 12, 2005_
%% Cell type:code id:fd96efc0 tags:
``` python
from functools import reduce
```
%% Cell type:code id:f3e79aed tags:
``` python
reduce(add,[47, 11, 42, 13])
```
%%%% Output: execute_result
113
%% Cell type:markdown id:894e8c0e tags:
![reduce diagram](reduce_diagram.png "Reduce")
Image taken from: [Reducing a list](https://www.python-course.eu/lambda.php)
### Additional materials
* [Python functional programming how to](https://docs.python.org/3/howto/functional.html)
* [Joel Grus: Learning Data Science Using Functional Python](https://www.youtube.com/watch?v=ThS4juptJjQ)
%% Cell type:markdown id:53a406e0 tags:
## Toolz - functional Python
A set of utility functions for iterators, functions and dictionaries, and contains several functions (originally from `itertoolz` and `functoolz` libraries) that are absent in the standard [`itertools`](https://docs.python.org/3/library/itertools.html) and [`functools`](https://docs.python.org/3/library/functools.html) libraries.
Toolz functions have the following properties:
[Toolz](https://github.com/pytoolz/toolz) functions have the following properties:
* composability: interoperable due to use of core data structures
* functional purity: do not rely on external state or change input
* laziness: lazy evaluation
```bash
$ pip install toolz
```
%% Cell type:code id:cd348c7b tags:
``` python
import itertools
```
%% Cell type:markdown id:8b07be5b tags:
### A sample set of `itertoolz` functions
%% Cell type:code id:e252174a tags:
``` python
from toolz.itertoolz import take, drop, take_nth
import itertools
```
%% Cell type:markdown id:062c6ce7 tags:
#### Operations on `n` elements of a sequence
wrapper for `itertools.islice`
take first `n` elements
%% Cell type:code id:80222b27 tags:
``` python
take3 = list(take(3,[1,2,3,4,5,6,7]))
print('take : ',take3)
# similar to itertools.islice(iterable,start)
# itertools.islice(iterable,start)
islice3 = list(itertools.islice([1,2,3,4,5,6,7],3))
print('islice:',islice3)
```
%%%% Output: stream
take : [1, 2, 3]
islice: [1, 2, 3]
%% Cell type:markdown id:0528bf37 tags:
drop first `n` elements
%% Cell type:code id:f352e1c3 tags:
``` python
drop3 = list(drop(3,[1,2,3,4,5,6,7]))
print('drop : ', drop3)
# similar to itertools.islice(iterable,start,None)
# itertools.islice(iterable,start,None)
islice_drop_3 = list(itertools.islice([1,2,3,4,5,6,7],3,None))
print('islice:', islice_drop_3)
```
%%%% Output: stream
drop : [4, 5, 6, 7]
islice: [4, 5, 6, 7]
%% Cell type:markdown id:1cba4f65 tags:
take every `n`th element
%% Cell type:code id:ef9c95fc tags:
``` python
ntake = list(take_nth(3,[1,2,3,4,5,6,7]))
print('take_nth:', ntake)
# similar to itertools.islice(iterable,0,None,step)
# itertools.islice(iterable,0,None,step)
itertools_ntake = list(itertools.islice([1,2,3,4,5,6,7],0,None,3))
print('islice :', itertools_ntake)
```
%%%% Output: stream
take_nth: [1, 4, 7]
islice : [1, 4, 7]
%% Cell type:markdown id:f6932baa tags:
### Toolz complement
Convert a function into its logical compelement
%% Cell type:code id:0aa9e0f7 tags:
``` python
from toolz.functoolz import complement
```
%% Cell type:code id:ef5df16a tags:
``` python
isodd = complement(iseven)
list(filter(isodd,data))
```
%%%% Output: execute_result
[1, 3, 5, 7, 9]
%% Cell type:markdown id:5d3c647c tags:
### Itertoolz groupby
Group a collection by a key (function)
%% Cell type:code id:f814f0b5 tags:
``` python
from toolz.itertoolz import groupby
```
%% Cell type:code id:005781ab tags:
``` python
list(filter(iseven, data))
```
%%%% Output: execute_result
[2, 4, 6, 8, 10]
%% Cell type:markdown id:a4077325 tags:
**toolz groupby**
%% Cell type:code id:4baefdd1 tags:
``` python
print(groupby(iseven,data))
```