"Let's say we need a lot of integers for later iteration, solving this with lists will reserve all the space needed in RAM immediately:\n",
"\n",
"(*An integer is 64bit (=8 byte) in Python, hence one million integers are around 8MB..*)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"N = int(1e6)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pointless_list = [i for i in range(N)]\n",
"print(f'we have {sys.getsizeof(pointless_list) / 1e6}MB')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Akin to list comprehensions, generator expressions can be constructed *inline*:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pointless_generator = (i for i in range(N)) # note the round brackets '()' instead of '[]'\n",
"print(f'we have {sys.getsizeof(pointless_generator) / 1e6}MB')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generators have a tiny Memory Footprint\n",
"\n",
"Note the huuuge size difference (and try different values of `N`)!\n",
"\n",
"*(Note that the actual memory consumption on your machine can still be quite different as to what `sys.getsizeof` reports, yet it will never be smaller.. please don't crash your browser)*\n",
"\n",
"Let's iterate over both objects and sum up all the values:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(pointless_list) == sum(pointless_generator)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generators are lazy\n",
"So both objects obviously encode the same set of integers, but where does this huge difference in RAM consumption come from? Generators **merely instantiate the recipe** how to generate its elements, there is nothing evaluated/exectuted yet. This is often called **lazy execution**. Maybe best to show this by using a function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def expensive_routine(num):\n",
" print(f'Heavy RAM/CPU usage {num}')\n",
" return num"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"list_of_results = [expensive_routine(i) for i in range(5)]\n",
"print(list_of_results)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"generator_of_results = (expensive_routine(i) for i in range(5))\n",
"print(generator_of_results)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see no output after constructing the generator, indicating that indeed nothing was yet exectuted! To actually trigger the execution of the function we need to iterate over the generator:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for result in generator_of_results:\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Syntactic Overkill\n",
"When supplied as function arguments, we can leave out the enclosing `()`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sum(x**2 for x in range(3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is how the joblib example of the EPUG session 5 actually worked syntactically:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def my_job_processor(n_jobs):\n",
" # not sure how they pass the generator around to the workers explicitly..\n",
" def queue(jobs):\n",
" for result in jobs:\n",
" print(result)\n",
" return queue\n",
"\n",
"my_job_processor(n_jobs=3)(expensive_routine(i) for i in range(3))\n",
"# compare to:\n",
"# Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generators are exhaustable!\n",
"As the elements encoded in the generator are produced one-by-one, there is no way to *go back*:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"square_gen = (x**2 for x in range(5))\n",
"for num in square_gen:\n",
" print(num)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for num in square_gen:\n",
" print(num)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After iterating over the generator once it is *exhausted*, meaning no more elements can be produced from it! Note that iterating over an exhausted generator will produce no errors (just no output), more on this in the following section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generators are not indexible\n",
"\n",
"This follows from the exhaustability property:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_gen = (i for i in range(10))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_gen[3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This works, but will exhaust a part of the generator, so use with utmost care (or even an anti-pattern?):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"7 in my_gen"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for num in my_gen:\n",
" print(num)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mini Excursus: Iterators and Iterables\n",
"\n",
"These are actually fundamental building blocks of the Python programming language, they are everywhere yet somewhat hidden *under the hood*. \n",
"\n",
"Formally an **iterator object** must have two methods: `__iter__()` and `__next__()`, the first one returns the iterator instance itself and the latter one yields the next element during iteration .\n",
"\n",
"An **iterable** is an object which supports the iterator protocol, meaning we can construct an iterator from it using `my_iterator = iter(iterable)` and grab the next element using `next(my_iterator)`. This works on all *container-like* objects (lists, tuples, strings, dictionaries,...) which have an `__iter__()` method to construct the iterator. Let's try it out:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"some_string = 'ABC'\n",
"# let's get the iterator from this iterable:\n",
"my_iter = iter(some_string)\n",
"# now we can manually iterate over it using next()\n",
"print( next(my_iter) )\n",
"print( next(my_iter) )\n",
"print( next(my_iter) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When we iterate too far, the `StopIteration` exception is raised:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print( next(my_iter) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### for loops\n",
"This is how `for loops` actually work: they 1st construct the iterator from the object to be iterated over (the iterable) via `iter()`, then call `next()` until the `StopIteration` exception is silently raised and caught. Of course you can also supply an iterator directly to a for loop, as `iter()` then just returns that very iterator:"
"# but we can also first construct an iterator \n",
"my_dict_iter = iter(some_dict)\n",
"for key in my_dict_iter:\n",
" print(key, some_dict[key])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that iterators also can be exhausted, using the same iterator again will produce nothing as we silently run into the `StopIteration` exception:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key in my_dict_iter:\n",
" print(key, some_dict[key])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yet iterating over the same dictionary again of course works because a new iterator is constructed *under the hood* by the for loop:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key in some_dict: # here a new iterator is created\n",
" print(key, some_dict[key])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's see how this works for files:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open('foo.bar', 'w') as Output:\n",
" Output.write('Line 1\\n')\n",
" Output.write('Line 2\\n')\n",
" \n",
"with open('foo.bar', 'r') as Input:\n",
" file_iter = iter(Input)\n",
" print( next(file_iter) )\n",
" print( next(file_iter) )\n",
" print( next(file_iter) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is of course the same as doing:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open('foo.bar', 'r') as Input:\n",
" for line in Input: # here the file iterator gets created\n",
" print(line)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, the `StopIteration` exception was silently cought by the for loop. This is why it's super elegant and efficient to loop over a file *line-by-line* as the iterator will only read one line at a time into memory. Calling `Input.readlines()` will put all the file contents into your RAM at once!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Maybe you have guessed it by now: **Generators are Iterators!** Iterators are more general as every class which implements `_iter_()` and `_next_()` is an iterator, whereas generators are a bit like syntactic sugar by implementing these methods for us. We go into more details here in the next section.\n",
"\n",
"So the take home message maybe here is:\n",
"\n",
"**You never iterate over iterables (the 'data') directly, there's always first an iterator constructed yielding element by element during iteration. This allows to abstract the 'data' from the iteration process, making things like lazy execution possible**. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building Generators\n",
"\n",
"Besides writing inline expressions using `()` we can be more explicit and expressive with Pythons `yield` statement to build *generator functions* returning generators: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fibonacci_numbers(nums):\n",
" x, y = 0, 1\n",
" for _ in range(nums):\n",
" x, y = y, x+y\n",
" yield y\n",
"\n",
"# get the first 10 fibonacci numbers:\n",
"fib10 = fibonacci_numbers(10)\n",
"for number in fib10:\n",
" print(number)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"They `yield` statement pauses the internal for loop and **yields** the results element-by-element. Note that the internal variables keep their respective values in between iteration steps: the state gets remembered!\n",
"\n",
"Is what is returned also really an iterator though? Let's find out:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fib5 = fibonacci_numbers(5) # the old one was exhausted anyways..\n",
"next(fib5) # and we can call next() on it -> it's an iterator allright :)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So our generator fullfills all required iterator properties, as opposed to say a simple list:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"some_list = [1,2,3]\n",
"iter(some_list) == some_list # a list is an iterable, not an iterator"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Just for completeness, it's possible to mix `yield` and `return` statements in a generator function to have more control about when to raise the `StopIteration` exception:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fibonacci_numbers(smaller_than):\n",
" x, y = 0, 1\n",
" while True:\n",
" x, y = y, x+y\n",
" \n",
" if y > smaller_than:\n",
" return # this magically raises the correct exception of the iterator protocol\n",
" \n",
" yield y\n",
" \n",
"\n",
"\n",
"list(fibonacci_numbers(smaller_than=1000)) # exhausts the generator by iterating till the end"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Infinite Stream Processing\n",
"\n",
"Generators can be used to process theoretically infinite amounts of data, due to their *lazyness*. With a slight modification we can get all Fibonacci numbers (at least one by one):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def all_fibonacci_numbers():\n",
" x, y = 0, 1\n",
" while True:\n",
" x, y = y, x+y\n",
" yield y\n",
" \n",
" \n",
"all_fibs = all_fibonacci_numbers()\n",
"# get the first 25 fibonacci numbers:\n",
"for _ in range(25):\n",
" print( next(all_fibs) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This generator will never raise the `StopIteration` exception, which does not violate the iterator definition. However, a direct for loop will never terminate! As the state is remembered, we can just ask for the 26th fibonacci number:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"next(all_fibs) # and so onto infinity.."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pipelining Generators\n",
"\n",
"Generators can be chained together, this allows for seamless stream processing. Let's say we want to find the first 10 Fibonacci numbers which are divisable by a certain, yet variable, number. We can't know in advance how many Fibonacci numbers we would have to generate for each candidate, but we can still chain it with another generator to process this potentially infinite stream: "
"So with this we can find all the fibonacci numbers divisable by our candidate **without needing to know beforehand** how many we have to scan for, and hence potentially saving a lot of resources:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all_fibs = all_fibonacci_numbers() # the new generator yielding potentially all Fibonacci numbers\n",
"div_by = find_divisables(all_fibs, divisor=23) # nothing was executed yet.."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"next(div_by)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"- Generators are a subclass of the ubiquitous and very *pythonic* Iterators\n",
"- Can be created by either inline expressions `()` or with generator functions sporting the `yield` statement\n",
"- They allow for on-demand aka *lazy* execution ↔ only load into RAM what you really need at the moment\n",
"- Infinite stream processing capabilities\n",
"- Allows for much clearer and more readable code as compared to overly throwing around `while` and `break` and so on.."
Let's say we need a lot of integers for later iteration, solving this with lists will reserve all the space needed in RAM immediately:
(*An integer is 64bit (=8 byte) in Python, hence one million integers are around 8MB..*)
%% Cell type:code id: tags:
``` python
importsys
N=int(1e6)
```
%% Cell type:code id: tags:
``` python
pointless_list=[iforiinrange(N)]
print(f'we have {sys.getsizeof(pointless_list)/1e6}MB')
```
%% Cell type:markdown id: tags:
Akin to list comprehensions, generator expressions can be constructed *inline*:
%% Cell type:code id: tags:
``` python
pointless_generator=(iforiinrange(N))# note the round brackets '()' instead of '[]'
print(f'we have {sys.getsizeof(pointless_generator)/1e6}MB')
```
%% Cell type:markdown id: tags:
### Generators have a tiny Memory Footprint
Note the huuuge size difference (and try different values of `N`)!
*(Note that the actual memory consumption on your machine can still be quite different as to what `sys.getsizeof` reports, yet it will never be smaller.. please don't crash your browser)*
Let's iterate over both objects and sum up all the values:
%% Cell type:code id: tags:
``` python
sum(pointless_list)==sum(pointless_generator)
```
%% Cell type:markdown id: tags:
### Generators are lazy
So both objects obviously encode the same set of integers, but where does this huge difference in RAM consumption come from? Generators **merely instantiate the recipe** how to generate its elements, there is nothing evaluated/exectuted yet. This is often called **lazy execution**. Maybe best to show this by using a function:
We see no output after constructing the generator, indicating that indeed nothing was yet exectuted! To actually trigger the execution of the function we need to iterate over the generator:
%% Cell type:code id: tags:
``` python
forresultingenerator_of_results:
print(result)
```
%% Cell type:markdown id: tags:
### Syntactic Overkill
When supplied as function arguments, we can leave out the enclosing `()`:
%% Cell type:code id: tags:
``` python
sum(x**2forxinrange(3))
```
%% Cell type:markdown id: tags:
This is how the joblib example of the EPUG session 5 actually worked syntactically:
%% Cell type:code id: tags:
``` python
defmy_job_processor(n_jobs):
# not sure how they pass the generator around to the workers explicitly..
# Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
```
%% Cell type:markdown id: tags:
### Generators are exhaustable!
As the elements encoded in the generator are produced one-by-one, there is no way to *go back*:
%% Cell type:code id: tags:
``` python
square_gen=(x**2forxinrange(5))
fornuminsquare_gen:
print(num)
```
%% Cell type:code id: tags:
``` python
fornuminsquare_gen:
print(num)
```
%% Cell type:markdown id: tags:
After iterating over the generator once it is *exhausted*, meaning no more elements can be produced from it! Note that iterating over an exhausted generator will produce no errors (just no output), more on this in the following section.
%% Cell type:markdown id: tags:
### Generators are not indexible
This follows from the exhaustability property:
%% Cell type:code id: tags:
``` python
my_gen=(iforiinrange(10))
```
%% Cell type:code id: tags:
``` python
my_gen[3]
```
%% Cell type:markdown id: tags:
This works, but will exhaust a part of the generator, so use with utmost care (or even an anti-pattern?):
%% Cell type:code id: tags:
``` python
7inmy_gen
```
%% Cell type:code id: tags:
``` python
fornuminmy_gen:
print(num)
```
%% Cell type:markdown id: tags:
## Mini Excursus: Iterators and Iterables
These are actually fundamental building blocks of the Python programming language, they are everywhere yet somewhat hidden *under the hood*.
Formally an **iterator object** must have two methods: `__iter__()` and `__next__()`, the first one returns the iterator instance itself and the latter one yields the next element during iteration .
An **iterable** is an object which supports the iterator protocol, meaning we can construct an iterator from it using `my_iterator = iter(iterable)` and grab the next element using `next(my_iterator)`. This works on all *container-like* objects (lists, tuples, strings, dictionaries,...) which have an `__iter__()` method to construct the iterator. Let's try it out:
%% Cell type:code id: tags:
``` python
some_string='ABC'
# let's get the iterator from this iterable:
my_iter=iter(some_string)
# now we can manually iterate over it using next()
print(next(my_iter))
print(next(my_iter))
print(next(my_iter))
```
%% Cell type:markdown id: tags:
When we iterate too far, the `StopIteration` exception is raised:
%% Cell type:code id: tags:
``` python
print(next(my_iter))
```
%% Cell type:markdown id: tags:
### for loops
This is how `for loops` actually work: they 1st construct the iterator from the object to be iterated over (the iterable) via `iter()`, then call `next()` until the `StopIteration` exception is silently raised and caught. Of course you can also supply an iterator directly to a for loop, as `iter()` then just returns that very iterator:
%% Cell type:code id: tags:
``` python
some_dict={'key-1':'value-1','key-2':'value-2'}
# we can loop directly over a dictionary
forkeyinsome_dict:
print(key,some_dict[key])
```
%% Cell type:code id: tags:
``` python
# but we can also first construct an iterator
my_dict_iter=iter(some_dict)
forkeyinmy_dict_iter:
print(key,some_dict[key])
```
%% Cell type:markdown id: tags:
Note that iterators also can be exhausted, using the same iterator again will produce nothing as we silently run into the `StopIteration` exception:
%% Cell type:code id: tags:
``` python
forkeyinmy_dict_iter:
print(key,some_dict[key])
```
%% Cell type:markdown id: tags:
Yet iterating over the same dictionary again of course works because a new iterator is constructed *under the hood* by the for loop:
%% Cell type:code id: tags:
``` python
forkeyinsome_dict:# here a new iterator is created
print(key,some_dict[key])
```
%% Cell type:markdown id: tags:
Let's see how this works for files:
%% Cell type:code id: tags:
``` python
withopen('foo.bar','w')asOutput:
Output.write('Line 1\n')
Output.write('Line 2\n')
withopen('foo.bar','r')asInput:
file_iter=iter(Input)
print(next(file_iter))
print(next(file_iter))
print(next(file_iter))
```
%% Cell type:markdown id: tags:
This is of course the same as doing:
%% Cell type:code id: tags:
``` python
withopen('foo.bar','r')asInput:
forlineinInput:# here the file iterator gets created
print(line)
```
%% Cell type:markdown id: tags:
Again, the `StopIteration` exception was silently cought by the for loop. This is why it's super elegant and efficient to loop over a file *line-by-line* as the iterator will only read one line at a time into memory. Calling `Input.readlines()` will put all the file contents into your RAM at once!
%% Cell type:markdown id: tags:
Maybe you have guessed it by now: **Generators are Iterators!** Iterators are more general as every class which implements `_iter_()` and `_next_()` is an iterator, whereas generators are a bit like syntactic sugar by implementing these methods for us. We go into more details here in the next section.
So the take home message maybe here is:
**You never iterate over iterables (the 'data') directly, there's always first an iterator constructed yielding element by element during iteration. This allows to abstract the 'data' from the iteration process, making things like lazy execution possible**.
%% Cell type:markdown id: tags:
## Building Generators
Besides writing inline expressions using `()` we can be more explicit and expressive with Pythons `yield` statement to build *generator functions* returning generators:
%% Cell type:code id: tags:
``` python
deffibonacci_numbers(nums):
x,y=0,1
for_inrange(nums):
x,y=y,x+y
yieldy
# get the first 10 fibonacci numbers:
fib10=fibonacci_numbers(10)
fornumberinfib10:
print(number)
```
%% Cell type:markdown id: tags:
They `yield` statement pauses the internal for loop and **yields** the results element-by-element. Note that the internal variables keep their respective values in between iteration steps: the state gets remembered!
Is what is returned also really an iterator though? Let's find out:
%% Cell type:code id: tags:
``` python
fib5=fibonacci_numbers(5)# the old one was exhausted anyways..
iter(fib5)==fib5# so yeah, iter() returns our generator itself!
```
%% Cell type:code id: tags:
``` python
next(fib5)# and we can call next() on it -> it's an iterator allright :)
```
%% Cell type:markdown id: tags:
So our generator fullfills all required iterator properties, as opposed to say a simple list:
%% Cell type:code id: tags:
``` python
some_list=[1,2,3]
iter(some_list)==some_list# a list is an iterable, not an iterator
```
%% Cell type:markdown id: tags:
Just for completeness, it's possible to mix `yield` and `return` statements in a generator function to have more control about when to raise the `StopIteration` exception:
%% Cell type:code id: tags:
``` python
deffibonacci_numbers(smaller_than):
x,y=0,1
whileTrue:
x,y=y,x+y
ify>smaller_than:
return# this magically raises the correct exception of the iterator protocol
yieldy
list(fibonacci_numbers(smaller_than=1000))# exhausts the generator by iterating till the end
```
%% Cell type:markdown id: tags:
### Infinite Stream Processing
Generators can be used to process theoretically infinite amounts of data, due to their *lazyness*. With a slight modification we can get all Fibonacci numbers (at least one by one):
%% Cell type:code id: tags:
``` python
defall_fibonacci_numbers():
x,y=0,1
whileTrue:
x,y=y,x+y
yieldy
all_fibs=all_fibonacci_numbers()
# get the first 25 fibonacci numbers:
for_inrange(25):
print(next(all_fibs))
```
%% Cell type:markdown id: tags:
This generator will never raise the `StopIteration` exception, which does not violate the iterator definition. However, a direct for loop will never terminate! As the state is remembered, we can just ask for the 26th fibonacci number:
%% Cell type:code id: tags:
``` python
next(all_fibs)# and so onto infinity..
```
%% Cell type:markdown id: tags:
### Pipelining Generators
Generators can be chained together, this allows for seamless stream processing. Let's say we want to find the first 10 Fibonacci numbers which are divisable by a certain, yet variable, number. We can't know in advance how many Fibonacci numbers we would have to generate for each candidate, but we can still chain it with another generator to process this potentially infinite stream:
%% Cell type:code id: tags:
``` python
deffind_divisables(numbers_to_check,divisor):
checked=0# to keep track..
fornuminnumbers_to_check:
checked+=1
ifnum%divisor==0:
print(f'Checked {checked} numbers..')
yieldnum
# only ever gets printed with finite input..
print(f'Checked all {len(numbers_to_check)} numbers!')
So with this we can find all the fibonacci numbers divisable by our candidate **without needing to know beforehand** how many we have to scan for, and hence potentially saving a lot of resources:
%% Cell type:code id: tags:
``` python
all_fibs=all_fibonacci_numbers()# the new generator yielding potentially all Fibonacci numbers
div_by=find_divisables(all_fibs,divisor=23)# nothing was executed yet..
```
%% Cell type:code id: tags:
``` python
next(div_by)
```
%% Cell type:markdown id: tags:
## Summary
- Generators are a subclass of the ubiquitous and very *pythonic* Iterators
- Can be created by either inline expressions `()` or with generator functions sporting the `yield` statement
- They allow for on-demand aka *lazy* execution ↔ only load into RAM what you really need at the moment
- Infinite stream processing capabilities
- Allows for much clearer and more readable code as compared to overly throwing around `while` and `break` and so on..