# Generators over arrays
*Yield better results by iterating over generators*

July 21, 2021 — by Doeke Norg

---

I love ❤️ generators in PHP. They are like supercharged arrays that can preserve memory when used correctly. I've been
using `iterable` instead of `array` type-hinting ever since I learned about them.

## Generators are callback iterators

Generators are simple functions. But where a regular function will `return` a single value or even `void`, a generator
can return *multiple* results. The only thing you have to do to change a function into a generator is to
replace `return` with `yield` and call it.

A generator is an `iterable`, meaning you have to loop over them in order to retrieve the results. You can
simply `foreach` over a generator, and it will return every `yield` it encounters.

```php
function exampleGenerator() {
  yield 1;
  yield 2;
  yield 3;
}

$generator = exampleGenerator();
foreach ($generator as $value) {
  echo $value;
}
// will echo out: 123
```

Notice that we actually call the function to return the generator. In this example it's pretty obvious we need to do
that, but consider an anonymous function that is stored in the `$generator` variable. You might accidentally try to
iterate over that.

```php
$generator = function() {
  yield 1;
  yield 2;
  yield 3;
};

// Incorrect: $generator is now an uncalled function.
foreach ($generator as $value) // ...
  
// Correct: $generator() is now a `Generator` object.
foreach ($generator() as $value) // ...  
```

## Advantages of generators over arrays

While creating a function that yields 1,2,3 is very impressive; it's not really practical. So let's look at some reasons
why you might consider using generators.

### Generators are only executed when you start iterating

This might not seem like a big deal, but it actually is. Consider you have a `ChoiceField`-object that
has `array $options`, and you have to retrieve these options from a database. When the field is rendered, it obviously
needs to show those options. But when those options aren't rendered in that request, the database call will still be
performed to instantiate the field.

However, if you change `array $options` into `iterable $options` and provide the options via a generator, the database
call will only ever be executed when you `foreach` over those options.

```php
$options = function() {
  foreach (DB::query('retrieve the options') as $option) {
    yield $option;
  }
};

$field = new ChoiceField($options());
```

So calling the function will return the generator, but it will not execute until you start iterating.

> **Tip:** If you already have an iterable result set, like an `array` or any other `iterable`, you can use `yield from $resuls`. This will in essence `foreach` over all the results and `yield` every one of them.

```php
// Use `yield from` instead of looping the results.
$options = (function() {
    yield from DB::query('retrieve the options');
})(); // Notice we called the function directly to return the generator.

// Or shorthand
$options = (fn() => yield from DB::query('retrieve the options'))();
```

### Generators preserve memory

Besides not preforming any task without iterating, a generator only yields one result at a time, meaning it only has a
single reference in memory at all times.

```php
$options = (function() { 
    $results = DB::query('retrieve the options'); 
    foreach ($results as $result) {
        // This way there is only one `Option` in memory at all times.
        yield Option::createFromResult($result);
    }
    unset($results);
})();
```

In this example we retrieve a simple result set from a database query. Only when we `yield` the result, we build up the
`Option` model that represents that result. This saves a lot of memory

### Code can be executed after returning the results

You might have noticed that we casually called `unset($results)` after we returned the results. This is because
the generator will keep going until it no longer yields any results, unlike a `return` statement where the function will
end immediately. That's pretty awesome. This way you can even clean up some leftover memory consumption after your
generator finishes.

### Keys can be reused

When you `yield` a result, there is an implicit numeric 0-based key iterating the result. You can however yield both a
key *and* a value by adding the `=>` arrow.

```php
// Without keys.
function fruits() {
  yield 'apple';
  yield 'banana';
  yield 'peach';
}

foreach (fruits() as $key => $fruit) ... // Here key will be 0, 1, 2

// With keys.
function fruits() {
  yield 'zero' => 'apple';
  yield 'one' => 'banana';
  yield 'two' => 'peach';
  yield 'two' => 'lime';
}

foreach (fruits() as $key => $fruit) // Here $key will be 'zero',' one', 'two', 'two'
```

Noticed how we returned the same key twice? Unlike an array, this is no problem during the iteration. However, if you
were to change the generator back into an array, by using `iterator_to_array()` the key would be there only once,
holding the last result for that key.

## Things to consider when using generators over arrays

While generators behave very similar to arrays, they are not of the `array` type. This means you can run into these
caveats.

### Array functions will not work with generators

PHP's `array_` functions all require an actual array. So you cannot for example simply call `array_map()` with your
generator. To remedy this, you can use `iterator_to_array()` to turn your generator into an array. This will however
reintroduce the memory usage of arrays.

> **Tip:** You might use `iterator_apply` to preform a callback on the yielded result, but this is not recommended as
> this function does not return an iterator itself or any of the results. It only performs a callback for every
> iteration, but the callback doesn't receive the result. You have to provide the iterator as an argument, and you can
> then retrieve the `current()` iteration. It's not worth it.

### The `count` of a generator is not predefined

Since we can `yield` as many results as we want, and the generator only has one reference in memory at a time, it's not
possible to count the results without traversing them. To ease this process you can use `iterator_count()`. This will
loop over every result and return the actual count.

### A Generator instance can only be traversed once

When a generator finishes, it closes itself. Once this happens, you can't traverse it again. When you try to do so, you
will run into this exception: `Cannot traverse an already closed generator`.

A solution to this could be to call the generator function again. However, you should probably refactor your code to
prevent this.

**Note:** `iterator_count()` also closes the iterator, so you can't do a count and then loop. You should probably just
keep a record of the count while iterating.

## In conclusion

Obviously arrays have their time and place. I'd never use a generator to create a simple list. But whenever I'm working
with objects or entity models, I'd like to use them to limit the memory usage.

Learned anything new? Don't keep it to yourself, but share it on social media! And if you have any questions or remarks
let me know via [twitter](https://twitter.com/doekenorg).
