This post was updated in April 2018.

A couple of days ago I was working on some Ruby code when I found myself wishing I had Python’s yield keyword. (If you’re a user of both languages, you may be aware that yield is quite different in Ruby and Python, though not unrelated.) This led me to try to characterize the relationship between the two yields.

In a nutshell, Ruby’s yield is used when a function operates over a block, and is used to pass values into the block. Python’s yield actually yields control (and a value) to the caller while saving the state to be resumed the next time we enter the function. This is why iterators written with yield in Ruby are sometimes called internal iterators, while Python’s iterators are external iterators.

It is interesting to consider the following question: how can we approximate Ruby-style internal iterators in Python, and vice versa? Furthermore, can we write functions in each language to take a “natural” iterator and convert it to its equivalent approximated version?

Ruby Iterators

In Ruby, yield is used to pass function arguments into a block. It is normally used in concert with the block syntax for anonymous functions.

For example, Arrays have an each method which performs computations sequentially on each element:

[1,2,3].each { |x| puts x } # prints 1, 2, and 3 on separate lines

This could be implemented with yield as follows:

class Array
  def each2
    i = 0
    while i < self.size
      yield self[i]
      i += 1
    end
  end
end

In this example, yield passes in the elements of the array as arguments to the block one at a time. If we wanted to implement something like this yield, we would need an implicit anonymous function as some kind of argument to the iterator. Then yield would translate to a call to that function. Consider the following low-sugar translation:

class Array
  def each2(&block)
    i = 0
    while i < self.size
      block.call(self[i])
      i += 1
    end
  end
end

Python Iterators

Python’s yield is also a core part of idiomatic Python iteration, because it defines a generator. A generator exhibits behavior such as the following:

def my_generator():
  yield 1
  yield 2
  yield 3

g = my_generator()
print(next(g)) # prints "1"
print(next(g)) # prints "2"
print(next(g)) # prints "3"
print(next(g)) # throws StopIteration

As this example shows, yield suspends the function and returns to the caller with the value it was passed. The next time we enter the generator, execution resumes where it left off. This behavior is more complex than the internal iteration in Ruby. Suspending the execution state will make this more difficult to emulate.

(I’m ignoring the fact that yield is an expression in Python and, as such, allows the implementation of arbitrary coroutines. See David Beazley’s A Curious Course on Coroutines and Concurrency for a look at the sorts of things you can do with coroutines in Python.)

Keep in mind that generators are normally used with Python’s for statement and other looping constructs:

g = my_generator()
for value in g:
  print(value) # prints "1", "2", and "3" on separate lines

Bridging the Gap

Now, how would we do the translation between these types of iterators? As a disclaimer, I should warn you that the rest of the code in the post is Bad Form™: Ruby idioms are best used in Ruby and Python-isms are best used in Python.

Let’s start with the easier case: translating an (outer) Python iterator (a generator) into a Ruby-style iterator. To formalize our goal for this approximation, let’s say that we want a function internalize which takes a generator g and returns a function—the internal iterator—which accepts another function f as an argument and calls f with each value generated by g. Here’s an example usage:

def simple_generator():
  arr = ["a", "b", "c"]
  for letter in arr:
    yield letter

external = simple_generator()
internal = internalize(external)
internal(print) # Prints "a", "b", and "c" on separate lines

(I’m using Python 3 here.)

To write internalize, we need a function which calls the input function with all of its generated values:

def internalize(g):
  def internal(f):
    for v in g:
      f(v)
  return internal

Unfortunately, Python’s lambda expressions are a weak form of anonymous functions and aren’t as general as Ruby blocks, so this result won’t be very usable.

How do we go the other way? This time, let’s try to write a Ruby function externalize which takes a regular iterator method and creates a function which sequentially produces the iterator’s values as we call it. We could use this function as follows:

internal = ["a", "b", "c"].method(:each)
external = externalize(internal)

p external.call # "a"
p external.call # "b"
p external.call # "c"
p external.call # throws StopIteration

The trick to implementing this is how we save the state in the middle of iterating such that the next call to the outer function will jump back to that state. We can save the state by storing a continuation before we return, so this will be a job for callcc. In fact, my solution is somewhat convoluted and involves a pair of continuations:

def externalize(iter)
  callcc do |@continuation1|
    return Proc.new do
      callcc { |@continuation2| @continuation1.call }
    end
  end

  iter.call do |v|
    callcc { |@continuation1| @continuation2.call(v) }
  end

  throw "StopIteration"
end

If you’ve never used it before, callcc creates a continuation, passes it to its body, and returns the value with which the continuation is called. Hence, each entry into the Proc is bounced into the current level of iteration of iter. Each iteration causes the value to be returned through both continuations and out of the Proc, and also updates both continuations such that the next call to the Proc resumes at the same place. Try tracing the execution—it’s fairly tricky. I’d be interested to know if there’s an easier way of accomplishing this.

Improvements

Both of these examples are just rough demonstrations of the concepts, but we can make them a bit nicer. The iterators should be able to support processing multiple values at once, so our translations should preserve that feature. The Python case is simple: add a star for array expansion. Also, our conversion function is begging to be turned into a decorator:

def internalize(generator_fn):
  g = generator_fn()
  def internal(f):
    for v in g:
      f(*v)
  return internal

@internalize
def internal():
  arr = [("a", "z"), ("b", "y"), ("c", "x")]
  for letter in arr:
    yield letter

# Prints "a z", "b y", and "c x" on separate lines
internal(lambda v1, v2: print("%s %s" % (v1, v2)))

In Ruby, it’s not quite as pretty: we can do similar array expansion when we call the iterator so that we can at least transform iterators with multiple parameters, but then the result will yield single-element arrays if there is only one block parameter. Ruby folks seem to enjoy monkey-patching, so let’s add the transformation directly to the Method class.

class Method
  def externalize
    callcc do |@continuation1|
      return Proc.new do
        callcc { |@continuation2| @continuation1.call }
      end
    end

    self.call do |v|
      callcc { |@continuation1| @continuation2.call(*v) }
    end

    throw "StopIteration"
  end
end

h = { "a" => "z", "b" => "y", "c" => "x" }
external = h.method(:each).externalize

p external.call # ["a", "z"]
p external.call # ["b", "y"]
p external.call # ["c", "x"]
p external.call # throws StopIteration

Although internal iterators aren’t all that useful in Python, there are legitimate reasons to use generators in Ruby and there is an implementation for creating generators in the Ruby Generator class.

Since I wrote this post in 2010, external iteration has gained first-class support in Ruby with the addition of Enumerators. These form the basis of modern Ruby’s lazy iteration features.