This post was updated in April 2018.
A couple of days ago I was working on some Ruby code when I found myself wishing
I had Python’s yield keyword. (If you’re a user of both languages, you may be
aware that yield is quite different in Ruby and Python, though not unrelated.)
This led me to try to characterize the relationship between the two yields.
In a nutshell, Ruby’s yield is used when a function operates over a block, and
is used to pass values into the block. Python’s yield actually yields control
(and a value) to the caller while saving the state to be resumed the next time
we enter the function. This is why iterators written with yield in Ruby are
sometimes called internal iterators, while Python’s iterators are external
iterators.
It is interesting to consider the following question: how can we approximate
Ruby-style internal iterators in Python, and vice versa? Furthermore, can we
write functions in each language to take a “natural” iterator and convert it to
its equivalent approximated version?
Ruby Iterators
In Ruby, yield is used to pass function arguments into a block. It is normally
used in concert with the block syntax for anonymous functions.
For example, Arrays have an each method which performs computations
sequentially on each element:
[1,2,3].each { |x| puts x } # prints 1, 2, and 3 on separate lines
This could be implemented with yield as follows:
class Array
  def each2
    i = 0
    while i < self.size
      yield self[i]
      i += 1
    end
  end
end
In this example, yield passes in the elements of the array as arguments to the
block one at a time. If we wanted to implement something like this yield, we
would need an implicit anonymous function as some kind of argument to the
iterator. Then yield would translate to a call to that function. Consider the
following low-sugar translation:
class Array
  def each2(&block)
    i = 0
    while i < self.size
      block.call(self[i])
      i += 1
    end
  end
end
Python Iterators
Python’s yield is also a core part of idiomatic Python iteration, because it
defines a generator. A generator exhibits behavior such as the following:
def my_generator():
  yield 1
  yield 2
  yield 3
g = my_generator()
print(next(g)) # prints "1"
print(next(g)) # prints "2"
print(next(g)) # prints "3"
print(next(g)) # throws StopIteration
As this example shows, yield suspends the function and returns to the caller
with the value it was passed. The next time we enter the generator, execution
resumes where it left off. This behavior is more complex than the internal
iteration in Ruby. Suspending the execution state will make this more difficult
to emulate.
(I’m ignoring the fact that yield is an expression in Python and, as such,
allows the implementation of arbitrary coroutines. See David Beazley’s A
Curious Course on Coroutines and Concurrency
for a look at the sorts of things you can do with coroutines in Python.)
Keep in mind that generators are normally used with Python’s for statement and
other looping constructs:
g = my_generator()
for value in g:
  print(value) # prints "1", "2", and "3" on separate lines
Bridging the Gap
Now, how would we do the translation between these types of iterators? As a
disclaimer, I should warn you that the rest of the code in the post is Bad
Form™: Ruby idioms are best used in Ruby and Python-isms are best used in
Python.
Let’s start with the easier case: translating an (outer) Python iterator (a
generator) into a Ruby-style iterator. To formalize our goal for this
approximation, let’s say that we want a function internalize which takes a
generator g and returns a function—the internal iterator—which accepts
another function f as an argument and calls f with each value generated by
g. Here’s an example usage:
def simple_generator():
  arr = ["a", "b", "c"]
  for letter in arr:
    yield letter
external = simple_generator()
internal = internalize(external)
internal(print) # Prints "a", "b", and "c" on separate lines
(I’m using Python 3 here.)
To write internalize, we need a function which calls the input function with
all of its generated values:
def internalize(g):
  def internal(f):
    for v in g:
      f(v)
  return internal
Unfortunately, Python’s lambda expressions are a weak form of anonymous
functions and aren’t as general as Ruby blocks, so this result won’t be very
usable.
How do we go the other way? This time, let’s try to write a Ruby function
externalize which takes a regular iterator method and creates a function
which sequentially produces the iterator’s values as we call it. We could use
this function as follows:
internal = ["a", "b", "c"].method(:each)
external = externalize(internal)
p external.call # "a"
p external.call # "b"
p external.call # "c"
p external.call # throws StopIteration
The trick to implementing this is how we save the state in the middle of
iterating such that the next call to the outer function will jump back to that
state. We can save the state by storing a
continuation before we return, so
this will be a job for callcc. In fact, my solution is somewhat convoluted and
involves a pair of continuations:
def externalize(iter)
  callcc do |@continuation1|
    return Proc.new do
      callcc { |@continuation2| @continuation1.call }
    end
  end
  iter.call do |v|
    callcc { |@continuation1| @continuation2.call(v) }
  end
  throw "StopIteration"
end
If you’ve never used it before, callcc creates a continuation, passes it to
its body, and returns the value with which the continuation is called. Hence,
each entry into the Proc is bounced into the current level of iteration of
iter. Each iteration causes the value to be returned through both
continuations and out of the Proc, and also updates both continuations such
that the next call to the Proc resumes at the same place. Try tracing the
execution—it’s fairly tricky. I’d be interested to know if there’s an easier
way of accomplishing this.
Improvements
Both of these examples are just rough demonstrations of the concepts, but we can
make them a bit nicer. The iterators should be able to support processing
multiple values at once, so our translations should preserve that feature. The
Python case is simple: add a star for array expansion. Also, our conversion
function is begging to be turned into a decorator:
def internalize(generator_fn):
  g = generator_fn()
  def internal(f):
    for v in g:
      f(*v)
  return internal
@internalize
def internal():
  arr = [("a", "z"), ("b", "y"), ("c", "x")]
  for letter in arr:
    yield letter
# Prints "a z", "b y", and "c x" on separate lines
internal(lambda v1, v2: print("%s %s" % (v1, v2)))
In Ruby, it’s not quite as pretty: we can do similar array expansion when we
call the iterator so that we can at least transform iterators with multiple
parameters, but then the result will yield single-element arrays if there is
only one block parameter. Ruby folks seem to enjoy monkey-patching, so let’s add
the transformation directly to the Method class.
class Method
  def externalize
    callcc do |@continuation1|
      return Proc.new do
        callcc { |@continuation2| @continuation1.call }
      end
    end
    self.call do |v|
      callcc { |@continuation1| @continuation2.call(*v) }
    end
    throw "StopIteration"
  end
end
h = { "a" => "z", "b" => "y", "c" => "x" }
external = h.method(:each).externalize
p external.call # ["a", "z"]
p external.call # ["b", "y"]
p external.call # ["c", "x"]
p external.call # throws StopIteration
Although internal iterators aren’t all that useful in Python, there are
legitimate reasons to use generators in Ruby and there is an implementation for
creating generators in the Ruby
Generator
class.
Since I wrote this post in 2010, external iteration has gained first-class
support in Ruby with the addition of
Enumerators. These form the
basis of modern Ruby’s lazy iteration features.