Typesafe Activator

Play Iteratees

Play Iteratees

Alejandro Pedraza
Source
June 5, 2014
scala play iteratees scaladays2014

How to use Play Iteratees to build a custom body parser, using as an example an mp3 file metadata parser.

How to get "Play Iteratees" on your computer

There are several ways to get this template.

Option 1: Choose play-iteratees in the Typesafe Activator UI.

Already have Typesafe Activator (get it here)? Launch the UI then search for play-iteratees in the list of templates.

Option 2: Download the play-iteratees project as a zip archive

If you haven't installed Activator, you can get the code by downloading the template bundle for play-iteratees.

  1. Download the Template Bundle for "Play Iteratees"
  2. Extract the downloaded zip file to your system
  3. The bundle includes a small bootstrap script that can start Activator. To start Typesafe Activator's UI:

    In your File Explorer, navigate into the directory that the template was extracted to, right-click on the file named "activator.bat", then select "Open", and if prompted with a warning, click to continue:

    Or from a command line:

     C:\Users\typesafe\play-iteratees> activator ui 
    This will start Typesafe Activator and open this template in your browser.

Option 3: Create a play-iteratees project from the command line

If you have Typesafe Activator, use its command line mode to create a new project from this template. Type activator new PROJECTNAME play-iteratees on the command line.

Option 4: View the template source

The creator of this template maintains it at https://github.com/alpeb/play-iteratees#master.

Option 5: Preview the tutorial below

We've included the text of this template's tutorial below, but it may work better if you view it inside Activator on your computer. Activator tutorials are often designed to be interactive.

Preview the tutorial

Reactive Stream Processing

Throughout this tutorial we will explore how to use Iteratees, a nifty abstraction provided in the Play framework to handle streams reactively.

According to the manual:

"Progressive Stream Processing and manipulation is an important task in modern Web Programming, starting from chunked upload/download to Live Data Streams consumption, creation, composition and publishing through different technologies including Comet and WebSockets."

Inside the Play framework, Iteratees are used among other things to build body parsers, which consume the raw incoming data and transform it appropriately so that it is delivered to an Action, that receives it aptly wrapped in a Request. The Request's body will be typed according to the transformed data and can contain for example json, xml, multipart form data or a URL-encoded form.

You can build your own custom body parsers to pre-process incoming data before delivering it to an Action, and that's exactly what we're gonna do here.

An Efficient MP3 Metadata Parser

This example will allow you to upload an MP3 file and see as a result its metadata (title, author and album).

This information is encoded inside the file following the ID3 specification. In this example we'll take care of the ID3v2.2 and ID3v2.3 versions of the spec, which are the most common.

This metadata (also referred as "tag") is located at the beginning of the file. Our reactive approach through Iteratees fits particularly well this scenario, since we can consume just enough bytes to build the metadata and discard the rest of the file, stopping the file upload in its tracks.
So even with very large files we can show the metadata very quickly without having to wait for the full file to be uploaded and stored in memory, unlike with traditional non-reactive web frameworks.

Iteratee, Enumeratee, Enumerator

This app declares just one controller in Application.scala.
The index Action just displays the initial form. The upload Action processes the file upload using the mp3MetadataParser custom body parser defined above.

Building a custom body parser requires using the BodyParser object, passing a function that returns an instance of:

      Iteratee[Array[Byte], Either[Result, T]]
    
We'll explain Iteratees more in detail in the next section, but for the moment understand that these types imply the Iteratee will consume an array of bytes as its incoming data stream (the file upload coming from the browser) and it will produce either a Result instance if something goes wrong (usually a BadRequest object) or an instance of type T with the outcome of the computation (in our case the MP3 metadata) that is used inside the Action using this body parser.

All the Iteratees we use in this tutorial are typed Iteratee[Byte, A] (where A varies according to each case). Note the consumed stream is of type Byte instead of Array[Byte], because for this example it was easier to reason one byte at a time instead of having to deal with the buffering and intermediate state handling that the Array[Byte] approach would imply.

So how can we adapt a stream of Array[Byte] to a stream of Byte? This is exactly what Enumeratee[From, To] is for. When used with an Iteratee (the proper term is to "transform", with an alias operator &>>), it will transform the stream of type From into a stream of type To.
We need to build an Enumeratee[Array[Byte], Byte], as shown in the toBytes val, with mapInputFlatten. This Enumeratee method allows us to declare how to massage the stream chunks that are expressed as Input subclasses (more about that later) to produce the desired output to be consumed by an Iteratee. This method requires that we return new instances of Enumerator. An Enumerator is simply a way to encapsulate data to be consumed by Iteratees.
The gist of our transformation happens when we call:

      Enumerator[Byte](arr: _*)
    
Here the array of bytes received from the stream is fed as a list of arguments to the Enumerator's apply method, thus "flattening" the array and producing the desired stream of just bytes.

In Application.scala:19 we see how our toBytes Enumeratee transforms the Mp3File.tagParser Iteratee which is our main workhorse taking care of building the MP3 metadata and that we'll see in detail in a following section. For it to be more general, this Iteratee returns an instance of Iteratee[Byte, Metadata]. That explains why we need to call map(Right(_)) on it, to tranform its product into the required Either[Result, A].
Also notice there is no error handling here in the controller. We'll see Mp3File.tagParser handles errors by producing an Error Iteratee. When the Play framework runtime runs into that, it throws a RuntimException that the user would see in their browser through the usual Play error template. There are ways to handle this more appropriately and have the Iteratee produce a Left instance that we can react to in the controller, but for the sake of keeping this example focused we've left the exception alone.

Low Level Iteratees

Iteratee is a state machine that processes data by going through intermediary states. The scaladoc explains it clearly:

"At a high level, an Iteratee is just a function that takes a piece of input and returns either a final result or a new function that takes another piece of input. To represent this, an Iteratee can be in one of three states (see the Step trait): Done, which means it contains a result and potentially some unconsumed part of the stream; Cont, which means it contains a function to be invoked to generate a new Iteratee from the next piece of input; Error, which means it contains an error message and potentially some unconsumed part of the stream. One would expect to transform an Iteratee through the Cont state N times, eventually arriving at either the Done or Error state."

HelperIteratees.scala contains low level Iteratees for either moving forward along the stream without producing anything (forwardAfter) or producing a string or an array of bytes given some conditions (getUntil and take).
These Iteratees are built using the Cont object which creates an Iteratee in the "cont" state. Inside, we only need to describe what happens when input is received.
There are three types of input: Input.Empty, Input.EOF and Input.El. For each of these cases we must return a new Iteratee that will handle the following step.

  • Input.Empty means the stream didn’t provide any data, so we usually want to carry on, by recursively calling the same Iteratee.
  • Input.EOF means the stream reached its end, so we return a new Iteratee in the "done" state, with any data that was built in the process (Unit in the case of forwardAfter).
  • Input.El is for when we receive data, so we do whatever we need with it, and either carry on with the next step or finish by returning a "done" state, depending on the logic.

Composing Iteratees

We can build higher level Iteratees by composing smaller ones, as illustrated in Mp3File.scala.

Iteratee follows a monadic structure, which means we can build a non-blocking pipeline tying together different Iteratees, each one specializing in smaller chunks of the stream. As a first approach we could call Iteratee.flatMap repeatedly to chain each Iteratee. Fortunately, Scala provides us with the ability to alternatively express monadic computations through for-comprehensions which results in very clean code, as shown in tagParser, ID3v2_2framesParser and ID3v2_3framesParser. Each of the steps (a.k.a. generators) inside these comprehensions follow this pattern: to the right of the "<-" we must have an Iteratee[E, A] and to the left of the "<-" the value produced by that Iteratee (a variable of type A).

The genrators in these methods show how the ID3 spec is expressed using Iteratees in different ways:

  • Just advancing the stream, discarding its data, like in the forwardAfter calls, where the variable is left as "_"
  • Generating a value through a simple call to an Iteratee helper like getUntil, and eventually performing some transformation on the produced value through a map call, like when calling take(3) map bytes2PlainString
  • Not consuming data from the stream and just use the values produced in previous steps to perform some logic and advance to the next step just by returning a Done[Byte, Unit] Iteratee
  • Delegating a step of the parsing to a more specialized sub-parser. For example tagParser takes care of parsing the ID3 tag header, and it delegates the frames parsing to the appropriate subparser depending on the ID3 version found in the header.

Other Uses

We've seen how to use Iteratees to build custom body parsers that will hopefully help you manage the complexity that usually arises when following particular data format specs.

Also the approach followed here is a good example of how the reactive paradigm allows for better performance and smart use of resources. As mentioned above, a traditional approach to data parsing would imply unncessarily loading the entire file into memory, which implies a higher response time and a waste of RAM.

We encourage you to explore other usages of Iteratees besides data parsing. As of version 2.3 the Play framework allows handling WebSockets through actors, which is a good abstraction for handling discrete messages. However for handling streams you still can use Iteratees.

comments powered by Disqus