Blog

August 27, 2014

Akka Streams at Elder Research: Q&A with Simeon Fitch

At Typesafe, we’re incredibly excited about the work being done around the Reactive Streams and Akka Streams projects. The Akka team is addressing major challenges we’ve been seeing from developers working with streaming data, and they’re making major headway resolving back pressure issues.  

It’s always nice to hear about real-life implementations of these projects, so we’d like to share a recent interview with Simeon Fitch, Director of Software Architecture at Elder Research, Inc. Simeon used Akka Streams to handle I/O bottlenecks in his text processing engine. In our Q&A, Simeon outlines how he ported over his existing actor system to akka-streams, hurdles he encountered along the way, and, best of all, results!

--

Typesafe: Tell us a little bit about yourself and what you work on at Elder Research.

Simeon: I’ve been a professional software engineer for a little over 20 years. I started my career in the mid-90’s writing satellite communication systems in C++ (on Sun Solaris). I was sufficiently conditioned by those experiences with C++ to immediately appreciate the productivity potential of the Java platform, when it was announced in 1995. With the exception of a few side projects in Python, Ruby and C++, I’ve been on the JVM ever since.

Fast forward 19 years and I’m now at Elder Research, Inc. (ERI)—a data mining and predictive analytics consulting firm in Charlottesville, VA—I work with our software engineering and data science teams on software architectures for predictive model development and deployment. While we have some areas of specialization, generally-speaking, ERI has a very broad client-base, with each engagement having disparate needs and expectations. We work very hard at meeting our clients where they stand, partnering with them to find the shortest path to ROI in their analytics investment. Having a software development platform that supports those goals, with flexibility and stability, is a critical component of our strategy.

Typesafe: What is the role of Scala at ERI?

Simeon: Due to the number of long-standing systems we continue to enhance, we’re still predominantly a Java shop. However, I’ve been aggressively evangelizing Scala since taking the “Functional Programming Principles in Scala”[1] course in the Spring 2013. Our commit logs show building momentum, and all the projects I personally initiate are in Scala. I’m finding Scala adoption is enabling us to better address clients’ challenges with modern methods and domain-focused levels of abstraction, while continuing to benefit from the flexibility and robustness of the JVM platform (not to mention the vast number of existing libraries). There is indeed a learning curve for someone without an FP background, but my personal time investment in Scala has enabled me to write some of the most elegant and innovative software of my whole career. (ERI feeding me interesting problems deserves credit too!)

Typesafe: Can you tell us more about your personal path to Scala adoption?

Simeon: Although my past has been primarily as C++ and Java developer, I’ve maintained awareness that languages have a lifecycle, and longevity concerns must be considered in order to remain professionally viable. Every few years I turn on the technology radar and evaluate the platform/language landscape. When lambdas didn’t make it into Java 7, I started one of those periodic platform surveys, as a hedge. I believed the fundamentals behind the JVM remained highly pertinent, but I needed higher levels of expressiveness in my code and less boilerplate.

Having some experience with Python helped me hone my evaluation criteria. I’ve been using Python on and off since the pre-1.0 days (early 90’s), for a variety of one-off utilities, test scripts, or small monitoring UIs in Tk or Qt. Python is a fantastic language with a fertile ecosystem, and has significant promise in data science. However, when I compare  my experiences in developing software systems of an “interesting” size, I see that one’s descriptive vocabulary is significantly hampered when a language lacks static types. With that in mind, I made static typing a primary criterion, along side cross-platform support.

I checked out several languages, including C# (insofar as the cross-platform Mono CLR supported it) and even considered going back to C++ via the excellent Qt framework. I finally worked my way around to Scala, and after overcoming some initial intimidation invoked by the sbt operators (thank you <++=!), I took the Coursera class and had my faith in the JVM’s future restored.

Typesafe: What were the events that lead up to you choosing to use Akka Streams?

Simeon: Just as my early professional experience with C++ enabled me to appreciate the Java platform, it’s been through writing my first actor system with basic Akka actors that’s allowed me to appreciate the productivity potential of Akka Streams. Don’t get me wrong: Akka actors and futures are a significant productivity gain for over vanilla Java concurrency! It’s just that those productivity gains quickly shifted the focus to new problems that were closer to the heart of my application. A good thing in my book!

Specifically speaking, one of my projects is a semistructured and unstructured text processing engine that makes use of visual and linguistic semantics for identifying data targets. We went from a version with significant I/O bottlenecks to one—first using core Akka—that could saturate all of my CPU cores. That is, it would until it ran out of memory!

Developers seasoned in concurrent systems will recognize this scenario: the back-pressure problem. Up until that point I’d never really had to worry about it; I/O bottlenecks or threading resources had been the focus, and the in-core processing had been artificially held at bay. Akka gave me the compute performance needed, but shed light on a new set of problems—good ones!—I now had to contend with.

To do so I read several articles on Akka load balancing and back-pressure management[2]. Fortunately the timing was such that the Scala Days 2014 presentations had just been posted, and I watched the one by Kuhn and Klang on Akka Streams[3]. After a sigh of relief (I wasn’t relishing the idea of writing the load-balancing infrastructure myself), I gave Akka Streams a test. Within a day I hacked a proof-of-concept together that eliminated the back-pressure problem. I was sold, and with enthusiasm I spent the next week or so ripping core components out of my actors and into the Transformer[T] API, or as vanilla functions for use in the Duct[T] combinators.

Typesafe: Tell us about your experience porting your existing actor system to akka-streams. Were there any significant hurdles or surprises along the way?

Simeon: Keep in mind that I started with akka-streams-experimental, version 0.4, and my expectations for instant gratification were nil. I knew what I was getting into, including having to dig into the source code if I had to. However, after that first day of prototyping, I knew that whatever hurdles I ran into by using this “experimental” library—written by industry luminaries—were going to be overshadowed by significant productivity gains.

I’m using Spray.io’s client library[4] for remote data access, which does a great job of abstracting the Akka-based asynchronous network I/O complexity. One hurdle was figuring out how to properly map the Future[HttpResponse] returned from Spray.io, and map it into an already materialized Flow[HttpResponse]. Help from the Akka users’ group was instrumental in getting a partial solution, and I understand there’s ongoing work in that area by the Akka team to make it even easier to handle.

Another hurdle was figuring out the semantics of some of the more advanced Duct[T], particularly the ones that work with other materialized endpoints (eg: I still can’t grok Duct[T].splitWhen). That said, for an experimental library, the documentation is impressive. In my opinion, Akka ranks at the top with ScalaTest in having some of the best documentation in the whole Scala ecosystem. Therefore, I’m confident that use cases and semantics for each of the Akka Streams combinators will be refined as it approaches production.

Typesafe: What benefits have you seen so far?

Simeon: First, back-pressure is now under control!

More importantly, I can finally reason about the behavior of the system. Before Akka Streams, I had resorted to embedding PlantUML[5] state diagrams in my ScalaDoc to keep track of my architecture, something I knew indicated a serious maintenance liability issue. Now with Akka Streams I’m working at much higher-level of abstraction; one that’s highly composable, and where the intent of an algorithm is more clear to the reader. As an additional benefit, my code base dropped in size significantly; I estimate it to be a factor of 3. That’s code I don’t have to maintain!

Benefits so far: fewer lines of code that are more readable and reusable; an implementation that is better performing and deterministic in behavior. How much more benefit could one ask of a single library!

Typesafe: What other initiatives would you like to see from Typesafe and the Akka team?

Simeon: For the Akka team overall, I say keep doing what you’re already doing. From the outside it looks like the dream-team that should keep following their intuition on moving the platform forward. I’m especially impressed with the organization of the Reactive Streams initiative, the way that it aims to play nicely with other streaming APIs.

The collaboration with Spray.io team on Akka HTTP also deserves highlight (both sides). I hope that collaborative effort is successful, and serves as an example to other Scala infrastructure teams in how to create a whole that’s greater than the parts.

--

Simeon, thanks so much for sharing your experience with us and keep us in the loop with your Akka Streams project!

comments powered by Disqus
Browse Recent Blog Posts