Blog

August 18, 2014

What Happened to my Travis CI?

@dickwall
August 18, 2014
scala
sbt

The Issues

A couple weeks ago, while conducting our Scala training course, one of the students who was having some trouble cleaned up his ivy cache and then ran sbt again to test the exercises.

sbt failed, with a message that the Scala 2.10.2 jars could not be downloaded. Since we face these kind of things a lot during training, I did the usual network checks and everything seemed to be working, we tried again: no dice. Eventually we got it working (I think we grabbed the jars from another machine in the end - either way we got a quick workaround solution) and I paid it no more mind, not until later that day anyway.

It seems we were not alone, nor did we have nearly the same level of trouble that many other people in the Scala community saw. Open source projects using Travis CI for their builds were particularly affected, in fact disabled.

Since there still seems to be a fair amount of confusion surrounding this outage, I hope that a brief summary of the facts will clear up some of that confusion, and also open a discussion about improving messaging so we might be able to prevent or reduce the impact of such an outage again.

This is not a blame or excuse blog either - I have spent the last couple of days trying to gather as much info as I can and while there are a few mistakes scattered around, I don’t think any party acted incorrectly, certainly not with intent to do so. Here are the facts I have pieced together, you be the judge.

For those in a hurry to get back to coding, a short summary of “just the facts” that have come to light as causing this outage (in order of priority to fix, and likely timeframe in order to do so) are:

  1. sbt still downloads Scala 2.10.2 artifacts even though they are not required for building.
    Solution/workaround: in project/plugins.sbt add a dependency:
    libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value
    (see https://github.com/sbt/sbt/issues/1439 for more details)

  2. Travis CI starting from a clean VM image for each build means that all dependencies must be downloaded before the build can proceed.
    Solution/workaround: download tarball with initial ivy state from a hosting solution, e.g. Dropbox. Alternatively, install and configure a repository proxy solution like Nexus or Artifactory in front of Maven Central. 

  3. Many VoidsWrath (a Minecraft mod) applications already out in the world download Scala 2.10.2 artifacts on each run, and retry if the download fails, creating massive demand on the Sonatype distribution servers. This will be a longer term fix.

For those interested, here are more details and background on each of these items:

Around a year ago, Sonatype contacted people on the sbt project asking if anything had changed in the way dependencies were being resolved and requested. They had noticed a large spike in demand for the Scala 2.10.2 jars from a Java 1.6 user agent - so high that it accounted for the majority of download traffic from Maven Central!

Some investigation turned up that a community application called Minecraft Forge had a plugin for writing mods in Scala (and why wouldn’t you?) but that it used a fairly simplistic solution to obtaining the Scala support necessary and downloaded the jars every time it was started. It shouldn’t be a surprise to anyone that the minecraft modding community dwarfs the Scala community right now (and probably most other developer communities too) - indeed the download numbers from maven central would tend to indicate that.

The application was found when Sonatype turned off downloads to requests from the Java 1.6 user agent and messages started popping up that Minecraft Forge wouldn’t start any more. The Minecraft Forge community was responsive and helpful, and a fix was implemented based on cooperation between them and Sonatype.

What no one at the time knew was that another project, Voids Wrath, based on Minecraft Forge, forked the code causing the original download problem and continued to use it. Since the Scala downloads did not work for that project, someone on the project found a workaround of supplying a different user agent for the Scala downloads, which of course allowed the jars to be downloaded just fine. This, in addition to more subtle tactics retrieve the scala 2.10.2 compiler and library jars, led to Central redirecting such requests to a slower CDN provider.

Fast forward about 6 months.  Sonatype was working on adding additional delivery features, and a misconfiguration happened which did not take this redirect into account: instead of redirecting requests, clients received 404 responses for Scala 2.10.2 library and compiler jars.

It was a simple configuration mistake - if you haven’t made one of those I applaud you, but I have certainly made my share. There was no intent by any party to deliberately stop serving Scala 2.10.2 jars.

It took a couple of days from the initial observation of the interruption to finding, coordinating and fixing the problem with the configuration, at which time the Scala 2.10.2 jar files were once again available, albeit from a now slightly slower CDN used by Maven Central to offload some of the demand. The slower downloads are continuing to cause some timeout issues in Travis CI builds even now.

The problem with Travis CI is that it starts with a fresh VM image for every build, which means that the ivy cache is lost and has to be downloaded anew for each build. Due to an inefficiency in ivy, even though sbt now uses Scala 2.10.4, the old 2.10.2 version is still downloaded by default for each build.

This is a statement of the facts that I have been able to dig up so far, and the current status is that while Scala 2.10.2 is once again available from maven central, and your sbt builds should work, you may see that the downloads are slower and this can still cause problems with Travis CI builds.

What We Can Do Right Now

Now let’s look at what we can do from here, with the priority on short term solutions and then a discussion of longer term better approaches that we will continue to work towards. Referring back to the list of 3 issues from the top of this post.

Tackling #1 (sbt still downloads Scala 2.10.2 artifacts even though it doesn’t use them): there is a relatively simple workaround that anyone using Travis CI to build their project can implement right now.

Thanks to Paul Phillips (@extempore2) and Sam Halliday (@fommil) for the initial approach, and Eugene Yokota (@eed3si9n) for the refinement, adding the following line to your project/plugins.sbt (or equivalent to your sbt .scala files) will prevent the download of Scala 2.10.2 artifacts during the sbt build:

libraryDependencies += "org.scala-lang" % "scala-compiler" % scalaVersion.value

For more background on this fix, check github sbt issue 1439.

This first fix, while effective, is a triage measure more than a permanent solution. It specifically targets the need to download Scala 2.10.2 artifacts, but the larger issue of having to download everything in your ivy cache for each build will not be addressed by this workaround. For that we need to tackle issue #2 and that will take longer.

Tackling #2 (Travis CI starts with a fresh VM image and requires downloading the entire ivy cache on each build): requires finding a way of avoiding the download of all of the ivy artifacts for each build in Travis CI. This will, if you can implement it right now, decrease your Travis build times and increase the reliability of builds, but there are some risks associated with the easy solutions you can reach for at present. We are continuing to consider solutions to this problem, many of them targeting the empty ivy artifacts.

There are already Travis plugins available for downloading and unpacking tarballs from a suitable download location, for example a Dropbox public directory. Any free hosting solution you find is likely to have download bandwidth limitations, for Dropbox, the advertised limits are 10 GB per day, for example, and if you exceed this, your Travis CI build is going to start failing again.

For smaller Scala projects without too many daily builds, this is limit is likely fine, but for larger projects with high numbers of daily builds, you are likely to exceed the limit and get cut off, so please be mindful of that if you decide to give the solution a try. Hosting on S3 should get you all the storage and bandwidth you want, for a (hopefully reasonable) price, and as a bonus, transfer times into your Travis CI VM should be very fast, but open source projects are often not associated with big monetary resources, so if you decide to try and go this route, be mindful of your costs.

We have a number of other ideas that tackle this problem but are still working through which ones are likely to yield the best results at present, so watch out for followup articles.

A second angle of attack is to run your own Maven proxy solution (for example Nexus, or Artifactory). Sonatype deserves our thanks for graciously hosting Scala and so many other Open Source projects gratis for many years, but that doesn’t mean we can’t help them, and ourselves out using proxies to simultaneously reduce the load on Maven Central and speed up our own builds.

If you were lucky (or organized) enough to be running an artifact proxy for your project last week, chances are that you didn’t even notice there had been an outage. One of the issues we will be tackling here is that at present there is little information or help in how to setup or configure these proxies. That will be resolved in soon in another technical blog post (we want to make sure the information is complete and correct, if you are eager to get going though there is no reason you can’t experiment in the meantime).

This solution is further complicated because the Nexus proxy default settings are known not to work with sbt, so simply downloading Nexus is not enough, extra configuration is required and we are working with Sonatype to provide that information. Artifactory information will be coming sooner as that is simpler to configure for sbt.

The Longer Term Outlook

Like many in the Scala the community, I had never heard anything about this issue before. I suspect that if I had read a blog post about a game mod downloading Scala every time it ran and causing a spike in Maven Central’s traffic I would not have thought much more about it. Perhaps someone reading it might have connected that a forked project from that time would continue to exhibit the problem and ultimately result in an outage of the Scala 2.10.2 - certainly in hindsight it’s a lot easier to draw that conclusion.

Regardless of whether it might have been spotted in time or not, it is clear that more communication would be a good thing (it’s part of my job now too). This has also been echoed repeatedly in the community since last week, that more transparency into matters like this would be welcome. In response we will start including more developer centric content in blog form.

We don’t want to simply flood the regular blog with this new content, since much of it is likely to be dry and useful to only a small percentage of readers, but we intend to include a lot more operational information on it, including the status of nightlies, technical milestone and release updates, summaries of engineering work taken from some of the internal communications and items like notifications about extra traffic on Maven Central or anything else that we think might affect the community in any way.

We are also very interested in what you would like to be included in this information. If it’s something that would be valuable to you, please let us know, and providing it’s not too high maintenance to implement, then we will try and get it in there (just please, give us some time to establish anything). No doubt we will miss some stuff, and maybe overshare some, but we can fine tune as time goes on. One idea that sprang to my mind was tracking SIPs as they progress, I for one would find that interesting.

We will have a developer tag on the blog and a separate RSS feed for these updates as soon as we can get it set up. Don’t expect a lot of entertainment on this channel, it’s likely to be quite dry information. We will try and capture items like this lurking issue with download spikes, partially in the hope that many eyes will prove valuable in understanding their significance and the potential risks ahead of them turning into full blown problems in the future.

We also still need to follow up with Voids Wrath in order to tackle problem #3 (VoidsWrath application downloads Scala 2.10.2 artifacts on every run), to ensure that their implementation no longer asks for the Scala 2.10 jars upon every run. Trying to identify other forks of the initial Minecraft Forge code as a preventative measure also seems like a good idea, and we’ll reach out to Sonatype to help.

Hopefully this article has explained what happened, provided some short term solutions and longer term reassurance that we know the problem is not yet closed. Stay tuned for more information as we proceed, and thanks for your patience.

 

comments powered by Disqus
Browse Recent Blog Posts