RubyGems.org Vulnerability Explained
After evaluating Gemfury’s processing of RubyGems, we feel it is important to share our understanding and bring awareness to possible security issues when parsing untrusted YAML input.
On January 30, 2013, the community package server RubyGems.org was compromised with a rogue code execution vulnerability. The all-volunteer team sprung to action and in the following 53 hours yanked the expoit, patched the vulnerability, verified all the existing gems, and migrated the service to AWS. As of today, the service has been restored and deemed safe for use.
Important: This vulnerability came from misuse of a standard YAML library and might not be specific to just RubyGems.org. Many applications depend on this library and are potentially vulnerable to a similar exploit if exposed to untrusted YAML input — please take this opportunity to audit and secure your own applications.
Quick review of RubyGem structure
RubyGems are used to encapsulate, package, and share Ruby code. A Gem is nothing more than a tar.gz archive of the files packaged with
$ tar -ztf rails-3.2.11.gem data.tar.gz metadata.gz
data.tar.gz archive contains all packaged files that the author has chosen to distribute. A list of these files is specified in the original gemspec.
metadata.gz file is a compressed
YAML.dump serialization of the
Gem::Specification object that is defined by the above-mentioned gemspec. This specification contains the name, version, author, file list, dependencies, and other important information about the Gem.
Uploading to RubyGems.org
When a Gem is uploaded to RubyGems.org or Gemfury, the server extracts the contents of
metadata.gz and uses this to index the Gem. The extracted data is used on the Gem information page and, more importantly, in the backend indexes queried by
gem install and Bundler when a developer installs that Gem.
Before the discovery of this exploit, RubyGems.org loaded the content of
metadata.gz by calling
YAML.load which is a part of the standard Ruby libraries.
A powerful feature of the Ruby YAML library is the ability to serialize Ruby objects. For example, when
YAML.load was called on the Gem metadata, the returned object was a
Gem::Specification instance and not one of the basic types.
This feature was used to compromise RubyGems.org — the exploit was an uploaded gem with a well-crafted
metadata.gz file that instantiated an object that could and did execute arbitrary Ruby code.
YAML has a number of ways to deserialize Ruby objects and one of them is specifically designed for subclasses of
Hash that takes the following form in the YAML file:
--- !ruby/hash:MyHashClass Hello: World Foo: Bar
In this example, when the parser encounters this input, it will create a new instance of
MyHashClass and call
= method for each listed key/value pair. And it does so without verifying whether
MyHashClass is actually a subclass of
So now, to execute arbitrary code, one just has to find any existing class that calls
eval on either of the arguments to the
= method. Unfortunately, the class that was used in this exploit is included in every Ruby on Rails application as part of Action Pack’s routing.
If you trace the
= method of
NamedRouteCollection, you will find that it inserts the content of the first argument into a
module_eval block, thus executing rogue code.
Please evaluate whether your applications is loading YAML input anywhere from an untrusted source. A good way to catch it is to stub the
YAML.load method after all your configuration files are loaded and re-run your test suite.
If your application is supposed to process untrusted YAML input, I recommend two possible solutions:
If your input is only expected to have basic types without any Ruby objects, I recommend looking at safe_yaml which disables non-basic types for both Syck and Psych parsers.
Using only basic types should be the standard approach of serializing to YAML. It is not a good practice to expose internal details of your application (like class names) outside of a trusted environment.
However if, like RubyGems.org, your input does expect to contain certain Ruby classes, then you should customize the behavior of Psych to only instantiate a whitelist set of classes. Also, audit and/or stub the following methods for each of the whitelisted classes.
def =(k, v) end def init_with(v) end def yaml_initialize(k, v) end
- RubyGems.org 1/30/13 incident status
- RubyGems.org data verification status
- RubyGems.org class-whitelist YAML patch
- safe_yaml project
- YAML for Ruby
If you have any comments or corrections, please email me or reply on Twitter: