Towards Faster Ruby Hash Tables

Hash tables are an important part of dynamic programming languages. They are widely used because of their flexibility, and their performance is important for the overall performance of numerous programs. Ruby is not an exception. In brief, Ruby hash tables provide the following API:

  • insert an element with given key if it is not yet on the table or update the element value if it is on the table
  • delete an element with given key from the table
  • get the value of an element with given key if it is in the table
  • the shift operation (remove the earliest element inserted into the table)
  • traverse elements in their inclusion order, call a given function and depending on its return value, stop traversing or delete the current element and continue traversing
  • get the first N or all keys or values of elements in the table as an array
  • copy the table
  • clear the table

Since Ruby 1.9, traversing hash tables and shift operation should be done in the order elements were included. In other words, elements included earlier should be traversed first. The current implementation of Ruby hash tables is based on an old package written by Peter Moore, with modern murmurhash and siphash algorithms used for hashing Ruby values.

Moore’s original implementation has been changed a lot and the current Ruby hash tables have the following organization:

The typical Ruby hash table (very small hash tables have a different organization) has an array of bins, each bin containing a pointer to a list of table elements (entries, in the Ruby hash table terminology). The calculated hash is translated into the table bin by a modulo operation, which is actually a mask operation as the bin array is of the size of a power of 2.

As you can see, entries have 3 pointers: two for a doubly linked list of entries in their order of inclusion and one for a list of entries corresponding the same bin (in other words, for colliding entries). Using a single linked list for collisions saves memory but makes deleting arbitrary entries during traversal, not a trivial task. So the actual Ruby hash table API is even more complicated than what I described.

When a Ruby hash table’s load factor becomes too big, a new, larger bin array is created.

Modern processors have several levels of cache. Usually, the CPU reads one or a few lines of the cache from memory (or another level of cache). So CPU is much faster at reading data stored close to each other. The current implementation of Ruby hash tables does not fit well with modern processor cache organization, which requires better data locality for faster program speed.

So, how can we improve data locality?

First, we can remove the collision lists by switching to an open addressing hash table where each bin corresponds at most to one entry. Although open addressing needs more bins than the original separate chaining hash tables, we can afford this as we remove one pointer by switching to open addressing. As a result of this switch, we improve data locality for processing entries by their key.

The tendency has been to move from chaining hash tables to open addressing hash tables due to their better fit to modern CPU memory organizations. CPython recently made this switch. GCC has widely-used such hash tables internally for more than 15 years.

To improve data locality for processing entries by their inclusion order, we can remove the doubly linked list and put the entries into an array. It also removes two pointers, and the new entry becomes exactly two times smaller than the original one. That also decreases entry footprint and increases data locality. Removing the lists lets us avoid pointer chasing, a common problem that produces bad data locality.

So our proposed Ruby hash table has the following organization:

The entry array contains table entries in the same order as they were inserted. When the first entry is deleted, a variable containing the index of the current first entry (start) is incremented. In all other cases of deletion, we just mark the entry as deleted by using a reserved hash value.

Bins provide access to entries by their keys. The key hash is mapped to a bin containing the index of the corresponding entry in the entry array (using an index instead of a pointer can be memory saving when we use 8-, 16- or 32-bit indices for reasonable size tables on 64-bit machines).

The size of the array bin is always a power of two, which it makes mapping very fast by using the corresponding lower bits of the hash. Generally, it is not a good idea to ignore some part of the hash. But the alternative approach is worse. For example, we could use a modulo operation for mapping and a prime number for the size of bin array. Unfortunately, the modulo operation for large 64-bit numbers is extremely slow (it takes more 100 cycles on modern Intel CPUs).

Still, the remaining bits of the hash value can be used when the mapping results in a collision. That is also an improvement in comparison with the original tables. We use a secondary hash value function analogous to one in the new CPython hash table implementation. The secondary hash value function is a function of the collision bin index and original hash value. This choice of function guarantees that we can traverse all bins and finally find the corresponding bin, as after several iterations the function becomes X_next = (5 * X_prev + 1) mod pow_of_2, which is a full cycle linear congruential generator since it satisfies the requirements of the Hull-Dobell theorem. By the way, I suspect that the function lookdict in CPython might cycle in extremely rare cases, as X_prev in the function can be more than the size of the table.

When an entry is removed from the table, besides marking the slot as deleted using a special hash value, we also mark the bin by a special value in order to find entries, which had a collision with the removed entries.

There are two reserved values for bins. One denotes an empty bin, while the other one denotes a bin containing a deleted entry.

The length of the bin array is always twice the entry array length. This maintains a healthy table load factor. To improve table memory footprint, we use 8, 16, 32-bits indices on a 64-bit system for smaller tables. As bins are a smaller part of the all table footprint for small tables, we could even increase the bin array two times to decrease the load factor and the collision number even more. But currently, this approach is not used.

The trigger of rebuilding the table is always the case when we cannot insert an entry at the end of the entry array. As opposed to the original hash tables, new arrays can be smaller after rebuilding. This can happen when we remove a lot of entries.

Table rebuilding is done by creating new array entries and bins of appropriate size. We could try to reuse the arrays in some cases by moving non-deleted entries to the start of the array. But this makes little sense, as the most expensive part is entry moves and such an approach just complicates the implementation.

Analysis of the hash table usage in Ruby on Rails done by Koichi Sasada shows that the majority of hash tables are small. We prevent allocations of bins and use a linear search for very small tables less than 5 entries. It saves memory and makes an access to the entries by keys faster for such small tables by saving time to maintain and work with data in the bins.

The original Ruby hash tables use the same approach for tables of less than 7 entries but the implementation is more complicated as it needs a different table representation.

Here is the performance comparison of original and proposed hash tables on 27 MRI hash table benchmarks. We use two major targets for this: x86-64 and ARM.

The average performance improvement on Intel Haswell for the proposed hash tables is about 46%. On ARM, it is even bigger 59%. The proposed implementation also consumes less memory in average than the old one. The below graph shows memory consumption in bytes on x86-64 depending on the number of entries in the hash table.

The proposed implementation is also more straightforward. It permits to simplify the Ruby hash table API in future by avoiding of the original hash tables’ problems with deleting entries during traversal.

Although the new design seems simple, achieving acceptable results required a lot of testing of different variants of the implementation. That is how performance work is usually done.

As an example of such work, one variant of the new hash table implementation had a faster solution to solve the problem of possible denial attacks based on hash collisions:

  • When a new table entry was inserted, we just counted collisions with entries with the same hash (more accurately a part of the hash) but different keys and when some threshold was achieved, we rebuilt the table and started to use a crypto-level hash function. In reality, the crypto-level function would be never used until someone really tries to do the denial attack.
  • Such approach permitted to use faster non-crypto level hash functions in the majority of cases. It also would permit easily to switch to the usage of other slower crypto-level functions (without losing speed in real world scenario), e.g. sha-2 or sha-3. The cryptographic function siphash currently used in MRI is pretty a new function and it is not time-tested yet as older functions.

Although it is not documented, Ruby function hash should be the same as hash function used inside the hash tables and its value should be always the same for an object during one MRI run. Therefore, this approach did not survive the final variant of the hash tables.

It took 8 months from submitting the initial code to committing it into MRI repository (please see issue 12142). Now the new hash table implementation is a part of MRI 2.4.

The hash table implementation discussion contains more 170 posts. Reading the posts can be quite educative and can be a good case of studying open source development aspects and issues.

On a side note, I’d like to thank my managers for permission to invest part of my time to work on Ruby performance in addition to my major responsibilities.


 

Developing for IoT? Take this Eclipse survey

Calling all IoT developers–your learnings can benefit your peers who are at different states of IoT adoption. And the more information you share about your development approaches and programming preferences, the better communities and companies can understand your requirements.

The third annual IoT Developer Survey, hosted by Eclipse IoT, is your chance to share your experience. Please take the time to complete the 5-8 minute survey.
https://www.surveymonkey.com/r/iotsurvey2017redhat

The survey runs until March 17, 2017.

You can take a look at last year’s results here: https://ianskerrett.wordpress.com/2016/04/14/profile-of-an-iot-developer-results-of-the-iot-developer-survey/

OpenShift for Developers: Set Up a Full Cluster in Under 30 Minutes

One of the common questions I get asked by developers is how they can use OpenShift locally for their own development. Luckily, we have a lot of different options and selecting one depends on the specific development environment that you prefer to work with.

For example, if you prefer to have things working in a virtual machine without having to worry too much about the installation, the all-in-one or official CDK is probably what you are after. These two options utilize Vagrant and VirtualBox with a major difference being that the all-in-one uses the open source Origin project and the CDK uses the enterprise version called OpenShift Container Platform.

One of my favorite ways of using OpenShift locally is to use oc cluster up. This is a fantastic tool that I use on a daily basis but I suggest you also take a look at the oc cluster wrapper project that my team codes and supports. The oc cluster wrapper project was created to help developers out a bit further by automating a lot of tasks such as profile management and persistent volumes.

After you play around with OpenShift locally, you will come to the realization that you would enjoy having a 24/7 install of OpenShift that you can publicly host your projects on. This is where a lot of Developers stumble because they aren’t system administrators. For that reason, I took some time to create a video that shows how to install OpenShift Origin 1.4 from start to finish. This means that I create a bare virtual machine, install the operating system, install dependencies (like docker), and then use ansible to install OpenShift. After the install, I then show how to setup wildcard DNS for a public hostname. All in under 30 minutes.

I hope you enjoy the video and using OpenShift Origin 1.4!

How to Build a Stratum 1 NTP Server Using A Raspberry Pi

The Raspberry Pi Model B was released in 2012 and, since then, a number of useful applications regarding this device have ensued. However, one particular application that is seldom overlooked when dealing with the Raspberry Pi is its ability to be used as a Stratum 1 NTP server and allow you to synchronize clocks across networks like the Internet. For me, this useful trick has actually made my entire office far more efficient. 

Continue reading “How to Build a Stratum 1 NTP Server Using A Raspberry Pi”

Memory Error Detection Using GCC

Introduction

GCC has a rich set of features designed to help detect many kinds of programming errors. Of particular interest are those that corrupt the memory of a running program and, in some cases, makes it vulnerable to security threats. Since 2006, GCC has provided a solution to detect and prevent a subset of buffer overflows in C and C++ programs. Although it is based on compiler technology, it’s best known under the name Fortify Source derived from the synonymous GNU C Library macro that controls the feature: _FORTIFY_SOURCE. GCC has changed and improved considerably since its 4.1 release in 2006, and with its ability to detect these sorts of errors. GCC 7, in particular, contains a number of enhancements that help detect several new kinds of programming errors in this area. This article provides a brief overview of these new features. For a comprehensive list of all major improvements in GCC 7, please see GCC 7 Changes document.

Continue reading “Memory Error Detection Using GCC”

Announcing Fuse for agile integration on the cloud – FIS 2.0 release

Today, I am very pleased to announce the GA of Fuse Integration Service 2.0. This release will make integration applications more portable, flexible and allow agile developers to react faster to business needs by supporting microservice architectures. Developers will now be able to realize the benefits of microservices within integration projects and be able to leverage integration patterns while breaking up monolithic applications and reducing the size of services pushed onto older ESB technology.

With FIS 2.0, developers can now choose a more suitable technology for the composition and integration of microservices,  with a more lightweight runtime providing for faster deployment, while simplifying packaging and ensuring a smoother process from development to production, as well as allowing management of the distributed application and taking care of fault tolerance all at the same time.

The list goes on. The best thing about Fuse Integration Service 2.0 is that it can be used as a best-practice foundation for developers to focus on building business value in microservices without worrying about how they need to solve every problem on the list. And here is why…

Superior pattern-based solution for building and composing microservice  – The new FIS 2.0 comes with Apache Camel 2.18,  with 150+ built-in components and data transformation, it fits perfectly with the microservice principle of building smart endpoints. Developers can simply configure connectors to various systems and services.  

Enterprise Integration Patterns – are a new, best practice in the concept of agile integration; developers can compose microservices with ease (Simple pipeline), and simply reuse the pattern without reinventing each time.

Excellent developer experience with Fuse Tooling.   From getting started,  to real world production deployments, Fuse provides a comprehensive set of tools to help developers through the complete application life cycle. Developers can choose between traditional Java programming styles or leverage drag and drop features from tooling. Debugging and unit testing can also be done in the IDE with testing suite libraries. Maven is included for dependency and builds management.  For getting started, Fuse also provides a set of quick-start examples that simplify the learning curve but are also great for experienced developers wanting to rapidly prototype new projects.


Containerized applications – 
 FIS 2.0 offers a repeatable and declarative environment, allowing developers to quickly package an integration application into a container, simplifying the use of the same image in development, QA, and production environments.  Fuse has pre-defined a base image for the docker-like container, allowing developers to use it as a base for application logic after which it will generate images using the tooling provided. 


Support Spring Boot and Karaf runtime – 
FIS now officially supports Spring Boot,  a widely adopted environment for microservices.  Spring Boot’s “autowire” capability, and ability to create lightweight stand-alone applications has made it a natural fit as a microservice runtime. Karaf as an OSGi runtime is also supported for existing Fuse developers.

 

API Support and Service Resiliency – With REST DSL, developers can now define a REST endpoint within minutes and
automatically export the API documentation (Swagger). When connecting APIs, it is important to make sure to maintain service resiliency. Fuse Integration Service adopts Kubernetes as it’s orchestration layer for containers, which will detect any failure of the service and recover by spinning up another running instance. By supporting Hystrixs in Camel, Fuse makes sure the failure is isolated without affecting other instances. 

Complete CI/CD cycle – FIS 2.0 provides a great user experience for continuous integration with features in the IDE, Maven, source control with git and other SCM applications, and the source-to-image plugin helps developers build images either to test locally or for deploying into an actual cloud platform. The out-of-the-box pipeline support helps developers create a complete cycle for continuous delivery.

Container Orchestration – Last but not least, as the number of microservices grows, developers needed a way to manage and orchestrate all containers and services. Kubernetes in FIS 2.0 will help to automatically discover services, load balance incoming requests, handle clustering and dynamically configure services when they come alive. Operations can scale up and down services on-demand and manage resource as needed.

Of course, don’t worry if your application can not immediately jump to microservices,  Fuse can also support traditional ESB approaches. TRY IT  TODAY and get started with the agile integration experience!

 

 

Red Hat Logo

An Incremental Path to Microservices

As a consultant for Red Hat, I have the privilege of seeing many customers. Some of them are working to find ways to split their applications in smaller chunks to implement the microservices architecture. I’m sure this trend is generalized even outside my own group of the customers.

There is undoubtedly hype around microservices. Some organizations are moving toward microservices because it’s a trend, rather than to achieve a clear and measurable objective.

In the process, these organizations are missing a few key points, and when we all awake from this microservices “hype”, some of these organizations will discover that they now have to take care of ten projects when before they had one, without any tangible gain.

To understand what it takes to reap real benefits from microservices, let’s look at how this neologism came to being.

Continue reading “An Incremental Path to Microservices”

Unlock your Red Hat JBoss Data Grid data with Red Hat JBoss Data Virtualization

Welcome to another episode of the series: “Unlock your Red Hat JBoss Data Grid (JDG) data with Red Hat JBoss Data Virtualization (JDV).”

This post will guide you through an example of connecting to Red Hat JBoss Data Grid data source, using Teiid Designer. In this example, we will demonstrate connecting to a local JDG data source.  We’re using the JDG 6.6.1, but you can connect to any local or remote JDG source (version 6.6.1) if you wish, using the same steps.

Continue reading “Unlock your Red Hat JBoss Data Grid data with Red Hat JBoss Data Virtualization”

Five-Day Sprint Process meets Raleigh Innovators Program – Part 5 of 5

My Experience

When I heard that HR would be exploring changes to our employee review process, I made a mental note to follow up on that later in the year. I’d only ever been an end user of this process. I didn’t know what an annual review should look like, but I could see the same room for improvement that Red Hatters voiced on internal mailing lists and at the water cooler. I work in IT, not HR, but am passionate about building up people, morale, and community in my workspace. I’d even poked around HR and considered transferring departments. So, you can imagine how excited I felt when I was offered a spot on this project team out of the blue. I got to step into the world of HR full-time for three months and have a greater impact on the 11,000+ people of this company on a larger scale than I ever imagined.

I had a lot to learn

We all did. That was the beauty of it for all of us. It was like a sudden baptism into a completely new field. I didn’t know a 9-box from a shoebox. We researched best practices from trusted resources like Gallup, Forbes, Harvard Business Review, etc. We interviewed people from other companies known as pioneers in this space. Trends have shifted from one daunting annual performance review to a series of ongoing conversations.  Rather than a two-way channel between the boss and employee, companies are moving to a 360 feedback model with input from peers and customers.

The underground was busy- with best practices

What’s awesome is that this is already happening at Red Hat. Throughout our real life crash course in HR, we were talking with Red Hatters. Many had enhanced our existing process with additional conversation points and documentation. Peer feedback loops were already in place. One interviewee told us he wouldn’t need our prototype because his peers are already fully committed to their peer feedback system.  What an awesome challenge for my project team! We learned that groups across the company are already bought into industry best practices and that we just needed to align them with a process, which supports that energy.

Our motley crew rocked

Our team came from three different departments, and all of us had very different roles. We each brought a unique perspective to our conversations and decisions. My specialty was UX.  Others were software development, marketing, and employment branding. Whether it was corporate impact or reporting, our backgrounds were apparent during the discussion. The group was stronger because of the different lenses we brought into our analysis of each decision. We learned how engineers commonly like to receive feedback. We learned what managers do behind the scenes to use our reviews in their strategic workforce planning. We learned about prototype software. This kind of cross-discipline approach to a challenge is rare but wonderful. It’s an effective way to approach well-rounded problem-solving. Had any one of these departments solely owned this Innovators Project, the product would have been limited to the vision of our disciplines. By coming together with open minds, we understood the people of our company on a completely new level and made a better solution as a result.

It is clear why each of us was chosen for this project. We have been through the current process and experienced its challenges. We know Red Hat’s employees and culture.  Each person chosen for this team was insightful, hard working, and cared deeply about the problem we were solving. We are the kind of people who lose sleep over a merely passable job; we have to do a great job. We differed personally and professionally. From hedgehogs to trendsetters to toddlers, we probably wouldn’t have bonded if we’d met as strangers at a party. Yet, we quickly became a tight group where we were free to be our individual selves.

I digress

Our part of the project has ended (ideation and socialization). Efforts are underway in HR to bring our new process to life. I can’t wait to see the official implementation of this work and know that I made a difference in the well-being of people across Red Hat.

This project was an amazing opportunity for me. I am grateful to have been chosen to work at the corporate level to have such a direct impact on every person at this company. The Google Ventures Five-Day Sprint Process was revolutionary to me. I see how this approach generates depth and creative thinking that we can’t replicate on our own.


The Red Hat portfolio of products empowers professional developers to be more productive and build great solutions.