In the beginning of my learning experience with Ruby at Flatiron School, I came across a lab that was quite a challenge, grouping items from an enumerable and return a hash, where the keys and values had been arranged in an ordered semantic way. After a hefty amount of Google searches, this premise was all too common when working with collections, such as arrays or hashes.

Boromir complains about iterating array of arrays

Well Boromir, just like you, I let the influence of Isildur’s Bane cloud my judgement, and succumb to despair. Well, not quite that dramatic, but it was indeed a daunting task.

In the beginning of my online research, this scenario turned out to be very common and often occurring in the wild. I thought that in order to tackle this problem, it would have involved such a bunch of loops and weird esoteric hacks that someone clever had probably already made it into a gem.

Enter Matz…

… then Matz took the best of list processing from Lisp, and the best of OO from Smalltalk and other languages, and the best of iterators from CLU, and pretty much the best of everything from everyone.

Steve Yegge1

As in that quote, Ruby has a robust library in Enumerate that allows us programmers to have granular control in how exactly we can process a collection or any other enumerable, all in an elegant way.

The Problem

While I don’t exactly remember the details of the lab, the main challenge was to group two collections (arrays) by an arbitrary condition, or as we can put it, “group them by in a semantic way”, and the answer in all those Google searches was to use #group_by built-in method.

The all-mighty #group_by method can be accessed by any enumerable, meaning that if you want to arrange items, or keys or values in an arbitrary semantic way, you totally can, and Ruby empowers you.

The documentation is very clear that, after calling #group_by, one must use a code block. Inside of that block is what the method will use for the condition on how to group the items and it also will use that line to create a key for the output hash.

I know, it is a mouthful, but after playing with it in a REPL environment, it will all make sense, so I will show you a few examples.

Demonstration

Again, these are not the same arrays or hashes from the lab, but for this purpose, these examples will do just fine.

Example 1

We are going to group an array of rock_hits, where every item is another array which represents a song, and in which the first element is a rock_band and the second is a rock_hit. Remember that, by using #group_by, the return object will be an hash.


rock_hits = [
    ["Queen", "Bohemian Rhapsody"],
    ["Queen", "Don't Stop Me Now"],
    ["Queen", "Another One Bites the Dust"],
    ["Queen", "We Will Rock You"],
    ["Queen", "Somebody to Love"],
    ["Queen", "I Want To Break Free"],
    ["Metallica", "Nothing Else Matters"],
    ["Metallica", "Enter Sandman"],
    ["Metallica", "The Unforgiven"],
    ["Metallica", "One"],
    ["Guns N' Roses", "Paradise City"],
    ["Guns N' Roses", "November Rain"],
    ["Guns N' Roses", "Knockin' On Heaven's Door"],
    ["Guns N' Roses", "Don't Cry"],
    ["Guns N' Roses", "Welcome to the Jungle"],
    ["Guns N' Roses", "Sweet Child O'Mine"],
    ["Guns N' Roses", "You Could be Mine"],
    ["AC/DC","Thunderstruck"],
    ["AC/DC","Back In Black"],
    ["AC/DC","Shoot to Thrill"],
    ["AC/DC","Dirty Deeds Done Dirt Cheap"]
  ]

We want to use the artist, or the first item of the nested arrays, for the key. We can do that by running the following command in irb or pry:


rock_hits.group_by { |song| song[0].itself }

You may be asking yourself, what is #itself? It’s a kernel method that … well, makes an object return itself. Handy, right?.

The returned hash looks like this:


{"Queen"=>
  [["Queen", "Bohemian Rhapsody"],
   ["Queen", "Don't Stop Me Now"],
   ["Queen", "Another One Bites the Dust"],
   ["Queen", "We Will Rock You"],
   ["Queen", "Somebody to Love"],
   ["Queen", "I Want To Break Free"]],
 "Metallica"=>
  [["Metallica", "Nothing Else Matters"],
   ["Metallica", "Enter Sandman"],
   ["Metallica", "The Unforgiven"],
   ["Metallica", "One"]],
 "Guns N' Roses"=>
  [["Guns N' Roses", "Paradise City"],
   ["Guns N' Roses", "November Rain"],
   ["Guns N' Roses", "Knockin' On Heaven's Door"],
   ["Guns N' Roses", "Don't Cry"],
   ["Guns N' Roses", "Welcome to the Jungle"],
   ["Guns N' Roses", "Sweet Child O'Mine"],
   ["Guns N' Roses", "You Could be Mine"]],
 "AC/DC"=>
  [["AC/DC", "Thunderstruck"],
   ["AC/DC", "Back In Black"],
   ["AC/DC", "Shoot to Thrill"],
   ["AC/DC", "Dirty Deeds Done Dirt Cheap"]]}

One may notice that even though we might have accomplished our objective, the String of the Artist is the key of the hash, but it’s not exactly what was asked of us. Mmmm… well okay… Let’s try again, what about this REPL command:


rock_hits.group_by { |song| song.shift }

will return …

{
  "Queen"=>
  [["Bohemian Rhapsody"],
   ["Don't Stop Me Now"],
   ["Another One Bites the Dust"],
   ["We Will Rock You"],
   ["Somebody to Love"],
   ["I Want To Break Free"]],
 "Metallica"=>[["Nothing Else Matters"], ["Enter Sandman"], ["The Unforgiven"], ["One"]],
 "Guns N' Roses"=>
  [["Paradise City"],
   ["November Rain"],
   ["Knockin' On Heaven's Door"],
   ["Don't Cry"],
   ["Welcome to the Jungle"],
   ["Sweet Child O'Mine"],
   ["You Could be Mine"]],
 "AC/DC"=>
  [["Thunderstruck"], ["Back In Black"], ["Shoot to Thrill"], ["Dirty Deeds Done Dirt Cheap"]]}

We are almost there, but we can see that the values of the returned array is a bunch of nested arrays. Like I said, we are almost there and we will take it there!

It seems we need to flatten the values? Array#flatten will serve that purpose. Now, we would have to iterate through every single array to flatten it.

We have a Ruby way of doing this, all without using #each!

Hash#transform_values to the rescue!

In a smash bros meme, Transform Values joins the battle

When we read the documentation, it says that #transform_values takes a hash and it operates on its values. Pretty straight forward.

Let’s go back to irb, and try:

rock_hits.group_by { |song| song.shift }.transform_values do |values|
  values.flatten
end

And, this is what we get in return.

{
  "Queen"=>
  ["Bohemian Rhapsody",
   "Don't Stop Me Now",
   "Another One Bites the Dust",
   "We Will Rock You",
   "Somebody to Love",
   "I Want To Break Free"],
 "Metallica"=>
   ["Nothing Else Matters",
   "Enter Sandman",
   "The Unforgiven", "One"],
 "Guns N' Roses"=>
  ["Paradise City",
   "November Rain",
   "Knockin' On Heaven's Door",
   "Don't Cry",
   "Welcome to the Jungle",
   "Sweet Child O'Mine",
   "You Could be Mine"],
 "AC/DC"=>
  ["Thunderstruck",
   "Back In Black",
   "Shoot to Thrill",
   "Dirty Deeds Done Dirt Cheap"]
 }

Eureka! This is the hash we were looking for!!!

Public Service Announcement:

Remember that `Array#shift` is destructive, meaning that it will alter the original array. If you need to preserve the original array, you may need to make a copy of it.

Example 2

Say we need to group an array of country names, by the first letter.

Having experience with Enumerable#group_by and Hash#transform_values, attacking this problem will be a piece of cake.

country_list =
["Afghanistan","Albania","Algeria","Andorra","Angola","Anguilla","Antigua
&Barbuda","Argentina","Armenia","Aruba","Australia","Austria","Azerbaijan","Bahamas","Bahrain","Bangladesh","Barbados","Belarus","Belgium","Belize","Benin","Bermuda","Bhutan","Bolivia","Bosnia & Herzegovina","Botswana","Brazil","British Virgin Islands","Brunei","Bulgaria","Burkina Faso","Burundi","Cambodia","Cameroon","Cape Verde","Cayman Islands","Chad","Chile","China","Colombia","Congo","Cook Islands","Costa Rica","Cote D Ivoire","Croatia","Cruise Ship","Cuba","Cyprus","Czech Republic","Denmark","Djibouti","Dominica","Dominican Republic","Ecuador","Egypt","El Salvador",...

We proceed with this command in your preferred REPL tool:

country_list.group_by { |country_name| country_name[0].to_sym }

And, we get what we wanted …

{:A=>
  ["Afghanistan",
   "Albania",
   "Algeria",
   "Andorra",
   "Angola",
   "Anguilla",
   "Antigua & Barbuda",
   "Argentina",
   "Armenia",
   "Aruba",
   "Australia",
   "Austria",
   "Azerbaijan"],
 :B=>
  ["Bahamas",
   "Bahrain",
   "Bangladesh",
   "Barbados",
   "Belarus",
   "Belgium",
   ....

I’ve used #to_sym to convert the string to a symbol as they tend to be better keys. I’ll expand more on that in a later blog post.

We can even count how many countries there are, per letter. We will use our old trusty #transform_values.

country_list.group_by { |country_name| country_name[0].to_sym }.transform_values { |values| values.count }

And the returned hash has the actual count of countries per letter.

{:A=>13,
 :B=>19,
 :C=>17,
 :D=>4,
 :E=>6,
 :F=>7,
 :G=>15,
 :H=>4,
 :I=>9,
 :J=>4,
 :K=>4,
 :L=>9,
 :M=>18,
 :N=>10,
 :O=>1,
 :P=>10,
 :Q=>1,
 :R=>4,
 :S=>26,
 :T=>12,
 :U=>6,
 :V=>3,
 :Y=>1,
 :Z=>2}

Example 3

In this example, we are going to group the hash by the keys :passed and :failed. The score goes from 0 to 100, where the student needs at least 60 to pass.

Here we have a real-world example of the usefulness of #group_by. It is possible that one day this method will come really handy and help you from really hacky code.

Here is the grades hash.

# The range of the grade are between 0 and 100
# At least 60 to pass, less and the student failed.

grades = {
  "Pedro" => 60,
  "Malik" => 59,
  "Penny" => 88,
  "Marissa" => 93,
  "John" => 75,
  "Juan" => 48,
  "Amy" => 75,
  "Sophia" => 35,
  "Carmen" => 79,
  "Mario" => 80,
  "Giovanni" => 60
}

Going to irb, let’s run this code snippet.

grades.group_by do |student_name, grade|
  grade >= 60 ? :passed : :failed
end

If you are not clear on the ternary operator, it comes really handy for those clean one liners.

Here we are entering a condition, on whether if the grade is more than or equals to 60, or less than, then the item is assigned to the proper key.

And, here is our finished hash:

{
  :passed=>
  [
    ["Pedro", 60],
    ["Penny", 88],
    ["Marissa", 93],
    ["John", 75],
    ["Amy", 75],
    ["Carmen", 79],
    ["Mario", 80],
    ["Giovanni", 60]
  ],
  :failed=>
  [
    ["Malik", 59],
    ["Juan", 48],
    ["Sophia", 35]
  ]
}

Conclusion

Like peanut butter and jelly, Enumerable#group_by and Hash#transform_values go well together, and should be part of your repertoire as a Ruby developer. Once I start diving into advanced JavaScript development, I’ll search what is the equivalent of these helpful methods.

Cheers.

Footnotes

[1] Tour de Babel by Steve Yegge