Manipulating emojis in Java, or: What is 🐻 + 1?

Manipulating emojis in Java, or: What is 🐻 + 1?

Warning: The code you’re about to see has no redeeming qualities whatsoever. We hope you enjoy it as much as we do.Β 

If you’re like us, you’ve probably been wondering about how to manipulate emojis in your Java programs. Or perhaps you’ve been thinking about that age-old question, “What is 🐻 + 1?” Thanks to a recent coding session in which yr author spent what could have been several productive hours going down a πŸ°πŸ•³(rabbit hole), we can help you answer that question.

First of all, here’s the code in its entirety, MysteryAnimal.java:

public class MysteryAnimal {
  public static void main(String[] args) {
    String bear = "🐻";

    // If the previous line doesn't show up in your editor,
    // you can comment it out and use this declaration instead: 
    // String bear = "\ud83d\udc3b";

    int bearCodepoint = bear.codePointAt(bear.offsetByCodePoints(0, 0));
    int mysteryAnimalCodepoint = bearCodepoint + 1;

    char mysteryAnimal[] = {Character.highSurrogate(mysteryAnimalCodepoint),
                            Character.lowSurrogate(mysteryAnimalCodepoint)};
    System.out.println("The Coderland Zoo's latest attraction: "
                       + String.valueOf(mysteryAnimal));
  }
}

The code creates a Java String whose value is the bear emoji. Note that you can represent the bear emoji with the string "\ud83d\udc3b" β€” two Unicode values that work as a pair. With the string defined (bear = 🐻), we get the Unicode code point for the emoji, add 1 to it, convert the updated code point back to a new pair of values, and finally print the result.

The problem for Java is that a char can only represent a 4-digit hex number (16 bits), while emojis are 5-digit hex numbers. If you’ve got some time to kill (and frankly, if you’re reading this article, you probably do), the Unicode site has the complete list of emojis.Β Spoiler alert: looking at that list might give away the answer to this puzzle. The Unicode code point for the bear emoji is \u1f43b. Again, that’s too large a number to be represented as a char.

The solution from the Unicode consortium was to create what are called surrogate characters. (This is despite the fact that emojis are vital to modern human existence.) An emoji is represented as an ordered pair of 4-digit hex numbers: a high surrogate and a low surrogate. There are fairly complicated mathematical formulas to calculate the two based on the original 5-digit hex number, but the Java Character class, as you can see above, can calculate those two values for you. To represent an emoji in a Java String, we create a char array with the two surrogates and use the String.valueOf() method to combine them.

You can find more information about surrogate characters in the documentation for the Java Character class. You can find even more information about surrogates in the Unicode standard itself.

Everything you need to grow your career.

With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.

SIGN UP

And now, the answer

If you run the code above, you’ll get these results:

The Coderland Zoo's latest attraction: 🐼

And there you have it! From an emoji perspective, Bear + 1 = Panda. We hope you’ve enjoyed this diversion; if you’ve read this far, we assume you probably did. Enjoy manipulating emojis in Java, and let us know in the comments if you find any practical use for this technology. Have fun!

Share