These new stats let us dig deeper. Many of the most popular existing emoji would not have passed Unicode’s search criteria if they’d been in place at the time: smiling face with smiling eyes, face with tears of joy, loudly crying face, sparkle heart, eggplant, smiley poo, devil face, see-no-evil monkey, party popper, bicep, crossed finger, and shrug. None of these have anywhere near the benchmark 500 million results when you search for them in Google, even in 2019 when those results have been juiced by many pages about the emoji themselves—instead, they got in by being on Japanese phones before Unicode started taking over the decision-making process. On the other hand, many emoji that do meet the search criteria have languished far below the median level of popularity since they were introduced, including scooter, pita with falafel, rhino, tin can of food, coat, fortune cookie, bobsled, pretzel, gloves, vampire, zebra, hedgehog, rockstar/singer, and astronaut.
To be sure, sometimes the results do align: red heart, heart-eyes, fire, balloon, thumbs-up, and thinking face are all very popular as both search results and as emoji. And surely the search criteria did manage to exclude some genuinely obscure candidates. (T. rex does pretty well as both an emoji and as a search result, but I doubt ichthyosaur would have achieved similar popularity.) But overall, using search results to predict emoji usage is, to update an idiom, a case of comparing apple emoji to orange emoji.
It’s not just that emoji approved according to the newer criteria have had less time to catch on, because other emoji introduced in those same years have rocketed to popularity, like the thinking face and face surrounded by hearts. It’s more about what concepts get encoded as emoji. Using search results biases us toward common nouns—that’s how we get those rhinos and coats and vampires and pretzels. But people don’t generally use emoji as substitutes for nouns. They could, but they don’t. Instead, emoji are used in addition to words, as a way of providing further context or emotion or illustration, like how we use gestures alongside the physical kind of language, and that’s what faces and hands and hearts are particularly good at.
Five or 10 years ago, in earlier editions of Unicode, we didn’t really know how (or even if) the world was going to start using emoji. Maybe they were just a Japanese thing, maybe people would actually have stuck them in the middle of their sentences in place of words or used emoji for the same things that they make websites about. But now that we do have this data, and I hope that this is why Unicode released it, we can add it as a useful counterbalance to search data, when people are proposing more new emoji. For example, if someone wants to propose an emoji for a new article of clothing (say, pajamas), they can see not just how well the word “pajamas” does in search by itself, but also compare it to the popularity of the existing clothing emoji.
So which kinds of emoji should we expect to see more and less of, if Unicode starts taking into consideration the popularity of existing emoji? To find out, I downloaded Unicode’s emoji frequency data set, labeled all of the emoji by category (I believe this practice is now popularly referred to as “training the neural net”), and calculated some stats.
I used my own categories because I was interested in making finer-grained distinctions than are typically found on your emoji keyboard: distinguishing between round traditional faces like tears of joy and anger; “weird faces” with expressions on other characters such as the devil smiley, the heart-eyed cat, or the see-no-evil monkey; people in specific poses such as the shrugging person or dancer; people with no particular pose or expression representing archetypes, such as the redhead or astronaut; and groups of people such as all the various couples and families.