Trolls & Utauloids – Internal Structure and Appeal

A long time ago I wrote something about characters. That isn't important now, but I would like to say something that I have recently learnt about myself. What I do want to talk about is how I consider these things as being categorically different from other things that they are basically permanently described as, and why.

But first, what are those things in the title, and why are they special?

The things in the title

There are two things in the title, and they are generally considered a type of character, which is something we'll get to later.


Many years ago, there was a man who made a comic on the web about four children playing a video game. Then it got worse, and seven years and well over one hundred thousand words later it ended having passed through several dozen characters. It's very long and it's kind of hard to read now and even harder to reasonably summarise without breaking the flow of this article, but it's ultimately a long story about characters discovering relations amongst each other, and some unspecified, faraway conflict happens.

The important thing for our purposes is that there is a particular type of individual in the comic called a "Troll", which is one amongst several others. For several plot reasons, Trolls, as a species, is best described as a "like humans but" type of alien, in that they look very similar with minor æsthetic changes.

The relevant changes is as such:

The reason why they are called "Trolls" is that their first appearance in the comic was to hassle the other characters in the story in pure text form. They otherwise has little to do with the mythological troll.

Once these are revealed in the comic it was almost immediately popular for readers to make their own trolls, specifically having the scheme above. This new type of original characters ("OC", as the lingo goes) are then called "fantrolls". Even now, half a decade after the end of the comic, these fantroll community continues to linger around on Tumblr, and they generally interact with others by means of role play. I still catch up with those blogs every day.


The word "Utauloid" has an interesting origin. Even further ago, in the deep dark depths of the early 2000s, there was a company in Japan that had the brilliant thought of having a computer sing songs. The result is a program called Vocaloid, and that had a massive impact in how someone can make music today. The word "Vocaloid" is pretty easy to understand: it's a voice ("vocal") that is paired with a character which is typically partially robotic in form ("-loid", from "android").

The problem is that Vocaloid is a paid program, and a pretty expensive one at that, so there's a space for a program that is similar but is free. And a few years later the program called UTAU came out which is basically just that.

(In case you're wondering, both programs came out before the idea of free software was en vogue, so they're both proprietary even though one of the programs is freeware. Indeed, Japan has some extremely restrictive copyright laws which makes everything hard. And also, UTAU still has most of its text encoded using a pre-Unicode scheme, which means that you need to set the Windows locale to Japan to be able to intelligently use it. Oh, and did I mention both of them are Windows only?)

One important difference between the two is that UTAU is explicitly geared towards being able to accept voices that you create. Vocaloid has the ability to be used with a lot of voices but only the ones that are sold commercially. This means that there are now ordinary people who, in their own free time create voices for use with this program, and a community was created for them. These new voice banks are called "Utauloids", from analogy. But notice that the L remained, even though it no longer has any real meaning in the word anymore.

Like Vocaloids they are named after, and the idea of a mascot which caused Vocaloids to have those, Utauloids are almost always decorated with a visible name and face. Typically, they mimic the naming scheme, general appearance and characteristics that one particular company (the most popular one) gives to its products: an ordinary human name whose last character in the second name is usually the character for "sound" 音, pronounced "ne", and the first name is written entirely in kana. This is by no means universal, as there are other companies that exist with competing schemes and there's a lot of freedom as to how one designs them.

However, regardless of how one designs the "face", there has to be a "body" – the voicebank – in order for it to have any use. So even though there's a lot of setting and background information, one does not have to engage with it at all in order to use the voicebank to sing songs, and indeed the examples I tend to consume a lot generally don't.

The character and me

The last bit I need to write before everything comes together is to describe why "characters" are difficult creatures.

Characters have always been a hard concept for me to grasp for various reasons. Basically, it's hard for me to keep track of because people are difficult and models of how people model other people are harder. It's gotten to the point where I think that they are unnecessary for storytelling – and to a large extent I think so, if only because it turns out you can do a lot with very little, and that I grew up mostly with non-fiction books that don't necessarily feature living things anyway. The point is that I find them basically impenetrable, so I'd rather not engage with them altogether.

That has changed a little bit in the ten years since I have read it, but the basic point still remains: characters are /difficult/, and they are extraordinarily common for how much skill one needs to decode it. I think Steven Hawking, when writing A Brief History of Time, was told that every formula he added to the book will scare half the potential audience away; this is my expected result with characters and their relationships, but this was never the case, and this is a point of confusion for a long time.

With that out of the way, I shall now make the statement:

The things in the title have a simple internal structure

Here's the type hierarchy for the things I've referenced:

(defclass w:character () ...)
(defclass voicebank:utauloid (w:character) ...)
(defclass homestuck:fantroll (w:character) ...)

(I've taken a couple of liberties with package naming. Just imagine that PACKAGE-LOCAL-NICKNAMES is available and these nicknames are defined somewhere else.)

That's just a fancy way of saying that Utauloids and fantrolls are both characters of some sort, but here's the most important thing: it's surprisingly easy to just straight-up fill in the elided components, like this.

(defclass homestuck:fantroll (w:character)
  ((first-name :type '(string 6))
   (last-name :type '(string 6))
   (horn-shape :type 'svg+xml)
   (symbol :type 'svg+xml)
   (blood-colour :type 'keyword)
   (relationships :type '(vector 7 w:fantroll))
   #| Plus a couple of other things
      that require comic-specific knowledge to comprehend,
      but they're all straightforward like this |#))

No really, that's it! (Minus the readers/writers and also the initforms and initargs, but it turns out that a lot of these are presumably read-only.) If you really want, a troll that appears in the story can simply subclass the class above with no further modification:

(defclass homestuck:troll (homestuck:fantroll) ())

Of course it helps that I used CLOS classes here, so I do not need to specify the class methods at all, but no matter what those methods are, they can only use the direct slots of the class to get that authentic "fantroll" (as opposed to the generic character) experience.

Utauloids are similar, except in this case the class structure is already written for me (albeit in another language, and of course it's much more complicated). After all, with Utauloids, the voicebank is just the computer representation of how a program works. And even the mascot that corresponds to the voicebank isn't all that complicated: in a sense, they're little more than a name, a mass (optionally a secret), a length and an image. They don't even have a dimension of time to them. Talk about simplicity!

Here's the thing that shines however: at no point do I actually have to model a full character. Rather than dealing with a mess of individuals and relationships (and goodness gracious I have no idea how to model that method), these two types of characters are simple to reconstruct. Indeed it's clear that they have no other internal structure; aside from some trivial time-dependent component, it is fairly easy to just set two of these things together and predict how they would react. In some situations, the interaction is even formalised – in combat for instance, there is an omitted slot above called a "strife specibus", which is just a silly way for weapons to be held and accessed.

The other important thing to note is that they have a very easy identity component. A troll is uniquely identified by name, of course, but that name is also redundant to the symbol and the blood colour. In a sense, a troll is nothing but its name (which can be written in a number of ways) and its history. It's very much like a star, which is basically uniquely determined by age and mass, and that kind of simplicity makes it very appealing. Specifically, /they all look the same, aside from these distinguishing marks/ – unlike generic characters, there is no "hidden identity component" that you just have to know to "know" the characters correctly. It's almost like we're dealing with a paper cutout rather than a "character" per se, and that's fairly appealing to me.

So what does that mean?

Ultimately, the important part is that although I have trouble with characters as a general concept, I think that it's a lot easier once they are simplified enough to fit comfortably in analysis. Once I can easily get my head around them, I can proceed to something more interesting.

It helps that these two types of characters are surprisingly easy to grasp, because they have a greatly transparent structure: their identity is easily tied to various visual aspects which allows for simple reconstruction and notation. That means less of a need to work about how these hidden variables work and more computing power available to interact with other parts of a story, including mechanics and plot elements.

🗼 gemini://