What We Wish We’d Known About Making Audio Games (Excerpt I)
Joseph Jaquinta, CTO, TsaTsaTzu
(A lightly edited excerpt of the talk of the same name for the Berlin Voice Conference 2018.)
Ladies and gentlemen, welcome, and thank you for attending our talk. You can tell from our accents that we are Americans. Our country is currently suffering an unfortunate identity crisis. As a result, we usually over-compensate by engaging in a shameless minute of self-promotion. I will be reading the following statement as quickly as possible:
TsaTsaTzu is one of the oldest and most expansive game developers on the Alexa and Google Assistant platforms. We’ve had the first Massively Multiplayer game, the first Real Time Shooter, and were the first to take monetization seriously long before official APIs went live. Our experiences have been broad and deep, but not always successful. There are some things that work well in voice, some things that don’t, and it’s not always clear and straightforward ahead of time which your idea will be. But here we share all the joys and sorrows of the development path with advice on how to charter a clear course. Ask questions, get honest answers.
End shameless self-promotion.
Here’s the nitty-gritty of what we do at TsaTsaTzu. We develop custom voice applications for companies all over the world. We believe this platform – Voice – plays to our strengths which give us an advantage with unique development and UI design challenges. We develop games as a way of deploying (read: testing) our code before client rollout. We roll up parts of our code into games as a means of testing its performance and robustness. We deeply appreciate our players who pound on our code for free. We especially appreciate players who help us as we are trying to create compelling user experiences – or just working out what can and can not be done – very quickly with l(sadly) precious little platform support. We are writing (and in fact have written) the manuals on this challenging work.
Because everybody loves playing games, right?
Let’s get to our first example, text adventure games.
A Case Study of the Limited Linear Voice Interface: Text Adventure Games.
One of the entries I created for my very first Alexa hack-a-thon was a voice port of the classic RPG, Colossal Cave Adventure. This game has been around since the 1970s (which most people in this crowd will not remember), and it is one of the first things ported to new platforms. We thought this game – which uses cutting-edge technology of the 70s to deliver interactive RPG experiences – might gain some added life in a voice-only interface.
So, I trolled the internet, found a port of it in a language that I liked, and even went so far as to get permission of the person who wrote it to use it. (You know that intellectual property is important and all that., right?) Since it was an open input system, it took a little effort to work out all the possible correct (and incorrect) inputs and design an audio model for them and their combinations. But it was a nice challenge rather than being insanely difficult. In short order I had a demo running and, in emulation (which I had to write myself), was able to play the game from start to finish. There’s even a video of it up on YouTube somewhere.
I went into the hack-a-thon with visions of doors that could open. Other text adventure games waiting to be ported to this new platform. A wealth of opportunity for easy pickings. I had visions of new revenue sources dancing in my head.
At the hack-a-thon, I got to try my prototype out on an actual device. And there I learned it was terrible.
The game was way too verbose. On the screen there are lovely passages of detailed descriptions of the subterranean realm you are exploring. Some are just descriptive, others have clues to playing the game. But they were all much longer than was comfortable to hear in a computer generated voice. The game itself recognized this and would abbreviate the description after you had visited a place three times. But that still left an interminable amount of verbiage.
The take home was that what works well on a screen does not necessarily work well in voice. On a screen you can skip ahead, re-read, or just look for keywords. With voice you have to listen, from start to finish, at the pace the person is reading. The lesson learned was that every word must be relevant in voice. A three sentence response is pushing it. You as a developer/designer need to focus on using the minimum amount of verbiage to convey your message.
The gameplay just wasn’t interesting in voice. At the dawn of natural language parsing, much of the endearing fun of the game was finding out the exact language you need to use to trigger a certain action in the game. With a .001 Mhz processor and just over a megabyte of memory these machines don’t have a lot of space for aliases or sophisticated analysis. What was once an endearing process of learning how to communicate with a machine back in the day ends up being a U/X nightmare to a modern audience.
The lesson learned there is that voice is sufficiently different that just porting code will very seldom work. You need to redesign from the ground up. Choose-your-own-adventure type games have had some moderate popularity on the system. But they are generally composed of content custom to voice rather than ports of successes from other platforms.