Holly Herndon: AI Music Production & Vocal Processing

In March of 2018 I met Holly Herndon at a hotel on the outskirts of Berlin, just two weeks before she was set to defend her Doctor of Musical Arts Music Composition dissertation from Stanford University, and one month before the release of her third album, PROTO, on 4AD Records. We had a fantastic conversation about the making of PROTO, her creative process, as well as what it's like to collaborate with artificial intelligence and a choral ensemble.

I'm always trying to develop my own palette. Each record has its own tool kit of whatever I've developed for that record.

How you would describe what you do?

I'm a computer musician.

What does that mean?

Well, that's always what comes next. I am a composer, and I'm a performer. My primary instrument is the computer.

What were the formative experiences in becoming the musician you are today?

Well, my introduction to musical performance was through choirs, often in a liturgical setting – in church, in this emotional ecstasy that one has in the religious singing experience. Another introduction would be coming to Berlin as a teenager and hearing Eurodance in the supermarket; also the crazy, synthetic pop music that was popular here. It's like a geographical point. Another one would be moving to Oakland, going to Mills [College], and starting to use a computer in a way that I had more control over. That's the trajectory.

What led you to Mills?

I was already writing music. I went to some master classes here [in Berlin]. I took some free improvisation with [vocalist] Lauren Newton. I was trying to teach my way through some things. I had downloaded SuperCollider [a real-time audio synthesis programming language] and was trying to figure it out without any community. That is hard to do if you are coming at it blind, without any reference point. There was already a deep interest, and I was trying to figure things out myself. I figured that if I wanted to take things to the next level technically, I needed to retool. So, I decided to go to Mills.

SuperCollider was the first music-making software you interfaced with?

It was, and it wasn't what I ended up using. I ended up using Max [Max/MSP/Jitter; a visual programming language for music]. It's random. At Mills, one semester they would teach Max, and one semester they would teach SuperCollider. I happened to start on the semester they were teaching Max. That's the only reason. It's actually probably better that way, because it's visual. Actually, I was in the beta testing of Max for [Ableton] Live at the time, when I was first getting started.

Around what years?

I was at Mills from 2008 to 2010. This always happens to people when they first start to use Max: I was building this complicated, insane, disgusting patch that I'd never use again. But that's part of doing it; to learn how it works. I was building this stupid performance system that was super complicated. I spent forever on it, and then the Max for [Ableton] Live beta came out and answered everything. All of a sudden it was so much easier to do all the things I wanted to do, because I could just put individual Max patches on individual audio tracks - things that were difficult to code if you were just starting with an empty Max patch. That was a massive self-own; but it was also good, because I had to learn how to do all that. Now I have the stability of this DAW [Ableton Live] but with the flexibility of all this weirdness I want to do [Max For Live]. That then became a powerful performance tool.

I know those messy Max patches. When I was in school, I was doing the same thing. It was like, "If only there were something else!"

Then that something else came. I think a lot of people had that moment. A lot of people who didn't know what Max was bought Ableton Live, and they found out about [Max] through [Live]. Then they got into programming. That's cool that it introduced people on that level. So, then I started writing Movement [her first album] towards the end of Mills, and after Mills. Movement came out right when I started at Stanford [University]. I had no expectation that anyone would care about it at all. It was just this weird thing I made and thought was cool; then somebody wanted to put out. Then, "Oh, you're invited to play this festival." That was exciting, but also weird timing, because I'd just started at Stanford, so I was taking classes and trying to tour the album at the same time. That was gnarly.

Can you talk a little bit about your work at Stanford's CCRMA [Center for Computer Research in Music and Acoustics, (pronounced "karma")]?

How would I describe Stanford? I was drawn to the history of it; John Chowning, and the whole legacy there. Also, there's just an attitude. Chris Chafe is the director there, and he's my advisor. He has this beautiful, super open-minded approach to experimentation; wherever that might take you. There's no real expectation of what will come out of something. A lot of that comes from Chowning. You have this example of someone who's just messing around in the lab, and the next thing you know it's one of the most valuable patents [FM synthesis -ed.] that the university has ever seen. That will embolden a community to be like, "My experimentation is valuable!" Experimenting without necessarily a goal in mind is something that's valued there, and I think that's a beautiful thing.

John Chowning and CCRMA

The Center for Computer Research in Music and Acoustics, or CCRMA, was founded in 1975 at Stanford University by John Chowning and Leland Smith, along with research associates John Grey, James A. Moorer, and Loren Rush. This was the culmination of over a decade of work by Chowning and Smith that began in 1964 with help from Max Mathews of Bell Telephone Laboratories. During this initial decade, they pioneered early experiments with music, acoustics, and early mainframe computer systems. The first computer music course was offered in 1966, and in 1969 they established a summer workshop in computer generated music. In 1967 Dr. Chowning began his work on frequency modulation (FM) synthesis, which resulted in an AES paper on the technique in 1973 and a patent in 1977. FM synthesis was licensed to Yamaha in 1973, and they eventually introduced the FM-based DX7 synthesizer in 1983 – one of the best-selling synthesizers in history. The patent for FM synthesis was one of the most profitable patents held by Stanford, earning over $23 million before it expired in 1994. When CCRMA was first founded it was part of the Stanford Artificial Intelligence Laboratory (SAIL) in an off campus building in the hills of Palo Alto, CA. In 1979 SAIL moved to the main campus and in 1986, CCRMA followed suit and moved into their current location at The Knoll Building on the main Stanford Campus. Over the decades, CCRMA has had a huge impact on how music is made today, and many people in the industry – such as Bill Putnam Jr. of Universal Audio [Tape Op #24] – have completed studies there. In many ways, Holly Herndon's work with Spawn on PROTO is an almost full circle, back to the early promise of CCRMA and SAIL. -JB ccrma.stanford.edu

So, when you got there, did you know the direction you were going to go?

I didn't know. I was overwhelmed, in a sense. It's a very weird program. I'm in the composition department. The music department is one big thing, and then there's a composition track, a musicology track, and an engineering track. CCRMA is mostly the engineering track, but there are also composers there. A lot of the composition history at Stanford is a more traditional approach; very complex and score-oriented. One of the composers there is Brian Ferneyhough, who's famous for his contributions to the "new complexity" movement, which is all about the most ridiculously complex scores you could imagine; to challenge players that are extremely skilled and to push them to a next level of performance. It's a very different tradition or interest than I'm coming from. Maybe at times it was an awkward fit, but I always felt at home at CCRMA. Most of the people there are doing physical modeling, trying to improve a compression algorithm, or something like that. There's an openness to the community there. I like being around engineers and finding ways to integrate that work into my creative practice, even though that's not my focus. Some of that is just so insane. Like Jules [LaPlace]'s classes – some of the physical modeling – is super advanced math. It's a nice community.

Are you still working on your PhD?

I'm defending [it] in two weeks.

What's your dissertation about?

Because it's a Composition DMA [Doctor of Musical Arts], it's a body of work which is PROTO, and [the performance pieces] "Deep Belief" and "Chain Opera," and then an analysis of that. It's mostly writing about AI [artificial intelligence], as well as the aesthetic implications and subjects like that.

Do you have separate phases for writing and composing, and then taking those ideas in to record, or is the process fluid?

That would be the smart way to do it. My methodology is not always that perfect. I would write something, like a simple score, and then we would have regular rehearsals once a week, or every other week. The members would perform them, and I would record that. I'd then go back into the studio and work with it. Then I'd iterate on that and change the score; or I'd have them emulate a process that I applied to the score. Once it was at a certain point, I would go into the studio to record them, but I would still end up changing it and remixing it into its final iteration.

For the score, do you use traditional notation for the ensemble to read?

Yeah. Or sometimes I record a process and have them emulate a digital process. Then it, of course, becomes something entirely new when they're interpreting it.

What is the environment or studio that you work in like?

We had a very unusual setup in our old place. We just moved a couple months ago. We used to live in Kreuzberg [in Berlin], and we had a more industrial loft space with a large, open room where we could have rehearsals. I had a little studio room, where I could shut the door that was sound-treated and had a nice speaker setup. I could do single recordings in there. We would also rent a recording studio to do a proper final run of recordings. I would record our rehearsals in the main room and then iterate on them in my studio. Once we got it to a point where we were rehearsed, we would take that to a recording studio. We'd further rehearse there and get real-time feedback and try it with different approaches. "Okay, let's try it this time staccato" or, "legato" or, "add a glissando to these parts." Just workshopping.

When you're recording the vocal ensemble at your house, do you have a particular way that you like to capture that audio? Any certain mic'ing techniques or mics that you like to use?

It depends on what we're recording. Usually I was just recording as a reference, but we ended up using some of that. I probably should have paid a bit more attention. I find the Sony PCM [handheld recorder] to be quite good at recording. The fidelity is good enough. I did a piece called "Body Sound" years ago with a dancer, and that was all recorded with the PCM. I was just holding it to his feet as he was dancing and recording his foot sounds. It's actually really crisp and clear. But there were a couple times that I rented microphones and set them up in our space for a couple of sessions when I knew that I didn't need to rent the studio. It's expensive to do that. There were a couple of background parts I wanted to record where I would set up some microphones. I can't remember the microphones that I ended up using. There's this place here called Echoschall. I would rent a nicer microphone when I would record soloists in my studio.

What are you recording with in your studio? What DAW do you use?

Straight into Ableton Live, and I just record with whatever preamp I've rented from Carsten [Lohmann of Echoschall], and then I have a Focusrite Clarett series sound card.

What about any go-to studios here that you record at?

There are a couple that I've used. One is called LowSwing Tonstudio, in Prenzlauer Berg, and then there's one called Blackbird Music Studio in Kreuzberg that's also pretty cool. Then I just discovered a new one in Mitte that I can't remember the name of. It's more expensive, but it was beautiful. I want to be able to afford to record there!

What attracts you to certain studios?

That studio just had a good vibe. It had good lighting, and good acoustics within the live room. When I'm in the studio, it's mostly about recording the live room with the ensemble performing. I'm not really doing any production in those studios. The production all happens at home. I have nice ADAM speakers. Then I mixed in London with Marta Salogni. She's amazing. She just built her own studio in London. Before she was renting out a studio in the Mute [Records] office. She's a sick mixing engineer. She usually works with bigger pop artists, but she's friends with a friend of mine, Lafawndah, and they were working together on Lafawndah's album [Ancestor Boy]. She was playing me some of the tracks that she just released, and I was like, "Oh, this mix is amazing!" She said, "You have to meet my friend, Marta." Then we found a time to synchronize. She's so good. I love working with mixing engineers, because it's like hearing your music in a new light. The process of writing clouds your judgment, at least my judgment's totally skewed by the end of it. [Marta] has this almost psych rock vibe to the way that she mixes digital instruments. She gives it this space and weight in a way that worked. All of the ensemble music was recorded in a space with people, and you feel that liveness and atmosphere. When you have digital instruments with that, it can often feel disconnected. How do you make all of these digital instruments work together? Then you have this artificial intelligence [component] that has its own aesthetic and lo-fi sound to it. How do you make those all sit in the same environment? Marta was huge in that. Her drum mixes are so epic.

I wanted to ask you about the drum sounds.

I feel like Marta made them sound so much better than they were! I feel like all of a sudden they were these Baywatch sounds; imagining these stadium sessions where the drummer has this huge kit all around them. She's a masterful drum mixing engineer.

Do you like being in the room during the mixing process? Were you sitting there together?

Yeah, for sure. Well this time, because she was in London, I sent her the material and she did a pre-mix. A lot of that takes time, and she can do it on her own schedule. Then I went to London, and we went through the album together and tweaked each one, song by song. Of course, Mat [Dryhurst] was part of that as well. We wrote and produced together. I don't want to write him out of this process. He's also sitting in the room. Mat has an almost savant ear for timing. Some people have perfect pitch. He has perfect timing, in a way that's almost annoying sometimes. Sometimes I will like a swing on something that he doesn't, because his timing is too sensitive. Sometimes we will argue over swing!

How does it work during production?

Sometimes we sit together towards the end of the process when we're tweaking things. The way the process usually works is that I'll start something and then we workshop it together. There's a lot of file sharing. We remix things together. This has been a long process to get to this point, because I used to be very sensitive about certain things being off-limits. Like, "Don't touch that. That's perfect!" That's not a great way to collaborate. I think the more comfortable I get with myself, the easier it is for me to let go of control. I've tried to be more like, "Nothing is holy." They can mess with anything. Everything is a "Save As" anyways. That's been a learning process for me. Then we just remix each other back and forth, and it creates this weird collage at the end.

That's awesome. So, it's not necessarily together?

We work better separately. We'll come together for a listening session and give each other feedback. But we don't jam together. We're not vibey like that. We're more neurotic and controlling of things. Whenever we try to sit together, we're fighting over the mouse.

I'm curious about how the different roles in the ensemble for PROTO works? What is the role of your AI bot, Spawn, and the role of your human ensemble?

It is a group. Almost more like a sports team, and different players are on the field at different times. They all have their own projects, and they're all traveling constantly, writing music, and playing in a million different things. It was a process of getting to know their voices and then trying to write things that bring out certain features of their voices. It's specific to them. On "Frontier," the wailing sound in the back is just so Stine [Janvin]. I actually wrote that line with a Granulator [sample-based granular synthesizer] and then asked Stine to sing that line. She emulates it and makes it come to life. Or like "Frontier," dealing with Sacred Harp music. One of the ensemble members, Evelyn [Saylor], is in a Sacred Harp ensemble. She taught everyone how to do that authentic delivery, and how to go for the chest voice. It's drawing on what different people brought to the table, as well as what they try to draw out of it. We were trying to see Spawn as an ensemble member. A lot of people doing machine learning music now are dealing with automated composition, like training on Bach and then writing new pieces in the style of Bach. We're not interested in that. I find that not at all interesting. What we wanted to do was use sound as material, kind of coming more from a musique concrète lineage. We wanted to see Spawn more as a performer.

L-R: Spawn (on top of the piano) , Roman Ole, Evelyn Saylor, Jules LaPlace, Holly Herndon, Josa Peit, Mathew Dryhurst, and Albertine Sarges

Can you break down who or what Spawn is? Does Spawn use a pronoun?

I use she. Maybe it's super narcissistic, but the first experiments I did with Spawn were modeled on my voice. The first things that Spawn did sounded like me, so it was a she!

Where did Spawn come from?

Mat and I have been wanting to do some work with neural networks for a while. We got a grant from the German government, actually in honor of Beethoven. A celebration of his life and work. They gave maybe ten different composers little grants over three years to work on different things. Something experimental that you wouldn't normally be able to fund from your current setup. We didn't have a way to dive into this otherwise, so we were able to buy the hardware, and we were able to pay Jules LaPlace for his time. He's working on all the software. We had this opportunity, so that's what we decided to do with it. It's a three-year grant, and we're in year two.

So, Jules is an integral part? Are you guys continuing to develop together?

He is, yeah. We call him an ensemble member. He is amazing; also as a composer. Having somebody who has a musical ear makes a big difference. Jules is a wonderful alien.

How did Spawn develop over the last two years?

Well, the first six months were just awful. It was like we didn't get anything out of it. We were working mostly with TensorFlow [a symbolic math library, also used for machine learning applications], which is usually used for style transfer of images, so we were applying that to spectrograms. It was even lower fidelity than what we have now. It was very scratchy, and it wasn't that interesting. A lot of the voice work we were doing sounded very computerized; kind of vocoded. It was a good six months of us training it on all different kinds of things, and we kept getting awful results. Six months is a long time to not have anything. We were like, "This sucks! Why are we doing this?" We switched it up and decided to use SampleRNN, which is a different approach; more of a microsound sample-by-sample approach. That's when things started to turn around. I think the first thing we got out of it that was the most interesting is what became "Birth," where she's kind of emulating my voice. Jules made this GUI [graphical user interface] where Spawn will be rendering, and then she'll post when something's new. She has a little Slack bot where she's like, "Here's a new track!" So, we listened to it, and we were like, "Hallelujah! We finally have something interesting!" Of course, it ignited our interest again, because we were starting to wane in on it. Then we wanted to put everything through SampleRNN and see what else we could do. It was a long, laborious process, and that was a watershed moment. When we started working on the voice model with the WORLD vocoder, with Jlin's "Godmother" track, that was another moment.

What are the opportunities of using artificial intelligence in your music?

I was interested in it as a subject. Also, I was interested in it as a philosophical, human subject of what AI means for the history of humanity, the history of intelligence, and for our evolution as a species. A broader, philosophical question. That filters into the music that's not even necessarily dealing with Spawn. "Extreme Love" is a text written by Jenna Sutela, who also works with artificial intelligence. Spawn is not even on that track, but it's dealing with these kinds of questions around intelligence and evolution. It was a broader theme that I was interested in dealing with, but also working with it in this DIY capacity. I was interested in what sound world would come out of that. Whether or not I'm successful, I don't know. I'm always trying to develop my own palette. Each album has its own tool kit of whatever I've developed for that record. Hopefully it has its own sound world. That takes time to develop. It's not like I want to just plug into whatever I did on the last one and start jamming away. It's a holistic approach. Let's build the ship and sail the ship, but you have to build it first. Through building it, you're learning how to play it and what it can do. I think that's why it takes me so long to write, as well. I have a hard time doing something that I feel like sounds like something I've done before. I feel like I'm trying to push myself to progress. Of course, it's not always successful, but at least that's what the goal is when I'm starting.

What are the training sessions like that you were doing with Spawn? I happened to be at the performance at Martin-Gropius-Bau.

Because the rendering time is so long, it's difficult to incorporate her into a live performance. It's something we're struggling with right now. We want to tour with her, but we're trying to figure out this real-time system. At that time we were like, "What part of this process is performative?" It's the creation of the data sets themselves, because you have to perform them in order to record them. We wanted to involve the audience in that as a way to make the process less opaque, as well as to explain what a training even is. If you just hear somebody say, "Oh yeah, I'm training an AI," it sounds like a mysterious thing. It's not. I'm creating audio files that I'm then feeding to this neural network. By involving the audience, we were hoping to explain it in a way that was slightly didactic, but also immersive and interesting, and maybe beautiful at the same time.

It was certainly beautiful.

Thank you!

Now that PROTO is out, what is the future for you and Spawn?

We're working on real-time systems now. It's difficult, but that's the goal.

For her to be able to learn on the spot?

I'm not going to say she will never, but our current capacity is not such that she will be able to learn on the spot. But I think that she will be able to perform on the spot in a way that is interactive with human performers on the stage. That's what we're hoping for. It's not seamless yet, and we won't present it until we've figured it out. I don't want to do some sort of fake AI bullshit on stage. We will start the tour without her. Whenever she is ready, she will join us.

What are your thoughts on the role that artificial intelligence can have in the musical sphere?

I think there are many things that will happen. I think we'll see a lot of automated writing. That's inevitable. It's a much cheaper way to generate genre-specific music, but I don't think that's interesting.

What's most interesting to you?

What's most interesting is seeing it as a different kind of intelligence, as well as trying to figure out what we can learn from that intelligence. Allowing it to surprise us with different options. With [the track] "Godmother" for example, I didn't teach Spawn how to beatbox at all. That would be embarrassing if I ever tried to beatbox, but Spawn ended up beatboxing, which was hilarious, weird, and funny. Be open to what surprises you're going to get back when you do something. Those kinds of moments are exciting and fun. It's revealing something of my voice that I'm training her on that, when I didn't even have that in mind when I was training her. The fundamental difference (compositionally speaking, not performatively speaking) between algorithmic music and something using a neural network is that with algorithmic music, you set up the rules and you let it run, and that will surprise you by having some randomization in there. But with the neural network, the rule set can be derived from another piece. It's an alienated rule set, which is a fundamental difference. Usually the composer has to set that rule set, and now the composer can outsource the rule of writing itself to the neural network by being able to analyze an existing piece of music. I think that's fundamentally different, and it lends people to a lazy approach of just trying to copy things that came before. But maybe there's a different way of applying that, that can reveal something about how we compose that's interesting. Instead of just trying to copy, maybe we can use the way the AI sees that rule set. Maybe that can reveal something about our impulse as composers that's interesting. I haven't seen that yet, but that's something that could be interesting.

So, this is going to be an avenue you continue to go down with Spawn's development, as well as to see what she does next?

Yeah, I'm definitely going to continue with it. Also, this real-time thing is what gets me excited, especially live. That's been my whole thing; live-processing. I feel like I'm back at this early computer music thing, where it's punch cards and then I put it in the computer, wait a day, and get the sounds back. It gets exciting when you can improvise with the process and have immediate feedback.

Is the biggest thing that's holding you back the technology of the processors?

Yes. So, we'll see! I'm sure we'll be able to figure it out. We have some ideas!