In January of this year, I started speaking to Mat Dryhurst about artificial intelligence and machine learning. I knew very little about it, save for the works of David Behrman, George Lewis, and the like, early pioneers into the interface between man and computer. Mat was open, enthusiastic, and suggested many fine articles. Along with partner Holly Herndon, the two created Spawn, an artificial intelligence that was also a member of their vocal ensemble.
In April, I chatted at length with Dr. Holly Herndon about her upcoming album, PROTO, which was the first album to feature AI. It certainly won’t be the last.
And in November, an excerpt from this chat was finally published as part of New York Magazine‘s Future Issue. You can read that here.
Below is a deeper dive into the future of music and machine learning.
What’s the starting point for Spawn?
HH: It came out of touring for two years and asking questions after that. how do we want to make music going forward? We wanted to work with people, with players. I get lonely and depressed when I’m in the studio along. Even with an ensemble, I spend even more hours alone doing editing. It was a need for human contact. From having toured so long in electronic music circuit, I noticed that things were becoming so automated, the perfect AV show. It almost doesn’t matter that there’s someone onstage. Even with Platform, we tried to present the imperfection of it as well, showing the desktop and things like that, to do Wizard of Oz behind the curtain, show the human aspect of it. There’s always a human component. We were working on AI in parallel.
Your daughter is way smarter than Spawn. Probably way cuter. It was more of a research thing. We didn’t even know if it would be usable or interesting. The first six months were pretty uninteresting. After six months, we got more interesting results. We started training with my voice and Mat’s voice. And then we opened it up to the ensemble. My approach to technology is how can technology make us more human. People misunderstand the Cyborg Manifesto or what Donna Haraway is saying. The cyborg is a metaphor, how women can be liberated from a past gender role. It’s not about trying to erase the human body or dehumanize ourselves at all.
To the contrary, it’s allowing a flexible definition of humanity so that we can allow ourselves to evolve. If a computer is doing all this heavy lifting and repetitive legwork, if the computer is doing all this crazy automated stuff, what’s left for us to do onstage? You could look at it and just say, “I’ll just leave the stage.” I see a lot of electronic music going that way. The computer is freeing us to be more human and deal with other performative parameters and to celebrate each other on stage even more. It’s two visions.
What does it mean when Spawn is trained on Jlin?
HH: Jlin has been involved very early on. She declared herself the godmother really early on. We tried her music on all different forms. The voice model was most interesting. When we put her music through Sample RNN, the outcome wasn’t very interesting. Her music is repetitive, but with nuance and inflections, that’s what makes it interesting. It’s the groove and small inflections that make it interesting. The neural network didn’t understand that. It was just trying to guess the next sample. It’s both super-impressive and like…terrible. It’s like ‘God, you’re so dumb!’ With the voice model, it made way more sense.
Spawn has such a limited perspective. People talk about AI vs. AGI (artificial intelligence versus artificial general intelligence). AGI is what we are, we’re not artificial, but we have a general intelligence. I can apply what I learned from this coffee cup to the next thing. Spawn will just be trained on cup and just see cup and doesn’t understand where cup lives in the world.
Human bodies are such sophisticated sensorial machines. With AI, that’s going to really be hard to replicate. Do we even want to replicate us? And why does it have to always be tied back to us? Even though we refer to Spawn as a child, we don’t see her as a human child. She’s an inhuman child, it’s a different kind of intelligence.
How does Spawn hear the music?
She gobbles up WAV or AIFF files, you can feed her an iPod. Right before we left, we fed her an insane stream of music and every once in awhile you’d hear her gargle something out. She was chugging music (laughs). We’re now trying to figure out this real time system.
The people who trained the AI, we wanted it to be audible. We wanted to use peoples’ voices, so that then you can hear them going into it. Instead of a huge data pool, we limited it to a specific community. That’s why we used sound as material rather than MIDI and statistical analysis. There’s an oral history there, a storytelling.
We were deliberate about not using material that we didn’t have permission to use. We wanted it to be our community. That’s going to be one of the big problems coming up with AI. If you look at how sampling has been treated over the last 40 years, and all of the ethical issues around that and we haven’t dealt with that well. There are people who still aren’t compensated. It’s a total quagmire. How much did Moby pay the Alan Lomax Archives for the voices on Play? Did the vocalists’ families get paid? We don’t know how that was dealt with.
There has been a historical entitlement towards sound material as just material. (we sing Enigma to each other) Those are Taiwanese farmers and Enigma just sampled them with no permission. People just like taking in a way without context or attribution or naming someone. People are so entitled when it comes to other peoples’ voices. On one hand, there’s something beautiful about remix culture, but there’s also something fucked up about that sense of entitlement and not understanding it. Even with a legal framework, we are still fucking up all the time with it.
AI has no legal framework. If we can’t figure out sampling, we’re in trouble with attribution and appropriation with AI. It’s going to open up a minefield for people.
What does Spawn look like?
HH: It’s a souped-up gaming PC, a tower. A laptop wouldn’t be strong enough. We have a GPU unit, we could have a stack of them. Mat wants to get some souped-up casing for it. You just need power, sound card, and a cooling system. She’s an old-school tower.
How far is AI from being integrated into popular music-making?
Not far at all. I think it’s just around the corner. It just depends on how that’s going to look. The sound as material approach that we took is still pretty rough. We compared early 1900s phonograph recordings to digital recordings today and it’s insane how that’s developed over a hundred years. We’ll have very accurate voice models of past vocalists pretty soon and that’s going to open up a new quagmire of questions about what we do with our forefathers’ and foremothers’ voice models. Who has the right to do whatever they want with them? The voice model will come a long a way. I used to say we’ll have infinite Michael Jackson records, but that probably won’t happen anymore (laughs). There will be infinite Aretha Franklin records is maybe the better example. But there’s no opt-in or opt-out for that. She wasn’t able to say you can’t make a model of my voice. That’s coming.
Automated composing is already here. It’s wallpaper, it’s mood music, it’s making opaque past human labor, taking that as a given canon and then creating something off of the back of that without acknowledging that that every happened. Ultimately that creates a recursive feedback loop of aesthetics that I’m not interested in. It gets us into an artistic cul-de-sac, but it’s cheap and functional so we’ll see that. Anything idiomatic, like Hans Zimmer, where you know what’s going to happen, that’s what AI will really thrive in.
“What if I just want my ambient music perfectly attuned to my mood?” people ask. One problem with that is that somebody at some point had to create ambient music. It now seems this chill relaxed thing. At one point, that wasn’t the norm, somebody had to take that chance and create that new thing. We need the next thing to be developed in order for us to develop as a culture and as a society. Music needs to reflect our current times. We can’t constantly be just regurgitating the past.
We are constantly shifting this interest from the composer to the consumer, which is what Spotify does, it’s about pleasing the consumer. It doesn’t give a shit about the composer’s ideas. Even if payment wasn’t an issue, just even representing the music and allowing an album to be an album. Payment aside, how it deals with music as a material, it’s so consumer-focused. There are so many start-ups now that offer to change the music to match your heart rate. If you start jogging, make the music match that. The composer didn’t intend that.
This shift from the composer to the consumer –while that is very lucrative– we lose something in this shift. If I think about my 16 year-old self and if music was trying to please me as a teenager forever, how am I ever going to grow out of these aesthetic immaturities? You have to listen to things you don’t like, things that challenge you, things you don’t understand, things that are different. I’ve had so many conversations with programmers about this. Many programmers don’t see culture in this way, that’s not the problem that they’re trying to solve. We don’t view cultural creators as experts. Musicians have spent years and years and years on how to communicate an idea through sound, let’s respect that as an art form.
Difficult things, emotions that feel bad or different or unacceptable, that’s what music has always been about for me.
HH: And new emotions that we don’t have a Spotify category for! Not everything is so fucking quantifiable in this way. It’s a really impoverished way of looking at culture.
I’m curious how George Lewis and his writings about Voyager informed your work with Spawn.
We took so much inspiration from Lewis because he’s a badass. He has a very different approach to technology than other academics. He approaches AI, the inhuman Voyager from the view of how African-Americans were also treated as inhumans during slavery. He gives this example of Blind Tom Wiggins, this savant pianist, a Mozart-like improviser and composer. He was treated as an automaton. They couldn’t make sense of a black man having this ability. Instead, they would dehumanize him and he would perform in white parlors and the language around him would be “Come see this automaton!” Lewis is careful to not immediately discount an inhuman intelligence because of that history. When he works with Voyager, he sees Voyager as a collaborator, he’s not afraid to dethrone himself as the artist. He’s not afraid of this idea of the human input being the end all be all, there’s something beautiful about that humility. He’s also a rigorous intellectual.
There’s a lot of doomsday tied into AI and worst case scenario aspects of it. How do you see with Spawn and the way forward, what’s going to remain human and what will Spawn replace?
HH: AI has the ability to further entrench already existing inequalities in society. We have to fight against that tooth and nail. What you need to make powerful AI is oceans of data and you need processing power. Who has access to oceans of data? The platform capitalists that we’ve been freely feeding our digital selves to the last ten years without questioning it. Totally not valuing our digital selves at all. No government oversight or regulation. Our digital data is going into these models for training things that we’ll never even experience or it won’t benefit our lives at all. It’s insane. It’s a natural resource that we don’t view as such. Some people compare data to oil. Right now you can’t replace the human performer. Our physical bodies are amazing, how we respond.
Do you better appreciate your physical body?
HH: I love my physical body, but I understand that it has its limitations. My best gift is my mediocre voice because it forced me to develop these digital appendages. If I had been born with a natural Adele voice I probably wouldn’t have done that. I have these cyborgian things to make my voice interesting, to try and transcend that. I really appreciate vocalist’s ability to respond in the moment, with memory and historical training and being present. This human desire to connect through music is innate. The Notre Dame footage featured people together singing and it’s this innate thing to emote together. There’s a reason that there moments on this record that sound liturgical, wherein I found myself craving this public moment of emotion, sharing it together. That’s something I was craving. I don’t think we evolve past that. our technology can evolve, but we don’t need to get rid of what’s wonderful about being human. People who make idiosyncratic music are in less danger of being replaced.
If you think about how pop music is written these days, it’s not artificial intelligence, but you have upwards of 20 writers and they’re basically scanning players and the communities they are involved with, it’s functioning in a similar way.
The first wave is just background music, there’s active listening and there’s passive listening. Passive listening generates money in a streaming society, which is also arrrrgh. Valuing a work of art by how many times it’s played is such a fucked up way of valuing something. The most important music to me and my brain development have not been things I played on repeat at a dinner party. Things I only listened to once are just as valuable.
Writing about David Behrman’s early work, Paul DeMarinis once said: “Electronic instruments are in theory freed from such compromises; they permit as pure a harmony as the human mind can imagine…[Behrman] has been able “to use really rich harmonic material without having to deal with all the weight and forward direction usually associated with harmony. Without gravity.”
HH: What you’re getting at is one of my godfathers of electronic music, John Chowning. You get these timbres that you don’t find in nature. Often technologists are trying to just emulate a piano instead of developing a new sound world. That sound world was entirely new because he came up with it and now you hear it in all these movies. I love this idea of building on the past and progressing things and having things respond to how we’ve developed to today. I love trying to figured out what Spawn can do that we can’t do.