Episode 10: Machine Learning at Spotify 

Rob May and guest co-host Byron Galbraith, Talla's Co-Founder and Chief Data Scientist, interview David Murgatroyd. Tune in for a more technical episode this week on how they are using machine learning over at Spotify. 

Subscribe to AI at Work on iTunes or Google Play

Rob Circle Headshot   

Rob May, CEO and Co-Founder, Talla


Byron Black and White Headshot  

Byron Galbraith, Chief Data Scientist and Co-Founder, Talla 


David Murgatroyd Black and White Headshot  

David Murgatroyd, Machine Learning Leader, Spotify




Episode Transcription 

Rob May: Welcome to the latest edition of the AI at Work podcast. I am Rob May, the co-founder and CEO of Talla. Instead of Brooke today, I have Byron Galbraith, my co-founder and chief data scientist who's here today.

Part of the reason for that is we have a more technical guest than we usually have on and so I thought he and Byron might want to talk about some things. David Murgatroyd, who is the machine learning lead at Spotify here in the Boston office. Welcome, David. Why don't you tell us a little bit about your background, what you did before Spotify and share what you're working on now.

David Murgatroyd: Good to be here with you guys. I've been doing machine learning for almost two decades now, and most of my background was doing natural language processing. I did that for a good while at a company called bases technology and I kind of got kicked farther and farther downstairs, maybe a little further from the code and more into figuring out how we could get everything done that we needed to.

I moved to Spotify a couple of years ago and have been excited to work on a vertical application that has consumers that are being impacted by it, and to  see the mission that's there and to be able to look at a broad variety of AI kind of applications. Looking at unstructured data, structured data, search, recommendations, voice, and seeing those things kind of pulled together into a single experience. My background is in software. I've been doing that forever and NML AI.

Byron Galbraith: Before we started, David and I were chatting about recommendation systems. That's a big part of what Spotify does, in addition to serving up all that music. What are some of the key challenges that you guys have with recommendation systems, and how do you capture and curate and just manage all that data?

DM: It's a big challenge and it's also really exciting because you can make an impact, like we were just talking here, having some users around the Talla office, and sort of what folks enjoy. There's two ends of the spectrum. One is, what does the domain? What is the content? What are the entities that you're trying to represent in that domain? Then the other end is what does a user? What is the product? What is the intent? You're trying to connect those things together.

One challenge we have is how do we build our systems in a way that we can both optimize for the particular feature, the particular user intent, the particular thing that's happening, while also having shared infrastructure, shared models. It's a balance between those. The purpose of having some amount of shared models is to avoid having to spend a ton on compute for every feature in the world and to have a consistent experience and to benefit from improvements of quality. You also want to be able to fine tune for exactly what I'm doing right here. So, that's one thing.

In terms of the data management, the infrastructure these days makes it relatively straightforward. It's really valuable when you folks who are doing machine learning data science who can also do what these days, I guess, is called data engineering, so building data pipelines than can look across a ton of data. That's a challenge.

Then also being increasingly real time, taking advantage of more context, making sure that they're fair in terms of any bias that might be in our recommendations. It's always there, having some level of explanation because you're not just providing some data, but you're really developing a relationship with your user. You want to have recommendations that are trustworthy and that get to delight and that really advance that purpose, rather than just scratch a temporary itch.

BG: Something that's sort of interesting about recommendation systems, especially when you have a broad set of information. You'll look at something like Amazon, where they say, “Hey. You just bought a wallet. Would you like to buy another wallet?” How do you get around the problem of saying, you just listened to Radiohead. You just listened to the Beatles. How about more Beatles? How about more Radiohead? Something about music sometimes is not just, “I want to hear more of what I want to hear.” It's, “I want to hear a song that's going to delight me, which might be something that I'm unfamiliar with.” So, how do you do that? How do you sort of say, “All right, we basically have this sort of sensor data set in the statistical sense in that all I've heard is the things you've selected.”

How do I find a new thing that I think you might like, given that it's unclear, necessarily, that the things you've already listened to are a great indicator of that?

DM: I think a lot of it. This is where AI and machine learning comes in in its ability to generalize. You look at what someone has interacted with and the way they've interacted with it, and look at the content itself, what the properties are, the context in which they did it. Then it's the job of our models to be able to create an ability to generalize and generalize at scale. In systems these days, you'll have representations for entities that are in some vector space or something, so that you can do geometric calculations, which are pretty straightforward to do, pretty fast to do, which is wonderful for a lot of things.

Then also it may be that, that particular space isn't optimized for the corner of your application that your user is in right now. We end up then optimizing on top of it. But it's a challenge. We call it the cold start problem in recommendations.

There is cold start for users, too, where a user is new to the platform and you don't know anything about them. Or cold start for content, where you've got a new artist who's just shown up-- and, ironically, those are the people that we want to do best job for. We want to do the best for everybody, but especially that person who is, like, in their garage. They've just got some indie label deal, so they're on Spotify. They're like, “Oh, now I have access to 170 million users.” They're the person that's hardest to figure out how to recommend. That's where we've really got to invest a lot.

RM: Byron's question sort of implied in some cases the Amazon algorithm's not good, right? I think there are people, maybe like Netflix users, who might say, “Oh, the algorithm's really good.” Probably, it's good for some people and not good for other people. As someone who's building recommendation engines and AI around it, do you think these systems are good? How much better do you think we can get?

DM: That's a good question. I mean, a lot of it is good for the purpose at hand. And a lot of this is where, in some ways, the rule of product comes in. So, a story from Spotify-- one of our flagship features is called ‘Discover Weekly’. Every Monday, we give you a list of 50 tracks that you haven't heard before, that we think you're going to like.

The ML engine that's the main basis of it-- and it's advanced some since-- had actually been around at Spotify a bit before Discover Weekly was there, just powering our Discover page. It took someone with product insight to be able to say, well, maybe this broader Discovery page that says, because you liked blah, you're going to like this, maybe that's not the right way. Maybe what we need is a playlist because people consume playlists.

I think when you're able to kind of pivot the product articulation to fit the capability of the ML, then there's still room to grow there. It's a way to kind of fit it. I think some of the frontiers in recommendation, when it comes to being increasingly contextual, so taking a lot of other features into account that aren't just the long-term identity of the user and the content, but also where is the user right now, what are they doing, what did they just listen to, what did they just do. Those are things that are being worked on.

I think another thing, something I love to talk about, is the rule of human expertise and machine coming together. That's one thing that we call “algotorial” , so where we have an algorithmic and an editorial role in producing some experience where the idea is there is-- the rule of our editors is to reflect culture and shape culture and to really understand where music is. So, they have a voice. Then the algorithm can say, well, of this space that they've identified as be really interesting, this corner of it is likely to be most interesting to Rob because I know that he likes this kind of stuff. I think they work all right.

RM: It's the Rolling Stones. For anybody listening who wants to know what I like and you want to send me a Amazon gift card, it's something for the Rolling Stones.

DM: Nice.

RM: My favorite band.

BG: Something that you brought up in some of the discussion and I think is sort of fascinating, and we had talked a little bit about this before as well, is a big thing about this is that you have to be able to sort of distill the entirety of a song, the experience of this song, the way someone might appreciate it, that for ML to work, you need to turn that into numbers. You need to turn it into a set of features.

I'm just curious-- we know when other systems have used things, like the music Genome Project tried to, like, “Okay, we're going to identify all these kinds of things that can make up a song and try to describe the beat progression or the chord progression or the artist or all that kind of stuff.” So I'm curious if you could talk a little bit about just, as much as you can, about some of the representation-- the features and the way you represent the sort of information and how that's used.

DM: I think there's a few different buckets that it's important to look at when you're thinking about recommendation. There's things, like you were saying, sort of intrinsic to the content. This is the stuff you really need in those cold start situations where all you have is the content. It's stuff like energy or tempo or those kinds of things that are really intrinsic to the content.

Then, there's what you might call the collaborative information, in the sense of collaborative filtering, even though that as an exact approach itself is maybe receiving a bit the notion of collaborative information, like what other users like you have listened to things like this. There's also kind of cultural knowledge, we call it. So the idea is-- and there's some blog posts on this from a few years ago-- the idea that in that culture at large, so let's say on some relatively not so well-known music blog, this content was described in this way and this other content was described in a similar way. So maybe those pieces of content have some similarities to each other. So there is that kind of information on the platform, outside the platform.

There are these different buckets. There's these different sources. One of the challenges or one of the exciting frontiers with something like deep learning is pushing the articulation of features to an increasingly lower level, or an increasingly kind of raw level, if you like, so that we can get representations that themselves are learned. There's a trade off there in terms of interpretability, in terms of amount of data that you need to be able to really have those hook in. Thankfully, we've got a decent amount of data, so that's not much of a problem. There's a lot of fun ways to think about it.

RM: Are you seeing anything, when you look at what's been happening with video and GANs in particular, they did that Obama speech saying things that he never said. Then they came out with the whole deep fakes thing, and you had this problem where you are they were imposing people's faces on pornographic videos and all this kind of stuff. Are you seeing anything like that? Are people building tools for similar audio things? It's like, “Hey, here's Willie Nelson doing the Run DMC version of whatever.” Are people doing that?

That's my first question. Then, is this going to turn someday into an arms race about just people creating music so fast and automatically with AI, and all kinds, and they're going to explode and dominate Spotify. Do you guys ever worry about that?

DM: A little bit. I think there's a gatekeeper in terms of getting things onto Spotify. It's not user-generated content. But the question is, what systems can we put at that gate so that we don't have to have tons of people or rely on people whose incentives may not be aligned with ours to be that entering place?

I think your point, which I generalize as any time there's enough money in something related to ML, you're going to have adversarial behavior. You're going to have people that are trying to game the system, trying to game the algorithm. That's something that you can see on any of these platforms from way back to keyword stuffing and websites for search engine optimization.

I think that's something that you want to watch out for and you need to sort of think about. What does robust machine learning look like when you have different or to be able to identify what kind of trust can you put in different content sources or the shape of content that might be more kind of adversarial or spammy or something?

RM: Are there rules right now around music and derivatives, machine learning works, where if I mix a whole bunch of things together and I create my own AI song out of it, are there legal issues around that right now? Do you know?

DM: I don't know for sure. I think there are. When it comes to sampling and other things, I presume there are. I mean, even-- what was it, like Vanilla Ice and Queen or something had way back when? There's "Under Pressure" and "Ice, Ice Baby." So I think when it comes to sampling and such, I would imagine that's the law. I'm not a lawyer, but it may be that there are sort of analogous constraints on machines that there would be on people. But that's a corner of it, thankfully, that I don't have to think about all that much.

BG: I'll go in a more positive direction of Rob's question about ML AI-generated content. There's also a lot of interest in a branch of the research community around AI for creativity, like creating, not doing it for, maybe, nefarious purposes. But, just what can we do? What can we enable artists to do? How can we use these things to make sort of cool, interesting experiences?

I know you've got a lot of really smart, talented researchers at Spotify. I know most people who work at Spotify tend to be really into music. I'm curious, is anyone at Spotify doing this internally, just like a like a hackathon or a fun project? Can you speak to any of that?

DM: There's actually a whole part of our research group. They came from Sony and they recently delivered an album. I believe it's called ‘Hello World’, which was, like, the first AI-assisted composed album ever. It's really interesting to get to talk to them and see how they support creators. I don't know the details of it, but Spotify's vision is to enable a million creators to make a living off their work by delighting a billion fans.

To be able to support creators in expressing themselves and trying to pull away some of the maybe mechanical or frustrating details that aren't really a matter of artistry but more sort of rote technique or something, I think that's exciting. It's kind of an analog to algotorial notion that I was talking about before, allowing humans to do what humans are good at, machines to do with machines are good at. And hopefully, the result is something that benefits even more people.

RM: Let's change gears a little bit and talk about product management and AI and machine learning. I think this is the super interesting topic to me, because as the CEO of a company, if you want to make changes, if you go to your engineering team and you say, we have 10,000 monthly users and we think we have to go to a million, they kind of have a rough idea of what they need to do to get there and how long it might take, and are mostly sort of correct. Engineering is very well understood for a lot of those things. And data science and machine learning often aren't.

Sometimes, if you go to your data science team you say, “Hey, we're at level x and I need to be at x plus 20% better”, the answer might be, “Great. How long do you want us to try this before we either give up or keep trying new things?” It's not as well understood of a discipline. Talk a little bit about that topic and how it maybe changes the way that you do product management for products that are heavily ML focused.

DM: For product management for ML-based products, I think a really important thing is that the goal is not for the product manager to understand how ML works, but more the shape of ML in products, and that there is a shift from product management through things like diagrams and mockups to things like data and metrics. And that comes from a product manager's use of data to be able to sort of almost specify what the desired behavior is or, in the case that you give, there's an existing product which is shown to be valuable and you're trying to move a particular metric on it.

I think it's interesting to think about sort of the product life cycle, where you start from a place where you're not sure whether something's going to be valuable. And it's nice if you can at least experiment on that using something where you're pretty confident you can at least solve the problem. You don't know if the problem is worth solving, but you can solve it. And then there's the sort of shift to, OK, we've shown that this problem is worth solving. But to really do a good job at it, we're going to have to go chase this extra 20%.

The way that I think about that-- and there's no silver bullet-- is to try and identify some common place of commitment that both folks on the business side and on the technical side can talk about, so something like if you think about even as simple as a sprint planning, let's say, if you're doing Agile of Scrum or something. Someone on the product side might say, exactly as you did, I want you to move this metric x percent. Someone on the technology side might say, we want to do time box, as you say, like, we want to do it so long.

I think a thing that's in the middle there is to commit to an experiment. The idea is the engineers, the data scientists, they could say, well, we know how to run these four experiments. The product person would say, well, either I'll get out of some of those that my metric will go up some amount or we'll get learning. As long as the experiments are well articulated, there's a hypothesis, it's based on such and such background data, intuition, whatever, and you have a means for doing enough error analysis that you can get learning out of it, I think there's a way to put that into a common language that people can feel satisfied that they're-- the commitments that are important to them, they're able to make and they're able to receive.

Then, during the broader product lifecycle, there is a shift in terms of the things that you tend to talk about. Maybe, very early in the product lifecycle, when you're just trying to say, is this problem valuable, you might say, can we measure success? Kind of like, even if we have the world's most brain dead implementation, can we measure it? So, measure before you model. Can we measure it?

Now, we can measure it. Can we run enough experiments and see that they're affecting the metric at all, maybe in some simple way? Then you move through to being able to run more and more experiments and having more and more of those experiments be able to move the metric that you care about.

It's tough. You have to have that trust as you do in any kind of relationship that people are doing things that are most interest of the team overall in the business, and that they have in a product context so that when they're in the middle of whatever that cycle is, they can figure out what way to go to really provide value there.

BG: Maybe to kind of jump on all of it further is that I think-- we sort of talked about, gave an example of-- we have a process that we want to improve using machine learning, that there is either an existing process and we think it can be better. But we have a clearly defined thing and we can say, OK, let's make this-- or that's part of it. What is the thing we want? Let's make it better.

What would your recommendation be for, say, a product manager or somebody who's trying to say, I think that-- where they don't even know to ask? So you could get in this like, hey, I'm going to go over to the data science team and be like, what can you guys do? And they're like, what do you want us to do? And, you kind of get in this cycle of, well, I don't know how to think about maybe what features I want to ask for because I don't know what you guys are capable of.

And then you might say, well, we can do lots of things. But without focus, we don't know where to spend our time. So how do you sort of balance that tension?

DM: I've been around that a number of times. There's no silver bullet. I think if you can make a conversation about data and about looking at data, and say, literally, if a product manager is specifying some improvement, I would love for them to type into a spreadsheet or into a file what you would want the output of this thing to look like.

They might have intuition based on domain expertise about features in the machine learning sense that they would like to see experiments around. But more often I'd like that domain expertise to be in the data scientists and the machine learning folks, and instead for the product manager to say, here's how I'd like the behavior of this product to change. And I want it, for these kinds of input, to have this different behavior. And here's a metric by which you could be able to quantify that.

It can be difficult because building metrics that are useful and or that are correlated with business value is very difficult, especially if you're in a online setting where often it takes an A/B test to really know for sure. And so there are all kinds of techniques to try and use historic A/B test results for current offline results. But that's like-- we could talk 10 podcasts about that stuff.

I think if you can make a conversation about how do you want to change this data and which way do you want the metric, then the data scientists can understand some of the motivation from a product perspective because they might say, OK, for these kinds of users of these kinds of customers or for this kind of need, I think this is how we want it to move. And then the data scientist can say, well, based on this example data, we could go get some data that's like it. We can do some analysis of that. We can point to hypotheses for experiments. We can sort of turn that over. But it's hard.

BG: And somewhat related to that-- and this is a conversation that I feel like we've actually had in the past. I'm curious to sort of see your thoughts, maybe now a year or two later, which is, where does the science fit in an organization?

DM: Ah. That's a great question. So maybe it would be useful for me to distinguish how I use data science versus machine learning. I think there's a “how to’s” Spotify. But it's not-- I don't mean to claim that this is a general thing. And it might be maybe even contrary to how it used in Talla. I don't know.

For us data science is enabling decisions by people, people like product managers. So it's often product insights. It's about models, analysis of how people make decisions. Machine learning is enabling decisions by products. So, those machinery engineers are building something that's going to live inside the product and make a decision that no human is going to intervene between before it hits a real live user.

I can answer the question for sort of each of those. Let's start with the machine learning engineers have defined it. So to me, my whole framework for thinking about all this stuff is, where are you in a product life cycle? And also, maybe even as an organization, where you in your maturity with respect to machine learning?

If you're very early, either in the product lifecycle or as an organization, I think you want to centralize those folks so that-- well, few reasons. One is so that they can collaborate with each other. Another is that your different product initiatives might not have a steady amount of work for those folks where there's really a need. And they may be still sort of exploring what product problem are we trying to solve.

Then later are you-- then sort of in the middle of the product life cycle, I think you want to have more of a integrated, embedded model, so a notion where on different product teams or squads you would have a machine learning engineer or a few who are sort of shoulder-to-shoulder with folks who are delivering their product, and they're really optimizing for understanding the product use case, understanding what the data is, the domain, that sort of thing. They've got that really close relationship with their product manager.

Then as you get farther along in the product lifecycle, you've realized that, hey, we've got something valuable here. Really chasing that additional, not just 20%, but 2% is very valuable. And so we need to go out there and pull in the latest research to squeeze those extra percentages out. We need-- the difference between something published in 2015 and something published in 2018 is really important to us. So we need to be chasing that.

In that case, I recommend subdividing teams, maybe into something like you might call a workstream, where you might have a single squad that owns a product or a feature, and there might be some group that's worrying about delivering the stuff that we need to deliver right now and some group that's worrying about developing the six months, three months down the line version of it.

Then finally, the latest part of the product lifecycle, if you've gotten to the point where you have to do something completely new because you've realized, hey, we've got something. We've got an approach that we've pushed pretty far, but there's this segment of our domain or our users or something that's just not being served. Or we've hit the limit and there really isn't any way to go there. You need a completely new thing. And that's almost where you, in some sense, go back to the first version where you can take your machine learning team and sort of put them off and say, you guys need to step change us on this.

BG: So it's tough. And I think, like I say, the big ones for me is where are you in the product lifecycle, where are you as an organization in terms of maturity.

Because I can see that if you-- you could clearly see maybe with the case where if you have a mismatch there, that that could be more disruptive than beneficial, that if I'm trying to solve core usability things and I've got someone-- people off doing foundational research, that may not be as beneficial as if I've got-- or conversely, I'm like, I've got a mature product. I'm trying to really do it. And I'm like, yeah, but we're not investing in the thing that's going to take a long amount of time and more thinking and energy, because we still have our process and you're over here, and you're doing that particular role.

So then you say, "Okay, we have this flexible dynamic." But how do you manage that?

DM: Yeah. It's really important for everyone on the team to trust everyone else on the team, that they're all valuable and that everybody is earning their keep. So, I think you've got to talk about it a lot. You have to encourage-- in cases, especially in the cases where there's either you're directly embedded, like we're all one team and the front-end person's sitting next to the machine learning engineer, sitting next to the back-end engineer or whatever. In those cases, there needs to be mutual respect and including a desire.

At Spotify and other places, we call it being T-shaped, where you have someone who maybe they do machine learning, but they're also willing to do back end or data or whatever. And not only are they willing, I often say, you want engineers who love their customers more than their code, and they're willing to do whatever they need to do for their customers. And so they're on this team and what this team needs to do is deliver this thing. And it doesn't have anything to do with machine learning. And so all right, well, I'm Mr. machine learning engineer. I'm going to jump in on that.

You have to realize when you ask someone to do that, it probably means that the person that was setting up the team didn't anticipate the needs because they thought there was going to be a steady stream of ML work and they weren't quite right about that. And so you have to sort of acknowledge that and say, the reason that we're all being asked to do this is because we didn't quite predict it. But that's-- you learn. That's OK. That's life in the real world.

So that trust, and a lot of it comes from the shared values, and that's really important. A lot of times, there can be a difference in perspective. Having the common goals and the common purpose can get through a lot of that.

RM: So we're going to have to wrap up. But one question we like to ask people on here, particularly if they've worked in the AI space is, where do you fall on the Musk versus Zuckerberg debate? AI's going to kill us all and we should be prepping for this or we don't have anything to worry about? What's your take?

DM: So I like-- I think it was Andrew Ing, who I like a lot, said something like, I worry about AI killing us all the way I worry about overpopulation on Mars, meaning it might happen someday, but it's so far off that we have more immediate problems to deal with. I think a much more practical one is with self-driving cars, what are the truck drivers of the world going to do in 10 years or something? A lot more about unemployment and the effects on the economy. I know people are worrying about that, too.

To me, I think it's almost always the case that there's a little bit of truth in each end of the spectrum. I think there are more pressing problems that AI is going to bring to us as a society we should be thinking about, including this bias issue that I was talking about a little bit ago.

RM: Cool. You've got a lot of presentations out, right, on SlideShare and YouTube? So if people want to Google ‘David Murgatroyd from Spotify’, you could find some really interesting presentations that David's given over the years. And we'll post some of those as well along with this episode. So David, thanks for coming on and we'll see everybody next week.

DM: Thanks a lot. Pleasure.

Subscribe to AI at Work on iTunes or Google Play and share with your network! If you have feedback or questions, we'd love to hear from you at podcast@talla.com or tweet at us @talllainc.

Show Notes:

David Murgatroyd, Spotify, How to Train Your Product Owner: https://www.youtube.com/watch?v=Efv8K8DYRio

David Murgatroyd, Spotify, SlideShare: https://www.slideshare.net/dmurga