Episode 36: Data Science and Machine Learning at Wayfair with Dan Wulin
Host Rob May interviewed Dan Wulin, Head of Data Science and Machine Learning at Wayfair, and got his intel different techniques they use and what sets them apart, where Data Science sits within the organization and how they work with Engineering, what he has learned about communicating with non-technical executives around machine learning and data science, demystifying Data Science, and much more.
Rob May, CEO and Co-Founder,
Dan Wulin, Head of Data Science and Machine Learning, Wayfair
Rob May: Hi everyone and welcome to the latest episode of AI At Work. I'm Rob May, the CoFounder and CEO of Talla. I'm your host today. My guest is Dan Wulin, the Head of Data Science and Machine Learning at Wayfair. Welcome to the show and tell us a little bit about your background and what you did before Wayfair.
Dan Wulin: First, thanks for having me, Rob. Prior to Wayfair, and this is going back almost seven years ago now, studied physics, got my PhD there, decided that I wanted to get more into the business world. So I ended up joining the Boston Consulting Group. Did that for about a year and a half.
I was in Chicago at the time. Wanted to get back to the East Coast where I was originally from and then ended up joining Wayfair. Because very much in my mind I wanted to go to a place that was still in the business world, but had a ton of data to work with. I really fell in love with the culture. You know, had great interactions with the folks that I met during the interview process. I ended up at Wayfair, I think it was the tail end of 2013. I've been there ever since.
I started on the marketing side in a very business facing role. Owning a lot of the marketing budget focused on Google, and product ads, and that type of thing, and then over time gravitated more and more to data science and machine learning.
RM: Wayfair is one of those companies that a lot of people wonder how you compete with a place like Amazon or Walmart.com. It's interesting because I've actually heard the founders talk about that and how some of the things that you do, some of the products that you sell, are actually not good with the traditional keyword search and visuals.
You guys do a lot of stuff that's like lamps and things that are more attuned to visual search and all that? How do you think about some of the differences between Wayfair and maybe at Amazon or Walmart or other e-commerce player, and then how does that translate, are there different machine learning techniques that drives as a result?
DW: I could start with the business perspective of that, and then narrow that down to what sets Wayfair apart from the data science side. From a business perspective, and you know this is something the co-founders talk a lot about, or you'll see in the investor materials, is the nature of what we're selling is fundamentally different from what core products could look like for some of our competitors.
We're typically selling larger parcel type of things. Things that are very much in the long tail. An example that I hear a lot is a lot of people buy the same kinds of paper towels, very few people are going to have the same chandelier or dining table that you have in your home. So whether it's from a product selection standpoint, or how you ship the big thing and how you do that efficiently, quite a bit different.
From a data science perspective, absolutely visuals are a huge piece of what we're thinking about as a team. Even more broadly than that, a way that I like to think about it is, and this is true for me even, you know a lot of our customers would have a hard time articulating what exactly they're looking to buy when they get to the website. That can be because they're just not familiar with what furniture style or home decor style actually is, how to put that into words.
There's a lot of ambiguity when you think about, you know, I want something that's coastal. Or even if you say I want something that's blue, like what exactly do you mean. So a lot of what we're doing, at least from the storefront side, and partnering with them on the data science team, is figuring out how do we remove some of that friction. And that could be using product imagery, doing smart analytics and data science around product attributes, and that sort of thing.
The other thing that sets us apart, at least as a data science team, so we do a lot of work on the marketing side. We're super ROI focused, so that helps make sure that we're getting the biggest bang for the buck. We're working closely with the marketing team on that. I can go on, but there are a lot of other teams that we partner up with to try to set us apart.
RM: How is the data science or structure then inside Wayfair? Is it one giant group?
Do you guys specialize? When you say you work with marketing, is that like there's four people on your team that do that? Or you guys all switch out? Does marketing put in a request?
DW: We're very much a hybrid organization. So in aggregate, my team, we're just over 100 data scientists or so. Globally across Wayfair we have probably 150 folks that are filling a data scientist type of role.
Central reporting, so that gives us centralized recruiting, training and development, helps keep a consistent bar across the team. We're able to cross train people, give them other opportunities, and so on.
We're decentralized from a perspective that within that group of 110 some odd folks, there are specific groups that are embedded with different business functions. So they're co-located with them. They are, rather than being a pool of engineering-type resources where they're rolling on and off of projects on a week to week or month to month basis, they're dedicated at the quarterly or even yearly level on particular projects. Working really closely with business folks that have the subject matter expertise to make sure that the work is relevant and really getting in the needle.
RM: How do you work with engineering? Are you using tools like R, doing your stuff and then throwing it across the wall? Or, do you have a bridge organization that specializes in implementing these kinds of things in production level code, or are you expected to do that yourself?
DW: That's a really good question. And the truth is that there are different flavors of how we do that across the org. And the server guiding principle is that we want to do whatever we need to get the work out in a high quality state as quickly as we can.
In some cases that does mean that if there is a preexisting team in engineering that maybe there are folks that are thinking about product recommendations, we'll work closely with them to figure out how do we get our algorithms injected into the recommendations. Conversely for sure there also have been cases where we're doing work and there aren't clear teams within engineering to hand that off to. And that's something where we'll either ask people on the team to stretch in that direction, or we'll go and hire the appropriate profile.
From a technical level, people are using everything from R, to Python, to Spark, pretty much any common machine learning technology. There's at least a couple people on the team that are using it. But the core message is, we try to be flexible and there's a ton of variety in what we have to do to do that.
RM: Within the data science team, do you tend to specialize in certain types of techniques, or are most of the people there more generalists?
DW: Similar, there's a mix. If you were to have asked me that question three years ago, I would have said that we're hiring generalists. And I think the reason why that was true at the time was the team was a lot smaller, the problems that we were chasing after I would characterize them as lower hanging fruit. You're able to make a lot of progress by using pretty common off the shelf packages with some modification.
What's happened over time is, not that those things have gone away, but for sure in some areas to be able to get the ROI that we're looking for, we need a higher level of specialization. So examples of that could be people that know natural language processing, or our experts in computer vision, and that sort of thing.
RM: How do you deal with projects that sometimes might come along where you don't know from the beginning how they might perform? One of the big differences that we talked about, and I hear from data science and engineering is, if I say our website has 100,000 visitors a month and needs to scale to a million, you kindof know what you need to do.
Sometimes people come to you and they say, oh it would be great if we could see if we have the data to predict this thing, and you're like, huh. We don't know, we might, the prediction might not be good enough. We might not have the right data or enough data.
Do time box those things? Or, what's your approach? How long do you work on something before you decide I don't know that we can do that with what we have?
DW: There are probably two angles to it. One is what you're saying around time boxing. The second is, assuming that you do want to do it, do you think about characterizing the value that you're adding and what are the different flavors of it? So in terms of time boxing, absolutely.
This is true for me as it's true for a ton of data scientists. It's very easy to get sucked into the problem that you're working on. So you want to be very deliberate around keeping yourself honest, and making sure whether it's two weeks or a month into a project, that you're getting, seeing what you need to, to believe that you can be successful.
We try to front load projects with data collection, building proof of concept models, and doing as close to an end to end. And it's going to be something super hacky to be honest. But getting as close to that sort of end to end so we get some level of confidence that it's achievable.
For sure there are projects where you do that, and you realize that hey, we don't have the data, or the techniques aren't going to work, or we would have to hire x many more people to actually do this. What I found is, usually at that few month mark, you know enough to at least be able to pivot to something that will work with a lot more confidence.
Once you're reasonably confident that you're ready to invest in a project over multiple quarters or even longer, we aspire to get as close to revenue and customer level impact as we can to measure what we're doing. We can in a ton of cases, for sure there are going to be cases where we can. One example would be what if we create something that's, and we've done this, that's more a decision support for merchandising.
One thing that we've started doing is guessing what are the different attributes of products in our catalog. Imagine you get the chair, we're trying to guess what is this thing's style, what is its material? Then you can imagine, you get all this nice metadata.
It's really hard to quantify on a customer KPI level how much you're getting at. They'll look at things like basically operational efficiency and how much sharper we're getting there.
RM: Where do you think, when you look out seven to 10 years, what are some things that are coming that we can't do yet but that we're going to be able to do? And how is it going to change the experience for your average online shopper? Is it just going to be better recommendations, or are there other very qualitative differences in the online shopping experience that are coming from data science?
DW: Speaking about Wayfair I can at least give you a couple things that I'm personally really excited about. So one problem that we're working on is understanding style. So if you go to Wayfair today, you look at our search bar, we have a little camera, and basically we let you submit photos.
We're trying to find furniture in the catalog that looks similar to that. That kind of gets you, I don't know if it's halfway or a third of the way there.
The real problem that you want to solve, I think, is understanding complementarity. A customer comes in, in reality what they have is a room that they're probably trying to furnish and add more to, and they want stuff that goes well. Whether it's with the wallpaper, or the sofa that they already have, or whatever it is.
We're trying to understand style at that level. What that could mean from a user experience, and there's so many different ways to approach this from a UX standpoint, is we want to have some understanding of things that you've either already browsed or already own, and find other things that go well with them.
Where, if you had just gone to a regular or transactional search page, tried filtering down, putting in the keywords to find what you were looking for, you wouldn't have been able to find it anyway. I mean, ultimately it'll result in recommendations that all of a sudden seem a lot more relevant. They're going to be much more inspirational and surprising in some ways.
The neat part about it is that you can do that at a very frictionless way. So from the user experience you're not asking them to do much more, but they're getting a lot more out of it. You know, that's probably one thing.
Kind of generalizing away from that, so why are we able to do that? We're able to do that because of technology around GPUs, and deep learning, and different techniques that let us use things like imagery and text information. We're able to use that at scale, in real time as people are browsing, to create really differentiated models and experiences.
RM: What have you learned about interacting with non-technical executives around machine learning and data science? Because this is actually a super technical field, much more technical than some of the last ways of technology.
Cloud was easy right, it was like, oh it used to live here, now lives here, right. Are there any lessons learned or any recurring themes or problems that come up from trying to communicate some of this to non-technical people?
DW: Well the first thing I would say contextually is the data science team at Wayfair is really fortunate because we do have a very data-driven culture in general that permeates all the business functions. I've never had to fight hard to convince anyone that data science is the right thing to be doing for any given problem that we were interested in.
If I had to distill it down a few things, I would say the first thing that I always try to do is demystify what data science is. Because on the one hand, you can say, well it's this complicated math slash statistics and you need a certain kind of background to be able to do it, and you need to be able to code, and so on. When in reality, the truth of it is, what a data science algorithm is doing is not that different from what you would do in an Excel spreadsheet if you knew some statistics and had a huge amount of time.
It's doing it in an automated way at scale. I think taking away some of the perceived complexity of it so people feel more comfortable approaching it. You know, the other really important thing is working backwards from what the business is looking to achieve.
One approach that a data science team could take is start from a cool technology and then try to seek out use cases for it. At Wayfair, for a lot of reasons, we try to work backwards from what are the most critical challenges that the business is facing, or the most critical and innovative customer experiences that we want to create. Then we figure out how do we get data science to power that.
By starting from that, like the end result, people are going to understand that very naturally. It's going to help refine what we're doing from a data science side from the get go, and just get people a lot more comfortable. And probably the last bit is doing our best to couch everything in KPIs, or a mix of KPIs, as well as just qualitative output.
That as you're building this thing, whether it's something that's super simple from a data science perspective or really complex, that people are building an intuition around what it's actually doing and building confidence in it as well.
RM: You guys have been doing this for awhile and maybe ahead of the curve compared to a lot of companies. When you think about data science and machine learning strategically, do you think there are advantages to being early?
When you look out and you think, okay, everybody is going to start to see the benefits of this, every Ecom company is going to do it, do you think that over time other people will catch up to us, or do you believe that by virtue of building the organization this way, some of the tools you have, what you're learning, all your data processes, that you'll maintain a lead because of that?
DW: I'm a firm believer that data science done right is a very virtuous cycle. And this goes far outside anything that I or my team have done at Wayfair. If you go back probably 10 years ago, Wayfair was investing extremely heavily in business intelligence and just getting the data itself in a very clean and accessible spot.
Then, what that later enabled, whether it's analyst or data scientist, is people can do really high quality, accurate analysis without a huge amount of ramp up or cost. And then from the data science side, for sure at the start, things tend to start as the isolated or siloed problems that you're trying to solve.
What we're seeing over time is that there's a huge amount of interconnectivity between what we're doing. I mentioned product recommendations on the one hand. I had also mentioned briefly some of the work we're doing on merchandising where we're pulling in things like product attributes. Those two things feed off each other now.
We're able to characterize what are the products that you've looked at, what are their attributes, and then use that to give you better recommendations. And we're also using that information to make sure that the catalog itself is in a really clean, coherent spot for our suppliers.
That's just one example, but these sorts of connections between the different projects are growing over time. It's something that you can't go from zero to one overnight, you have to build up to it.
RM: Do you have an interest personally beyond any of the work that you do at Wayfair? Do you have a broader interest in AI, are there any new techniques or things that you follow just because you're interested in them? Are you worried about killer robots, ending of the world?
DW: To be honest, I think I take a very pragmatic perspective on AI. So for me, I view it as a-- it's not a strategy in and of itself. It's a real-- it can be a very powerful tool applied well.
I get particularly excited when I see AI that's getting folded into how business actually operate and it's able to influence the decisions they're making or the customer experience. That's exactly what we've been doing, just doing our best to do on the Wayfair data science team.
I mean, in general, to be honest, I probably learn as much as I do of what's going on in the industry just by chatting with folks on the team and externally to Wayfair, so I don't have any particularly favorite blogs or anything like that. And really, the truth is, I do gain the most out of those face to face chats.
I try to do that as much as possible. You know, when I find people that are into AI, machine learning, I just love hearing what they've been up to and bouncing ideas around.
RM: Do you have any core human in the loop workflows as part of the stuff that you do where you have ongoing labeling of data sets? And do you do those internally, or do you use a third party service?
DW: Number one, technically all the labeling we're doing is internal to the team. We do a mix of literally in-house like in Boston, as well as outsourcing, or using some offshore resources for it, you know, cases where we do it. One recent one is, style-based recommendations was talking about that, so it'll literally be that we'll take an image of a room, and then have multiple people who've been trained in what the different styles are label it. So we get a sense of, for a given image, what is its distribution over styles. Because, oftentimes people aren't going to agree on one particular style, and we're able to take that information, and then use that to get-- make the model better.
Very often we'll train up models and then send the output to teams that can then validate it and then pull in that information. For sure that's a core piece of the workflow. It tends to be on things that are more style-oriented or more qualitative.
RM: Awesome. Well Dan, thanks for being on today. For those of you listening, if you have guests you'd like to see, our questions you'd like us to ask, please send those to podcast at Talla dot com. And with that, we'll see you on the next episode.