Episode 21: Computer Vision with Matt Scott

In this episode of AI at Work, Matt Scott, Co-Founder and CTO of Malong Technologies calls in from China to talk with Rob May and guest co-host Byron Galbraith. Tune in learn all about Malong Technologies, a computer vision AI company founded in 2014 and hear Matt's advice to business leaders.

Subscribe to AI at Work on iTunes or Google Play

Rob Circle Headshot

Rob May, CEO and Co-Founder, Talla

matt scott b&w

Matt Scott, Co-Founder and CTO of Malong Technologies

Byron Black and White Headshot

Byron Galbraith, Chief Data Scientist and Co-Founder, Talla


 


 Episode Transcription 

Rob May: Hello, everybody. Welcome to the latest edition of AI at Work. I'm your host, Rob May, CEO and Co-Founder of Talla. Today, my Co-Founder and Chief Data Scientist at Talla, Byron Galbraith, is joining me as well.

Our guest today is Matt Scott. Matt is the CTO and Co-Founder at Malong Technologies. Matt, welcome to the show. Tell us a little bit about your background and about what you guys are working on.

Matt Scott: Absolutely. Thank you so much for the opportunity. I am the CTO and Co-Founder of Malong Technologies. We are a computer vision AI company founded in 2014. I founded it after about 10 years at Microsoft, and most of that time in Microsoft Research in the specific lab Microsoft Research Asia in Beijing, where a lot of the deep learning and machine learning technologies have emerged over recent years.

I started a company in China. I am not from China, originally. I'm from New York. I'm an American. In China, there are some interesting opportunities in AI. That is where we do our business. We focus on retail and other types of areas where product recognition can come into play. That's just a bit of high level. I can go more or less, based on your questions.

RM: Tell us a little bit about some of the most popular use cases that people use product AI for.

MS: Product AI is actually a core technology for recognizing products without barcodes in videos or images. This technology can be used to power smart retail, or, in some areas, called new retail, which is taking this type of technology into the offline shopping experience to enable unmanned shopping or to help prevent theft by double-checking the checkout experience.

We also work on various things related to outside the store as well. We build smart cabinets, which are these AI refrigerators. We're working with a partner in China. You can just open the door and take stuff and close it and just walk away. You pay for it on WeChat.

We also apply some of this product recognition technology to industrial use cases-- so for detecting various attributes of products, like the quality. We work in defect detection, as well, for some industrial quality cases.

Byron Galbraith: Matt. This is Byron. A few different things I'm curious about, but maybe first we'll dig into the core of what you do, which is a machine vision-based product.

I'm actually curious, given your background, what has your experience been seeing the evolution of machine vision? I'm assuming you worked on it for quite some time. The translation from the Viola-Jones, the optic flow methods, to, basically, cognants everywhere, have you seen that?

MS: Totally. I've been working on computer vision for over 15 years. I started working on the traditional techniques. Well, at the time, they were the modern techniques.

The difference between the traditional techniques and the techniques of today is mostly related to features. Features are what, essentially, the computer is looking at to describe an image so that you can apply additional machine learning techniques to.

Originally, in traditional machine learning or traditional computer vision, we would handicraft our features. We would, for example, look for things and build rules and if/else rules to pick out something interesting or differentiating in the image. Then we would say, “as a human being, I can interpret this image and notice something important about it that we can put into. We can represent as a number.” Features are just represented as numbers and/or a feature vector, which is a series of numbers that can be projected into some space that you can then apply other algorithms on top of.

Normally, we would just come up with these features or say, something important about a product may be its color or its shape or that edge or this edge. Then we would just formulate that into our own idea of what a feature vector should look like. That worked, but wasn't so robust.

The revolution around 2012 was to just say, “Hey, let's not handcraft these features. Let's use a machine to learn the features. Then, at the same time, let's also build models on top of those features. Build end-to-end networks where we're extracting features and learning from those features.”

The technique is totally different now. We don't program as much in terms of the actual execution or inference of images. We let the machine do it. Essentially, instead of writing code to operate on images, we create a machine that writes its own code. There's a level of indirection.

The challenge is, how to produce a great model. We've spent a lot of time coming up with what is the right architecture to approach this. The techniques of this indirection, or now we call it deep learning, are revolutionary in the performance. In years past, AI has been going through hot and cold periods, or winter periods, where there's a lot of hype. There's still hype, of course. But there's something, actually, very real here, which is the performance of these models on benchmarks.

We were able to see before 2012, when we were using traditional techniques, and now, after 2012, where we're using deep learning techniques, a massive difference in the benchmarks. We've, essentially, approached human-level performance on a number of computer vision benchmarks and, in some cases, that they even exceeded human-level performance, reaching superhuman-level performance. It's a totally different approach that has significant results in the academic area, as well as the industry. What we do at our company is focus on balance between doing basic research in computer vision, and then also the application side, as well, and staying at the cutting edge of research and application.

BG: Fantastic. And actually, I'm glad you mentioned benchmarks because I think that's a good segue, which is a lot of academic work. For instance, this ImageNet benchmark, which is very standard now in computer vision, it's not necessarily representative of real-world computer vision challenges that a lot of companies might be interested in trying to solve. For instance, if I'm collecting data from a phone, I have to deal with things like how the user holds it, the lighting conditions, what they're targeting at. You talk a bit about how you guys solve that bridge between, hey, we have a great model that works really well and a really nice, clean, curated data set benchmark, to, I have to deal with live, user-generated data.

That's actually a tremendous problem in the field where it looks like it's a mirage, that-- the fact that there is a great database out there that algorithms do work really well on, but it turns out that it's not practical at all. Just for example, building that data set, which works well for that particular vertical, is not easy to build for other verticals. Very often, when we talk to customers, they say, let's build an ImageNet for retail, or let's build an ImageNet for x, y, or z. It turns out to be extremely difficult to do.

MS: Let's look at some of the numbers. The ImageNet data set put together was around 10-plus million images. It took two years and 50,000 people to label, a tremendous amount of effort, to curate and balance that data set. It's not just getting the labels right. It's even this inter-category distribution to balance it so you don't bias towards one category or the other.

It's, in fact, incredibly difficult. You can almost think of it as as difficult or as complex as the previous generation, where all the complexity was in the programming. Now all the complexity is in the data management. That doesn't really scale at all. It's just a good place to benchmark algorithms.

To reach the practical stage of applying deep learning and landing it into industry, we had to come up with something a bit different. In our company, we've actually spent several years working on a different approach. The approach used today on those large data sets, like ImageNet, is largely supervised learning, where you assume the data is, essentially, perfect, noise-free. In the real world, it's very noisy, the data sets.

We came up with a approach using deep learning with a concept called weakly supervised learning. Weakly supervised learning is the ability for the machine to learn from noisy data in such a way that it's as performant as learning from clean data. Using data naturally, as it is chaotic and random and sometimes wrong or imbalanced. There's a whole set of techniques in this field of learning strategy.

Normally, when people are pursuing deep learning, they're going to be thinking about their data set. They're going to be thinking about their neural net architecture, even their loss function. But rarely do people think about their training strategy. And so that's where we came in with weakly supervised learnings, a very large step forward in this field.

We came up with a technique called CurriculumNet, which we just published it ECCV, the European Computer Vision Conference. This technique we applied to a very large benchmark from Google called WebVision. It's, essentially, a successor of the ImageNet competition. We can say it's a very big step forward because, looking at the benchmark, we won first place on this competition out of 100-plus competitors.

The difference between first and second place was quite a wide margin. It was actually a relative error rate of nearly 50%. It's a very big step forward. We're proud to actually contribute some of these techniques back to the scientific community and-- because we're hoping not just to improve our products, but to help the industry, because we don't want the industry to get confused that choosing supervised learning is going to be the right way because it may bring about another AI winter due to expectations not being set properly.

We really want deep learning to work and land for everyone. We do encourage other companies to start taking up the weakly supervised learning approach. We did not just release a peer-reviewed paper-paper. We also released some code, and even some models, as well, to help people get started.

RM: For the stuff that you guys are working on, what's the most interesting problem that you would solve at your company, but that somebody else could solve that would make your business and your product better? Is there anything out there that you go, oh, man, if somebody would fix this, it would help us as part of our pipeline or whatever you're doing?

MS: Where we can get help from others is, actually, at the algorithm level. We'd like to think of ourselves as a science-based company, where we really go down to the nuts and bolts of how deep learning works in its most metal form. Where we can get help from others is, actually, on the research side, where we publish papers at CBPR or ECCV.

At the end of the paper, we typically ask for the reader to help explore the areas which we didn't have time to explore ourselves, our open questions we don't have answers for. If other companies have a research department, other universities, or can actually help us in our research, this can not just help our products, because we base our products on these algorithms, but I think it can also help everyone around the world who works on deep learning.

BG: There's an interesting question here that I think comes up frequently, maybe, which is, isn't your technology your competitive advantage? Why put it out there? Why ask for people to help you do this? Is the AI, the technology, the thing that you think is differentiable? Or, is it really not just that it's just a component and without, say, the rest of the entire process of running a business, that it doesn't matter as much?

MS: That's a great question. AI, I believe, is different from past technical revolutions, where if you have the technology, you just want to keep it in the basement, lock it up in the vault, and just guard it to death. Actually, in AI, instead of a technology you want to keep and build it's actually a capability. The capability is to always be at the cutting edge because, actually, the technologies are continuously moving forward.

If you're using technology and AI that's just one year old, it could be already out of date. We need to be at the cutting edge. Putting our stuff out there to get the community involved can help us, and then help everyone else. We believe our differentiator is that we are continuously at the forefront, at the state of the art. That's where we have our differentiation.

Of course, in a business, there's much more to that. There's actually our customer base, who trusts us. We've got a lot of work going with our partners and customers. This technology will help their products. Again, our company is an enterprise solution provider. We help other companies make their products better. So our customers trust us to help them be at the cutting edge. That's what we're all about.

BG: Great. And actually, you guys are based in China. What is it like being an entrepreneur in China and selling to Chinese companies versus, say, your experience of when you worked at Microsoft from being in the United States?

MS: It's another great question. It turns out that being in China is not that-- it doesn't present any issues or differences that are very noticeable for doing the research because we participate in the international research community. So we're often participating in conferences and meeting up with researchers and working across the world with people on this stuff. And so it doesn't really matter where we are. I'm flying around all the time. Our team members are calling these guys over there. And so it's not so obvious that this is a China AI walled off type of deal. It's more of an international effort when you're actually getting down to research.

Most of our company's effort is actually on the core research technology. For the actual business part, our customers are in China. So that's different, of course. We do have customers outside of China as well. I think besides just the time zones and there's obviously a cultural difference living in China.

I don't see any major issues or any major differences in doing my work. I could do my work in China. I could do it outside of China. I wouldn't see that much difference, other than the resources.

RM: Some of our listeners here are interested in machine vision, or they're interested in AI at their companies. There is a lot of bad information out there. What advice do you have for people that are just getting started? Where do they look inside their companies? What kinds of things should they be reading or thinking about or questions they should ask? Where do they start?

MS: That's definitely true. AI is now in everyone's mind, it's almost, in some way, jumped the shark. Every advertisement I see here is just AI this, AI that. It can be very confusing.

What exactly is AI? AI has now become a buzzword, just like "big data," where it sort of lost its meaning somewhere along the way. I would say that, to cut through all of that, it's to go to the primary sources.

If someone is really interested in, actually, really understanding the algorithms and actually making this stuff work, I would go to the primary sources of the actual papers that were published in the field that-- the key ones that have changed the game. That's not too difficult to find.

If someone just wants to use the technology, not really fully understand it, I think that it would be, probably, a better thing to work with experts in the field. I think that one of the issues with deep learning today is that there's a lot of myths. The expectations are not in sync. People will expect this stuff will work 100% of the time. Deep learning and machine learning in general is very rarely 100% accuracy type of work. Especially if you don't know the ins and outs of it, you're not going to get very high performance. And that can dissuade people and make people think that this stuff doesn't work very well at all.

I think getting the expectations set properly, and also figuring out how to marry the technology to a particular use case, is going to be very important. I think that people who are working in companies that are thinking about AI need to have a basic understanding of the technology and then find that right use case. Don't have the hammer looking for the nail type of deal. If you can get that use case, try to understand a bit of the technology. Work with experts to really land it. Otherwise, it could waste time and energy.

BG: I'm curious about learning about AI, and maybe I'm going to look around, and I've heard big companies are pushing their various frameworks, do you have a recommendation for people? Should they push those aside and focus on more simple, fundamental things, or should they just jump right in and do TensorFlow tutorials, for instance?

MS: I think it's about what the goal is. If the goal is to not get so deep, it's okay to just jump into some tutorials. There's going to be some ceiling to the progress if you don't understand some of the fundamentals. For example, it's not even fundamentals of deep learning. It's the fundamentals of machine learning.

What is the best practices with data is actually a universal machine learning concept. There's a lot of fundamentals, I think, that would just be very important to be successful in deep learning, if they would be pursued. For that, there's lots of online courses and materials that could be found.

I would suggest to go down that path of some basic education in the fundamentals of deep learning before just jumping into the tutorial unless the goal is just to be extremely high-level. Then it's fine. If you want to just make a "Hello, World" and see is it a hot dog or not a hot dog, you can easily do something like that.

I think that that's going to get old pretty quick because people are going to start to think, well, this stuff can really work. Let me change my whole business to adopt this stuff. Actually, there will be a lot of misunderstandings if doing that path. Again, I think coming down to some of the fundamentals will be my recommendation.

RM: One of the questions we always wrap up on for a lot of our guests is this idea that a lot of people are worried about AI, not just the economics, but actually killer AI and all that. There's been this debate last year between Elon Musk and Mark Zuckerberg that was a little bit public about how much should we be worried about and thinking about and preparing for that or not. We always like to ask people in the field, where do you fall on that spectrum between “there's nothing to worry about” and, “oh my god, AI might kill us, we should be putting a lot of time and effort into figuring this out?”

MS: I'm definitely in the camp of it not being a killer technology in the literal sense. I believe it's in the ANI, the Artificial Narrow Intelligence, for the foreseeable future. The AGI, the General Intelligence, that's the stuff of Terminator and science fiction. I don't believe we're anywhere close to that. I can give you some evidence to that in the form of logic.

In computer science, we're good at emulating things we find in biology or some natural phenomenon. Once we fully understand that natural phenomenon, we're good at emulating that in code or models. That's something natural and experienced. For example, even deep learning itself, the idea of the layers actually came from, it's biologically inspired by taking apart some biology. Some of the experiments a long time ago.

The reason why I don't think we're close to general intelligence is because we don't how the consciousness works because AGI is actually, essentially, giving consciousness to a machine. How can we emulate consciousness in code if we don't ourselves understand it? In neuroscience, consciousness hasn't made much progress. I don't think we, just in the biological sense, we have very little idea of how consciousness works. How can we simulate that in software?

I think my idea is that when we get some understanding of consciousness, once we understand the mechanics of it, then it'll come pretty soon to software. Since, over the last 100 years or so, in neuroscience, we haven't made much progress, I don't anticipate that coming in in another 100-plus years.

RM: Matt Scott, thanks for being on today. If people want to find out more about your company, what's the best URL to use?

MS: Oh, thank you, malong.com

RM: Thank you guys for listening. If you have guests you'd like us to see or questions you'd like to ask, please send those to podcasts@talla.com. We'll see you next week.

Subscribe to AI at Work on iTunes or Google Play and share with your network! If you have feedback or questions, we'd love to hear from you at podcast@talla.com or tweet at us @talllainc.