Rob May interviewed Ram Katamaraja, CEO and Founder at Colaberry and Pawan Nandakishore, Colaberry's Data Scientist. At Colaberry, they provide career-oriented training programs that re-skill and upscale people who are losing jobs, helping them move into the data space. Rob, Ram, and Pawan talked about the unique story behind Colaberry and how they are applying data science today.
Rob May, CEO and Co-Founder,
Ram Katamaraja, CEO and Founder, Colaberry
Rob May: Hi, everyone, welcome to the latest edition of AI at Work. I'm Rob May, co-founder and CEO of Talla. I am here with the Ram Katamaraja and Pawan Nandakishore I butchered your last name. Did I get it right?
Pawan Nandakishore: Haha, you got it right.
RM: They are at Colaberry, which has a very, very interesting business. They got started focused on veterans and training veterans, and have been able to build an interesting data science play around that I'm going to let them tell you about. Welcome to the show. Give us a little bit about your background, what Colaberry does, and then tell us how you got involved in this veterans thing, and how you use data science. Give us the high-level overview.
Ram Katamaraja: Thanks, Rob, for having us. We started Colaberry in 2012. The objective was to give back to the community and the country by hiring a veteran in to my team as a data analyst, and as I started looking I could not find people with the skills. Fortunately, I ended up talking to, like, friends, and we put together a team of five initially and started training them. I ended up hiring three of them, and two went and got jobs outside.
I was in a way, like, done giving back to the country, because all I wanted to do was hire one person. But we also learned that there is a lot of people out there who needs what we are doing. Basically, they need an opportunity to learn skills that are relevant to the industry, and we picked data as the key skill that we could teach. It's because, unlike coding which is like art work, you need to have a lot of abstract thinking, data is very tangible. Literally, you can put data before anybody, and all they have to do is play around with it like LEGO blocks. It's like putting LEGO blocks together, and we figured out, we found out that they're leveling each much faster when working with the data.
We focused on it, and it grew from there through word of mouth. I think, at this point, we are in a juncture where the world that is getting disrupted by AI, but driven by data, and there is a lot of opportunities in the data space. The work that we are doing is kind of like right in between, where we are having the opportunity to reskill and upscale people who are losing jobs, people who needs to move into new jobs in the data space.
RM: What kinds of data skills are you teaching them? Is this more like data cleansing? Is this basic statistical analysis? Is it data visualization? I mean, there's a whole bunch of things that we can do around data. Is it a pretty standardized thing that you're teaching everybody else, or are they specializing? Tell us a little bit about the specifics there.
PN: Ultimately, for any organization to be able to utilize AI, they need data skills, basic data skills. That means that they need to be able to acquire data. They need to know how to label data, which means that, ultimately as an organization, you need to make data-driven decisions. For that, you need to be able to plot data. Yes, so I mean it includes everything, right. It includes the ability to collect data. It includes the ability to visualize the data, tell stories using data.
Once you do that, then you can bring in same machine learning, and you can apply this data to these sorts of systems and then start making decisions from that. So that's the structure where you first teach them basic data skills, so that they can start making those decisions. Then create systems where they can utilize these systems to learn skills to apply this to AI.
RK: Some of our observations are like in typical organization, the front line workers with data skills can be a great addition to the data analysis teams, because they bring the direct domain knowledge. They can be a great addition. Let's say there are people with some sort of programming skills. They can transition well into the data engineering jobs. If people have a statistics background, they can transfer well into the data science jobs. Our goal is to, one, transition and upscale them to move into the higher jobs.
RM: Are you still just focused on veterans, or have you start to expand this? What kind of background does somebody need to be successful here?
RK: We started with veterans, but right now we focus on everybody. It doesn't matter. I want to take a step back, when we are talking about data and talking about data science, there are a whole bunch of rules that are out there. It's from our data acquisition, to data cleansing, to reporting on data, labeling on data, and then developing machine learning algorithms, developing data science algorithms, ultimately developing new AI products. There are all kinds of skills that are out there. When we are looking at frontline workers, we are fundamentally giving them skills-- one, to do their job much more efficiently, because now they are data aware-- and then, giving them skills to transition into high-value jobs.
To give you an example, if a trucking industry is automating all its fleet, probably having your truckers as the decision makers when the autonomous trucks are going around is a good decision for the company. That’s our approach of providing data skills for the people, so that they can be relevant in the new data-driven economy that is evolving. In a way, they don't really need any skills, because there are any upfront skills, but one skill that would be extremely helpful is having the domain knowledge. Once you know domain knowledge, you would be really good working with data.
PN: See, because ultimately, when you're trying to build any kind of data product, you need to understand what you're actually doing. So in the trucking example, a truck driver knows how to drive a truck. He or she will know the challenges that they face in this task, so a data science team can learn a lot just by talking to them. And if they have the right skills, they can even help them collect data. They can even help them make the right sort of choices, whether it's in algorithms, whether in how to collect-- and what kind of data is being collected. And then, I mean, ultimately as a business, that's what is valued.
RK: Yes. Our fundamental bet is this. 20 plus years ago, if you needed a job, you need skills-- how to operate a computer. And in the future that's getting disrupted by AI, for you to operate you need to have data skills. You need to understand how to work with data. Our goal is to create data literacy in individuals and across the organizations, so that we can prepare the workforce for the future of work.
RM: Very cool. Have you had the chance to follow up with some of your people now? Do you have any statistics on how successful they've been? Once they've been deployed at some of these jobs, do they tend to stick around? Do they tend to be happy in the new roles? Do you have any numbers there?
RK: We have wild number. So we have literally, like people who are working as truck drivers now working as data architects. We have veterans who were working as cable box operators now becoming like IT managers. We have inner city youth, at-risk inner city youth, working in Silicon Valley companies like Facebook, LinkedIn type of companies as a data analyst or data engineers after transitioning.
On the other hand, with the question that you asked it's a very important question, because one of the significant learning for us as part of our work is teaching the hard skill is important. But, teaching the soft skills is like a lot more important for them to succeed.
What that means is when somebody comes to our platform or our programs, the number one thing that we're trying to tackle is the fear of technology. So a lot of times, these people are not-- they don't even believe that they can work with the technology. So we have to start from there, right-- fear of technology. And we need to give them problem solving skills. And we need to give them hybrid loops, right. We use gamification methods to do the hybrid loops. We need to make them develop collaborative skills, communication skills. We use video technologies, audio technologies, and MLP, and giving them AI-driven feedback for them to develop this technology. All these types of skills allow them not just transition in the job, but also succeed and thrive in the job.
Once they transition, one thing they understand is they can do this. Next, they make good money, because they're moving into career-oriented jobs instead of normal gigs. They want to succeed, and they're trying. In We have a model where we don't just train people, but we charge them after they get a job. Our success model is engaged, it depends on our students succeeding, over a period of 12 months. Once its been 12 months, there is no stopping from there.
RM: Interesting. You mentioned the soft skills, and one of the things that we hear a lot from people who have come up in software but haven't typically, like, had to work with the data science team is that, like, data science is very different. And the example that I give sometimes on the show is, if you're getting 10,000 visitors a month to your website, and you need to scale that to 10 million, and you go to your engineering team and say that they kind of know what they need to do roughly, conceptually. If you have maybe a machine learning model that's 85% accurate, that you think needs to be 95% accurate, there are things you can try. You don't always know, how long it's going to take, or if you can get it there based on the data that you have and the techniques you're using. You may have to use an entirely different technique than what got you to the 85%, etc.
They can be a little more research-y and everything else. Then of course, even on the data science side, there's a very engineering component about data engineering, and cleansing, and all that. What have you seen, or what advice would you give to people that are new at working with data scientists about? Like, here are some things you need to know or some techniques you could consider to make data-related teams blend well with the rest of the departments?
PN: I think one aspect definitely is that domain knowledge. For example, if you don't know, if you don't know, what the kind of product that you're building is-- like, you don't have any idea-- even if you have data science skills, you will find that you will not be able to succeed because you don't know what the product is being made for. For example, if you're in the energy sector, and you're coming in as a data scientist, if you don't understand, like, how companies measure success-- what's their measurement of success of, like, building a product and releasing a product? You need to be able to understand that.
This is where, as someone who's a data scientist, you have to come in and also be really good at understanding the business aspect of things. You can't just be someone who says that, look, I know all these fancy algorithms. That gets you nowhere, because most of the time what organizations will typically want is someone who can understand data and who can also understand the business values. I think you can collect data within your organization, and you can say that, look, these are the kind of things that my organization values from the data that I've collected. On that basis, I'm going to make these sets of decisions. I think this understanding, for someone who's coming in and working with the data science team, is crucial more than anything.
RK: I can talk about how we apply data science. The way we think is we want to automate ourselves. That's the way to scale. We are actually taking a use case, and then how do you automate it? For example, if I had to talk about automating soft skills, how do you automate people to do their interviews well, to talk well? A standard model for helping people prepare for interview is you do mock interviews, and a lot of scenario play, etc. That takes hours and hours of time. If we could automate that and figure out how to apply data science for that, and make it scalable, that'd work.
We did something really simple. As people are going through our programs, we made them record video, on a consistent basis, about what they learned. We took that information, and started analyzing their body postures, and giving them feedback, “hey, you need to adjust your body posture”. That's the feedback that you would give in the mock interview, right. The AI is doing that. It's a need-based application, right. Then we analyze their vocabulary and see, like, what vocabulary they're using, and we score that vocabulary and give them feedback.
Here again, our approach is automating ourselves. When we are trying to automate ourselves. We are not an AI-first organization, but we are like a need-based organization. Let's look at the problem. Let's solve this. Can we apply AI and figure it out? It used to take about 10 interviews, on average, for people to get a job, and probably anywhere from 10 to 100 hours of one-on-one mentoring with each student. It just evaporated completely.
Now our data tells us exactly when a person is ready to go and talk to a potential hiring manager, because we are able to tell with almost-- I wouldn't say 100%, but over a period of time we learned that, you hit these 80% to 90% scores. You are ready to go. Now apply your human intelligence and figure out, like, how to cross that thing. That's how we apply that.
PN: I really like that one interview, one job sort of philosophy that we have, which is also really nice. You can perhaps sort of talk about that a little bit, as well. That might be interesting.
RK: One interview, one philosophy, that was like one of the core principles that was behind like how we developed our process, and platform, and technology. We are continuously developing it. When somebody who is looking for a job, they go through the programs and everything, and they go out in an interview. If they have. There is no way for you to predict whether they can get a job, it's a waste of time. It's a waste of time for the job seeker and a waste of time for the employer, as well.
The way we looked at it is, the employers are looking for certain attributes, right. They are looking for basic skills, data skills capability, to work in their data teams, and a culture match. These are the certain things that the employee is looking, the employee is presently looking for, “OK, I need to be able to ace this interview”. I need to be able to confidently articulate, like, what I know. We said, let's figure out how to make the job seeker confidently articulate what they know in an interview, both from the body language, how they look at themselves, and how they present themselves.
Once that barrier is crossed, now it's just a matter of talking to the hiring manager. And if somebody walks into your office and is a great communicator and able to articulate what they learned well, as an employer, usually you are trying to give that benefit of doubt. Oh, yes. This guy or girl is great. Let's bring them in, and they can learn additional skills that are required. That's typical bet that an employer makes. That helps us succeed. Once we are able to identify that threshold, they go in, and they tried to get the job. I would say for almost for more than a year we've been very lucky in that. Almost every interview they attend, they get the jobs.
RM: That's awesome. That's a great track record. You guys have been doing this for a few years now. What have you seen? What has changed? What are the trends that you are seeing in data and data science that-- you know, if you're listening, and you're not deep in this field, like, you would say, hey, you should pay attention to this. This is growing. Maybe you should stop paying attention to this. This is waning. What advice do you have there?
PN: I think one definitely is like this heavy sort of swing towards Python from a technology point of view. Many people have just adopted it as the de facto language for doing data science and working with data in general. I think there are a couple of points, I forgot there was one thing. What was that? I mean, from a technology point of view, of course there's like a heavy move towards like deep learning and whatnot, because these are tools that are becoming more and more accessible. And like, one of the tasks that we have is that-- but on a fundamental level, what we are trying to emphasize on is that even this basic idea of identifying what is data in your organization is something that's really important, and hopefully we can push it towards that direction.
There's all these advanced techniques that people are coming up within algorithms. But ultimately what makes a difference is how well you use your logistic regression algorithms, and how well you use the existing tools. I think that's not being done well enough, so of course there's going to be like a swing towards all these various trends. But, we do see that people ask for these basic skills more. There's also this demand for moving towards a holistic view of data science, someone who can do data engineering, someone who can do data science, at the same time, communicate these results. That's the unicorn that everybody likes to go after.
We feel what you'll see in the future is that will break off into people who specialize into very specific topics. Like, there'll be someone who specializes in just gathering data and just telling those basic stories about, like, what-- just describing it almost like in a business analytic sense. Then you have people who do machine learning and whatnot.
RK: To add to that, I mean, I tried to, like, give this analogy. Like, when internet came up, everybody's a webmaster, and then there are, like, probably a million roles that came up. So this area is evolving, and it's not one data scientist is the only role. There are entire organizations driving towards becoming data-driven organizations, so that is opening up lots of job opportunities. For example, if you take a business analyst role job-- five years ago, all they need to know is Excel. Now they need to know Python, right?
Every job that's out there is getting embedded with the need for working with data skills, so that's a huge opportunity. Organizations are moving towards AI, and AI is going to disrupt. But organizations are also looking to hire new talent and upscale talent, so that they can work with these new evolving organizations. And there, we are seeing that organizations are betting on people having the capability to work with data in almost every role.
PN: I think the other thing is that, as time passes by, a lot of the tools that are available right now are going to be packaged off. Like, you're not going to be training a base level neural network. Ultimately, what you're going to be doing is you're going to be taking your data. You're going to be putting it in a system that is a result. And that's typically like how many-- honestly, many of the web frameworks today work, right. And so, when you have the situation, those kinds of specialized roles may not survive, and what will survive is the ability to just work with that basic data to know that-- look, what kind of data do I put into this complex machine learning system?
I don't necessarily need to know what is happening there, but I need to be able to figure out what do I put in. Of course we can devolve into, like, explainable AI and the sort of issues there. But at least at the basic level, most people will need to know how to work with these tools. And I think, yeah, there's definitely a swing towards certain technologies, and, you know, we're going with the swing.
RM: We're new in 2019. Do you have any big predictions for anything that might happen in AI this year? Any new breakthroughs or anything you expect to see?
PN: I am hoping that reinforcement learning and tools associated with that will mature much more. This year, I think we will see that happening. We will see a lot of that happening. And I think AI is going to become-- I hope that it becomes, like, less of a hive and more of a-- more of, like, these are the set of tools. These are good tools, and that's slowly emerging. You're seeing, like-- you're seeing situations where people can simply work with machine learning and AI modules, and that's great for people.
I think as 2019 rolls in, you're going to see more people excited about-- not just being excited about AI, but being able to apply it on a day-to-day basis and actually derive value from this. That's my prediction.
RM: Cool. Last year, one of the last questions that we asked everybody in the show was the Elon Musk, Mark Zuckerberg debate, because that was a good thing going on. I think we're going to stop asking on it, and this year I think we're going to ask everybody the debate that happened sort of at the end of last year, which is Gary Marcus versus everybody else, and there are some people on his side. But, where do you fall on the spectrum of deep learning has a lot more runway towards what it can do, and where we're going to go, you know, which I think is the common conventional wisdom, versus the Gary Marcus, like, no, no, it's fundamentally missing something. We really need to look at some, you know, of these other techniques to take AI to sort of the next level of breakthrough.
PN: I think one of the main issues there is that deep learning is great, but what we still underestimate is the scale and the cost at which deep learning works. Like, the other day that I was reading a statistic, Nvidia recently, It took them $70,000 to train that model over, Titan GPUs. And I was like, that's ridiculous. I mean, it sounds like Nvidia can do it, because they're in that business, right. An organization like a small or even a medium size organization cannot do that, right?
I think it's like deep learning has become this sort of thing where it's only accessible to a certain strata of engineers, and data scientists, and whatnot. It's not accessible to everyone. And I think when it comes to that, it's important to do that. There is a long way that deep learning has to go. And in terms of what it can provide in terms of value, again, I feel there's a lot of hype, but there's not enough real value that's being delivered. And yeah, of course, big companies can do that, because they have the data.
Facebook has so much data. Smaller organizations are collecting data. They're building their systems, but deep learning isn't always the best approach to everything. It's weird for me to say that, because a lot of my training is in deep learning. The most important thing is explainable AI, the ability to explain the kind of decisions that are being taken. Until and unless we address that, you can utilize any deep learning algorithm. You can utilize anything in the world, but you're not going to derive any value out of it.
Yes, it's going to give you an answer, but beyond that, “why did I get this answer?” is very important, because there are going to be legal implications. There are going to be all sorts of other implications, especially in the medical industry. I don't think it's a good idea to introduce AI without completely understanding what it can do.
RM: Good answer. Well, thank you, guys, for coming on the show. And if people were to find out more about Colaberry, what's the best URL to go to?
RM: If you have guests you'd like to see on the program or questions you'd like us to ask, please send those to podcasts at talla.com. Thanks for listening, and we'll see you next week.