Data science is experiencing its moment in the limelight, with this vital discipline laying the groundwork for new artificial intelligence (AI) and machine learning (ML) advances on a daily basis. As humans and as business leaders, our appetite for the latest and greatest is always strong, and it’s no exception in this case. At present, everybody is looking to ramp up their data science practices and point to the use of AI and ML in their products and services.
We’re overdoing it. We’re over-engineering solutions. We’re hiring people for jobs that turn out to not be quite as advertised and then either disbanding teams a year later, or wondering why they aren’t engaged and are naturally attriting. In doing so, we’re also fueling an ecosystem that encourages people to fluff up their credentials and seek data science jobs for which they’re not really qualified, making it harder for genuine employers to hire the right talent. False economies don’t sustain.
Let me be clear though: Data science is an essential and even revolutionary field in the modern business landscape. The new techniques being developed for understanding and operationalizing data, increasingly in an automated fashion, are transformative. The way we function is changing, and must continue to do so. That said, in our absolute desire to be a part of the AI and ML story, we are drowning ourselves in inefficiency. Allow me to explain.
YouTube is rife with videos of complex Rube Goldberg machines. (If you haven’t seen them, I highly recommend—hours of fun while sheltering-in-place!) As remarkable as these devices are, they are—by definition—a needlessly complex solution to a simple problem. This approach becomes dangerous when we translate it to the business world by starting with a technology choice (“Let’s make sure our products use AI!”) rather than starting with the business problem.
As an example, consider my 14-year-old Toyota. I have a key fob that sits in my pocket. Even if I have a bag in each hand, I can walk up to the car and just open the door. No extra effort required.
Then along came phones with near-field communication (NFC) built in. All of a sudden, auto manufacturers were rushing to showcase how you could use your NFC-enabled phone to open your car door. To do so, you simply had to take the phone out of your pocket, hold it up to the NFC tag on the window, and then pull the door open.
This example checks all the boxes for auto manufacturers tasked with using the latest technologies. It allows an executive to get up on stage and talk about how you can use your phone to unlock your car. But now, I have to stop at my car, put down my grocery bags, take my phone out, hold it up, put it back in my pocket, pick up the bags, and then get into the car. My elegant and seamless experience just became riddled with pain points.
This is unfortunately what too many people are doing when they try to develop complex models, or build an AI solution, to perform tasks that have simple solutions available. Doing so just for the sake of it is a waste of resources and a long-term economic detriment. The savviest organizations show restraint and recognize that the best solutions often arise in the context of scarce resources and incentives that align with solving for customer and business value, rather than technological checkboxes.
The key fob example also serves to illustrate the old truism about not focusing effort on solved problems. If you’re trying to run a business efficiently, then you want to be tightly focused on the distinct value add that you provide. Where problems have been solved by others—and let’s face it, most of our problems are not as unique as we may want to believe—leverage their work. Stand on the shoulders of giants.
Instead of building out a team of data scientists to solve everything in-house, first explore the availability of either open-sourced or licensable solutions elsewhere. As AI and machine learning mature as disciplines, we’re finding that many of the biggest players in this space—including Amazon, Google and others—have already invested heavily in creating robust algorithms and tools that can easily be employed or adapted to solve any number of data challenges. There’s nothing to be gained by employing your own team of 50 data scientists to solve for a problem that can be readily addressed by an off-the-shelf solution. (For technically minded readers, there’s an interesting article by Thomas Nield* that walks through a specific example of scheduling systems, for which there are several existing algorithms that solve really efficiently, obviating the need to invest in re-invention.)
Above all, when you’re considering investing in data science, and more so in ML and/or AI, it is imperative that you recognize that the foundation for any potential successful outcome is the quality of data that you have available for your team and its models or tools. Garbage in, garbage out, as the saying goes.
A team of PhDs may well develop a machine learning image recognition system for you that outdoes even what the big guns today have in place. But if you train it with seven pictures of dogs that are labeled as cats, the only thing it will do is fail spectacularly.
Quality, of course, extends deeper than that, and any data scientist or data engineer worth their salt will demand that you focus here first. As you do so, accuracy, precision, recall, timeliness and provenance are all important considerations, but what is often paid little more than lip service is defining what constitutes quality in your particular context. Much like the vanity metrics that companies love to trot out (think “30 million people downloaded my app,” which tells you nothing about how many of them are actually using it), if you don’t properly consider what constitutes quality, you won’t attain it.
Consider a data set that concerns the presence of children in a household, and their ages. If you’re selling infant onesies to parents with newborns, then timeliness and precision are critical. Your target market is tight, and if you’re a few weeks too late, you’ve missed the mark. However, if you’re selling family board games, it might barely matter if you’re off by a couple of years if your accuracy is good. It’s the same data, but a different quality assessment.
AI and ML are going to be a fundamental part of our future. I am not asserting that today’s enterprises shouldn’t be employing best-in-class data scientists. I am simply saying that company leaders need to ensure that they are hiring against a well-defined strategy and need, and ensuring that they have clean, well (and ethically) sourced data that is sufficiently substantive to warrant significant modeling atop it. By focusing in this manner, you can ensure that your organization’s resources — as well as the time and talent of your data scientists — are being put to good use.
https://towardsdatascience.com/sudokus-and-schedules-60f3de5dfe0d
**Article originally published on Tech Funnel.
As Chief Product Officer, Rohan oversees Data Axle’s products, technology, and data, with a view towards creating increasingly valuable solutions for our clients and partners. His teams are responsible for product management, design, engineering, analytics and data science, and IT & data operations as he leads the evolution of Data Axle’s product suite.