AI & Machine Learning

The evolution of AI and ML modeling capabilities

Michael Krueger
January 18, 2023
3 min read

More and more our marketing strategies aim to be personalized and automatic, and the cloud has facilitated that evolution. AI and Machine Learning are part of that shift. We spoke with our expert, Mike Krueger, about the evolution of modeling capabilities. He gave us an in depth look at how AI/ML modeling works and how businesses can utilize it.

MEET THE EXPERT

Michael Krueger

VP Data Science

Mike provides thought leadership to the development of data science solutions that benefit our clients’ data enrichment needs and marketing opportunities. He oversees a talented team of data scientists that have a similar natural curiosity in the impact data and machine learning can have in achieving business goals. Mike brings over 20 years of extensive experience ranging from machine learning, segmentation, analytics, BI and market research.

There are obviously benefits to having data in the cloud, but it also adds some complexities to the process. Can you talk about the process?

Data engineers today play a larger role than they may have in the past. Data pipelines rapidly move internal data from our Consumer and Business database sources and specialty databases as well as external client data so that they can be made accessible to the Data Science team as the data consistently changes or is updated from clients. These pipelines form a network of data streams that allow our teams to automate how we can prepare and analyze data. We also have access to many more modeling algorithms to train and fit models during the initial development and as new client data is provided. The data science team can focus more attention on producing a highly accurate prediction with this data foundation in place and provide that benefit back to our clients for their data strategy and marketing needs.

How have AI and machine learning changed things in the last 5-10 years?

Ten years is a long time! The AI world is changing much faster than that and part of the challenge is keeping up with new solutions and technologies to remain innovative and foster creative thinking. At its core, the opportunity is that data scientists have at our fingertips a quicker way to get access to a lot of different methodologies that we can test in our experimentation. In years past, logistic regression models were our bread and butter, and they’re still part of the solutions that we use today. But now we have the opportunity to develop several deep learning models instantaneously that go far beyond what the human brain can decipher.

Timeframe to deliver is an important metric but we still have our gold standards in place regarding the measurement of the reliability and accuracy of our model outcomes. Another example of how this might apply is in model inference where new data is provided after initial model development, and we have an a/b test design where the initial algorithm is tested against another deep learning model algorithm to confirm the performance still holds well or the model that fits better is chosen. With deeper models, the explainability of the performance results is higher to ensure there is trust and confidence with our clients, and so we utilize different open source libraries to share the analysis of our results. ML also allows us to track results over time to trigger potential areas of model degradation and tuning required.

Whether we are deploying models to our core data platforms, SMB lead gen data solutions, or back to clients directly, our clients benefit by faster delivery on models that continue to perform well over time (or changes are made in flight to ensure they do).

For any of our readers who aren’t data scientists, how would they access these methodologies?

It requires a basic understanding of code language (python, in our case) and some foundation in statistics to have some appreciation of what the different methodologies offer. Where data science is such a growing field, there are many online education options available. With some level of education, a good source to review is Github, which developers publish to frequently. You will likely find one of our team members out there in the community!

Let’s say someone is working with us, and they’re a consumer data customer. What does that process look like? And also, on the back end, what is your team doing when they set up those models?

When we’re building a custom model for a client, the primary objective is to either acquisition or retention. If we are looking at acquisition, we want to identify new prospective customers or leads and we want to train a model and score our internal data for prospect campaign selection. If the goal is retention, we want to grow the value of existing customers. When we do this, we are using more first-party data provided by the client in combination with our data to train a model and then scoring the client’s customer base. The process is generally similar outside of more data discovery for retention-based models.

In either situation, we are receiving client data that we need to prepare and ingest into our machine learning platform. Part of the preparation may be cleaning and standardizing the data, as needed, and then appending our data attributes to that client data to develop the modeling sample we will use for model development. With a consumer data client, this most likely means attributes from our core consumer data, B2C link data, and some of our specialty databases (all of which are available in the cloud within our machine learning platform to join).

In the world of AI/ML, we now allow for thousands of different attributes to be available for the model to consider. Where there may have been considerable manual effort in the past to reduce data via some up front analysis, so the model only focuses on meaningful attributes, much of that is handled programmatically today. Our data scientists will evaluate the results of each model during development and inference and work closely with clients or their partners to share results, as desired. Upon final determination of robust model performance, final results are shared in a report that explains the predicted lift of the model and the potential volume available at certain thresholds. As the desired audience changes over time, the model will adapt to ensure it still holds or be tuned to do so.

So, in that kind of scenario where we are letting the model run in the cloud and it’s updating as the data changes, are we typically presenting that info to the client in the form of updated presentations?

Not typically, as we’re validating results and explaining any potential for changes in a simpler report than the initial presentation. Some models need a longer time to get more information based on the marketing channel used or the sales cycle involved. For instance, with direct mail campaigns, it can take significantly more time to get responses back than digital channels. If there is a weekly, monthly or some recurring program, the model validation process can be scheduled to be in line with that calendar.

There seems to be a pattern emerging for benefits of the cloud—faster processes, for a start.

Yes, it’s much faster for sure but, as mentioned, without sacrificing quality. That’s partly because we can work with more data sources and more attributes than before. We can automate much of our processing, too, so our clients receive insights and new outcomes as they need it.

Our “Ask the Expert” series is always illuminating, and we appreciate Michael taking the time to share his knowledge. Contact us if you want to continue the conversation.

More from the Data Axle blog

Modeling & Predictive Analytics

Forecast your future with predictive analytics

AI & Machine Learning

The essential role of data in martech stack automation

Data Quality

Unlocking the potential of first-party data for targeted marketing