More and more our marketing strategies aim to be personalized and automatic, and the cloud has facilitated that evolution. AI and Machine Learning are part of that shift. We spoke with our expert, Mike Krueger, about the evolution of modeling capabilities. He gave us an in depth look at how AI/ML modeling works and how businesses can utilize it.
Mike provides thought leadership to the development of data science solutions that benefit our clients’ data enrichment needs and marketing opportunities. He oversees a talented team of data scientists that have a similar natural curiosity in the impact data and machine learning can have in achieving business goals. Mike brings over 20 years of extensive experience ranging from machine learning, segmentation, analytics, BI and market research.
Data engineers today play a larger role than they may have in the past. Data pipelines rapidly move internal data from our Consumer and Business database sources and specialty databases as well as external client data so that they can be made accessible to the Data Science team as the data consistently changes or is updated from clients. These pipelines form a network of data streams that allow our teams to automate how we can prepare and analyze data. We also have access to many more modeling algorithms to train and fit models during the initial development and as new client data is provided. The data science team can focus more attention on producing a highly accurate prediction with this data foundation in place and provide that benefit back to our clients for their data strategy and marketing needs.
Ten years is a long time! The AI world is changing much faster than that and part of the challenge is keeping up with new solutions and technologies to remain innovative and foster creative thinking. At its core, the opportunity is that data scientists have at our fingertips a quicker way to get access to a lot of different methodologies that we can test in our experimentation. In years past, logistic regression models were our bread and butter, and they’re still part of the solutions that we use today. But now we have the opportunity to develop several deep learning models instantaneously that go far beyond what the human brain can decipher.
Timeframe to deliver is an important metric but we still have our gold standards in place regarding the measurement of the reliability and accuracy of our model outcomes. Another example of how this might apply is in model inference where new data is provided after initial model development, and we have an a/b test design where the initial algorithm is tested against another deep learning model algorithm to confirm the performance still holds well or the model that fits better is chosen. With deeper models, the explainability of the performance results is higher to ensure there is trust and confidence with our clients, and so we utilize different open source libraries to share the analysis of our results. ML also allows us to track results over time to trigger potential areas of model degradation and tuning required.
Whether we are deploying models to our core data platforms, SMB lead gen data solutions, or back to clients directly, our clients benefit by faster delivery on models that continue to perform well over time (or changes are made in flight to ensure they do).
It requires a basic understanding of code language (python, in our case) and some foundation in statistics to have some appreciation of what the different methodologies offer. Where data science is such a growing field, there are many online education options available. With some level of education, a good source to review is Github, which developers publish to frequently. You will likely find one of our team members out there in the community!
When we’re building a custom model for a client, the primary objective is to either acquisition or retention. If we are looking at acquisition, we want to identify new prospective customers or leads and we want to train a model and score our internal data for prospect campaign selection. If the goal is retention, we want to grow the value of existing customers. When we do this, we are using more first-party data provided by the client in combination with our data to train a model and then scoring the client’s customer base. The process is generally similar outside of more data discovery for retention-based models.
In either situation, we are receiving client data that we need to prepare and ingest into our machine learning platform. Part of the preparation may be cleaning and standardizing the data, as needed, and then appending our data attributes to that client data to develop the modeling sample we will use for model development. With a consumer data client, this most likely means attributes from our core consumer data, B2C link data, and some of our specialty databases (all of which are available in the cloud within our machine learning platform to join).
In the world of AI/ML, we now allow for thousands of different attributes to be available for the model to consider. Where there may have been considerable manual effort in the past to reduce data via some up front analysis, so the model only focuses on meaningful attributes, much of that is handled programmatically today. Our data scientists will evaluate the results of each model during development and inference and work closely with clients or their partners to share results, as desired. Upon final determination of robust model performance, final results are shared in a report that explains the predicted lift of the model and the potential volume available at certain thresholds. As the desired audience changes over time, the model will adapt to ensure it still holds or be tuned to do so.
Not typically, as we’re validating results and explaining any potential for changes in a simpler report than the initial presentation. Some models need a longer time to get more information based on the marketing channel used or the sales cycle involved. For instance, with direct mail campaigns, it can take significantly more time to get responses back than digital channels. If there is a weekly, monthly or some recurring program, the model validation process can be scheduled to be in line with that calendar.
Yes, it’s much faster for sure but, as mentioned, without sacrificing quality. That’s partly because we can work with more data sources and more attributes than before. We can automate much of our processing, too, so our clients receive insights and new outcomes as they need it.
Our “Ask the Expert” series is always illuminating, and we appreciate Michael taking the time to share his knowledge. Contact us if you want to continue the conversation.