Data Modeling

Data in the CPG industry comes from a variety of sources including SAP sales and supply chain transaction data, marketing sources and customer behavior. The CPG industry is challenged with creating analytics on this data and using the data for predictive and optimizing purposes. The complexity lies in creating a data model. Without a proper knowledge of HANA columnar databases, open source databases and POS integration technology forecasting and analyzing SKU data is a challenge. Predictions that need be made are which customer is going to buy what product next, how many purchases are they going to make and what is the customer churn? What is my customer life time value, how often to they buy and when was the last time they bought a product. Other types of predictions include trade and promotions, pricing, conversion costs and marketing (sg&a) costs. TekMetrix tools are readily available for making these predictions and more. SKU level predictions can be made and structured into a P&L (SKU) level, cash flow and balance sheet. Analysis of the P&L and other financial documents can be done by SKU, customer, geography, brand and other attributes, including aggregations. We can forecast into the future your customer lifetime value (CLV).

The question becomes, how to design and develop an underlying data model supporting a variety of analytics and predictive metrics including embedding AI and ML capabilities into the underlying data model? TekMetrix data AI ML models and experience will accelerate your project timelines and add significant value to your business.

Defining the Right Questions

If you want to run your business most effectively, we need to make predictions about the future. Predictive analyical forecasting takes historical data and creates a one or multi period forecast. TekMetrix data analysis helps examine past performance and also helps to predict future performance, for example predicting questions like:

  • Customer purchases
  • Customer life time value
  • Warehouse inventory
  • Cost of goods sold
  • Revenue
  • Customer returns
  • Pricing
  • Trade promotion costs
  • Contribution margin impact of incremental sales due to trade promotions

We want to make "when will" predictions for a fixed period and multiple periods in the future.

  • Inherently granular
  • Customer behavior
  • Foward looking
  • Multi-platform
  • Broadly applicable
  • Multidisciplinary

Predictive Statistics

Predictive statistics uses statistics for prediction and forecasting generally one period ahead of the current period. The mean of the prediction is the sample mean which is an unbiased estimation of the mean of true demand distribution. Standard deviation for prediction needs to be adjusted if there is insufficient data. If the data is normally distributed than the data can be adjusted for predictive purposes. 

If there is a trend in the data, than moving averages, mean and standard deviation computations used for predictive purposes will lag the trend. Therefore, linear regression or exponential smoothing are additional forecast options.

Regression Example - Predict 1 Period Ahead of the Current Period

The regression equation and its variants are used in Artificial Intelligence (AI) and Machine Learning (ML) modeling. These equations help us understand the relationship between independent variables (features) and a dependent variable (target) by fitting a mathematical function to the data. Various types of regression models, such as linear regression, polynomial regression, and logistic regression, are commonly employed for prediction, classification, and modeling tasks. These models can be developed in SAP HANA, SAC, PaPM. Additional tools could be MatLab, R, Python, or for smaller models in Excel. 

  • Revenue growth management, as an example can be optimized using regression analysis. Optimal pricing is the price which optimizes overall profit. Models are built to:
    • Quantify sales demand at different prices
    • Find the optimal price
    • Optimization performed with the general regression formula:  Y(n) = a + b1X1(n) + b2X2(n) + .... bjXj(n) + E(n) , X values are independent of Y and E(n) it error
    • Multiple independent variables can be modeled, for example Sales = a + b1(Price) + b2(Advertising) + E
    • The regression equation and variants of the regression equation are used for AI and ML modeling

Key Performance Indicators Use In Forecasting Beyond Period 2:

  • Direct marketing example using regression analysis can be used to predict future, multiperiod, customer behavior:
    • Use key performance indicators of past customer behavior to predict future behavior
    • Regression models are also useful for this type of modeling and forecasting
    • Regression model predictions using RFM models are used to forecast customer behavior (recency, frequency, monetary value)
      • Recency - what were the recent customer purchases (more important than frequency)?
      • Frequency - how many purchases did the customer make (more important than monetary value)?
      • Monetary value - what is the value of each of the purchases?
  • Probability models can be used to forecast longer term horizons
    • Buy till you die models (BTYD) are a powerful probabilistic model used to make long range projections, to answer when type questions, when will a customer churn?
    • Customer lifetime value modeling using Pareto/NBD and BG/BB models
  • Limitations of regression models
    • Forecasting more future periods than period 1 beyond the current period, regression models are limited because they need input data
  • Making predictions for period 3 than period 2 data can be used as the independent variable
  • Regression models are limited based on the data that is available to forecast multiple periods into the future