Virtual Events: ER148 12/8  | Jobs @ ERA: Operations Intern (Virtual) | Investment Analyst

# How to Measure and Predict Customer Lifetime Value

Customer Lifetime Value (CLV) is the total revenue a customer generates for a business. It’s a simple formula — no fancy tools needed — which looks like this:

GML is the gross margin per lifespan (how much profit each customer provides in their lifespan), R is the monthly retention rate (inverse of the churn rate), and D is the monthly discount rate (or the opportunity cost, mainly consisting of inflation and risk).

That’s a mouthful, so let’s take an example: Netflix. Their gross margin is 39% while the average total customer revenue is around \$465 (\$10.82 per user per month, on average, times a whopping 43 months), and the GML is the product of these factors, or around \$181. The monthly retention rate is about 99.25% (yes, that’s extremely high), and we can take the standard discount rate of 10%.

Plugging these values into the CLV formula, we get \$181 * (0.9925/(1+0.1–0.9925) which is equal to a CLV of \$1,683.

However, the CLV formula doesn’t tell us the impact of different attributes on component variables in that formula (like churn and customer spend), so simply knowing CLV doesn’t let us optimize for the future. It’s like calculating the fuel economy of a car, without knowing how to actually optimize it.

We can use a slightly different approach to model a customer’s transactions and predict what they’ll do in the future, taking into account variables like order value and churn rate. We can then see what drives sales and retention, allowing us to maximize CLV, based on this formula:

T is the average number of transactions per month, AOV is the average order value, AGM is the average gross margin, ALT is the average customer lifespan in months, and n is the number of clients for a given period.

Let’s use this Olist e-commerce dataset as an example of calculating CLV with this formula. First, we need to find the average number of transactions, T. For this, we look at the file “olist_orders_dataset.csv.”

In 2017, there were 45,101 transactions, and in 2018 there were 54,011 transactions. The average transactions for a one-year period is 49,556, or 4,130 per month. We exclude 2016 due to its low transaction volume.

We can find AOV, the Average Order Value, by merging the orders dataset (which has order IDs and dates) and the orders payment dataset (which has order IDs and transaction values) on their shared column, order ID. The mean transaction value in 2017 was \$152.55 and in 2018 it was \$155.31, so AOV is \$153.93 (again, we’re not including orders in 2016).

The dataset doesn’t include the cost of sales, so we can’t precisely calculate the average gross margin, but Shopify’s median gross margin is 56%, so let’s use that as an estimate of Olist’s AGM.

Likely for privacy reasons, we don’t actually get real customer IDs. There’s the same number of customer IDs as order IDs, which would imply that there’s a 100% churn rate (since customer IDs don’t have more than one associated order ID), so instead we’ll assume that ALT (average customer lifespan in months) is 12 months, within the realm of a typical e-commerce company.

Now, we have the variables we need to calculate CLV.

• T = 4,130
• AOV = \$153.93
• AGM = 56%
• ALT = 12
• n = 99,440

CLV = (4,130 × \$153.93 × 56% × 12) / 99,440 = \$4,296

Again, knowing CLV doesn’t tell us how to optimize it, so let’s dive deeper.

# What Impacts CLV?

If we want to optimize CLV, we need to understand what drives churn and average order value. The Olist dataset doesn’t give us quite enough to work with, though we can use a synthetic Telco dataset to predict churn and analyze order values.

Let’s upload the Telco dataset to Apteo, and select “Churn” as the KPI to automatically find insights. Total charges has the biggest impact on churn, followed by the contract type.

Heading to the “Insights” tab, we can see that higher-cost and month-to-month contracts have a higher rate of churn, while lower-cost and yearly contracts have a lower rate of churn. Further, churn is higher with electronic checks as a payment method, as opposed to automated payment methods.

We can create another predictive model with “total_charges” as the KPI, using the attributes of gender, senior citizen, partner, and dependents. We can see that whether the customer has a partner is the biggest driver of total charges, which makes sense, given that couples will have greater needs and financial means than individuals.

# Summary

To optimize CLV, you need to analyze its component variables. By understanding the attributes that drive retention and sales, we can then maximize CLV.

In the Telco example, one could minimize churn by prioritizing yearly contracts, lower-cost options, and automated payment methods, while maximizing order value by prioritizing sales leads with partners and dependents.