
Aptitive has partnered with a variety of companies to build out extensive platforms for analyzing their data. The tremendous value in creating a business intelligence environment is to allow users to answer much more complex and interesting questions. In addition to visualizing the question in a dashboard, it makes sense to model the problem using statistical software such as performing cluster analysis in R.
As an example, let’s pretend we belong to Supply Chain group that needs to answer the question: “How do I classify my products to determine the appropriate service level?” In other words, how do I group products together to decide how to stock them in the warehouse?
There are a variety of tools and analyses that make sense, but I’ve decided to use RStudio to create a clustering model.
Load Data Set
First, I created a file that contains 1000 rows and three columns: Product ID, Distinct Customer Count over the last year, and Revenue (in thousands) over the last year. Note that I only used two inputs for the sake of visualization…for a bigger analysis, we could include more relevant variables.
I imported the data to RStudio, loaded the component columns into vectors, and created a matrix for my analysis:

Sample Data
For your reference, here is the sample R code:
#pull columns from source .csv into vectors
CustomerCount <- SampleData$CustomerCount
RevenueK <- SampleData$RevenueK
#create a Matrix using the vectors
myMatrix <- matrix(c(CustomerCount, RevenueK), nrow=1000, ncol = 2 )
Perform the Cluster Analysis
Next, I used the hierarchical clustering function in R to map out the possible clusters based on the two components in my matrix. So, at the top of the chart, all products are in the same group. Then the function splits that group “k” number of times based on the commonalities between the two components. Based on the resulting visualization (ie Cluster Dendrogram), I tried to split the 1000 products into an even number of diverse groups. I concluded four clusters makes sense.

The y-axis is a measure of “closeness” of individual clusters
#Use technique native in R to create clusters
myclust<-hclust(dist(myMatrix[-1]))
plot(myclust)
#based on breakdown, decide appropriate number of clusters to create
clustcnt <- 4
rect.hclust(myclust, clustcnt )
fit <- kmeans(myMatrix, clustcnt )
Visualize the Clusters
Next, I assigned the products their appropriate category and colored the plot point to reflect the result:

Customer Count Vs Revenue colored by cluster
out <- cbind(myMatrix, ClusterNum = fit$cluster)
colnames(out)[1] = “CustomerCount”
colnames(out)[2] = “RevenueK”
#designate output vectors using the resulting matrix and plot by color
CustomerCountOutput <- out[,’CustomerCount’]
RevenueKOutput <- out[,’RevenueK’]
clustcolor <- out[,’ClusterNum’]
plot(CustomerCountOutput,RevenueKOutput,main=”Product Cluster”, xlab=”Customer Count”, ylab=”Revenue in Ks”, col=ifelse(clustcolor==1,”blue”, ifelse(clustcolor==2,”purple”, ifelse(clustcolor==3,”red”, “green”))))
Analyze the Results
Finally, I am able to create a story about each category and make mindful decisions about how I would manage the stock:
- Blue (High Revenue, Low Customer variety): I might store these items directly in the customer’s site and maintain high safety stock since they bring in more money.
- Purple (Mid Revenue, Low/Mid Customer variety): I would perform more analysis to try to push the product into the “High Revenue” category.
- Red (Low Revenue, Low Customer variety): We might not bother to even stock these items and create a JIT system for ordering.
- Green (Low Revenue, High/Mid Customer variety): We might try to minimize these product lines and carefully watch our costs to ensure we are getting good return.
Also note, that if the firm added a new product line, it would only be a matter of rerunning the model to derive new classifications. No more long, painful projects to decide how to answer that same question over again.
Conclusion
By performing more advanced analyses, companies can make better, data-driven decisions. What questions could this type of analysis answer is your business?
- Who are my organization’s best members?
- Who is most likely to buy my services?
- What types of products are most profitable?
- …we can go on forever.
Beyond clustering, there is tremendous potential to use data for predictive forecasting, regression modeling, principle component analyses, and much more! Of course, the consultants at Aptitive love to work with data. We partner with companies to answer the questions that save clients money, more effectively run their business, and give them a competitive edge over the competition. Please reach out if you would like to discuss more.
This post was originally posted on Medium