Informative, Curvy & Never Before Seen (by me at least) – The Associated Curve Chart
I’ll start off by saying that I really like this chart, it started out largely as just an experiment to connect datapoints but during the development it morphed into an entirely new chart type (I’ve never seen it in Qlikview or anywhere else – please let me know if you have) that I believe can be very useful when it comes to showing associated and complimentary (or not) products, I call it the Associated Curve Chart.
Firstly I need to explain what this chart shows as that will hopefully convey the usefulness and potential that’s held within. In my examples here I’m looking at customers who have bought multiple insurance products – it could just as easily be students taking various exams or baskets of goods from a supermarket. So; one insurance customer could have a Motor policy, a Life Insurance policy and Pet Cover whilst the next may well have a totally different mix from the product set and another could only have a single product. It would be useful for the insurer in this example to know where to offer discounts: “Get 10% off Motor Insurance when you buy Life Insurance”. Alternatively of course it could be to not offer discounts to purchasers of Life Insurance as you know they’re relatively likely to buy say Motor Insurance as well.
The chart is split across the middle with Male data at the top and Female at the bottom – this isn’t a requirement of that chart but shows we can add in an additional metric. There is one circle per Product – in this case 9 evenly spaced across the middle. The curves shows 2 metrics: the ‘strength’ (Saturation & Thickness) indicates the relative volume of customers buying at least both the Products at either end of the curve whilst the distance it extends from the X-axis across the middle shows the average premium earned from customers taking the 2 associated Products. Therefore a low weak line shows that relatively few customers purchase both Products together and when they do it generates a low Premium, conversely a high strong line means a relatively large volume and a high average premium.
So in the case of insurance you’d potentially look to find high faint curves (small volume but high premium) so you could decrease the premium charged to increase demand or vice versa look to increase premiums where the line is low and strong.
Think of applying this to say a supermarket’s data to look for associations in customer purchases – of course you’d see the obvious things like strawberries being bought at the same time as cream or perhaps charcoal bought with steaks but you’d start to see other less obvious but none the less relatively commonly purchased combinations…and that’s where you’d want to target discount offers: “50% off ice as we know you’re likely to buy Whisky at the same time”. It can get scarier than that; Target the US retailer was recently in trouble for mailing coupons for nursery furniture and maternity clothes to a young woman even before she’d told anyone (including her parents) – http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/ This was done thanks to Target spotting associations in their data; they had a database of women they knew to be pregnant and looked at the associated products they bought and then simply looked for the same associations in their wider unknown customer base and then mailed them discount coupons for baby related products.
Of course the vast majority of ‘associative purchases’ are false positives and nothing more than coincidental; you must have looked into you’re shopping basket at some point in life and thought something like: ‘what would someone think if they saw me buying avocados, bin liners and shoe laces?’ – there’s nothing joining the items together to explain you buying them at the same time. However, as the volume of transactions increases these coincidental associations fade and the ones that have shared common reasons (conscious or otherwise) behind them will come to the surface as they’re purchased more regularly by more people and it’s these associations that can be exploited for increased profit.
The associations could be shown simply by using the strength of the curve to point users in the right direction but the addition of the average premium metric takes things further. For instance you’d expect some people to purchase Motor Insurance and Pet Cover and others to buy Motor Insurance and Company Directors Insurance, the chart would show us where the greater volume was but also which of the 2 combinations generated the highest average premium – simplistically you’d expect the line connecting Motor and Directors Insurance to be higher as an average company director will have a higher value car (and thus more expensive to insure) than the average Pet owner, so; offer people who buy Directors cover only an incentive to take Motor insurance as well as you know it’s likely to generate a good premium. That’s an obvious simplified example but I guarantee that there are others out there hidden in the data that aren’t so apparent without the use of a chart like this. This could just as easily apply to Student Grades; do students taking Geography & Physics get a higher average % mark than those taking Geography & Media Studies? – The Associated Curve Chart can tell you.
And because this is Qlikview we can dive into the dataset – reduce the supermarket data by season; associations in the summer will be very different from those in the run up to Christmas, inner-city stores will differ from more rural locations, west coast will be different to east coast etc etc and all the while the chart responds and displays the underlying associations accordingly.
The chart isn’t perfect, for instance try plotting every possible product a supermarket sells across the x-axis and it would end up 100ft wide and be wholly impractical so it’s perhaps more suited to say selecting Strawberries and seeing what dairy products were purchased along with them – did we sell more single or double cream and which combination made the most?. Coupled to that the fact that we need to create a row of data for every product combination under each order number, customer number etc then our dataset is going to grow by several times which could well cause problems.
That said I still think it’s of great potential use from high-level indicator to deep data mining visualization and it shows information other charts would struggle to, so download the .qvw and have an experiment with it and see if you can put any of the principals to use. As a note; there are slight differences in the way versions of Qlikview and operating systems format axes and fonts so the version you open may not look 100% like those in the pictures.
*This is a bespoke chart created from bespoke tailored data so in it’s current form it can’t readily be applied to other datasets – I am working on a method to generate the chart automatically to make it quicker to deploy.
The method used to create the chart loosely follows that used to generate my recent ‘Dynamic Network Flows’ chart so the basics can be picked up here: http://qvdesign.wordpress.com/2012/06/22/new-qlikview-chart-type-dynamic-network-flow-charts/ Also there’s the Radial Chart mentioned here: http://qvdesign.wordpress.com/2012/03/05/associative-radial-chart/ that also shows associations between items purchased together in a more basic manner.
The complete .qvw can be downloaded here: https://docs.google.com/open?id=0BxloTMUod74taHpNZ1ZyS3VHZFE
As always I hope you can put it or something based on it to use.
All the best,