Empower Youth

YOU CANalytics | Data Visualization - Banking Case Study Example (Part 1) – YOU CANalytics |

Data Visualization Banking Case Study

Data Visualization – Banking Case Study Example (Part 1)

Leonardo – by Roopam

A Scientist & An Artist

A few weeks ago while wandering around in Florence, the birthplace of the Renaissance, I could not escape the thought of Leonardo da Vinci : the greatest polymath of all times. Leonardo’s illustrious resume contains titles such as painter, inventor, physicist, astronomer, engineer, biologist, anatomist, geologist, and architect – no kidding! A smart cat would have to live all her nine lives to acquire the nine titles Leonardo had mastered in one lifetime. Today, while discussing facets of data visualization, we should pay homage to Uncle Leonardo as we cross the realm of both art and science.

Art and Science of Data Visualization

Data Visualization – by Roopam

Data visualization, as mentioned earlier, is both art and science. I personally prefer to have a long look at the data, plotting them in various ways before jumping into rigorous mathematical modeling. You might have noticed my penchant for art while going through my artwork presented in all the posts on this blog. The saying – a picture is worth thousand words – holds true during data analysis as well. Models in analytics can go horribly wrong if you have not spent enough time on the data exploratory phase – which is all about data visualization to me. Let me present a case study example to explain the aspects of data visualization during the exploratory phase.

Banking Case Study Example – Risk Management

Assume you are the chief risk officer (CRO) for CyndiCat bank that has disbursed 60816 auto loans in the quarter between April–June 2012. Today, about a year and a quarter since the loans disbursal, you know that the loans have seasoned or bad loans are tagged to a greater certainty (read a detailed discussion). You have noticed a bad rate of around 2.5% or 1524 bad loans out of total 60816 disbursed loans.

Before you jump to multivariate analysis and credit scoring (read a detailed discussion on credit scoring), you want to analyze the bad rate across several individual variables. You have a hunch based on your experience that borrower’s age at the time of loan disbursal is a key distinguishing factor for bad rates. Therefore, you have divided the loans based on the age of the borrowers and created a table something like the one below.

Using the above table, you have created a histogram and zoomed into the area of interest (close to the bad loans) as shown in the plots below.

You must have noticed the following

• The distribution of loans across age groups is a reasonably smooth normally distributed curve, without too many outliers. Age often display this kind of pattern for most products. However, do not expect similar smooth curves for other commonly found variables in a business scenario. Often, you may have to resolve to variable transformation to make the distributions smooth.

• The maximum bad loans are in the age bucket 42 to 45 years. This certainly does not mean the risk is also the highest in this bucket, however, once I have heard someone drawing a similar conclusion in a quarterly business review meeting –a silly mistake. Note, the maximum loans are also in the bucket 42 to 45 years. Absolute numbers do not provide enough information hence we need to create a normalized plot.

• The data is really thin on the fringe buckets (i.e. <21 and >60 years groups) with only 9 and 6 data points – be careful when dealing with such thin data. Sound business knowledge to modify these fringe buckets is extremely helpful while a model development. For instance, you know that for age above 60 for loans could be highly risky, but in this data, we do not have enough evidence for the same since we do not have enough data to validate our hypothesis. We should supplement a right risk weight in such situation – however, be very careful while doing so.

Normalized Plot

The normalized plot is easy to construct. The idea is to scale each age group to 100% and overlay bad and good percentage of records on top. We could extend the table shown above to get the values for the normalized plot as shown below.

Now, once you have the table ready you could create a normalized plot quite easily, as shown below (again we have zoomed into the plot to get a clear view of bad rates).

These plots are completely different from the original frequency count plot and presenting the information in a completely different light. The following are the things one could conclude from the plots.

• There is a definite trend in terms of the bad rates and the age groups. As the borrowers are getting older, they are less likely to default on their loans. That is a good insight.

• Again, the fringes (i.e. <21 and >60 years groups) have thin data, this information cannot be obtained from the normalized plot. Hence, you need to have the frequency plot handy to treat thin data differently. A handy rule of thumb is to have at least 10 records of both (good & bad) cases before taking the information seriously – otherwise, it is not statistically significant.

I must conclude by saying that, data visualization is the beginning of modeling process and not the destination. However, it is a good & creative beginning.

Sign-off Note

With big data, data analysis tools & technologies, scientific progress and democratic environment – we could be living in the Renaissance of our times. However, we will need more Leonardo da Vincis to make these times really special.

Posted in Banking Risk Case Study Example, Risk Analytics | Tags: Banking and Insurance Analytics, Business Analytics, Predictive Analytics, Roopam Upadhyay |

« Wise Models – Analytics Graffiti

Data Visualization – Banking Case Study Example (Part 2) »

23 thoughts on “Data Visualization – Banking Case Study Example (Part 1)”

Amit Chandra says:
May 15, 2015 at 12:45 pm
Hi Roopam,
Thanks a lot buddy for giving a good insight on Analytics. Your blogs are really very informative and made me understand the nitty gritty of Analytics. Look forward to your future posts.. God bless you!!!
Thanks,
Amit Chandra
Reply
- Roopam Upadhyay says:
  October 3, 2015 at 8:43 am
  Thanks
  Reply
Abhishek Shukla says:
September 23, 2015 at 4:06 pm
Hi Roopam,
yours is easily the most effective guide on predictive analytics i have come across. You made statistics sound as cool as science and engineering.
regards,
Abhishek
Reply
- Roopam Upadhyay says:
  October 3, 2015 at 8:42 am
  Thanks Abhishek, appreciate your kind words.
  Reply
Sourav says:
September 30, 2015 at 4:12 pm
Hi Roopam,
I enjoy going through each article of you. Incredible work!!
can you help me in understanding, how you decide on the different age bands?
Thanks
Reply
- Roopam Upadhyay says:
  October 3, 2015 at 8:42 am
  Thanks Sourav,
  Here age bands were formed using eyeballing. The idea is to notice significant gradient change in average risk with change in bands. You could also use uni-variate decision trees (CHAID) to create bands.
  Reply
Sylvia says:
April 3, 2016 at 2:11 am
Analytics de Perfiles son fundamentales para enriquecer el Negocio!!, la explotación de datos combinado con los modelos predictivos son el valor agregado al Negocio!!
Muy buenas tus publicaciones!!!! Gracias por compartirlo!!
Reply
Kriti Pandey says:
May 17, 2016 at 10:52 am
Here in your graph there is a clear downward trend. What if the data doesn’t have any(increasing/decreasing) trend? In that case WOE values cannot be monotonically increasing or decreasing. What should be done in those cases?
Reply
- Roopam Upadhyay says:
  May 25, 2016 at 9:49 am
  Monotonically decreasing or increasing trend is not the primary requirement for development of models. For instance, age forms a u-shaped plot vs. bad-rate for the banks in developed economies. This is logical since the repayment capability of elderlies is not as strong as for middle aged working professionals. The data used in this case study is for a developing economy where borrowers’ age is fictitiously capped at 60 – I hope you have noticed the thin data for age above 57 years. Hence, the important condition is logical consistency rather than trend line. Regular trend line, in many cases, justifies logical consistency than randomly fluctuating trend.
  Reply
Mohi says:
August 26, 2016 at 2:46 pm
hi roopam
thanks alot for the good insights of anlaytics
sir i didint got the meaning of thin data which you refered in the article,can you make it clear for me
Reply
- Roopam Upadhyay says:
  August 26, 2016 at 6:13 pm
  Thin data refer to data with low or insufficient sample size. This also applies to small sample size for either the good or bad class of loans. You will this article useful : identify the right sample size for your analysis
  Reply
Raghu says:
October 12, 2016 at 3:25 pm
Thanks Roopam – Your blogs are quite interesting, informative and intuitive.
Reply
dee says:
December 15, 2016 at 6:22 pm
is there a sample data set for this example that I can use for practice?
Reply
Asutosh says:
May 9, 2017 at 11:53 pm
Hi Roopam,
Here you have considered Age as the variable to form binning and categorize the date. Lets say there is another variable as Income along with Age variable. This is also a variable which can decide the good and bad rates. How do you decide now the distinguishing factor for bad rates ?
Reply
Marcelo says:
May 24, 2017 at 3:38 am
Hi Roopam,
it is a good way to learn analytics. Excellent work!.
Reply
Deepak Kulkarni says:
February 14, 2018 at 1:43 pm
Hi Roopam,
Hope you are doing well.
It was one of my best experiences working with you. You are a valuable asset to any Organization and a best guide
Good luck.
Deepak
Reply
- Roopam Upadhyay says:
  June 13, 2018 at 5:11 pm
  Thanks, Deepak! Really appreciate your comment. Be well.
  Reply
Tejas Sanap says:
February 15, 2018 at 9:25 pm
dear sir,
can you share the jupyter notebbok/code you used in this article?
Reply
Chetan says:
June 25, 2018 at 10:52 am
Dear Roopam,
Lets say if i replace original variables by WOE and then check for multicollinearity. Is this the right procedure?
Reply
- Roopam Upadhyay says:
  July 5, 2018 at 7:07 pm
  Yes, that’s fine.
  Reply
Ankush Paul says:
February 15, 2019 at 12:29 pm
Hi Roopam,
Content is awesome. But can we have this data which is used over here.
Much Appreciated,
Ankush
Reply
Syrus Mathew says:
May 12, 2020 at 5:12 am
Hi Roopam,
Thanks for the good work work
Reply
Javid says:
December 28, 2020 at 4:37 pm
how can i decrease number of age groups?for example i want to see coefficents for 5 groups
Reply

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO BLOG

Provide your email address to receive notifications of new posts

Email Address

MUST READ

Career in Data Science - Interview Preparation - Best Practices

Free Books - Machine Learning - Data Science - Artificial Intelligence

CASE-STUDIES

- Marketing Campaign Management - Revenue Estimation & Optimization

Customer Segmentation - Cluster Analysis - Segment wise Business Strategy

- Risk Management - Credit Scorecards

- Sales Forecasting - Time Series Models

CREDIT

I must thank my wife, Swati Patankar, for being the editor of this blog.

PAGES

ORIGINAL Case Study

YOU CANalytics | Data Visualization - Banking Case Study Example (Part 1) – YOU CANalytics |

Related Topic

Data Visualization – Banking Case Study Example (Part 1)

A Scientist & An Artist

Art and Science of Data Visualization

Banking Case Study Example – Risk Management