your image

YOU CANalytics | Data Visualization - Banking Case Study Example (Part 1) – YOU CANalytics |

Data Visualization Banking Case Study
Related Topic
:- data visualization software

 

 

Data Visualization – Banking Case Study Example (Part 1)

·  23 Comments

 

Leonardo – by Roopam

A Scientist & An Artist

A few weeks ago while wandering around in Florence, the birthplace of the Renaissance, I could not escape the thought of Leonardo da Vinci : the greatest polymath of all times. Leonardo’s illustrious resume contains titles such as painter, inventor, physicist, astronomer, engineer, biologist, anatomist, geologist, and architect – no kidding! A smart cat would have to live all her nine lives to acquire the nine titles Leonardo had mastered in one lifetime. Today, while discussing facets of data visualization, we should pay homage to Uncle Leonardo as we cross the realm of both art and science.

Art and Science of Data Visualization

 

Data Visualization – by Roopam

Data visualization, as mentioned earlier, is both art and science. I personally prefer to have a long look at the data, plotting them in various ways before jumping into rigorous mathematical modeling. You might have noticed my penchant for art while going through my artwork presented in all the posts on this blog. The saying – a picture is worth thousand words – holds true during data analysis as well. Models in analytics can go horribly wrong if you have not spent enough time on the data exploratory phase – which is all about data visualization to me. Let me present a case study example to explain the aspects of data visualization during the exploratory phase.

Banking Case Study Example – Risk Management

Assume you are the chief risk officer (CRO) for CyndiCat bank that has disbursed 60816 auto loans in the quarter between April–June 2012. Today, about a year and a quarter since the loans disbursal, you know that the loans have seasoned or bad loans are tagged to a greater certainty (read a detailed discussion). You have noticed a bad rate of around 2.5% or 1524 bad loans out of total 60816 disbursed loans.

Before you jump to multivariate analysis and credit scoring (read a detailed discussion on credit scoring), you want to analyze the bad rate across several individual variables. You have a hunch based on your experience that borrower’s age at the time of loan disbursal is a key distinguishing factor for bad rates. Therefore, you have divided the loans based on the age of the borrowers and created a table something like the one below.

 

Using the above table, you have created a histogram and zoomed into the area of interest (close to the bad loans) as shown in the plots below.

 

You must have noticed the following

• The distribution of loans across age groups is a reasonably smooth normally distributed curve, without too many outliers. Age often display this kind of pattern for most products. However, do not expect similar smooth curves for other commonly found variables in a business scenario. Often, you may have to resolve to variable transformation to make the distributions smooth.

• The maximum bad loans are in the age bucket 42 to 45 years. This certainly does not mean the risk is also the highest in this bucket, however, once I have heard someone drawing a similar conclusion in a quarterly business review meeting –a silly mistake. Note, the maximum loans are also in the bucket 42 to 45 years. Absolute numbers do not provide enough information hence we need to create a normalized plot.

• The data is really thin on the fringe buckets (i.e. <21 and >60 years groups) with only 9 and 6 data points – be careful when dealing with such thin data. Sound business knowledge to modify these fringe buckets is extremely helpful while a model development. For instance, you know that for age above 60 for loans could be highly risky, but in this data, we do not have enough evidence for the same since we do not have enough data to validate our hypothesis. We should supplement a right risk weight in such situation – however, be very careful while doing so.

Normalized Plot

The normalized plot is easy to construct. The idea is to scale each age group to 100% and overlay bad and good percentage of records on top. We could extend the table shown above to get the values for the normalized plot as shown below.

 

Now, once you have the table ready you could create a normalized plot quite easily, as shown below (again we have zoomed into the plot to get a clear view of bad rates).

 

These plots are completely different from the original frequency count plot and presenting the information in a completely different light. The following are the things one could conclude from the plots.

• There is a definite trend in terms of the bad rates and the age groups. As the borrowers are getting older, they are less likely to default on their loans. That is a good insight.

• Again, the fringes (i.e. <21 and >60 years groups) have thin data, this information cannot be obtained from the normalized plot. Hence, you need to have the frequency plot handy to treat thin data differently. A handy rule of thumb is to have at least 10 records of both (good & bad) cases before taking the information seriously – otherwise, it is not statistically significant.

I must conclude by saying that, data visualization is the beginning of modeling process and not the destination. However, it is a good & creative beginning.

Sign-off Note

With big data, data analysis tools & technologies, scientific progress and democratic environment – we could be living in the Renaissance of our times. However, we will need more Leonardo da Vincis to make these times really special.

Related

 

Data Visualization - Banking Case Lab: 2 Axes Risk Plots in R

June 1, 2014

In "Analytics Labs"

 

In Conversation with Eric Siegel: Author 'Predictive Analytics'

May 15, 2014

In "Events & Interviews"

 

In Conversation with Naeem Siddiqi - Author Credit Risk Scorecards and Credit Scoring Guru

false

In "Events & Interviews"

Posted in Banking Risk Case Study ExampleRisk Analytics | Tags:  |

« 

 »

23 thoughts on “Data Visualization – Banking Case Study Example (Part 1)”

  1. Amit Chandra says:

    May 15, 2015 at 12:45 pm

    Hi Roopam,

    Thanks a lot buddy for giving a good insight on Analytics. Your blogs are really very informative and made me understand the nitty gritty of Analytics. Look forward to your future posts.. God bless you!!!

    Thanks,
    Amit Chandra

    Reply

  2. Abhishek Shukla says:

    September 23, 2015 at 4:06 pm

    Hi Roopam,

    yours is easily the most effective guide on predictive analytics i have come across. You made statistics sound as cool as science and engineering.

    regards,
    Abhishek

    Reply

  3. Sourav says:

    September 30, 2015 at 4:12 pm

    Hi Roopam,

    I enjoy going through each article of you. Incredible work!!

    can you help me in understanding, how you decide on the different age bands?

    Thanks

    Reply

    • Roopam Upadhyay says:

      October 3, 2015 at 8:42 am

      Thanks Sourav,

      Here age bands were formed using eyeballing. The idea is to notice significant gradient change in average risk with change in bands. You could also use uni-variate decision trees (CHAID) to create bands.

      Reply

  4. Sylvia says:

    April 3, 2016 at 2:11 am

    Analytics de Perfiles son fundamentales para enriquecer el Negocio!!, la explotación de datos combinado con los modelos predictivos son el valor agregado al Negocio!!
    Muy buenas tus publicaciones!!!! Gracias por compartirlo!!

    Reply

  5. Kriti Pandey says:

    May 17, 2016 at 10:52 am

    Here in your graph there is a clear downward trend. What if the data doesn’t have any(increasing/decreasing) trend? In that case WOE values cannot be monotonically increasing or decreasing. What should be done in those cases?

    Reply

    • Roopam Upadhyay says:

      May 25, 2016 at 9:49 am

      Monotonically decreasing or increasing trend is not the primary requirement for development of models. For instance, age forms a u-shaped plot vs. bad-rate for the banks in developed economies. This is logical since the repayment capability of elderlies is not as strong as for middle aged working professionals. The data used in this case study is for a developing economy where borrowers’ age is fictitiously capped at 60 – I hope you have noticed the thin data for age above 57 years. Hence, the important condition is logical consistency rather than trend line. Regular trend line, in many cases, justifies logical consistency than randomly fluctuating trend.

      Reply

  6. Mohi says:

    August 26, 2016 at 2:46 pm

    hi roopam

    thanks alot for the good insights of anlaytics

    sir i didint got the meaning of thin data which you refered in the article,can you make it clear for me

    Reply

  7. Raghu says:

    October 12, 2016 at 3:25 pm

    Thanks Roopam – Your blogs are quite interesting, informative and intuitive.

    Reply

  8. dee says:

    December 15, 2016 at 6:22 pm

    is there a sample data set for this example that I can use for practice?

    Reply

  9. Asutosh says:

    May 9, 2017 at 11:53 pm

    Hi Roopam,

    Here you have considered Age as the variable to form binning and categorize the date. Lets say there is another variable as Income along with Age variable. This is also a variable which can decide the good and bad rates. How do you decide now the distinguishing factor for bad rates ?

    Reply

  10. Marcelo says:

    May 24, 2017 at 3:38 am

    Hi Roopam,
    it is a good way to learn analytics. Excellent work!.

    Reply

  11. Deepak Kulkarni says:

    February 14, 2018 at 1:43 pm

    Hi Roopam,

    Hope you are doing well.
    It was one of my best experiences working with you. You are a valuable asset to any Organization and a best guide 

    Good luck.

    Deepak

    Reply

  12. Tejas Sanap says:

    February 15, 2018 at 9:25 pm

    dear sir,
    can you share the jupyter notebbok/code you used in this article?

    Reply

  13. Chetan says:

    June 25, 2018 at 10:52 am

    Dear Roopam,

    Lets say if i replace original variables by WOE and then check for multicollinearity. Is this the right procedure?

    Reply

  14. Ankush Paul says:

    February 15, 2019 at 12:29 pm

    Hi Roopam,

    Content is awesome. But can we have this data which is used over here.

    Much Appreciated,
    Ankush

    Reply

  15. Syrus Mathew says:

    May 12, 2020 at 5:12 am

    Hi Roopam,
    Thanks for the good work work

    Reply

  16. Javid says:

    December 28, 2020 at 4:37 pm

    how can i decrease number of age groups?for example i want to see coefficents for 5 groups

    Reply

Leave a comment

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

 Notify me of follow-up comments by email.

 Notify me of new posts by email.

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO BLOG

Provide your email address to receive notifications of new posts

Email Address

Subscribe

MUST READ

 

Career in Data Science - Interview Preparation - Best Practices

 

Free Books - Machine Learning - Data Science - Artificial Intelligence

CASE-STUDIES

 

- Marketing Campaign Management - Revenue Estimation & Optimization

 

Customer Segmentation - Cluster Analysis - Segment wise Business Strategy

 

- Risk Management - Credit Scorecards

 

- Sales Forecasting - Time Series Models

CREDIT

I must thank my wife, Swati Patankar, for being the editor of this blog.

PAGES

© Roopam Upadhyay

Comments