Feeds:
Posts
Comments

It’s been nearly three years since I last wrote an article for this site so it’s perhaps cheeky of me to use it for my personal gain, but here you go: those three years have not been spent idle, on the writing front, it’s just that I’ve been writing about something a little more exciting. Like shipwrecks, car chases, and upside down shootouts.

Drachen, my debut thriller, is available on pre-order at Amazon (US: http://amzn.com/B0133U3HGC UK: http://www.amazon.co.uk/dp/B0133U3HGC)

 

A marine archaeologist standing-up for herself. A psychopath with mother issues. A hitman who hates failure. A soldier with a point to prove. A policeman out on a limb. And a treasure that tests every allegiance.

 

Brett Rivera might not know exactly what’s going on, or who she can trust, but she’s in the race of her life and she knows she’s not going to give up; after three years of searching she has found the wreck of the Drachen. It goes downhill from there: first the hold is empty, and then she’s attacked, and then she’s almost killed.

Why is a mother-obsessed psychopath spending so much money to catch her? Who is the British soldier really? How is the hazy amber globe and the rusted keys she recovered supposed to help her locate the Hanseatic League’s greatest lost treasure?

Brett doesn’t know, but she has two things in her favour: Patrick, her best friend, and an ancient book which just might be the missing piece. She is pursued in Finland, double-crossed in Tallinn, abducted in Lübeck, shot at in Bremen, and she’s not taking it lying down.

 

A shipwreck. A lost treasure. A hell of a race from one to the other.

 

Reject Inference

I wrote my layman’s introduction to scoring a while ago now and never delivered the promised more in-depth articles. This is the first in a line of articles correcting that oversight. The team at Scorto has very kindly provided me with a white paper on scorecard building, which I will break into sections and reproduce here. In the first of those articles, I’ll look into reject inference, a topic that has been asked about before.

One of the inherent problems with a scorecard is that while you can test easily test whether you made the right decision in accepting an application, it is less easy to know whether you made the right decision in rejecting an application. In the day-to-day running of a business this might not seem like much of a problem, but it is dangerous in two ways: · it can limit the highly profitable growth opportunities around the cut-off point by hiding any segmenting behaviour a characteristic might have; and · it can lead to a point where the data that is available for creating new scorecards represents only a portion of the population likely to apply. As this portion is disproportionately ‘good’ it can cause future scorecards to under-estimate the risk present in a population. Each application provides a lender with a great deal of characteristic data: age, income, bureau score, etc. That application data is expensive to acquire, but of limited value until it is connected with behavioural data. When an application is approved, that value-adding behavioural data follows as a matter of course and comes cheaply: did the customer of age x and with income of y and a bureau score of z go “bad” or not? Every application that is rejected gets no such data. Unless we go out of our way to get it; and that’s where reject inference comes into play.

The general population in a market will have an average probability of bad that is influenced by various national and economic characteristics, but generally stable. A smaller sub-population will make-up the total population of applicants for any given loan product –the average probability of bad in this total population will rise and fall more easily depending on marketing and product design. It is the risk of that total population of applicants that a scorecard should aim to understand. However, the data from existing customers is not a full reflection of that population. It has been filtered through the approval process it stripped of a lot of its bads. Very often, the key data problem when building a scorecard build is the lack of information on “bad” since that’s what we’re trying to model, the probability an application with a given set of characteristics will end up “bad”. The more conservative the scoring strategy in question, the more the data will become concentrated in the better score quadrants and the weaker it will become for future scorecard builds. Clearly we need a way to bring back that information. Just because the rejected applications were too risky to approve doesn’t mean they’re too risky to add value in this exercise. We do this by combining the application data of the rejected applicants with external data sources or proxies. The main difficulty related to this approach is the unavailability and/ or inconsistency of the data which may make it difficult to classify an outcome as “good” or “bad”. A number of methods can be used to infer the performance of rejected applicants.

Simple Augmentation
Not all rejected applications would have gone bad. We knew this at the time we rejected them, we just knew that too few would stay good to compensate for those that did go bad. So while a segment of applications with a 15% probability of bad might be deemed too risky, 85% of them would still be good accounts. Using that knowledge we can reconsider the rejected applications in the data exercise.

· A base scoring model is built using data from the borrowers whose behavior is known – the previously approved book.
· Using the developed model, the rejected applications are scored and an estimation is made of the percentage of “bad” borrowers and that performance is assigned at random but in proportion across the rejected applications.
· The cut-off point should be set in accordance with the rules of the current lending policy that define the permissible level of bad borrowers.
· Information on the rejected and approved requests is merged and the resulting set is used to build the final scoring model.

Accept/ Reject Augmentation
The basis of this method consists in the correction of the weights of the base scoring model by taking into consideration the likelihood of the request‘s approval.
· The first step is to build a model that evaluates the likelihood of a requests approval or rejection. · The weights of the characteristics are adjusted taking into consideration the likelihood of the request‘s approval or rejection, determined during the previous step. This is done so that the resulting scores are inversely proportional to the likelihood of the request‘s approval. So, for example, if the original approval rate was 50% in a certain cluster then each approved record is replicated to stand in for itself and the one that was rejected.
· This method is preferable to the Simple Augmentation method, but not without its own drawbacks. Two key problems can be created by augmentation: the impact of small and unusual groups can be exaggerated (such as low-side overrides for VIP clients) and then because you’ve only modeled on approved accounts the approval rates will be either 0% or 100% in each node.

Fuzzy Augmentation
The distinguishing feature of this method is that each rejected request is split and used twice, to reflect each of the likelihood of the good and bad outcomes. In other words, if a rejected application has a 15% probability of going bad it is split and 15% of the person is assumed to go bad and 85% assumed to stay good.
· Classification
Evaluation of a set of the rejected requests is performed using a base scoring model that was built based on requests with a known status;
– The likelihood of a default p(bad) and that of the “good” outcome p(good) are determined based on the set cut-off point, defining the required percentage of the “bad” requests (p(bad)+p(good)=1); – Two records that correspond to the likelihood of the “good” and “bad” outcomes are formed for each rejected request;
– Evaluation of the rejected requests is performed taking into consideration the likelihood of the two outcomes. Those accounts that fall under the likelihood of the “good” outcome are assigned with the weight p(good). The accounts that fall under the likelihood of the “bad” outcome are assigned with the weight p(bad).
· Clarification
– The data on the approved requests is merged with the data on the rejected requests and the rating of each request is adjusted taking into consideration the likelihood of the request‘s further approval. For example, the frequency of the “good” outcome for a rejected request is evaluated as the result of the “good” outcome multiplied by the weight coefficient.
– The final scoring model is built based on the combined data set.

Reject inference is no a single silver bullet. Used inexpertly it can lead to less accurate rather than more accurate results. Wherever possible, it is better to augment the exercise with a test-and-learn experiment to understand the true performance of small portions of key rejected segments. Then a new scorecard can be built based on the data from this new test segment alone and the true bad rates from that model can be compared and averaged to those from the reject inference model to get a more reliable bad rate for the rejected population.

RHINO REFLECTIONS

RHINO REFLECTIONS.

 

On a completely unrelated matter, but one a lot more important than credit risk strategy in the grander scheme of things, it’s great to see Chinese celebrities raising awareness about the terrible and futile cost of the trade in rhino horn.

We usually assume that in a given situation, the more conservative of two strategies will better protect the bank’s interest. So, in the sort of uncertain times that we are facing now, it is common to migrate towards more conservative approaches, but this isn’t always the best approach.
In fact, a more conservative approach can sometimes encourage the sort of behaviour that it aims to prevent. Provisions are a case in point.

Typically provisions are calculated based on a bank’s experience of risk over the last 6 months – as reflected in the net roll-rates. This period is long enough to smooth out any once-off anomalies and short enough to react quickly to changing conditions.
However, we were recently asked if it wouldn’t be more conservative to use the worst net roll-rates over the last 10 years. While this is technically more conservative (since the worst roll-rates in 120 months are almost certainly worse than the worse roll-rates in 6 months) it could actually help to create a higher risk portfolio. Yes, the bank would immediately be more secure, but over time two factors are likely to push risk in the wrong direction:

1)        The provision rate is an important source of feedback. It tells the originations team a lot about the risk that is coming into the portfolio from internal and external forces. The sooner the provisions react to new risks, the sooner the originations strategies can be adjusted. So, because a 10 year worst case scenario is an almost static measure and unaffected by changes in risk, new risk could be entering the portfolio without triggering any warnings. A slow and unintentional slide in credit quality will result.
2)        Admittedly, other metrics can alert a lender to increases in risk, but there is another incentive at work because provisions are the cost of carrying risk; by setting the cost of risk at a static and artificially high level you change the risk-reward dynamic in a portfolio.
A low risk customer segment should have a low cost of risk, allowing you to grow a portfolio by lending to low risk/ low margin customers. However, if all customers were to carry a high cost of risk regardless, only high margin customers would be profitable; and since high margin customers are usually also higher risk, there would be an incentive to grow the portfolio in the most risky segments.

In cases where the future is expected to be significantly worse than the recent past, it is better therefore to apply a flat provision overlay, a once-off increase in provisions that will increase coverage but still provide allow provisions to rise and fall with changing risk.

You will almost certainly have heard the phrase, ‘you can’t manage want you don’t measure’. This is true, but there is a corollary to that phrase which is often not considered, ‘you have to manage what you do measure’.

To manage a business you need to understand it, but more reports do not necessarily mean a deeper understanding. More reports do, however, mean more work, often exponentially more work. So while regular reporting is obviously important for the day-to-day functioning of a business, its extent should be carefully planned.
Since I started this article with one piece of trite wisdom, I’ll continue. I’m trying to write my first novel – man can not live on tales of credit risk strategy alone – and in a writing seminar I attended the instructor made reference to this piece of wisdom which he picked-up in an otherwise forgettable book on script writing, ‘if nothing has changed, nothing has happened’.
It is important to look at the regular reports generated in an organization with this philosophy in mind – do the embedded metrics enable the audience present to change the business? If the audience is not going to – or is not able to – change anything based on a metric then nothing is actually happening and if nothing is going happening, why are we spending money doing it?
Don’t get me wrong, I am an ardent believer in the value of data and data analytics, I just question the value in regular reporting. Those two subjects are definitely related, but they’re not just different, at times I believe they are fundamentally opposed.

An over-reliance on reporting can damage a business in four ways:

Restricting Innovation and Creativity
Raw data – stored in a well-organized and accessible database – encourages creative and insightful problem solving, it begs for innovative relationships to be found, provides opportunities for surprising connections to be made, and encourages ‘what if’ scenario planning.
Reports are tools for managing an operation. Reports come with ingrained expectations and encourage more constrained and retrospective analysis. They ask questions like ‘did what we expect to happen, actually happen’.
The more an organization relies on reports the more, I believe, it will tend to become operational in nature and backward focused in its analytics, asking and explaining what happened last month and how that was different to plan and to the month before. Yes it is import to know how many new accounts were opened and whether that was more or less than planned for in the annual budget, but no one ever changed the status quo by knowing how many accounts they had opened.
The easiest way to look good as the analytics department in an organization with a heavy focus on reports, is to get those reports to show stable numbers in-line with the annual plan, thus raising as few questions as possible; and the easiest way to do that is by implementing the same strategy year after year. To look good in an organization that understands the real value of data though, an analytics department has to add business value, has to delve into the data and has to come up with insightful stories about relationships that weren’t known last year, designing and implementing innovative strategies that are by their nature hard to plan accurately in an annual budgeting process, but which have the potential to change an industry.

Creating a False Sense of Control
Reports also create an often false sense of accuracy. A report, nicely formatted and with numbers showing month-on-month and year-to-date changes to the second decimal point, carries a sense of presence; if the numbers today look like the numbers did a year ago they feel like they must be right, but if the numbers today look like the numbers did a year ago there is also less of an incentive to test the underlying assumptions and the numbers can only ever be as accurate as those assumptions: how is profit estimated, how is long-term risk accounted for, how are marketing costs accounted for, how much growth is assumed, etc. and is this still valid?
Further, in a similar way to how too many credit policies can end up reducing the accountability of business leaders rather than increasing it, when too much importance is placed on reporting managers become accountable for knowing their numbers, rather than knowing their businesses. If you can say how much your numbers changed month-on-month but not why, then you’re focusing on the wrong things.

Raising Costs
Every report includes multiple individual metrics and goes to multiple stakeholders, each of those metrics has the potential to raise a question with each of those stakeholders. This is good if the question being raised influences the actions of the business, but the work involved in answering a question is not related to the value of answering it and so as more metrics of lesser importance are added to a business’ vocabulary, the odds of a question generating non-value-adding work increases exponentially.
Once it has been asked, it is hard to ignore a question pertaining to a report without looking like you don’t understand your business, but sometimes the opposite is true. If you really understand your business you’ll know which metrics are indicative of its overall state and which are not. While your own understanding of your business should encompass the multiple and detailed metrics impacting your business, you should only be reporting the most important of those to broader audiences.
And it is not just what you’re reporting, but to whom. Often a question asked out of interest by an uninvolved party can trigger a significant amount of work without providing any extra control or oversight. Better reports and better audiences should therefore replace old ones and metrics that are not value-adding in a context should not be displayed in that context; or the audience needs to change until the context is right.

Compounding Errors
The biggest problem, though, that I have with a report-based approach is the potential for compounding errors. When one report is compiled based off another report there is always the risk that an error in the first will be included in the second. This actually costs the organization in two ways: firstly the obvious risk of incorrectly informed decisions and secondly in the extra work needed to stay vigilant to this risk.
Numbers need to be checked and rechecked, formats need to be aligned or changed in synchronization, and reconciliations need to be carried out where constant differences exist – month-end data versus cycle end data, monthly average exchange rates versus month-end exchange rates, etc.
Time should never be spent getting the numbers to match; that changes nothing. Time should rather be spent creating a single source of data that can be accessed by multiple teams and which can be left in its raw state, any customization of the data happening in one team will therefore remain isolated from all other teams.

Reports are important and will remain so, but their role should be understood. A few key metrics should be reported widely and these should each add a significant and unique piece of information about an organization’s health, at one level down a similar report should break down the team’s performance, but beyond that time and resources should be invested in the creative analysis of raw data, encouraging the creation of analytics-driven business stories.
Getting this right will involve a culture change more than anything, a move away from trusting the person who knows their numbers to trusting the person who provides the most genuine insight.
I know of a loan origination operation that charges sales people a token fee for any declined application which they asked to be manually referred, forcing them to consider the merits of the case carefully before adding to the costs. A similar approach might be helpful here, charging audiences for access to monthly reports on a per metric basis – this could be an actual monetary fine which is added saved up for an end of year event or a virtual currency awarded on a quota basis.