Posts Tagged ‘Credit Risk Management’

I wrote my layman’s introduction to scoring a while ago now and never delivered the promised more in-depth articles. This is the first in a line of articles correcting that oversight. The team at Scorto has very kindly provided me with a white paper on scorecard building, which I will break into sections and reproduce here. In the first of those articles, I’ll look into reject inference, a topic that has been asked about before.

One of the inherent problems with a scorecard is that while you can test easily test whether you made the right decision in accepting an application, it is less easy to know whether you made the right decision in rejecting an application. In the day-to-day running of a business this might not seem like much of a problem, but it is dangerous in two ways: · it can limit the highly profitable growth opportunities around the cut-off point by hiding any segmenting behaviour a characteristic might have; and · it can lead to a point where the data that is available for creating new scorecards represents only a portion of the population likely to apply. As this portion is disproportionately ‘good’ it can cause future scorecards to under-estimate the risk present in a population. Each application provides a lender with a great deal of characteristic data: age, income, bureau score, etc. That application data is expensive to acquire, but of limited value until it is connected with behavioural data. When an application is approved, that value-adding behavioural data follows as a matter of course and comes cheaply: did the customer of age x and with income of y and a bureau score of z go “bad” or not? Every application that is rejected gets no such data. Unless we go out of our way to get it; and that’s where reject inference comes into play.

The general population in a market will have an average probability of bad that is influenced by various national and economic characteristics, but generally stable. A smaller sub-population will make-up the total population of applicants for any given loan product –the average probability of bad in this total population will rise and fall more easily depending on marketing and product design. It is the risk of that total population of applicants that a scorecard should aim to understand. However, the data from existing customers is not a full reflection of that population. It has been filtered through the approval process it stripped of a lot of its bads. Very often, the key data problem when building a scorecard build is the lack of information on “bad” since that’s what we’re trying to model, the probability an application with a given set of characteristics will end up “bad”. The more conservative the scoring strategy in question, the more the data will become concentrated in the better score quadrants and the weaker it will become for future scorecard builds. Clearly we need a way to bring back that information. Just because the rejected applications were too risky to approve doesn’t mean they’re too risky to add value in this exercise. We do this by combining the application data of the rejected applicants with external data sources or proxies. The main difficulty related to this approach is the unavailability and/ or inconsistency of the data which may make it difficult to classify an outcome as “good” or “bad”. A number of methods can be used to infer the performance of rejected applicants.

Simple Augmentation
Not all rejected applications would have gone bad. We knew this at the time we rejected them, we just knew that too few would stay good to compensate for those that did go bad. So while a segment of applications with a 15% probability of bad might be deemed too risky, 85% of them would still be good accounts. Using that knowledge we can reconsider the rejected applications in the data exercise.

· A base scoring model is built using data from the borrowers whose behavior is known – the previously approved book.
· Using the developed model, the rejected applications are scored and an estimation is made of the percentage of “bad” borrowers and that performance is assigned at random but in proportion across the rejected applications.
· The cut-off point should be set in accordance with the rules of the current lending policy that define the permissible level of bad borrowers.
· Information on the rejected and approved requests is merged and the resulting set is used to build the final scoring model.

Accept/ Reject Augmentation
The basis of this method consists in the correction of the weights of the base scoring model by taking into consideration the likelihood of the request‘s approval.
· The first step is to build a model that evaluates the likelihood of a requests approval or rejection. · The weights of the characteristics are adjusted taking into consideration the likelihood of the request‘s approval or rejection, determined during the previous step. This is done so that the resulting scores are inversely proportional to the likelihood of the request‘s approval. So, for example, if the original approval rate was 50% in a certain cluster then each approved record is replicated to stand in for itself and the one that was rejected.
· This method is preferable to the Simple Augmentation method, but not without its own drawbacks. Two key problems can be created by augmentation: the impact of small and unusual groups can be exaggerated (such as low-side overrides for VIP clients) and then because you’ve only modeled on approved accounts the approval rates will be either 0% or 100% in each node.

Fuzzy Augmentation
The distinguishing feature of this method is that each rejected request is split and used twice, to reflect each of the likelihood of the good and bad outcomes. In other words, if a rejected application has a 15% probability of going bad it is split and 15% of the person is assumed to go bad and 85% assumed to stay good.
· Classification
Evaluation of a set of the rejected requests is performed using a base scoring model that was built based on requests with a known status;
– The likelihood of a default p(bad) and that of the “good” outcome p(good) are determined based on the set cut-off point, defining the required percentage of the “bad” requests (p(bad)+p(good)=1); – Two records that correspond to the likelihood of the “good” and “bad” outcomes are formed for each rejected request;
– Evaluation of the rejected requests is performed taking into consideration the likelihood of the two outcomes. Those accounts that fall under the likelihood of the “good” outcome are assigned with the weight p(good). The accounts that fall under the likelihood of the “bad” outcome are assigned with the weight p(bad).
· Clarification
– The data on the approved requests is merged with the data on the rejected requests and the rating of each request is adjusted taking into consideration the likelihood of the request‘s further approval. For example, the frequency of the “good” outcome for a rejected request is evaluated as the result of the “good” outcome multiplied by the weight coefficient.
– The final scoring model is built based on the combined data set.

Reject inference is no a single silver bullet. Used inexpertly it can lead to less accurate rather than more accurate results. Wherever possible, it is better to augment the exercise with a test-and-learn experiment to understand the true performance of small portions of key rejected segments. Then a new scorecard can be built based on the data from this new test segment alone and the true bad rates from that model can be compared and averaged to those from the reject inference model to get a more reliable bad rate for the rejected population.


Read Full Post »

You will almost certainly have heard the phrase, ‘you can’t manage want you don’t measure’. This is true, but there is a corollary to that phrase which is often not considered, ‘you have to manage what you do measure’.

To manage a business you need to understand it, but more reports do not necessarily mean a deeper understanding. More reports do, however, mean more work, often exponentially more work. So while regular reporting is obviously important for the day-to-day functioning of a business, its extent should be carefully planned.
Since I started this article with one piece of trite wisdom, I’ll continue. I’m trying to write my first novel – man can not live on tales of credit risk strategy alone – and in a writing seminar I attended the instructor made reference to this piece of wisdom which he picked-up in an otherwise forgettable book on script writing, ‘if nothing has changed, nothing has happened’.
It is important to look at the regular reports generated in an organization with this philosophy in mind – do the embedded metrics enable the audience present to change the business? If the audience is not going to – or is not able to – change anything based on a metric then nothing is actually happening and if nothing is going happening, why are we spending money doing it?
Don’t get me wrong, I am an ardent believer in the value of data and data analytics, I just question the value in regular reporting. Those two subjects are definitely related, but they’re not just different, at times I believe they are fundamentally opposed.

An over-reliance on reporting can damage a business in four ways:

Restricting Innovation and Creativity
Raw data – stored in a well-organized and accessible database – encourages creative and insightful problem solving, it begs for innovative relationships to be found, provides opportunities for surprising connections to be made, and encourages ‘what if’ scenario planning.
Reports are tools for managing an operation. Reports come with ingrained expectations and encourage more constrained and retrospective analysis. They ask questions like ‘did what we expect to happen, actually happen’.
The more an organization relies on reports the more, I believe, it will tend to become operational in nature and backward focused in its analytics, asking and explaining what happened last month and how that was different to plan and to the month before. Yes it is import to know how many new accounts were opened and whether that was more or less than planned for in the annual budget, but no one ever changed the status quo by knowing how many accounts they had opened.
The easiest way to look good as the analytics department in an organization with a heavy focus on reports, is to get those reports to show stable numbers in-line with the annual plan, thus raising as few questions as possible; and the easiest way to do that is by implementing the same strategy year after year. To look good in an organization that understands the real value of data though, an analytics department has to add business value, has to delve into the data and has to come up with insightful stories about relationships that weren’t known last year, designing and implementing innovative strategies that are by their nature hard to plan accurately in an annual budgeting process, but which have the potential to change an industry.

Creating a False Sense of Control
Reports also create an often false sense of accuracy. A report, nicely formatted and with numbers showing month-on-month and year-to-date changes to the second decimal point, carries a sense of presence; if the numbers today look like the numbers did a year ago they feel like they must be right, but if the numbers today look like the numbers did a year ago there is also less of an incentive to test the underlying assumptions and the numbers can only ever be as accurate as those assumptions: how is profit estimated, how is long-term risk accounted for, how are marketing costs accounted for, how much growth is assumed, etc. and is this still valid?
Further, in a similar way to how too many credit policies can end up reducing the accountability of business leaders rather than increasing it, when too much importance is placed on reporting managers become accountable for knowing their numbers, rather than knowing their businesses. If you can say how much your numbers changed month-on-month but not why, then you’re focusing on the wrong things.

Raising Costs
Every report includes multiple individual metrics and goes to multiple stakeholders, each of those metrics has the potential to raise a question with each of those stakeholders. This is good if the question being raised influences the actions of the business, but the work involved in answering a question is not related to the value of answering it and so as more metrics of lesser importance are added to a business’ vocabulary, the odds of a question generating non-value-adding work increases exponentially.
Once it has been asked, it is hard to ignore a question pertaining to a report without looking like you don’t understand your business, but sometimes the opposite is true. If you really understand your business you’ll know which metrics are indicative of its overall state and which are not. While your own understanding of your business should encompass the multiple and detailed metrics impacting your business, you should only be reporting the most important of those to broader audiences.
And it is not just what you’re reporting, but to whom. Often a question asked out of interest by an uninvolved party can trigger a significant amount of work without providing any extra control or oversight. Better reports and better audiences should therefore replace old ones and metrics that are not value-adding in a context should not be displayed in that context; or the audience needs to change until the context is right.

Compounding Errors
The biggest problem, though, that I have with a report-based approach is the potential for compounding errors. When one report is compiled based off another report there is always the risk that an error in the first will be included in the second. This actually costs the organization in two ways: firstly the obvious risk of incorrectly informed decisions and secondly in the extra work needed to stay vigilant to this risk.
Numbers need to be checked and rechecked, formats need to be aligned or changed in synchronization, and reconciliations need to be carried out where constant differences exist – month-end data versus cycle end data, monthly average exchange rates versus month-end exchange rates, etc.
Time should never be spent getting the numbers to match; that changes nothing. Time should rather be spent creating a single source of data that can be accessed by multiple teams and which can be left in its raw state, any customization of the data happening in one team will therefore remain isolated from all other teams.

Reports are important and will remain so, but their role should be understood. A few key metrics should be reported widely and these should each add a significant and unique piece of information about an organization’s health, at one level down a similar report should break down the team’s performance, but beyond that time and resources should be invested in the creative analysis of raw data, encouraging the creation of analytics-driven business stories.
Getting this right will involve a culture change more than anything, a move away from trusting the person who knows their numbers to trusting the person who provides the most genuine insight.
I know of a loan origination operation that charges sales people a token fee for any declined application which they asked to be manually referred, forcing them to consider the merits of the case carefully before adding to the costs. A similar approach might be helpful here, charging audiences for access to monthly reports on a per metric basis – this could be an actual monetary fine which is added saved up for an end of year event or a virtual currency awarded on a quota basis.

Read Full Post »

Every lending organisation needs a good credit policy but at what point does ‘good’ policy become ‘too much’ policy?

There is of course a trade-off between risk control and operational efficiency but their relationship isn’t always as clear as you might think and it continues to evolve as the lending industry moves away from hard-coded, one-size-fits-all rules to more dynamic strategies.

So how do we know when there is too much policy? In my opinion, a credit policy is more like the army than it is like the police force – it should establish and defend the boundaries of the lending decision but problems arise when it becomes actively involved within those borders.
This manifests itself in the common complaint of policy teams spending too much time managing day-to-day policy compliance and too little time thinking about a policy’s purpose; creating a culture where people ask a lot of questions about ‘how’ something is done in a particular organization but very few about ‘why’ it is done.
This is a problem of policy process as well as policy content.

The process should not shift accountability along a sign-off chain
Credit policies tend to generate supporting processes that can easily devolve to a point where even simple change requests must pass through a complex sign-off chain. Where does the accountability reside in such a chain?
All too often, only at the top; of course it is important for the most senior approver to be accountable but it is even more important that the original decision-maker is accountable and the further up the chain a decision moves the less likely this is the case. Each new signature should not represent the new owner of the accountability but rather the new additional co-owner with the original decision-maker.
Of course there are situations where a chain of sign-offs is a genuine safe-guard but in many more cases they serve to undermine the decision-making process by removing that key relationship between action and accountability. As a result the person proposing an action is able to make lax decisions while the person agreeing to them is removed from the information and so more prone to oversights; bad proposals consume resources while moving up and down the sign-off chain or, worse, slip through the gaps and are approved.

The first step back in the right direction is to remove the policy team from the sign-off process. Since the policy already reflects their views, their sign-off is redundant. Instead the business owner should be able to sign-off to the fact that the proposed change is within the parameters set out in the policy and should be held accountable for that fact.
By doing this the business can make faster decisions while simultaneously been forced to better understand the policy. But does it mean that the policy team should just sit back and assume all of the policies are being adhered to? No. The policy team still plays two important roles in the process: they provide guidance as needed and they monitor the decisions that have already been made, only now they do so outside of the sign-off process. In most cases there is sufficient time between a decision being made and it being implemented for a re-active check to still be
The only cases that should require direct pre-emptive input from the policy team are those that the product team feels breach the current policy; which brings us to the second solution.

The content should not assume accountability, the person should
A credit policy that is rich in detail is also a credit policy that is likely to generate many and insignificant breaches and thus a constant stream of work for the policy team. Over time it is easy for any policy to evolve in this way as new rules and sub-rules get added to accommodate perceived new risks or to adjust to changing circumstances; indeed it is often in the policy team’s interest to allow it to do so. However, extra detail almost always leads to higher, not lower, risk.
Firstly, a complex policy is less likely to be understood and therefore more likely to lead to accountability shifting to the policy team through the sign-off chain as discussed above. By increasing the volume of ‘policy exception’ cases you also reduce the time and resources available for focusing on each request and so important projects may receive less diligence than they deserve.
But an overly complex policy can also shift accountability in another way: whenever you describe ten specific situations where a certain action is not allowed you can often be implied to be simultaneously implying that it is allowed in any other situation and thus freeing the actor from making a personal decision regarding its suitability to the given situation; the rule becomes more accountable than the person.
The first point is easily understood so I’ll focus on the second. By filling your policy with detailed rules you imply that anything that doesn’t expressly breach the policy is allowable and so expose the organization to risks that haven’t yet been considered.
The most apt example I can think to explain this point better relates not to credit policy but to something much simpler – travel expenses.

I used to work in a team that travelled frequently to international offices, typically spending three to four days abroad at any one time. When I joined we were a small team with a large amount of autonomy and my boss dictated the policy for travel-related expenses and his policy was: when you’re travelling for work, eat and drink as you would at home if you were paying for it.
He told us not to feel that we should sacrifice just because the country we were in happened to be an expensive one – it was the organisation’s decision to send us there after all – but similarly not to become extravagant just because the company was picking-up the tab.
It was a very broad policy with little in the way of detail and so it made us each accountable; it worked brilliantly and I never heard of a colleague that abused it or felt abused by it.
That policy was inherently fair in all situations because it was flexible. However, in time our parent company bought another local company and our team was brought under their ‘more developed’ corporate structures, including their travel claim policies. These policies, like at so many companies, tried to be fair by unwaveringly applying a single maximum value to all meal claims. In some locations this meant you could eat like a king while in others austerity was forced upon you.
In don’t have data to back this up but I am sure that it created a lose-lose situation: morale definitely dropped and I’m certain the cost of travel claims increased as everyone spent up to the daily cap maximum each day, either because they had to or simply because now they could without feeling any responsibility not to.
Of course this example doesn’t necessarily apply 100% to a credit policy but much of the underlying truth remains: broader policy rules makes people accountable and so they needn’t increase risk in any many cases they actually decrease it.

A credit policy that says ‘we don’t lend to customer segments where the expected returns fail to compensate for their risk’ makes the decision-maker accountable than a policy that says ‘we don’t lend to students, the self-employed or the unemployed’.
Under the former policy, if a decision-maker isn’t confident enough in their reasons for lending into a new segment they can’t go ahead with that decision. On the other hand though, if they have solid analysis and a risk controlling roll-out process in place, they can go ahead and, unhindered by needless policy, can make a name for themselves and money for the business.
The latter policy though makes the decision-maker accountable only for the fact that the new customer segment was not one of those expressively prohibited not to the fact that the decision is likely to be a profitable one.

Of course encouraging broader rules and more accountability pre-supposes that the staff in key positions are competent but if they are not, it’s not a new credit policy that you need…

Read Full Post »

In developed markets ‘comprehensive’ credit bureaus are common place; that is credit bureaus which store information relating to all of an individual’s past repayments, not just information relating to their defaults. Although there are some exceptions most borrowers, lenders and regulators in these markets believe that a trusted third-party holding a database of good and bad financial history provides borrowers with better products and at a fairer price.

But most markets don’t start at this point. In markets where credit bureaus are new or non-existent it is often difficult for regulators and borrowers to know how to decide between a positive bureau and a negative bureau. In many cases the costs of a positive bureau are relatively well understood – both the physical costs of development and the societal costs in the form of privacy concerns – while its benefits remain underestimated: usually assumed to be only those benefits accruing to lenders through better risk control. As a result, the initial push is often for a ‘negative data only’ bureau.

But a positive bureau also carries significant benefits to borrowers and I will use an applicable, albeit in the inverse, example to explain how and why this can is the case.

In Formula 1 motor-racing points are awarded to the ten best drivers in each race and accumulated over the season to identify an overall winner. The goal of this approach is to identify the ‘best’ driver in a given year and it serves this purpose well – sorting out the very best from the just very good. Since most stakeholders in Formula 1 – team owners, sponsors, drivers and spectators – are concerned almost exclusively with knowing which individual is the ‘best’ this system seldom comes under serious criticism.

But that doesn’t mean that it suits all purposes equally well. Imagine you are placed in charge of a new Formula 1 team that has started with a very limited budget. The team owners realize that this small budget effectively precludes the possibility of winning the title in the short-term but they also understand that if they can survive in the sport for two years they can gain a bigger sponsor and fund a more serious title challenge thereafter. So their goal it to survive for two years and to do that they need to maximize the exposure they provide their advertisers by finishing as many races as possible as far from the back as possible.

What the budget means for you as the team manager is that you can only afford to hire cheap drivers which we’ll assume means drivers who finished in the bottom ten places in the previous season. From that group you’ll still want to get the two best drivers possible but how will you identify the ‘best’ drivers in that group? The table below shows the driver standings at the end of the 2010 season:

As the table above shows, the present system is so focused on segmenting drivers at the top of the table that it struggles to differentiate drivers towards the bottom; in fact the bottom six drivers all finished with zero points. 
Vitantonio Liuzzi might look like a clear choice in more than double the next driver tally but is he really the best option and who would you choose to join him? A new model is needed for your purpose, one that separates the ‘worst’ from the simply ‘bad’.

Negative bureaus have a similarly one-sided focus, a focus that might have fits their initial purpose but that limits their use in other situations. A negative bureau only stores information on customer defaults; helping to separate the highest risk customers from the less high risk customers but struggling to segment low risk customers. Information is only created when a payment is missed – and usually only when it is missed for several months – and so an individual with a long history of timeously repaying multiple debts will be seen as the same risk as a customer who has only paid back one small debt, for example.

Returning now to the earlier scenario: the current Formula 1 model awards points for finishing in one of the top 10 places using a sliding scale of 25; 18; 15; 12; 10; 8; 6; 4; 2; 1. A model better suited for your new purpose should still retain the information relating to good performances but should also seek to create and store information relating to bad performances. The simplest way to do this would be to penalize drivers for finishing in the last ten places using the same scale but in reverse.

Implementing these simple changes across the 2010 season immediately provides more insight into the relative performance of drivers towards the bottom of the standings and in so doing the gap between each driver becomes clearer and useful information is been created. 

Although Vitantonio Liuzzi had previously looked like an obvious pick his good performances – a 6th place in Korea and a 7th place in Australia – were overshadowed by his many more poor performances – last place in Abu Dhabi, Brazil and Singapore and second-last place in Japan and China. When the whole picture is seen together, he is no longer such an attractive prospect. A better bet would be to approach Jaime Alguersuari who, although he never placed better than 9th finished in 13th place or better in 80% of his races and only came last once. Sebastien Buemi also finished last on only one occasion though he spread his remaining results more unevenly with both more top ten and more bottom ten finishes than Jaime.
Both of these drivers would offer theoretically better returns having placed worse on the accepted scale but with more of the sort of results you’re team is looking for.
Of course this isn’t the perfect model and real Formula 1 fans might take exception but it does illustrate how creating a holistic view of the relative performance of all drivers, not just the very good ones, can be value-adding. 

Similarly, a negative-only bureau suits the simple purpose of identifying the very worst of your potential customers but it struggles to identify good customers or to segment the ‘middle’ customers by relative risk; users of the bureau that wish to merely avoid the worst borrowers are well served by this information but lenders who wish to target the best customer segments for low risk/ low margin products or who wish to match pricing to risk are unable to do so. 
The societal costs of a negative only bureau are therefore born by the best performing borrowers in that market who are given the same products at the same price as average risk borrowers. 
A comprehensive positive and negative bureau avoids this societal cost though it usually does so with added build and maintenance costs. 

When deciding which bureau is best for a given market then, borrowers and regulators should focus on the trade-off between the borrowers privacy concerns and the borrowers access to fair products at a fair price while lenders should focus on the trade-off between the cost of a comprehensive bureau – passed onto them in the form of higher bureau fees – and the expected benefits to be achieved through more profitable niche products.

* * * 

The fact that the 2011 season has just finished stands testament to how long this article has sat in draft form, awaiting publishing. However, the big delay does at least afford me add an addendum on the performance of the proposed model.
Of course far too many factors are at play to make a scientific comparison, not least the fact that Vitantonio Liuzzi, the man our model told us not to pick, changed teams but here goes anyway:
Vitantonio Liuzzi didn’t qualify for one race, retired from 5 and ended the season without a top 10 finish and only 7 finishes within the top 20. In all, he didn’t manage to collect a single point and joined seven other drivers in joint last place. 
Both models suggested Sebastien Buemi and he also finished the season placed 15th with 7 top ten finishes against five retirements and no finishes outside of the top 15. While Jaime Alguersuari, our model’s wildcard pick, finished one spot better on the overall standings with 5 top ten places, 3 retirements and only one finish outside of the top 20.
Never shy to identify a trend from two data points, I’d call that a 2-1 win to the comprehensive model

Read Full Post »

Many lenders fail to fully appreciate the size of their fraud losses. By not actively searching for – and thus not classifying – fraud within their bad debt losses, they miss the opportunity to create better defences and so remain exposed to ever-growing losses. Any account that is written-off without ever having made a payment is likely to be fraud; any account that is in collections for their full limit within the first month or two is likely to be fraud; any recently on book account that is written-off because the account holder is untraceable is likely to be fraud, etc.

Credit scorecards do not detect application fraud very well because the link between the credit applicant and the credit payer is broken. In a ‘normal’ case the person applying for the credit is also the person that will pay the monthly instalments and so the data in the application represents the risk of the future account-holder and thus the risk of a missed payment. However, when a fraudster applies with a falsified or stolen identity there is no such link and so the data in that application no longer has any relationship to the future account-holder and so can’t represent the true risk.


First Person Fraud

Now that explanation assumes we are talking about third-party fraud; fraud committed by someone other than the person described on the application. That is the most clear-cut form of fraud. However, there is also the matter of first person fraud which is less clear-cut.

First person fraud is committed when a customer applies using their own identity but does so with no intention of paying back the debt, often also changing key data fields – like income – to improve their chances of a larger loan.

Some lenders will treat this as a form of bad debt while others prefer to treat it as a form of fraud. It doesn’t really matter so long as it is treated as a specific sub-type of either definition. I would, however, recommend treating it as a sub-type of fraud unless a strong internal preference exists for treating it as bad debt. Traditional models for detecting bad debt are built on the assumption that the applicant has the intention of paying their debt and so aim to measure the ability to do so which they then translate into a measure of risk. In these cases though, that assumption is not true and so there should instead be a model looking for the existence of the willingness to pay the debt rather than the ability to do so. From a recovery point of view, a criminal fraud case it is also a stronger deterrent to customers than a civil bad debt one.


Third Person Fraud

The rest of the fraud then, is third-party fraud. There are a number of ways fraud can happen but I’ll just cover the two most common types: false applications and identity take-overs.

False applications are applications using entirely or largely fictional data. This is the less sophisticated method and is usually the first stage of fraud in a market and so is quickly detected when a fraud solution or fraud scorecard is implemented. Creating entirely new and believable identities on a large-scale without consciously or subconsciously reverting to a particular pattern is difficult. There is therefore a good chance of detecting false applications by using simple rules based on trends, repeated but mismatched information, etc.

A good credit bureau can also limit the impact of false applications since most lenders will then look for some history of borrowing before a loan is granted. An applicant claiming to be 35 years old and earning €5 000 a month with no borrowing history will raise suspicions, especially where there is also a sudden increase in credit enquiries.

Identity take-over is harder to detect but also harder to perpetrate, so it is a more common problem in the more sophisticated markets. In these cases a fraudster adopts the identity – and therefore the pre-existing credit history – of a genuine person with only the slightest changes made to contact information in most cases. Again a good credit bureau is the first line of defence albeit now in a reactive capacity alerting the lender to multiple credit enquiries within a short period of time.

Credit bureau alerts should be supported by a rule-based fraud system with access to historical internal and, as much as possible, external data. Such a system will typically be built using three types of rules: rules specific to the application itself; rules matching information in the application to historical internal and external known frauds; rules matching information in the application to all historical applications.


Application Specific Rules

Application specific rules can be built and implemented entirely within an organisation and are therefore often the first phase in the roll-out of a full application fraud solution. These rules look only at the information captured from the application in question and attempt to identify known trends and logical data mismatches.

Based on a review of historical fraud trends the lender may have identified that the majority of their frauds originated through their online channel in loans to customers aged 25 years or younger, who were foreign citizens and who had only a short history at their current address. The lender would then construct a rule to identify all applications displaying these characteristics.

Over-and-above these trends there are also suspicious data mismatches that may be a result of the data being entered by someone less familiar with the data than a real customer would be expected to be with their own information. These data mismatches would typically involve things like an unusually high salary given the applicant’s age, an inconsistency between the applicant’s stated age and date of birth, etc.

In the simplest incarnation these rules would flag applications for further, manual investigation. In more sophisticated systems though, some form of risk-indicative score would be assigned to each rule and applications would then be prioritised based on the scores they accumulated from each rule hit.

These rules are easy to implement and need little in the way of infrastructure but they only detect those fraudulent attempts where a mistake was made by the fraudster. In order to broaden the coverage of the application fraud solution it is vital to look beyond the individual application and to consider a wider database of stored information relating to previous applications – both those known to have been fraudulent and those still considered to be good.


Known Fraud Data

The most obvious way to do this is to match the information in the application to all the information from previous applications that are known – or at least suspected – to have been fraudulent. The fraudster’s greatest weakness is that certain data fields need to be re-used either to subvert the lenders validation processes or to simplify their own processes.

For example many lenders may phone applicants to confirm certain aspects on their application or to encourage early utilisation and so in these cases the fraudster would need to supply at least one genuine contact number; in other cases lenders may automatically validate addresses and so in these cases the fraudster would need to supply a valid address. No matter the reason, as soon as some data is re-used it becomes possible to identify where that has happened and in so doing to identify a higher risk of fraud.

To do this, the known fraud data should be broken down into its component parts and matched separately so that any re-use of an individual data field – address, mobile number, employer name, etc. – can be identified even if it is used out of context. Once identified, it is important to calculate the relative importance in order to prioritise alerts. Again this is best done with a scorecard but expert judgement alone can still add value; for example it is possible that several genuine applicants will work for an employer that has been previously used in a fraudulent application but it would be much more worrying if a new applicant was to apply using a phone number or address that was previously used by a fraudster.

It is also common to prioritise the historical data itself based on whether it originated from a confirmed fraud or a suspected one. Fraud can usually only be confirmed if the loan was actually issued, not paid and then later shown to be fraudulent. Matches to data relating to these accounts will usually be prioritised. Data relating to applications that were stopped based on the suspicion of fraud, on the other hand, may be slightly de-prioritised.


Previous Applications

When screening new applications it is important to check their data not just against the known fraud data discussed above but also against all previous ‘good’ applications. This is for two reasons: firstly not all fraudulently applied for applications are detected and secondly, especially in the case of identity theft, the fraudster is not always the first person to use the data and so it is possible that a genuine customer had previously applied using the data that is now being used by a fraudster.

Previous application data should be matched in two steps if possible. Where the same applicant has applied for a loan before, their specific data should be matched and checked for changes and anomalies. The analysis must be able to show if, for a given social security number, there have been any changes in name, address, employer, marital status, etc. and if so, how likely those changes are to be the result of an attempted identity theft versus a simple change in circumstances. Then – or where the applicant has not previously applied for a loan – the data fields should be separated and matched to all existing data in the same way that the known fraud data was queried.

As with the known fraud data it is worth prioritising these alerts. A match to known fraud data should be prioritised over a match to a previous application and within the matches a similar prioritisation should occur: again it would not be unusual for several applicants to share the same employer while it would be unusual for more than one applicant to share a mobile phone number and it would be impossible for more than one applicant to share a social security or national identity number.


Shared Data

When matching data in this way the probability of detecting a fraud increase as more data becomes available for matching. That is why data sharing is such an important tool in the fight against application fraud. Each lender may only receive a handful of fraud cases which limits not only their ability to develop good rules but most importantly limits their ability to detect duplicated data fields.

Typically data is shared indirectly and through a trusted third-party. In this model each lender lists all their known and suspected frauds on a shared database that is used to generate alerts but cannot otherwise be accessed by lenders. Then all new applications are first matched to the full list of known frauds before being matched only to the lender’s own previous applications and then subjected to generic and customised application-specific rules as shown in the diagram below:


Read Full Post »

In terms of credit risk strategy, the lending markets in America and Britain undoubtedly lead the way while several other markets around the world are applying many of the same principles with accuracy and good results. However, for a number of reasons and in a number of ways, many more lending markets are much less sophisticated. In this article I will focus on these developing markets; discussing how credit risk strategies can be applied in such markets and how doing so will add value to a lender.

The fundamentals that underpin credit risk strategies are constant but as lenders develop in terms of sophistication the way in which these fundamentals are applied may vary. At the very earliest stages of development the focus will be on automating the decisioning processes; once this has been done the focus should shift to the implementation of basic scorecards and segmented strategies which will, in time, evolve from focusing on risk mitigation to profit maximisation.

Automating the Decisioning Process

The most under-developed markets tend to grant loans using a branch-based decisioning model as a legacy of the days of fully manual lending. As such, it is an aspect more typical of the older and larger banks in developing regions and one that is allowing newer and smaller competitors to enter the market and be more agile.

A typical branch-lending model looks something like the diagram below:

In a model like this, the credit policy is usually designed and signed-off by a committee of very senior managers working in the head-office. This policy is then handed-over to the branches for implementation; usually by delivering training and documentation to each of the bank’s branch managers. This immediately presents an opportunity for misinterpretation to arise as branch managers try to internalise the intentions of the policy-makers.

Once the policy has been handed-over, it becomes the branch manager’s responsibility to ensure that it is implemented as consistently as possible. However, since each branch manager is different, as is each member of branch staff, this is seldom possible and so policy implementation tends to vary to a greater or lesser extent across the branch network.

Even when the policy is well implemented though, the nature of a single written policy is such that it can identify the applicants that are considered too risky to qualify for a loan but it cannot go beyond that to segment accepted customers into risk groups. This means that the only way that senior management can ensure the policy is being implemented correctly in the highest risk situations is by using the size of the loan as an indication of risk. So, to do this a series triggers are set to escalate loan applications to management committees.

In this model, which is not an untypical one, there are three committees: one within the branch itself where senior branch staff review the work of the loan officer for small value loan applications; if the loan size exceeds the branch committee’s mandate though it must then be escalated to a regional committee or, if sufficiently large, all the way to a head-office committee.

Although it is easy to see how such a series of committees came into being, their on-going existence adds significant costs and delays to the application process.

In developing markets where skills are short there a significant premium must usually be paid to high quality management staff. So, to use the time of these managers to essentially remake the same decision over-and-over (having already decided on the policy, they now need to repeatedly decide whether an application meets the agreed upon criteria) is an inefficient way to invest a valuable resource. More importantly though are the delays that must necessarily accompany such a series of committees. As an application is passed on from one team – and more importantly from one location – to another a delay is incurred. Added to this is the fact that committees need to convene before they can make a decision and usually do so on fixed dates meaning that a loan application may have to wait a number of days until the next time the relevant committee meets.

But the costs and delays of such a model are not only incurred by the lender, the borrower too is burdened with a number of indirect costs. In order to qualify for a loan in a market where impartial third-party credit data is not widely available – i.e. where there are no strong and accurate credit bureaus – an applicant typically needs to ‘over prove’ their risk worthiness. Where address and identification data is equally unreliable this requirement is even more burdensome. In a typical example an applicant might need to first show an established relationship with the bank (6 months of salary payments, for example); provide a written undertaking from their employer that they will notify the bank of any change in employment status; the address of a reference who can be contacted when the original borrower can not; and often some degree of security, even for small value loans.

These added costs serve to discourage lending and add to what is usually the biggest problem faced by banks with a branch-based lending model: an inability to grow quickly and profitably.

Many people might think that the biggest issue faced by lenders in developing markets is the risk of bad debt but this is seldom the case. Lenders know that they don’t have access to all the information they need when they need it and so they have put in place the processes I’ve just discussed to mitigate the risk of losses. However, as I pointed out, those processes are ungainly and expensive. Too ungainly and too expensive as it turns out to facilitate growth and this is what most lenders want to change as they see more agile competitors starting to enter their markets.

A fundamental problem with growing with a branch-based lending model is that the costs of growing the system rise in line with the increase capacity. So, to serve twice as many customers will cost almost twice as much. This is the case for a few reasons. Firstly, each branch serves only a given geographical catchment area and so to serve customers in a new region, a new branch is likely to be needed. Unfortunately, it is almost impossible to add branches perfectly and each new branch is likely to lead to either an inefficient overlapping of catchment areas or ineffective gaps. Secondly, within the branch itself there is a fixed capacity both in terms of the number of staff it can accommodate and in terms of the number of customers each member of staff can serve. Both of these can be adjusted, but only slightly.

Added to this, such a model does not easily accommodate new lending channels. If, for example, the bank wished to use the internet as a channel it would need to replicate much of the infrastructure from the physical branches in the virtual branch because, although no physical buildings would be required and the coverage would be universal, the decisioning process would still require multiple loan officers and all the standard committees.

To overcome this many lenders have turned to agency agreements, most typically with large private and government employers. These employers will usually handle the administration of loan applications and loan payments for their staff and in return will either expect that their staff are offered loans at a discounted rate or that they themselves are compensated with a commission.

By simply taking the current policy rules from the branch based process and converting them into a series of automated rules in a centralised system many of these basic problems can be overcome; even before improving those rules with advanced statistical scorecards. Firstly the gap between policy design and policy implementation is removed, removing any risk of misinterpretation. Then the need for committees to ensure proper policy implementation is greatly reduced, greatly reducing the associated costs and delays. Thirdly the risk of inconsistent application is removed as every application, regardless of the branch originating it or the staff member capturing the data, is treated in the same way. Finally, since the decisioning is automated there is almost no cost to add a new channel onto the existing infrastructure meaning that new technologies like internet and mobile banking can be leveraged as profitable channels for growth.

The Introduction of Scoring

With the basic infrastructure in place it is time to start leveraging it to its full advantage by introducing scorecards and segmented strategies. One of the more subtle weaknesses of a manual decision is that it is very hard to use a policy to do anything other than decline an account. As soon as you try to make a more nuanced decision and categorise accepted accounts into risk groups the number of variables increases too fast to deal with comfortably.

It is easy enough to say that an application can be accepted only if the applicant is over 21 years of age, earns more than €10 000 a year and has been working for their current employer for at least a year but how do you segment all the qualifying applications into low, medium and high risk groups? A low risk customer might be one that is over 35 years old, earns more than €15 000 and has been working at their current employer for at least a year; or one that is over 21 years old but who earns more than €25 000 and has been working at their current employer for at least two years; or one that is over 40 years old, earns more than €15 000 and has been working at their current employer for at least a year, etc.

It is too difficult to manage such a policy using anything other than an automated system that uses a scorecard to identify and segment risk across all accounts. Being able to do this allows a bank to begin customising its strategies and its products to each customer segment/ niche. Low risk customers can be attracted with lower prices or larger limits, high spending customers can be offered a premium card with more features but also with higher fees, etc.

The first step in the process would be to implement a generic scorecard; that is a scorecard built using pooled third-party data that relates to a portfolio that is similar to the one in which it is to be implemented. These scorecards are cheap and quick to implement and, as when used to inform only simple strategies, offer almost as much value as a fully bespoke scorecard would. Over time the data needed to build a more specific scorecard can be captured so that the generic scorecard can be replaced after eighteen to twenty-four months.

But the making of a decision is not the end goal; all decisions must be monitored on an on-going basis so that strategy changes can be implemented as soon as circumstances dictate. Again this is not something that is possible to do using a manual system where each review of an account’s current performance tends to involve as much work as the original decision to lend to that customer did. Fully fledged behavioural scorecards can be complex to build for developing banks but at this stage of the credit risk evolution a series of simple triggers can be sufficient. Reviewing an account in an automated environment is virtually instantaneous and free and so strategy changes can be implemented as soon as they are needed: limits can be increased monthly to all low risk accounts that pass a certain utilisation trigger, top-up loans can be offered to all low and medium risk customers as soon as their current balances fall below a certain percentage of the original balance, etc.

In so doing, a lender can optimise the distribution of their exposure; moving exposure from high risk segments to low risk segments or vice versa to achieve their business objectives. To ensure that this distribution remains optimised the individual scores and strategies should be consistently tested using champion/ challenger experiments. Champion/ challenger is always a simple concept and can be applied to any strategy provided the systems exist to ensure that it is implemented randomly and that its results are measurable. The more sophisticated the strategies, the more sophisticated the champion/ challenger experiments will look but the underlying theory remains unchanged.

Elevating the Profile of Credit Risk

Once scorecards and risk segmented strategies have been implemented by the credit risk team, the team can focus on elevating their profile within the larger organisation. As credit risk strategies are first implemented they are unlikely to interest the senior managers of a lender who would likely have come through a different career path: perhaps they have more of a financial accounting view of risk or perhaps they have a background in something completely different like marketing. This may make it difficult for the credit risk team to garner enough support to fund key projects in the future and so may restrict their ability to improve.

To overcome this, the credit team needs to shift its focus from risk to profit. The best result a credit risk team can achieve is not to minimise losses but to maximise profits while keeping risk within an acceptable band. I have written several articles on profit models which you can read here, here, here and here but the basic principle is that once the credit risk department is comfortable with the way in which their models can predict risk they need to understand how this risk contributes to the organisation’s overall profit.

This shift will typically happen in two ways: as a change in the messages the credit team communicates to the rest of the organisation and as a change in the underlying models themselves.

To change the messages being communicated by the credit team they may need to change their recruitment strategies and bring in managers who understand both the technical aspects of credit risk and the business imperatives of a lending organisation. More importantly though, they need to always seek to translate the benefit of their work from technical credit terms – PD, LGD, etc. – into terms that can be more widely understood and appreciated by senior management – return on investment, reduced write-offs, etc. A shift in message can happen before new models are developed but will almost always lead to the development of more business-focussed models going forward.

So the final step then is to actually make changes to the models and it is by the degree to which such specialised and profit-segmented models have been developed and deployed that a lenders level of sophistication will be measured in more sophisticated markets.

Read Full Post »

First things first, I am by no means a scorecard technician. I do not know how to build a scorecard myself, though I have a fair idea of how they are built; if that makes sense. As the title suggests, this article takes a simplistic view of the subject. I will delve into the underlying mathematics at only the highest of levels and only where necessary to explain another point. This article treats scorecards as just another tool in the credit risk process, albeit an important one that enables most of the other strategies discussed on this blog. I have asked a colleague to write a more specialised article covering the technical aspects and will post that as soon as it is available.


Scorecards aim to replace subjective human judgement with objective and statistically valid measures; replacing inconsistent anecdote-based decisions with consistent evidence-based ones. What they do is essentially no different from what a credit assessor would do, they just do it in a more objective and repeatable way. Although this difference may seem small, it enables a large array of new and profitable strategies.

So what is a scorecard?

A scorecard is a means of assigning importance to pieces of data so that a final decision can be made regarding the underlying account’s suitableness for a particular strategy. They do this by separating the data into its individual characteristics and then assigning a score to each characteristic based on its value and the average risk represented by that value.

For example an application for a new loan might be separated into age, income, length of relationship with the bank, credit bureau score, etc. Then the each possible value of those characteristics will be assigned a score based on the degree to which they impact risk. In this example ages between 19 and 24 might be given a score of – 100, ages between 25 and 30 a score of -75 and so on until ages 50 and upwards are given a score of +10. In this scenario young applicants are ‘punished’ while older customers benefit marginally from their age. This implies that risk has been shown to be inversely related to age. The diagram below shows an extract of a possible scorecard:

The score for each of these characteristics is then added to reach a final score. The final score produced by the scorecard is attached to a risk measure; usually something like the probability of an account going 90 days into arrears within the next 12 months. Reviewing this score-to-risk relationship allows a risk manager to set the point at which they will decline applications (the cut-off) and to understand the relative risk of each customer segment on the book. The diagram below shows how this score-to-risk relationship can be used to set a cut-off.

How is a scorecard built?

Basically what the scorecard builder wants to do is identify which characteristics at one point in time are predictive of a given outcome before or at some future point in time. To do this historic data must be structured so that one period can represent the ‘present state’ and the subsequent periods can represent the ‘future state’. In other words, if two years of data is available for analysis (the current month can be called Month 0 and the last Month can be called Month -24) then the most distant six months (from Month -24 to Month -18) will be used to represent the ‘current state’ or, more correctly, the observation period while the subsequent months (Months -17 to 0) represent the known future of those first six months and are called ‘the outcome period’. The type of data used in each of these periods will vary to reflect these differences so that application data (applicant age, applicant income, applicant bureau score, loan size requested, etc.) is important in the observation period and performance data (current balance, current days in arrears, etc.) is important in the outcome period.

With this simple step completed the accounts in the observation period must be defined and sorted based on their performance during the outcome period. To start this process a ‘bad definition’ and ‘good definition’ must first be agreed upon. This is usually something like: ‘to be considered bad, an account must have gone past 90 days in delinquency at least once during the 18 month outcome period’ and ‘to be considered good an account must never have gone past 30 days in delinquency during the same period’. Accounts that meet neither definition are classified as ‘indeterminate’.

Thus separated, the unique characteristics of each group can be identified. The data that was available at the time of application for every ‘good’ and ‘bad’ account is statistically tested and those characteristics with largely similar values within one group but largely varying values across groups are valuable indicators of risk and should be considered for the scorecard. For example if younger customers were shown to have a higher tendency to go ‘bad’ than older customers, then age can be said to be predictive of risk. If on average 5% of all accounts go bad but a full 20% of customers aged between 19 and 25 go bad while only 2% of customers aged over 50 go bad then age can be said to be a strong predictor of risk. There are a number of statistical tools that will identify these key characteristics and the degree to which they influence risk more accurately than this but they won’t be covered here.

Once each characteristic that is predictive of risk has been identified along with its relative importance some cleaning-up of the model is needed to ensure that no characteristics are overly correlated. That is, that no two characteristics are in effect showing the same thing. If this is the case, only the best of the related characteristics will be kept while the other will be discarded to prevent, for want of a better term, double-counting. Many characteristics are correlated in some way, for example the older you are the more likely you are to be married, but this is fine so long as both characteristics add some new information in their own right as is usually the case with age and marital status – an older, married applicant is less risky than a younger, married applicant just as a married, older applicant is less risky than a single, older applicant. However, there are cases where the two characteristics move so closely together that the one does not add any new information and should therefore not be included.

So, once the final characteristics and their relative weightings have been selected the basic scorecard is effectively in place. The final step is to make the outputs of the scorecard useable in the context of the business. This usually involves summarising the scores into a few score bands and may also include the addition of a constant – or some other means of manipulating the scores – so that the new scores match with other existing or previous models.


How do scorecards benefit an organisation?

Scorecards benefit organisations in two major ways: by describing risk in very fine detail they allow lenders to move beyond simple yes/ no decisions and to implement a wide range of segmented strategies; and by formalising the lending decision they provide lenders with consistency and measurability.

One of the major weaknesses of a manual decisioning system is that it seldom does more than identify the applications which should be declined leaving those that remain to be accepted and thereafter treated as being the same. This makes it very difficult to implement risk-segmented strategies. A scorecard, however, prioritises all accounts in order of risk and then declines those deemed too risky. This means that all accepted accounts can still be segmented by risk and this can be used as a basis for risk-based pricing, risk-based limit setting, etc.

The second major benefit comes from the standardisation of decisions. In a manual system the credit policy may well be centrally conceived but the quality of its implementation will be dependent on the branch or staff member actually processing the application. By implementing a scorecard this is no longer the case and the roll-out of a scorecard is almost always accompanied by the reduction in bad rates.

Over-and-above these risk benefits, the roll-out of a scorecard is also almost always accompanied by an increase in acceptance rates. This is because manual reviewers tend to be more conservative than they need to be in cases that vary in some way from the standard. The nature of a single credit policy is such that to qualify for a loan a customer must exceed the minimum requirements for every policy rule. For example, to get a loan the customer must be above the minimum age (say 28), must have been with the bank for more than the minimum period (say 6 months) and must have no adverse remarks on the credit bureau. A client of 26 with a five year history with the bank and a clean credit report would be declined. With a scorecard in place though the relative importance of exceeding one criteria can be weighed against the relative importance of missing another and a more accurate decision can be made; almost always allowing more customers in.


Implementing scorecards

There are three levels of scorecard sophistication and, as with everything else in business, the best choice for any situation will likely involve a compromise between accuracy and cost.

The first option is to create an expert model. This is a manual approximation of a scorecard based on the experience of several experts. Ideally this exercise would be supported by some form of scenario planning tool where the results of various adjustments could be seen for a series of dummy applications – or genuine historic applications if these exist – until the results that meet the expectations of the ‘experts’. This method is better than manual decisioning since it leads to a system that looks at each customer in their entirety and because it enforces a standardised outcome. That said, since it is built upon relatively subjective judgements it should be replaced with a statistically built scorecard as soon as enough data is available to do so.

An alternative to the expert model is a generic scorecard. These are scorecards which have been built statistically but using a pool of similar though not customer-specific data. These scorecards are more accurate than expert models so as long as the data on which they were built reasonably resembles the situation in which they are to be employed. A bureau-level scorecard is probably the purest example of such a scorecard though generic scorecards exist for a range of different products and for each stage of the credit life-cycle.

Ideally, they should first be fine-tuned prior to their roll-out to compensate for any customer-specific quirks that may exist. During a fine-tuning, actual data is run through the scorecard and the results used to make small adjustments to the weightings given to each characteristic in the scorecard while the structure of the scorecard itself is left unchanged. For example, assume the original scorecard assigned the following weightings: -100 for the age group 19 to 24; -75 for the age group 25 to 30; -50 for the age group 31 to 40; and 0 for the age group 41 upwards. This could either be implemented as it is bit if there is enough data to do a fine-tune it might reveal that in this particular case the weightings should actually be as follows: -120 for the age group 19 to 24; -100 for the age group 25 to 30; -50 for the age group 31 to 40; and 10 for the age group 41 upwards. The scorecard structure though, as you can see, does not change.

In a situation where there is no client-specific data and no industry-level data exists, an expert model may be best. However, where there is no client-specific data but where there is industry-level data it is better to use a generic scorecard. In a case where there is both some client-specific data and some industry-level data a fine-tuned generic scorecard will produce the best results.

The most accurate results will always come, however, from a bespoke scorecard. That is a scorecard built from scratch using the client’s own data. This process requires significant levels of good quality data and access to advanced analytical skills and tools but the benefits of a good scorecard will be felt throughout the organisation.

Read Full Post »

Older Posts »