Feeds:
Posts
Comments

Posts Tagged ‘Credit Risk Management’

I wrote my layman’s introduction to scoring a while ago now and never delivered the promised more in-depth articles. This is the first in a line of articles correcting that oversight. The team at Scorto has very kindly provided me with a white paper on scorecard building, which I will break into sections and reproduce here. In the first of those articles, I’ll look into reject inference, a topic that has been asked about before.

One of the inherent problems with a scorecard is that while you can test easily test whether you made the right decision in accepting an application, it is less easy to know whether you made the right decision in rejecting an application. In the day-to-day running of a business this might not seem like much of a problem, but it is dangerous in two ways: · it can limit the highly profitable growth opportunities around the cut-off point by hiding any segmenting behaviour a characteristic might have; and · it can lead to a point where the data that is available for creating new scorecards represents only a portion of the population likely to apply. As this portion is disproportionately ‘good’ it can cause future scorecards to under-estimate the risk present in a population. Each application provides a lender with a great deal of characteristic data: age, income, bureau score, etc. That application data is expensive to acquire, but of limited value until it is connected with behavioural data. When an application is approved, that value-adding behavioural data follows as a matter of course and comes cheaply: did the customer of age x and with income of y and a bureau score of z go “bad” or not? Every application that is rejected gets no such data. Unless we go out of our way to get it; and that’s where reject inference comes into play.

The general population in a market will have an average probability of bad that is influenced by various national and economic characteristics, but generally stable. A smaller sub-population will make-up the total population of applicants for any given loan product –the average probability of bad in this total population will rise and fall more easily depending on marketing and product design. It is the risk of that total population of applicants that a scorecard should aim to understand. However, the data from existing customers is not a full reflection of that population. It has been filtered through the approval process it stripped of a lot of its bads. Very often, the key data problem when building a scorecard build is the lack of information on “bad” since that’s what we’re trying to model, the probability an application with a given set of characteristics will end up “bad”. The more conservative the scoring strategy in question, the more the data will become concentrated in the better score quadrants and the weaker it will become for future scorecard builds. Clearly we need a way to bring back that information. Just because the rejected applications were too risky to approve doesn’t mean they’re too risky to add value in this exercise. We do this by combining the application data of the rejected applicants with external data sources or proxies. The main difficulty related to this approach is the unavailability and/ or inconsistency of the data which may make it difficult to classify an outcome as “good” or “bad”. A number of methods can be used to infer the performance of rejected applicants.

Simple Augmentation
Not all rejected applications would have gone bad. We knew this at the time we rejected them, we just knew that too few would stay good to compensate for those that did go bad. So while a segment of applications with a 15% probability of bad might be deemed too risky, 85% of them would still be good accounts. Using that knowledge we can reconsider the rejected applications in the data exercise.

· A base scoring model is built using data from the borrowers whose behavior is known – the previously approved book.
· Using the developed model, the rejected applications are scored and an estimation is made of the percentage of “bad” borrowers and that performance is assigned at random but in proportion across the rejected applications.
· The cut-off point should be set in accordance with the rules of the current lending policy that define the permissible level of bad borrowers.
· Information on the rejected and approved requests is merged and the resulting set is used to build the final scoring model.

Accept/ Reject Augmentation
The basis of this method consists in the correction of the weights of the base scoring model by taking into consideration the likelihood of the request‘s approval.
· The first step is to build a model that evaluates the likelihood of a requests approval or rejection. · The weights of the characteristics are adjusted taking into consideration the likelihood of the request‘s approval or rejection, determined during the previous step. This is done so that the resulting scores are inversely proportional to the likelihood of the request‘s approval. So, for example, if the original approval rate was 50% in a certain cluster then each approved record is replicated to stand in for itself and the one that was rejected.
· This method is preferable to the Simple Augmentation method, but not without its own drawbacks. Two key problems can be created by augmentation: the impact of small and unusual groups can be exaggerated (such as low-side overrides for VIP clients) and then because you’ve only modeled on approved accounts the approval rates will be either 0% or 100% in each node.

Fuzzy Augmentation
The distinguishing feature of this method is that each rejected request is split and used twice, to reflect each of the likelihood of the good and bad outcomes. In other words, if a rejected application has a 15% probability of going bad it is split and 15% of the person is assumed to go bad and 85% assumed to stay good.
· Classification
Evaluation of a set of the rejected requests is performed using a base scoring model that was built based on requests with a known status;
– The likelihood of a default p(bad) and that of the “good” outcome p(good) are determined based on the set cut-off point, defining the required percentage of the “bad” requests (p(bad)+p(good)=1); – Two records that correspond to the likelihood of the “good” and “bad” outcomes are formed for each rejected request;
– Evaluation of the rejected requests is performed taking into consideration the likelihood of the two outcomes. Those accounts that fall under the likelihood of the “good” outcome are assigned with the weight p(good). The accounts that fall under the likelihood of the “bad” outcome are assigned with the weight p(bad).
· Clarification
– The data on the approved requests is merged with the data on the rejected requests and the rating of each request is adjusted taking into consideration the likelihood of the request‘s further approval. For example, the frequency of the “good” outcome for a rejected request is evaluated as the result of the “good” outcome multiplied by the weight coefficient.
– The final scoring model is built based on the combined data set.

Reject inference is no a single silver bullet. Used inexpertly it can lead to less accurate rather than more accurate results. Wherever possible, it is better to augment the exercise with a test-and-learn experiment to understand the true performance of small portions of key rejected segments. Then a new scorecard can be built based on the data from this new test segment alone and the true bad rates from that model can be compared and averaged to those from the reject inference model to get a more reliable bad rate for the rejected population.

Read Full Post »

You will almost certainly have heard the phrase, ‘you can’t manage want you don’t measure’. This is true, but there is a corollary to that phrase which is often not considered, ‘you have to manage what you do measure’.

To manage a business you need to understand it, but more reports do not necessarily mean a deeper understanding. More reports do, however, mean more work, often exponentially more work. So while regular reporting is obviously important for the day-to-day functioning of a business, its extent should be carefully planned.
Since I started this article with one piece of trite wisdom, I’ll continue. I’m trying to write my first novel – man can not live on tales of credit risk strategy alone – and in a writing seminar I attended the instructor made reference to this piece of wisdom which he picked-up in an otherwise forgettable book on script writing, ‘if nothing has changed, nothing has happened’.
It is important to look at the regular reports generated in an organization with this philosophy in mind – do the embedded metrics enable the audience present to change the business? If the audience is not going to – or is not able to – change anything based on a metric then nothing is actually happening and if nothing is going happening, why are we spending money doing it?
Don’t get me wrong, I am an ardent believer in the value of data and data analytics, I just question the value in regular reporting. Those two subjects are definitely related, but they’re not just different, at times I believe they are fundamentally opposed.

An over-reliance on reporting can damage a business in four ways:

Restricting Innovation and Creativity
Raw data – stored in a well-organized and accessible database – encourages creative and insightful problem solving, it begs for innovative relationships to be found, provides opportunities for surprising connections to be made, and encourages ‘what if’ scenario planning.
Reports are tools for managing an operation. Reports come with ingrained expectations and encourage more constrained and retrospective analysis. They ask questions like ‘did what we expect to happen, actually happen’.
The more an organization relies on reports the more, I believe, it will tend to become operational in nature and backward focused in its analytics, asking and explaining what happened last month and how that was different to plan and to the month before. Yes it is import to know how many new accounts were opened and whether that was more or less than planned for in the annual budget, but no one ever changed the status quo by knowing how many accounts they had opened.
The easiest way to look good as the analytics department in an organization with a heavy focus on reports, is to get those reports to show stable numbers in-line with the annual plan, thus raising as few questions as possible; and the easiest way to do that is by implementing the same strategy year after year. To look good in an organization that understands the real value of data though, an analytics department has to add business value, has to delve into the data and has to come up with insightful stories about relationships that weren’t known last year, designing and implementing innovative strategies that are by their nature hard to plan accurately in an annual budgeting process, but which have the potential to change an industry.

Creating a False Sense of Control
Reports also create an often false sense of accuracy. A report, nicely formatted and with numbers showing month-on-month and year-to-date changes to the second decimal point, carries a sense of presence; if the numbers today look like the numbers did a year ago they feel like they must be right, but if the numbers today look like the numbers did a year ago there is also less of an incentive to test the underlying assumptions and the numbers can only ever be as accurate as those assumptions: how is profit estimated, how is long-term risk accounted for, how are marketing costs accounted for, how much growth is assumed, etc. and is this still valid?
Further, in a similar way to how too many credit policies can end up reducing the accountability of business leaders rather than increasing it, when too much importance is placed on reporting managers become accountable for knowing their numbers, rather than knowing their businesses. If you can say how much your numbers changed month-on-month but not why, then you’re focusing on the wrong things.

Raising Costs
Every report includes multiple individual metrics and goes to multiple stakeholders, each of those metrics has the potential to raise a question with each of those stakeholders. This is good if the question being raised influences the actions of the business, but the work involved in answering a question is not related to the value of answering it and so as more metrics of lesser importance are added to a business’ vocabulary, the odds of a question generating non-value-adding work increases exponentially.
Once it has been asked, it is hard to ignore a question pertaining to a report without looking like you don’t understand your business, but sometimes the opposite is true. If you really understand your business you’ll know which metrics are indicative of its overall state and which are not. While your own understanding of your business should encompass the multiple and detailed metrics impacting your business, you should only be reporting the most important of those to broader audiences.
And it is not just what you’re reporting, but to whom. Often a question asked out of interest by an uninvolved party can trigger a significant amount of work without providing any extra control or oversight. Better reports and better audiences should therefore replace old ones and metrics that are not value-adding in a context should not be displayed in that context; or the audience needs to change until the context is right.

Compounding Errors
The biggest problem, though, that I have with a report-based approach is the potential for compounding errors. When one report is compiled based off another report there is always the risk that an error in the first will be included in the second. This actually costs the organization in two ways: firstly the obvious risk of incorrectly informed decisions and secondly in the extra work needed to stay vigilant to this risk.
Numbers need to be checked and rechecked, formats need to be aligned or changed in synchronization, and reconciliations need to be carried out where constant differences exist – month-end data versus cycle end data, monthly average exchange rates versus month-end exchange rates, etc.
Time should never be spent getting the numbers to match; that changes nothing. Time should rather be spent creating a single source of data that can be accessed by multiple teams and which can be left in its raw state, any customization of the data happening in one team will therefore remain isolated from all other teams.

Reports are important and will remain so, but their role should be understood. A few key metrics should be reported widely and these should each add a significant and unique piece of information about an organization’s health, at one level down a similar report should break down the team’s performance, but beyond that time and resources should be invested in the creative analysis of raw data, encouraging the creation of analytics-driven business stories.
Getting this right will involve a culture change more than anything, a move away from trusting the person who knows their numbers to trusting the person who provides the most genuine insight.
I know of a loan origination operation that charges sales people a token fee for any declined application which they asked to be manually referred, forcing them to consider the merits of the case carefully before adding to the costs. A similar approach might be helpful here, charging audiences for access to monthly reports on a per metric basis – this could be an actual monetary fine which is added saved up for an end of year event or a virtual currency awarded on a quota basis.

Read Full Post »

Every lending organisation needs a good credit policy but at what point does ‘good’ policy become ‘too much’ policy?

There is of course a trade-off between risk control and operational efficiency but their relationship isn’t always as clear as you might think and it continues to evolve as the lending industry moves away from hard-coded, one-size-fits-all rules to more dynamic strategies.

So how do we know when there is too much policy? In my opinion, a credit policy is more like the army than it is like the police force – it should establish and defend the boundaries of the lending decision but problems arise when it becomes actively involved within those borders.
This manifests itself in the common complaint of policy teams spending too much time managing day-to-day policy compliance and too little time thinking about a policy’s purpose; creating a culture where people ask a lot of questions about ‘how’ something is done in a particular organization but very few about ‘why’ it is done.
This is a problem of policy process as well as policy content.

The process should not shift accountability along a sign-off chain
Credit policies tend to generate supporting processes that can easily devolve to a point where even simple change requests must pass through a complex sign-off chain. Where does the accountability reside in such a chain?
All too often, only at the top; of course it is important for the most senior approver to be accountable but it is even more important that the original decision-maker is accountable and the further up the chain a decision moves the less likely this is the case. Each new signature should not represent the new owner of the accountability but rather the new additional co-owner with the original decision-maker.
Of course there are situations where a chain of sign-offs is a genuine safe-guard but in many more cases they serve to undermine the decision-making process by removing that key relationship between action and accountability. As a result the person proposing an action is able to make lax decisions while the person agreeing to them is removed from the information and so more prone to oversights; bad proposals consume resources while moving up and down the sign-off chain or, worse, slip through the gaps and are approved.

The first step back in the right direction is to remove the policy team from the sign-off process. Since the policy already reflects their views, their sign-off is redundant. Instead the business owner should be able to sign-off to the fact that the proposed change is within the parameters set out in the policy and should be held accountable for that fact.
By doing this the business can make faster decisions while simultaneously been forced to better understand the policy. But does it mean that the policy team should just sit back and assume all of the policies are being adhered to? No. The policy team still plays two important roles in the process: they provide guidance as needed and they monitor the decisions that have already been made, only now they do so outside of the sign-off process. In most cases there is sufficient time between a decision being made and it being implemented for a re-active check to still be
The only cases that should require direct pre-emptive input from the policy team are those that the product team feels breach the current policy; which brings us to the second solution.

The content should not assume accountability, the person should
A credit policy that is rich in detail is also a credit policy that is likely to generate many and insignificant breaches and thus a constant stream of work for the policy team. Over time it is easy for any policy to evolve in this way as new rules and sub-rules get added to accommodate perceived new risks or to adjust to changing circumstances; indeed it is often in the policy team’s interest to allow it to do so. However, extra detail almost always leads to higher, not lower, risk.
Firstly, a complex policy is less likely to be understood and therefore more likely to lead to accountability shifting to the policy team through the sign-off chain as discussed above. By increasing the volume of ‘policy exception’ cases you also reduce the time and resources available for focusing on each request and so important projects may receive less diligence than they deserve.
But an overly complex policy can also shift accountability in another way: whenever you describe ten specific situations where a certain action is not allowed you can often be implied to be simultaneously implying that it is allowed in any other situation and thus freeing the actor from making a personal decision regarding its suitability to the given situation; the rule becomes more accountable than the person.
The first point is easily understood so I’ll focus on the second. By filling your policy with detailed rules you imply that anything that doesn’t expressly breach the policy is allowable and so expose the organization to risks that haven’t yet been considered.
The most apt example I can think to explain this point better relates not to credit policy but to something much simpler – travel expenses.

I used to work in a team that travelled frequently to international offices, typically spending three to four days abroad at any one time. When I joined we were a small team with a large amount of autonomy and my boss dictated the policy for travel-related expenses and his policy was: when you’re travelling for work, eat and drink as you would at home if you were paying for it.
He told us not to feel that we should sacrifice just because the country we were in happened to be an expensive one – it was the organisation’s decision to send us there after all – but similarly not to become extravagant just because the company was picking-up the tab.
It was a very broad policy with little in the way of detail and so it made us each accountable; it worked brilliantly and I never heard of a colleague that abused it or felt abused by it.
That policy was inherently fair in all situations because it was flexible. However, in time our parent company bought another local company and our team was brought under their ‘more developed’ corporate structures, including their travel claim policies. These policies, like at so many companies, tried to be fair by unwaveringly applying a single maximum value to all meal claims. In some locations this meant you could eat like a king while in others austerity was forced upon you.
In don’t have data to back this up but I am sure that it created a lose-lose situation: morale definitely dropped and I’m certain the cost of travel claims increased as everyone spent up to the daily cap maximum each day, either because they had to or simply because now they could without feeling any responsibility not to.
Of course this example doesn’t necessarily apply 100% to a credit policy but much of the underlying truth remains: broader policy rules makes people accountable and so they needn’t increase risk in any many cases they actually decrease it.

A credit policy that says ‘we don’t lend to customer segments where the expected returns fail to compensate for their risk’ makes the decision-maker accountable than a policy that says ‘we don’t lend to students, the self-employed or the unemployed’.
Under the former policy, if a decision-maker isn’t confident enough in their reasons for lending into a new segment they can’t go ahead with that decision. On the other hand though, if they have solid analysis and a risk controlling roll-out process in place, they can go ahead and, unhindered by needless policy, can make a name for themselves and money for the business.
The latter policy though makes the decision-maker accountable only for the fact that the new customer segment was not one of those expressively prohibited not to the fact that the decision is likely to be a profitable one.

Of course encouraging broader rules and more accountability pre-supposes that the staff in key positions are competent but if they are not, it’s not a new credit policy that you need…


Read Full Post »

In developed markets ‘comprehensive’ credit bureaus are common place; that is credit bureaus which store information relating to all of an individual’s past repayments, not just information relating to their defaults. Although there are some exceptions most borrowers, lenders and regulators in these markets believe that a trusted third-party holding a database of good and bad financial history provides borrowers with better products and at a fairer price.

But most markets don’t start at this point. In markets where credit bureaus are new or non-existent it is often difficult for regulators and borrowers to know how to decide between a positive bureau and a negative bureau. In many cases the costs of a positive bureau are relatively well understood – both the physical costs of development and the societal costs in the form of privacy concerns – while its benefits remain underestimated: usually assumed to be only those benefits accruing to lenders through better risk control. As a result, the initial push is often for a ‘negative data only’ bureau.

But a positive bureau also carries significant benefits to borrowers and I will use an applicable, albeit in the inverse, example to explain how and why this can is the case.

In Formula 1 motor-racing points are awarded to the ten best drivers in each race and accumulated over the season to identify an overall winner. The goal of this approach is to identify the ‘best’ driver in a given year and it serves this purpose well – sorting out the very best from the just very good. Since most stakeholders in Formula 1 – team owners, sponsors, drivers and spectators – are concerned almost exclusively with knowing which individual is the ‘best’ this system seldom comes under serious criticism.

But that doesn’t mean that it suits all purposes equally well. Imagine you are placed in charge of a new Formula 1 team that has started with a very limited budget. The team owners realize that this small budget effectively precludes the possibility of winning the title in the short-term but they also understand that if they can survive in the sport for two years they can gain a bigger sponsor and fund a more serious title challenge thereafter. So their goal it to survive for two years and to do that they need to maximize the exposure they provide their advertisers by finishing as many races as possible as far from the back as possible.

What the budget means for you as the team manager is that you can only afford to hire cheap drivers which we’ll assume means drivers who finished in the bottom ten places in the previous season. From that group you’ll still want to get the two best drivers possible but how will you identify the ‘best’ drivers in that group? The table below shows the driver standings at the end of the 2010 season:

As the table above shows, the present system is so focused on segmenting drivers at the top of the table that it struggles to differentiate drivers towards the bottom; in fact the bottom six drivers all finished with zero points. 
Vitantonio Liuzzi might look like a clear choice in more than double the next driver tally but is he really the best option and who would you choose to join him? A new model is needed for your purpose, one that separates the ‘worst’ from the simply ‘bad’.

Negative bureaus have a similarly one-sided focus, a focus that might have fits their initial purpose but that limits their use in other situations. A negative bureau only stores information on customer defaults; helping to separate the highest risk customers from the less high risk customers but struggling to segment low risk customers. Information is only created when a payment is missed – and usually only when it is missed for several months – and so an individual with a long history of timeously repaying multiple debts will be seen as the same risk as a customer who has only paid back one small debt, for example.

Returning now to the earlier scenario: the current Formula 1 model awards points for finishing in one of the top 10 places using a sliding scale of 25; 18; 15; 12; 10; 8; 6; 4; 2; 1. A model better suited for your new purpose should still retain the information relating to good performances but should also seek to create and store information relating to bad performances. The simplest way to do this would be to penalize drivers for finishing in the last ten places using the same scale but in reverse.

Implementing these simple changes across the 2010 season immediately provides more insight into the relative performance of drivers towards the bottom of the standings and in so doing the gap between each driver becomes clearer and useful information is been created. 

Although Vitantonio Liuzzi had previously looked like an obvious pick his good performances – a 6th place in Korea and a 7th place in Australia – were overshadowed by his many more poor performances – last place in Abu Dhabi, Brazil and Singapore and second-last place in Japan and China. When the whole picture is seen together, he is no longer such an attractive prospect. A better bet would be to approach Jaime Alguersuari who, although he never placed better than 9th finished in 13th place or better in 80% of his races and only came last once. Sebastien Buemi also finished last on only one occasion though he spread his remaining results more unevenly with both more top ten and more bottom ten finishes than Jaime.
Both of these drivers would offer theoretically better returns having placed worse on the accepted scale but with more of the sort of results you’re team is looking for.
Of course this isn’t the perfect model and real Formula 1 fans might take exception but it does illustrate how creating a holistic view of the relative performance of all drivers, not just the very good ones, can be value-adding. 

Similarly, a negative-only bureau suits the simple purpose of identifying the very worst of your potential customers but it struggles to identify good customers or to segment the ‘middle’ customers by relative risk; users of the bureau that wish to merely avoid the worst borrowers are well served by this information but lenders who wish to target the best customer segments for low risk/ low margin products or who wish to match pricing to risk are unable to do so. 
The societal costs of a negative only bureau are therefore born by the best performing borrowers in that market who are given the same products at the same price as average risk borrowers. 
A comprehensive positive and negative bureau avoids this societal cost though it usually does so with added build and maintenance costs. 

When deciding which bureau is best for a given market then, borrowers and regulators should focus on the trade-off between the borrowers privacy concerns and the borrowers access to fair products at a fair price while lenders should focus on the trade-off between the cost of a comprehensive bureau – passed onto them in the form of higher bureau fees – and the expected benefits to be achieved through more profitable niche products.

* * * 

The fact that the 2011 season has just finished stands testament to how long this article has sat in draft form, awaiting publishing. However, the big delay does at least afford me add an addendum on the performance of the proposed model.
Of course far too many factors are at play to make a scientific comparison, not least the fact that Vitantonio Liuzzi, the man our model told us not to pick, changed teams but here goes anyway:
Vitantonio Liuzzi didn’t qualify for one race, retired from 5 and ended the season without a top 10 finish and only 7 finishes within the top 20. In all, he didn’t manage to collect a single point and joined seven other drivers in joint last place. 
Both models suggested Sebastien Buemi and he also finished the season placed 15th with 7 top ten finishes against five retirements and no finishes outside of the top 15. While Jaime Alguersuari, our model’s wildcard pick, finished one spot better on the overall standings with 5 top ten places, 3 retirements and only one finish outside of the top 20.
Never shy to identify a trend from two data points, I’d call that a 2-1 win to the comprehensive model

Read Full Post »

Many lenders fail to fully appreciate the size of their fraud losses. By not actively searching for – and thus not classifying – fraud within their bad debt losses, they miss the opportunity to create better defences and so remain exposed to ever-growing losses. Any account that is written-off without ever having made a payment is likely to be fraud; any account that is in collections for their full limit within the first month or two is likely to be fraud; any recently on book account that is written-off because the account holder is untraceable is likely to be fraud, etc.

Credit scorecards do not detect application fraud very well because the link between the credit applicant and the credit payer is broken. In a ‘normal’ case the person applying for the credit is also the person that will pay the monthly instalments and so the data in the application represents the risk of the future account-holder and thus the risk of a missed payment. However, when a fraudster applies with a falsified or stolen identity there is no such link and so the data in that application no longer has any relationship to the future account-holder and so can’t represent the true risk.

 

First Person Fraud

Now that explanation assumes we are talking about third-party fraud; fraud committed by someone other than the person described on the application. That is the most clear-cut form of fraud. However, there is also the matter of first person fraud which is less clear-cut.

First person fraud is committed when a customer applies using their own identity but does so with no intention of paying back the debt, often also changing key data fields – like income – to improve their chances of a larger loan.

Some lenders will treat this as a form of bad debt while others prefer to treat it as a form of fraud. It doesn’t really matter so long as it is treated as a specific sub-type of either definition. I would, however, recommend treating it as a sub-type of fraud unless a strong internal preference exists for treating it as bad debt. Traditional models for detecting bad debt are built on the assumption that the applicant has the intention of paying their debt and so aim to measure the ability to do so which they then translate into a measure of risk. In these cases though, that assumption is not true and so there should instead be a model looking for the existence of the willingness to pay the debt rather than the ability to do so. From a recovery point of view, a criminal fraud case it is also a stronger deterrent to customers than a civil bad debt one.

 

Third Person Fraud

The rest of the fraud then, is third-party fraud. There are a number of ways fraud can happen but I’ll just cover the two most common types: false applications and identity take-overs.

False applications are applications using entirely or largely fictional data. This is the less sophisticated method and is usually the first stage of fraud in a market and so is quickly detected when a fraud solution or fraud scorecard is implemented. Creating entirely new and believable identities on a large-scale without consciously or subconsciously reverting to a particular pattern is difficult. There is therefore a good chance of detecting false applications by using simple rules based on trends, repeated but mismatched information, etc.

A good credit bureau can also limit the impact of false applications since most lenders will then look for some history of borrowing before a loan is granted. An applicant claiming to be 35 years old and earning €5 000 a month with no borrowing history will raise suspicions, especially where there is also a sudden increase in credit enquiries.

Identity take-over is harder to detect but also harder to perpetrate, so it is a more common problem in the more sophisticated markets. In these cases a fraudster adopts the identity – and therefore the pre-existing credit history – of a genuine person with only the slightest changes made to contact information in most cases. Again a good credit bureau is the first line of defence albeit now in a reactive capacity alerting the lender to multiple credit enquiries within a short period of time.

Credit bureau alerts should be supported by a rule-based fraud system with access to historical internal and, as much as possible, external data. Such a system will typically be built using three types of rules: rules specific to the application itself; rules matching information in the application to historical internal and external known frauds; rules matching information in the application to all historical applications.

 

Application Specific Rules

Application specific rules can be built and implemented entirely within an organisation and are therefore often the first phase in the roll-out of a full application fraud solution. These rules look only at the information captured from the application in question and attempt to identify known trends and logical data mismatches.

Based on a review of historical fraud trends the lender may have identified that the majority of their frauds originated through their online channel in loans to customers aged 25 years or younger, who were foreign citizens and who had only a short history at their current address. The lender would then construct a rule to identify all applications displaying these characteristics.

Over-and-above these trends there are also suspicious data mismatches that may be a result of the data being entered by someone less familiar with the data than a real customer would be expected to be with their own information. These data mismatches would typically involve things like an unusually high salary given the applicant’s age, an inconsistency between the applicant’s stated age and date of birth, etc.

In the simplest incarnation these rules would flag applications for further, manual investigation. In more sophisticated systems though, some form of risk-indicative score would be assigned to each rule and applications would then be prioritised based on the scores they accumulated from each rule hit.

These rules are easy to implement and need little in the way of infrastructure but they only detect those fraudulent attempts where a mistake was made by the fraudster. In order to broaden the coverage of the application fraud solution it is vital to look beyond the individual application and to consider a wider database of stored information relating to previous applications – both those known to have been fraudulent and those still considered to be good.

 

Known Fraud Data

The most obvious way to do this is to match the information in the application to all the information from previous applications that are known – or at least suspected – to have been fraudulent. The fraudster’s greatest weakness is that certain data fields need to be re-used either to subvert the lenders validation processes or to simplify their own processes.

For example many lenders may phone applicants to confirm certain aspects on their application or to encourage early utilisation and so in these cases the fraudster would need to supply at least one genuine contact number; in other cases lenders may automatically validate addresses and so in these cases the fraudster would need to supply a valid address. No matter the reason, as soon as some data is re-used it becomes possible to identify where that has happened and in so doing to identify a higher risk of fraud.

To do this, the known fraud data should be broken down into its component parts and matched separately so that any re-use of an individual data field – address, mobile number, employer name, etc. – can be identified even if it is used out of context. Once identified, it is important to calculate the relative importance in order to prioritise alerts. Again this is best done with a scorecard but expert judgement alone can still add value; for example it is possible that several genuine applicants will work for an employer that has been previously used in a fraudulent application but it would be much more worrying if a new applicant was to apply using a phone number or address that was previously used by a fraudster.

It is also common to prioritise the historical data itself based on whether it originated from a confirmed fraud or a suspected one. Fraud can usually only be confirmed if the loan was actually issued, not paid and then later shown to be fraudulent. Matches to data relating to these accounts will usually be prioritised. Data relating to applications that were stopped based on the suspicion of fraud, on the other hand, may be slightly de-prioritised.

 

Previous Applications

When screening new applications it is important to check their data not just against the known fraud data discussed above but also against all previous ‘good’ applications. This is for two reasons: firstly not all fraudulently applied for applications are detected and secondly, especially in the case of identity theft, the fraudster is not always the first person to use the data and so it is possible that a genuine customer had previously applied using the data that is now being used by a fraudster.

Previous application data should be matched in two steps if possible. Where the same applicant has applied for a loan before, their specific data should be matched and checked for changes and anomalies. The analysis must be able to show if, for a given social security number, there have been any changes in name, address, employer, marital status, etc. and if so, how likely those changes are to be the result of an attempted identity theft versus a simple change in circumstances. Then – or where the applicant has not previously applied for a loan – the data fields should be separated and matched to all existing data in the same way that the known fraud data was queried.

As with the known fraud data it is worth prioritising these alerts. A match to known fraud data should be prioritised over a match to a previous application and within the matches a similar prioritisation should occur: again it would not be unusual for several applicants to share the same employer while it would be unusual for more than one applicant to share a mobile phone number and it would be impossible for more than one applicant to share a social security or national identity number.

 

Shared Data

When matching data in this way the probability of detecting a fraud increase as more data becomes available for matching. That is why data sharing is such an important tool in the fight against application fraud. Each lender may only receive a handful of fraud cases which limits not only their ability to develop good rules but most importantly limits their ability to detect duplicated data fields.

Typically data is shared indirectly and through a trusted third-party. In this model each lender lists all their known and suspected frauds on a shared database that is used to generate alerts but cannot otherwise be accessed by lenders. Then all new applications are first matched to the full list of known frauds before being matched only to the lender’s own previous applications and then subjected to generic and customised application-specific rules as shown in the diagram below:

 

Read Full Post »

Older Posts »