Posts Tagged ‘fraud analytics’

Many lenders fail to fully appreciate the size of their fraud losses. By not actively searching for – and thus not classifying – fraud within their bad debt losses, they miss the opportunity to create better defences and so remain exposed to ever-growing losses. Any account that is written-off without ever having made a payment is likely to be fraud; any account that is in collections for their full limit within the first month or two is likely to be fraud; any recently on book account that is written-off because the account holder is untraceable is likely to be fraud, etc.

Credit scorecards do not detect application fraud very well because the link between the credit applicant and the credit payer is broken. In a ‘normal’ case the person applying for the credit is also the person that will pay the monthly instalments and so the data in the application represents the risk of the future account-holder and thus the risk of a missed payment. However, when a fraudster applies with a falsified or stolen identity there is no such link and so the data in that application no longer has any relationship to the future account-holder and so can’t represent the true risk.


First Person Fraud

Now that explanation assumes we are talking about third-party fraud; fraud committed by someone other than the person described on the application. That is the most clear-cut form of fraud. However, there is also the matter of first person fraud which is less clear-cut.

First person fraud is committed when a customer applies using their own identity but does so with no intention of paying back the debt, often also changing key data fields – like income – to improve their chances of a larger loan.

Some lenders will treat this as a form of bad debt while others prefer to treat it as a form of fraud. It doesn’t really matter so long as it is treated as a specific sub-type of either definition. I would, however, recommend treating it as a sub-type of fraud unless a strong internal preference exists for treating it as bad debt. Traditional models for detecting bad debt are built on the assumption that the applicant has the intention of paying their debt and so aim to measure the ability to do so which they then translate into a measure of risk. In these cases though, that assumption is not true and so there should instead be a model looking for the existence of the willingness to pay the debt rather than the ability to do so. From a recovery point of view, a criminal fraud case it is also a stronger deterrent to customers than a civil bad debt one.


Third Person Fraud

The rest of the fraud then, is third-party fraud. There are a number of ways fraud can happen but I’ll just cover the two most common types: false applications and identity take-overs.

False applications are applications using entirely or largely fictional data. This is the less sophisticated method and is usually the first stage of fraud in a market and so is quickly detected when a fraud solution or fraud scorecard is implemented. Creating entirely new and believable identities on a large-scale without consciously or subconsciously reverting to a particular pattern is difficult. There is therefore a good chance of detecting false applications by using simple rules based on trends, repeated but mismatched information, etc.

A good credit bureau can also limit the impact of false applications since most lenders will then look for some history of borrowing before a loan is granted. An applicant claiming to be 35 years old and earning €5 000 a month with no borrowing history will raise suspicions, especially where there is also a sudden increase in credit enquiries.

Identity take-over is harder to detect but also harder to perpetrate, so it is a more common problem in the more sophisticated markets. In these cases a fraudster adopts the identity – and therefore the pre-existing credit history – of a genuine person with only the slightest changes made to contact information in most cases. Again a good credit bureau is the first line of defence albeit now in a reactive capacity alerting the lender to multiple credit enquiries within a short period of time.

Credit bureau alerts should be supported by a rule-based fraud system with access to historical internal and, as much as possible, external data. Such a system will typically be built using three types of rules: rules specific to the application itself; rules matching information in the application to historical internal and external known frauds; rules matching information in the application to all historical applications.


Application Specific Rules

Application specific rules can be built and implemented entirely within an organisation and are therefore often the first phase in the roll-out of a full application fraud solution. These rules look only at the information captured from the application in question and attempt to identify known trends and logical data mismatches.

Based on a review of historical fraud trends the lender may have identified that the majority of their frauds originated through their online channel in loans to customers aged 25 years or younger, who were foreign citizens and who had only a short history at their current address. The lender would then construct a rule to identify all applications displaying these characteristics.

Over-and-above these trends there are also suspicious data mismatches that may be a result of the data being entered by someone less familiar with the data than a real customer would be expected to be with their own information. These data mismatches would typically involve things like an unusually high salary given the applicant’s age, an inconsistency between the applicant’s stated age and date of birth, etc.

In the simplest incarnation these rules would flag applications for further, manual investigation. In more sophisticated systems though, some form of risk-indicative score would be assigned to each rule and applications would then be prioritised based on the scores they accumulated from each rule hit.

These rules are easy to implement and need little in the way of infrastructure but they only detect those fraudulent attempts where a mistake was made by the fraudster. In order to broaden the coverage of the application fraud solution it is vital to look beyond the individual application and to consider a wider database of stored information relating to previous applications – both those known to have been fraudulent and those still considered to be good.


Known Fraud Data

The most obvious way to do this is to match the information in the application to all the information from previous applications that are known – or at least suspected – to have been fraudulent. The fraudster’s greatest weakness is that certain data fields need to be re-used either to subvert the lenders validation processes or to simplify their own processes.

For example many lenders may phone applicants to confirm certain aspects on their application or to encourage early utilisation and so in these cases the fraudster would need to supply at least one genuine contact number; in other cases lenders may automatically validate addresses and so in these cases the fraudster would need to supply a valid address. No matter the reason, as soon as some data is re-used it becomes possible to identify where that has happened and in so doing to identify a higher risk of fraud.

To do this, the known fraud data should be broken down into its component parts and matched separately so that any re-use of an individual data field – address, mobile number, employer name, etc. – can be identified even if it is used out of context. Once identified, it is important to calculate the relative importance in order to prioritise alerts. Again this is best done with a scorecard but expert judgement alone can still add value; for example it is possible that several genuine applicants will work for an employer that has been previously used in a fraudulent application but it would be much more worrying if a new applicant was to apply using a phone number or address that was previously used by a fraudster.

It is also common to prioritise the historical data itself based on whether it originated from a confirmed fraud or a suspected one. Fraud can usually only be confirmed if the loan was actually issued, not paid and then later shown to be fraudulent. Matches to data relating to these accounts will usually be prioritised. Data relating to applications that were stopped based on the suspicion of fraud, on the other hand, may be slightly de-prioritised.


Previous Applications

When screening new applications it is important to check their data not just against the known fraud data discussed above but also against all previous ‘good’ applications. This is for two reasons: firstly not all fraudulently applied for applications are detected and secondly, especially in the case of identity theft, the fraudster is not always the first person to use the data and so it is possible that a genuine customer had previously applied using the data that is now being used by a fraudster.

Previous application data should be matched in two steps if possible. Where the same applicant has applied for a loan before, their specific data should be matched and checked for changes and anomalies. The analysis must be able to show if, for a given social security number, there have been any changes in name, address, employer, marital status, etc. and if so, how likely those changes are to be the result of an attempted identity theft versus a simple change in circumstances. Then – or where the applicant has not previously applied for a loan – the data fields should be separated and matched to all existing data in the same way that the known fraud data was queried.

As with the known fraud data it is worth prioritising these alerts. A match to known fraud data should be prioritised over a match to a previous application and within the matches a similar prioritisation should occur: again it would not be unusual for several applicants to share the same employer while it would be unusual for more than one applicant to share a mobile phone number and it would be impossible for more than one applicant to share a social security or national identity number.


Shared Data

When matching data in this way the probability of detecting a fraud increase as more data becomes available for matching. That is why data sharing is such an important tool in the fight against application fraud. Each lender may only receive a handful of fraud cases which limits not only their ability to develop good rules but most importantly limits their ability to detect duplicated data fields.

Typically data is shared indirectly and through a trusted third-party. In this model each lender lists all their known and suspected frauds on a shared database that is used to generate alerts but cannot otherwise be accessed by lenders. Then all new applications are first matched to the full list of known frauds before being matched only to the lender’s own previous applications and then subjected to generic and customised application-specific rules as shown in the diagram below:



Read Full Post »

When it comes to the application of statistical models in the lending environment, the majority of the effort is dedicated to calculating the risk of bad debt; relatively little effort is dedicated to calculating the risk of fraud.

There are several good reasons for that, the primary one being that credit losses have a much larger impact on a lender’s profit.  Fraud losses tend to be restricted to certain situations and certain products: application fraud might affect many unsecured lending products but it does so to a lesser degree than total credit losses, while transactional fraud is typically restricted to card products.

I discuss application fraud in more detail in another article so in this one I will focus on modeling for transactional fraud and, in particular, how the assumptions underpinning these models vary from those underpinning traditional behavioural scoring models.

Credit Models

The purpose of most credit models is to forecast future behaviour.  Since the future of any particular account can’t be known, they do this by matching an account to a similar group of past accounts and making the assumption that this customer will behave in the same way as those customers did.  In other words, they ask the questions of each account, ’how much does this look like all previous known-bad accounts?’.

So if the only thing we know about a customer is that they are 25 years old and married, a credit model will typically look at the behaviour of all previous 25-year-old married customers and assume that this customer will behave in the same way going forward.

The more sophisticated the model, the more accurate the matching; and the more accurate the matching between the current and past customers, the more valid the transfer of the latter group’s future behaviour to the former will be.

Imagine the example below where numerical characteristics have been replaced with illustrative ones.  Here there are three customer groups: high risk, medium risk and low risk.  A typical low risk customer is blue with stars, while a high risk customer is red with circles and a medium risk customer is green with a diamonds.

A basic model would look at any new customer, in this case green with stars, and assign them to the group they  most closely matched – medium risk – and assume the associated outcome – a 3% bad rate.  A more sophisticated model would calculate the relative importance of the colour versus the shapes in predicting risk and would forecast an outcome somewhere between the medium and low risk outcomes.

An over-simplification but the concept holds well enough to suffice for this article.

The key difficulty a credit model has to overcome is that it needs to forecast an unknown future based on a limited amount of data.  This forces the model to group similar accounts and to treat them as the same.  To extend the metaphor from above, few low risk accounts would actually have been blue with stars; there would have been varying shades of blue and varying star-like shapes.  Yet it is impossible to model each account separately so they would have been grouped together using the best possible description of them as a whole.

Transactional fraud models need not be so tightly bound by this requirement, though the extra flexibility that this allows is often over-looked by analysts too set in the traditional ways.

Transactional Fraud Models

Many transactional fraud models take the credit approach and ask ’how much does this transaction look like a typical fraud transaction?’.  In other words, they start by separating all transactions into ‘fraud’ and ‘non fraud’ groups, identifying a ‘typical’ fraud transaction and then comparing each new transaction to that template.

However, rather than only asking the question ’how much does this look like a typical fraud transaction?’, a fraud model can also ask ’how much does this look like this cardholder’s typical transaction?’.

A transactional fraud model does not need to group customers or transactions together to get a view of the future, it simply needs to identify a transaction that does not meet a specific customer’s established spend pattern.  Assume a typical fraud case involves six transactions in a day, each of a value between €50 and €500 and with the majority of them occurring in electronic stores.  A credit-style model might create an alert whenever a card received its sixth transaction in a day totaling at least €300 or when it received its third transaction from an electronic store.  However, if it was known that the cardholder in question had not previously used their card more than twice in a single day and had never bought goods at any of the stores visited, that same alert might have been triggered earlier and been attached to a higher probability of fraud.

A large percentage of genuine spend on a credit cards is recurring; that is to say it happens at the same merchants month in and month out.  In a project on this subject, I found that an average of 50% of genuine transactions occurred at merchants that the cardholder had visited at least once in the previous six months (that number doesn’t drop much when one uses only the previous three months).  Some merchant categories are more susceptible to repeat purchases than others but this can be catered for during the modeling process.  For example you probably buy your groceries at one of three or four stores every week but you might frequently try a new restaurant.

The majority of high value fraud is removed from the customer by time and geography.  A card might be ’skimmed’ at a restaurant in London but that data might then be emailed to America or Asia where, a month later, it is converted into a new card to be used by a fraudster.  This means that fraudsters seldom know the genuine customer’s spend history and so matching their fraudulent spend to the established patterns is nearly impossible.  In the same project, over 95% of fraud occurred at merchants that the genuine cardholder had not visited in the previous six months.  Simply applying a binary cut-off based on whether the merchant in questions was a regular merchant would lead to a near doubling of hit rates from the existing rule set.

Maintaining Customer Histories

The standard approach to implementing a customer-specific history would be as illustrated above.  In the live environment new transactions are compared to the historical record and are flagged if the merchant is new or, in more sophisticated cases, if the transaction value exceeds merchant-level cut-offs.  The fact that this is outside of history is used as a prioritisation with other fraud rules to create alerts.  Then later, in a batch run, the history is updated with the data relating to new merchants and changes to merchant-level patterns.  If only a specific period worth of data is stored, then older data is dropped-off at this stage.  This is commonly done to improve response times with three to six months worth of data usually being enough.

Customer-specific patterns like this are not enough to indicate fraud but, when used in conjunction with an existing rule set in this way they can add significant value.

There are of course some downsides to this approach, primarily the amount of data that needs to be stored and accessed.  This is particularly true if your fraud rules are triggered during the authorisations process.  In these cases it may be necessary to sacrifice fraud risk for performance by using only basic rules in the authorizations system followed by the full rule set in the reactionary fraud detection system.  Most card issuers follow this sort of approach where the key goal of authorisations is good customer service through fast turn-around times rather than fraud prevention.

The amount of data stored and accessed should be matched to each issuer’s data processing capabilities.  As mentioned earlier, simply accessing a rolling list of previously visited merchants can double the hit rate of existing rules and is not a data-intensive process.  Including average values, GIS or other value-added data will surely improve the rule hit rates even further but will do so with added processing costs.

The typical implementation would look like the diagram below:

In this set-up, customer history is not queried live but is rather used to update a series of specific fields such as customer parameters and an exception file.  The customer parameters would be related to the value of spend typical to any one customer and could be updated daily or weekly – even monthly updates will be alright if sufficient leeway is included when these are calculated.  An exception file will include specific customers to whom the high risk fraud rules should not apply.  This is usually done to allow frequent high risk spenders or frequent users of high risk merchant types – often casinos – to spend without continuously hitting fraud rules.

Once an authorization decision has been made, that data is passed into the offline environment where it passes through a series of fraud rules and sometimes a fraud score.  It is in this environment that the most value can be attained from the addition of a customer-specific history.  Because this is an offline environment, there is more time to query larger data sets and to use that information to prioritise contact strategies which should always include the use of SMS alerts as described here.

Here the fact that a transaction has fallen outside of the historical norm will be used as an input into other rules.  For example, if there have been more than three transactions on an account in a day and at least two of those were at new merchants, a phone call is queued.

Read Full Post »

Protecting a bank from external fraud (as apposed to internal financial fraud which is a matter dealt with by auditors) is a multi-faceted task that requires the input of several important teams.  The three most active of these can broadly be called Fraud Operations, Fraud Analytics and Information Technology* (IT).  This article will consider the relative “geography” of these teams and the internal structural requirements of each.

Fraud Operations is at the figurative ‘coalface’ of the fight against fraud.  This is the team that monitors the queues of system-generated alerts, identifies suspicious transactions, contacts customers to confirm or allay those suspicions and performs the administrative work required when a fraud is confirmed – closing the account, listing its details on industry databases, processing charge-backs, etc.  The success of this team is dictated by the efficiency with which it processes alerts.

If the success of Fraud Operations is dependent on the efficient execution of a given set of tasks, the success of Fraud Analytics is dependent on the inherent effectiveness of those tasks.  This team is responsible for analysing implicit and explicit data to optimise fraud-detecting rules and systems.  To do this well, the team must perform a mix of re-active and pro-active data analyses.

Re-active data analytics optimises the performance of the system by optimising the performance of its existing components, and does so in two important ways – the statistical review of historical data and the post-hoc reporting of fraud performance and trends.  Historical data is analysed to identify embedded patterns that may be indicative of fraudulent spend.  The results of such an analysis are then used to inform the design of the rules that will scan transactions and generate the alerts to be worked by Fraud Operations.  Historical data is also used as the basis for management reporting and, in particular, the reporting of prevailing fraud trends and the recent performance to budget. 

Pro-active data analytics, on the other hand, optimises the performance of the system by creating entirely new components.  Historical data is still a key input into the process but the results of the analysis thereof are forward-looking and usually presented in the form of a business case or project proposal.  So, where the results of re-active data analytics may inform an improvement in a particular fraud-detection rule, the results of pro-active data analytics may suggest a pilot project to test the value of SMS transaction alerts as an alternative to the rule altogether.

Linking these two teams is the IT team.  IT maintains the systems on which Fraud Operations depend and implements the updates and upgrades suggested by Fraud Analytics.  The role of IT is primarily to enable the efficient execution by Fraud Operations of the strategies set by Fraud Analytics.  As such, their success is linked directly to the performance of the systems they maintain and which, should they fail, have the potential to undermine the performance of the other major stakeholders.

The specific roles and responsibilities of each of these teams should be clearly demarcated and used to inform the evaluation team performance and the recruitment process.


 * IT is an overly inclusive term and so, in any sufficiently large organisation, only some smaller part of the IT function will be involved in fraud prevention.  Nevertheless, the term will be used un-adapted in this article.

Read Full Post »

Managing transactional fraud is like searching for a needle in a haystack.  Except the needle is moving and the haystack is growing!  Faced with an environment as complex and daunting as this, banks invest large amounts in increasingly sophisticated fraud detection systems.  These systems are typically built around a statistical model and aim to identify those transactions which most closely resemble previous fraudulent transactions.  These systems seek to increase the efficiency and effectiveness of the system by increasing the probability that each customer contact will detect and confirm fraudulent spend while simultaneously increasing the total number of fraudulent transactions detected.

Investment in large transactional fraud systems is justified by the ever-increasing cost of fraud losses.  However, the idea that they alone can solve the problem is based on an old paradigm.

Traditionally, communicating directly with customers was expensive and time-consuming.  To confirm fraudulent transactions banks needed to contact customers telephonically.  Since it was not financially viable for banks to contact every customer to confirm every transaction, they invested in systems and analysts that could screen the mass of transactions and identify only those transactions likely enough to be fraudulent so as to warrant the cost of a confirmatory phone call.  This was true even while the configuration of those systems necessarily resulted in fraudulent transactions being ‘missed’.  The companies that produced these transactional fraud detection systems, meanwhile, focused their efforts on making them ever better at calculating the probability of any one transaction being fraudulent.

But the key underpinnings of this paradigm – namely that staff and communication are both expensive – are no longer true.  Once the old paradigm is abandoned, it is possible to find significant value in simple and cheap solutions like SMS transactional alerts.

An SMS transactional alert is an informative SMS that is automatically generated whenever a transaction meeting pre-set criteria is processed on a credit card.  These SMS alerts typically include some basic information about the transaction and ask customers to phone or text the bank in the event of that transaction having not been originated by themselves.

SMS alerts are inspired by a new fraud management paradigm, one that is underpinned by the assumption that ‘staff’ can be free and that communication is very cheap.

SMS alerts clearly don’t change the direct costs of employing staff.  Rather, they transfer the workload of screening alerts from paid employees to unpaid customers.  If the bank sends an SMS alert to a customer, it is that customer who takes the time and effort to validate the transaction.  So, where once a large team of employees was needed to analyse transactions and to contact customers to confirm suspected frauds, it is now possible to screen almost all transactions with a small team of employees and a very large ‘team’ of customers.

It was the high cost of communicating with customers that made it essential for suspicious transactions to be manually screened and reduced before customers were contacted.  But, none of this is necessary now that banks can contact customers instantaneously and very cheaply through SMSes.

As a fraud prevention tool, SMSes do not preclude the need for traditional fraud management tools.  Rather, they free up manual resources and allow staff to focus immediately on the highest risk as identified by these systems.

When implementing SMS alerts, it is important to avoid two common mistakes that are often made when old paradigm thinking is allowed to persist.  Customers should not be charged for the service – though in some markets the practice does exist – and the triggers should be easily understood.

The value of an SMS alert system increases with its coverage, not with its efficiency.  Every SMS alert saves more money than it costs.  Therefore, the bank saves more money as each additional customer is enrolled in the programme.  By trying to recover the running costs directly from its customers, a bank limits the scope of its programme and, in so doing, limits its savings.  Though, in some markets banks have successfully charged for the service without major reductions in customer take-up rates.

Alerts should be sent for all transactions over a nominal value-based trigger – either enforced or customer-selected.  It may be more efficient to send alerts based on calculated fraud rules but this, again, is false economy.  Because staff are free and communication is cheap, it is now cheaper to send alerts for all transactions than it is to risk missing a fraud.  It is also preferable to meet customer expectations by generating alerts when – and only when – they are expected.

These alerts are not just a cheap way to limit fraud, they’re also a very effective way to do so.  When used fraudulently, an account that receives SMS alerts is likely to suffer losses fifty to seventy percent lower than those experienced by a similar account not receiving SMS alerts.

The benefits are not restricted to fraud savings either – customers value SMS alerts.  An SMS alert programme is therefore a win-win offering that reduces fraud losses while improving customer service.  The second non-financial benefit is an improvement in customer contact data.  Because customers expect and appreciate SMS alerts, they quickly become aware of any breakdown in communication between the bank and themselves.  And, because they appreciate these alerts, when they become aware of these broken communication lines they are more likely to pro-actively contact the bank to update their contact details.  Since all functions of the bank can access this information, they too benefit from better contact rates for their strategies.

In summary, a bank with a good SMS alert programme is likely to have lower fraud losses, lower fraud operational costs, happier customers and better customer contact details.

Read Full Post »