title: “📖 Notes on MSFM - Credit Risk Model”
date: 2019-12-04
tags: notes
mathjax: true

home_only: true

Standard Simulation Model on Credit Portfolio

Credit Risk

Lenders, such as banks, are subject to many kinds of risks. among which credit risk is the most likely to cause bank failure.


Each loan is part of a legal agreement that requires the borrower to pay interest and repay principle on schedule, while some borrowers are required to obey specified covenants, such as maintaining earning above a certain threshold.

If the borrower fails to follow the agreement, the lender holds the borrower to be in default, which can be money default or covenant default. Purchaser of public bonds only experiences money default.

At default, the loan agreement calls for fee to be paid by the borrower, gives the bank power to seize collateral (for secured loans), and has a cross default provision (where all loans are in default once one loan is in default).

In the 20th century, most banks did not define default until they discovered a model that could help them manage credit risk.


Rating Agencies

There are 3 major Nationally Recognized Statistical Rating Organizations (NRSRO) to which firms pay to rate their bonds to increase liquidity.

Under S&P ratings, the grades are:


D and PD

Let D be the default indicator of a loan, taking only two values: 0 and 1. PD is the probability of default annually.

By mathematical identity:

In a portfolio of N firms, the portfolio default rate, DR, equals:


Exposure, Recovery and LGD

Exposure is the amount that is owed to the borrowers. Recovery is measured in either of two ways:


LGD (Loss Given Defaults) is a random variable with values usually between 0 and 1:

For a defaulted loan, there are two ways to measure recovery/LGD. For a current loan, there is a distribution for LGD. The expectation is written as:

US investment grade bond LGD is about 0.20%, while non-investment grade is about 3.60%. Bank loans are almost alwasy senior to bonds and have lower LGD.

Loss and EL

Loss is measured as a fraction of exposure:

EL is the expected loss. Because D and LGD are indepndent, so:

Lenders often need to estimate and include EL in the spread they charged.


Change Of Variable

Note the LGD is often measured in fractions. To change the measure to dollar amount, we need to use the Chain Rule.

Given the pdf of LGD:

We define the function g such that:

Hence the function g-inverse is:

The partial derivative can be expressed as:


By definition:

Taking derivative on both sides and with chain rule:

Finally:


Simulate Portfolio Loss On One Single Loan

We know that:

To simulate loss, we first simulate D:

Draw x ~ Uniform[0, 1]
    If x < PD, then D = 1
    Else D = 0

Then simulate LGD based on the pdf of LGD. Multiple each D and LGD to get Loss. Repeat the process to produce a distribution of Loss.


Simulate Portfolio Loss On N Independent Loan

Assume the default of each of the N loan is independent and have the same probability of default, PD:

Then the total number of defaults follows binomial distribution:

However, based on historically data, the variance is much higher than that of the binomial distribution. Hence default correltion needs to be introduced.


Simulate Portfolio Loss On N Correlated Loan

Assume that there is a latent unobserved variable zi that is responsible for the default of firm i, i.e. firm i defaults if:

Assume any two firms i and j are jointly normal. Denote the correlation between zi and zj:

Let ri, j be the correlation between asset return of firm i and j, we know that almost certainly:

Denote PDJ as the probability that both firm i and j default:

To calculate PDJ with python:

import numpy as np
from scipy.stats import norm
from scipy.stats import multivariate_normal


PD1, PD2 = 0.1, 0.2
mean = [0, 0]
cov = [[1, .5], [.5, 1]]

result = multivariate_normal(mean, cov)
PDJ = round(result.cdf(np.array([norm.ppf(PD1),
                                 norm.ppf(PD2)])), 4)
print("Pr[D1=1, D2=1]:", PDJ)

Returns:

Pr[D1=1, D2=1]: 0.0515

Now that we have the Di, we can simulate portfolio loss rate, given the LGD distribution and exposures for each firm.

Denote Dcorr to be the correlation between Di and Dj:


Note that holding PDi, PDj fixed:


Copula

When we model more than three firms, pair-wise correlation is not enough to determine the entire distribution of outcomes. For example, there are N PD’s and N(N-1)/2 pair-wise correlations while we want to calculate 2N outcomes. Hence we introduce the Gauss copula which helps describe the group-wise correlations.

Consider a set of multivariate normals:

The quantiles of the set are uniformly distributed by definition:

The copula of the set (Z1, Z2, …, ZN) is defined as the joint cumulative distribution function of (Φ(Z1), Φ(Z2), …, Φ(ZN)):

The Gauss copula is as follow. Note that among all possible copula, the Central Limit Theorem defines and supports the Gauss copula:


In fact, the copula does not contain any information on the marginal distribution. Here we set the marginal distribution FZ to follow standard normal only as an example, but it can be anything continuous such that:

And so:



In the context of default modeling, we assume that each company’s default follows Bernoulli and simulate with standard normal distribution:

The probability of all firms default at the same time is by definition:

Note that given a pair-wise correlation matrix Σ, this probability can take any values between 0 and the lowest single firm default probability.

Now we assume all firms’z are connected by the Gauss copula, which suggests a single value for the probability of all defaulting.


With python we can either numerically evaluate the integral or use simulation to calculate the probability that all firms default at the same time.

import numpy as np
from scipy.stats import norm
from scipy.stats import multivariate_normal

np.random.seed(9999)
PD = [.5, .4, .3, .2, .1]
mean = [0, 0, 0, 0, 0]
cov = [[1, .05, .1, .15, .2],
       [.05, 1, .25, .3, .35],
       [.1, .25, 1, .4, .45],
       [.15, .3, .4, 1, .5],
       [.2, .35, .45, .5, 1]]

result = multivariate_normal(mean, cov)
PDA = round(result.cdf(np.array(norm.ppf(PD))), 4)
print('Probability Of All Default:', PDA)

N = 10000
simulation = norm.cdf(np.random.multivariate_normal(mean, cov, N))
D = np.array(np.sum((simulation < PD), axis=1).tolist())

PDA_simulated = round(np.count_nonzero(D == 5)/N, 4)
print('Probability Of All Default (Simulated):', PDA_simulated)

DR = round(sum(D)/(5 * N), 4)
print('Average DR:', DR)

Returns:

Probability Of All Default: 0.017
Probability Of All Default (Simulated): 0.0168
Average DR: 0.3014

Note that the compared to the other copulas, the Gauss copula requires only a pair-wise correlation matrix and the PD to tell a lot of information. Most of the times the Gauss copula has not been shown invalid, while the calibration of the marginals and correlation matrix are often proved erroneous.


Simulate Rating Transitions

The default model only has two states, 0 and 1:

To simulate rating transitions, we require two matrix:


Factor Model

Single Factor Model

We construct the single risk factor model with latent variable Zi:

The pair-wise correlation between two firms i and j’s latent variables is:

Where:

cDR and Vasicek

Define Conditional (Expected) Default Rate (cDR) as:

This gives the final form of cDR, which is called the Vasicek formula, named after Oldrich Vasicek. Note that the Vasicek formula is monotonic in z and in PD, i.e., higher the z/PD, higher the cDR.

The expected default rate for firm i is always PDi, since:

However, when Z is known, the expected default rate is cDRi. Firms are now uncorrelated as Z is known:

If there are large numbers of identical firms with uniform PD and ρ, the default rate of such asymptotic portfolio follows the unconditional Vasicek distribution.

The unconditional Vasicek pdf can be derived with change-of-variable technique. Note that we eliminate z and the pdf only has parameter PD and ρ:

The mean of cDR is PD:

Multi-factor Model

Suppose that there are two jointly normal systematic risk factors ψ and ω, and that there are two group of firms depending on each of the factors:

Between the two groups:

Note that:

Basel II Capital formula

The Bank For International Settlements is in Basel, Switzerland. The Basel Committee on Bank Supervision drafted legislation requiring banks to hold minimum capital, e.g. Basel II, Basel III, etc.

The Basel II formula is an Asymptotic Single Risk Factor model, where the portfolio is large enough for the Law of Large Number to work and it generalizes the Vasicek Distribution and include a diverse choice of PD and ρ within the portfolio. The core of the capital requirement for credit capital is the inverse CDF of Vasicek Distribution.

Inverse Vasicek (with parameter PD and ρ):

Note:

Making sense of the Basel II formula:


Estimation, Statistical Test and Overfit

Estimating PD

Firms differ widely in their credit quality, and PD tend to change over time as well. So a firm’s PD is neither known or fixed. We analyze analogous firms with identical credit ratings to estimate PD.

Method 1, for all A-rated firms in the dataset:

Method 2, for all A-rated firms in the dataset:

Method 3, estimate PD as a parameter in a pdf describing A-rated firms. This tries to find a distribution that best fits the data. We will focus on this method.

Method Of Moments

Given a dataset {Xi}N, we set the moments of the Vasicek distribution equal to the moments of the data.

First moment:

Second moment (unbiased, using N-1 in denominator):

Note:

Maximum Likelihood Estimation

The MLE method chooses parameter values that make the data most likely under the assumed distribution. MLE matches the distribution to the data as a whole, as oppose to M.o.M. which only matches the moments. The MLE fits the PDF to the dataset.

When data is not highly dispersed, however, the MLE estimate tend to be close to the M.o.M. estimate.

The MLE method is biased estimate that choose parameters that maximize the likelihood function. Given a dataset {Xi}N, we assume the true default rates follow Vasicek distribution. The likelihood function is:

Often we try to maximize the log-likelihood function, i.e. find PD and ρ such that:

Hypothesis Testing & Wilks’ Theorem

We does not assert truth, as truth is often unknown. With a given set of data, we can only assert some models are better in predicting the future behavior of similar data.

We called the simpler model the null hypothesis, the more complicated ones the alternative hypothesis. The null generally nests under the alternative, i.e. the alternative becomes the null when some parameters are set to certain values.

We prefer the null, because it is simpler, and by doing so we avoid Type 1 error, which is the rejection of a true null.

Hence we only reject the null if the alternative fits the data significantly better through a statistical test.

Wilks Theorem asserts that if:

Then D has a distribution that approaches the χ2 distribution (with df = number of extra parameters in the alternative), given dataset {Xi}N:

The likelihood ratio is defined as follow. It is less or equal than 1 as the alternative is more flexible, and it leads to more probability densities given certain data:


We reject the null hypothesis if D statistic is a tail observation that either the null is not true or the null is true and something (type 1 error) unlikely happen. We reject the null when:

For example when df = 1, the critical value = 3.84, we will reject the null with 95% confidence when:

Overfit

An overfit model makes worse forecast than a simpler model.

We assume the population data (X, Y) follows bivariate normal distribution:

Given ρ, the population regression line is:

The sample regression line is:

From a sample of 30 observations of (X, Y), ordinary least square (OLS) is performed to find the in-sample p-value for the coefficient and R2. MSE is used to evaluate forecast error.

image1.png

image2.png

This shows that when the population has a week relationship (ρ = 0.2), estimates of slope are more dispersed.



Now we look at the relationship between statistically significance and MSE. The population Mean-Squared Error (MSE) is an out-of-sample measure of forecast errors. The population MSE does NOT depend on any in-sample data:

We can see that the population regression (b = ρ, a = 0) would minimize MSE, by taking partial derivatives. We can also see that higher the ρ, lower the MSE.


A regression is significant (at 95% confidence) if the p-value for the coefficient b is less than 0.05.

We have observed that when population has a weak relationship (ρ = 0.2):

This is because the strong relationship suggested by the regression does NOT forecast the week population relationship well.

When population has a strong relation (ρ = 0.8), however, the significant regression/high R-square holds out-of-sample.


Conditional LGD Risk

cLGD

The history of bond LGD shows that LGD is elevated when default rate is elevated. The elevation is shown to be moderate and similar across different debt types:

It is important to model LGD appropriately in different economic conditions. Like cDR, we define cLGD:

Note that:

There are two ways to calculate ELGD:

Futhermore,

Where:

Frye-Jacobs

Modeling cLGD separately from cDR introduces complexity and potential overfit to the cLoss model. Instead, the Frye-Jacobs LGD function assumes that both cDR and cLoss follow Vasicek distribution, and infers cLGD as a function of cDR.

Frey-Jacobs assumptions:

  1. cDR and cLoss are comonotonic.

  2. cDR follows Vasicek distribution, which stems from the simplest portfolio structure:

  1. Distribution of cLoss does NOT depend of the definition of default.
    \times This implies the distribution of cLoss does not have separate parameters for PD and ELGD. It does have a parameter EL.

  2. cLoss follows Vasicek distribution

  1. cLoss and cDR have the same ρ parameter.
    \times This ensure that the LGD function is monotonic

Finally,

Observations:

  1. cLGD is strictly monotonic with range (0, 1), for all k
  1. cLGD increases slowly, and similarly for all k, at low cDR
  2. Elasticity is greatest for loans wth low LGD.

Frye-Jacobs: Develop Alternative Hypothesis

Introduce an additional sensitivity parameter to test the slope of the LGD function.

We know that:

In integration form:

Bring in the Frye-Jacobs cLGD function:

Note that EL is in both lhs and rhs, divide both EL by ELGDa:

Note that we have identified a new LGD function:

Analyzing the choice of a:

Frye-Jacobs: Hypothesis Test

We introduce finite portfolio, which brings randomness into the D’s and LGD^{dollar}s.

Under finite portfolio, the probability of 0 defaults is:

When conditional on cDR and Σ D > 0, the average portfolio LGD rate is normal:

Let Y ~ N(0, 1) be a standard normal variable, then LGD becomes:

Now calculate Loss based on DR and LGD:

Use change-of-variable technique to calculate the pdf for Loss:

Where:

Finally, the pdf of loss conditional on Σ D and cDR:

Removing the conditional, the distribution of loss in a uniform portfolio, with N loans, same PD and ρ and the cLGD function, becomes:

Here is a plot of the the unconditional loss density in a finite (N = 10) portfolio in red and loss density in an infinite portfolio (Vasicek) in blue. (note that the plot use D to denote Σ D):

image3.png

Now we have the pdf for loss, we an test the hypothesis:

As a result MLE(a) = 0.01 based on all loan data and the test failed to reject the null. Same with other bonds and bonds/loans data combination. We conclude that the Fyre-Jacob model is consistent with Moody’s data

Vender Estimation

Distance-To-Default and EDF

Robert Merton argues that:

Moody’s suggests that loan contains the option to default, and attempts to use risk-neutral probability to estimate the probability of default. In the context of a put:

Under Moody’s assumption, the firm has an option to default on its assets once it drops below its liability. Here, liability is the strike price, for which Moody’s uses D, or “default point”, to denote short term debt plus half of long term debt to represent liability. DD stands for Distance-To-Default, suggested by Merton. So the probability of default is:

Moody’s then estimates the value and volatility of the assets (unobservable) based on the value and volatility of the market capitalization (observable).

However, since Φ(-DD) gave very poor estimate for the default probability, Moody’s sets the EDF(Estimated Default Frequency) of a firm equal to the average historical default rate of firms with the same Distance-To-Default. An EDF uses DD to find historical analogs of current firms.

Correlation

Merton assumes that the correlation ρ between the latent variable Z’s is equal to the asset return correlation r.

However, data suggests that correlation estimated from credit data is less than the correlation based on asset returns. Hence a credit portfolio model that uses asset correlation to estimate ρ overstates credit risk.