Increase of banks’ credit risks forecasting power by the usage of the set of alternative models
expand article infoAlexander M. Karminsky, Ella Khromova
‡ National Research University Higher School of Economics, Moscow, Russia
Open Access


The paper is aimed at comparing the divergence of existing credit risk models and creating a synergic model with superior forecasting power based on a rating model and probability of default model of Russian banks. The paper demonstrates that rating models, if applied alone, tend to overestimate an instability of a bank, whereas probability of default models give underestimated results. As a result of the assigning of optimal weights and monotonic transformations to these models, the new synergic model of banks’ credit risks with higher forecasting power (predicted 44% of precise estimates) was obtained.


banks, credit ratings, probability of default, ordered logit models, ordered probit models, rating agencies

JEL classification: G21, G33.

1. Introduction

Economic growth and stability of any country depend on the financial environment of its banking system. Given the critical role of banks as financial intermediaries, the estimation of their financial stability is one of the main goals of regulators and government. The most commonly used ways for assessing the financial performance and controlling the level of credit risk of a bank is an evaluation of its probability of a default and a rating grade. The probability of default (PD) is the likelihood of a bank failure over a fixed assessment horizon while a rating determines the class to which a company belongs based on the PD. Although both of these methods have been intensely studied, the forecasting power of these models still has a wide area for improvement. There are possible biases that may lead to misleading results. PD estimates provided by a model forecast are underestimations, because of imbalanced structure of datasets containing defaults. The occurrence of the default event is rare, so a PD model becomes overfitted towards non-default events. Even the classical balancing data methods provided by He and Garcia (2009) and Garcia et al. (2012) do not fully solve this problem and a PD model gives underestimated results (Karminsky and Kostrov, 2017). On the other hand, rating models are not fully reliable either. The main reason for that is a bad proxy for the dependent variable in the model. Researchers obtain only information about ratings that were assigned by some rating agencies (RAs) and they have to assume this information to be absolutely true and objective. However, in reality, a rating assessment is a subjective opinion of an agency that depends on its conservatism and methodology. Indeed, it was proven that, recently, RAs care a lot about their reputation and try to be overcautious in order not to miss a financial disease of a bank. In the context of this problem, the first hypothesis of this paper is formulated (Hypothesis 1): There is a significant divergence in the predictions of credit ratings’ and PD models: credit ratings’ models tend to overestimate the financial disease of a bank, whereas PD models give underestimated results.

In the presence of this divergence, this paper is aimed at adjusting the previously used models of credit risks to a single scale and creating a synergic reliable model of banks’ credit risks by using the set of alternative models based on publicly available information. PD models and rating models were chosen as a set of alternative models that will be considered in this research, but further research will provide the joint forecast of a wider set of credit risks models. According to the second hypothesis (Hypothesis 2) of this research, the usage of the set of alternative models (ratings and PD) will improve banks’ credit risks forecasting power. The relevance of the paper is determined by the ability to compare and interpret different credit risks models and to evaluate the financial stability in a valid and consistent manner.

This research is based on the “Banks and Finance” database provided by the informational agency “Mobile”. The panel dataset of Russian banks was used in the analysis. The total number of banks after filtration was 395 (86 of them experienced the default). The financial performance of these banks was considered on a quarterly basis from the year 2007 to 2016, so the overall number of observations was 11,627 which should be sufficient to make consistent conclusions.

The rest of the paper is structured as follows. The second section provides results of a literature review. Then the analysis of empirical data and the formation of a representative sample are illustrated. The third section deals with the econometric models for forecasting a bank’s rating and PD on the same dataset with the further check of their “goodness of fit”. In the fourth section, PD’s and rating’s estimates are calibrated to the common scale and distributions of their forecast errors are compared and the divergence in these estimates is analyzed. As a result, the synergic model with a higher forecasting power is constructed by assigning optimal weights and monotonic transformations to PD and rating models. Finally, the synergic model is further checked for its out-of-sample fit and conclusions are formulated.

2. Literature review

The paper unifies two seemingly separate areas of economic literature. The first area addresses the issue of underestimation of credit risk by default models, while the second area concerns the overcautious assignments of credit ratings.

All recent studies advise to pay great attention to the presence of the class imbalance problem in data on defaults and its impact on the estimation procedure and on some standard forecasting power indicators (Esarey and Pierce, 2012; Karminsky and Kostrov, 2017; Lanine and Vennet, 2006). Few events of default are usually available to estimate the model properly in the training set. The main consequence of the class imbalance problem is the underestimation of the “rare” class, which will deteriorate the forecasting power for bank failures (Florez-Lopez and Ramon-Jeronimo, 2014; Rösch and Scheule, 2014). Garcia et al. (2012) discuss the class imbalance problem and methods to overcome it. Amongst the mostly used methods are random omission of non-defaults, random inclusion of defaults and increase in weights of the rare class observations in a log-likelihood function.

As the second literature stream, there exists a long-time tendency for estimation the differences between ratings assessments of different RAs. Despite the fact that many rating agencies use similar letter designations, the approaches to financial analysis differ amongst them. It was observed that the rating agency, Standard & Poor’s, is more cautious and conservative when evaluating the financial stability of banks, compared with its two largest competitors Fitch and Moody’s. Also, it was revealed that Moody’s approach to the assessment of banking risks is the most liberal (Karminsky and Peresetsky, 2007; Karminky and Khromova, 2016). Many authors studied a consistent difference between the scores of the various rating agencies and the financial stability of corresponding banks (Morgan, 2002). It was found that, previously, the activity of rating agencies has had little regulation, allowing rating agencies to avoid responsibility for inaccuracies (overestimation) in assigned ratings, while investors were suffering huge losses (Solovjova, 2016). Santoni and Arbia (2013) noted that the reputation of RAs has steadily deteriorated due to some notable failures (Enron, Worldcom, Parmalat) and to the subprime crisis (2007–2009). However, in the most recent times, it was shown that RAs are very cautious in estimation of banks’ financial stability as their reputation fully depends on it. The reputation of a RA suffers more when an agency predicts a higher rating grade than it should do. Therefore, nowadays, RAs tend to react sharply on any bad news for a well-performed bank by trying to predict the worst scenario of its performance, because for them it is better to reassign the rating to a higher grade some time later, than not to capture the worsening of financial performance and lose their reputation. It was shown that the ratings models overestimate financial instability of a bank and this gets worse if the out-of-sample model is applied (Karminsy and Khromova, 2016). However, this skewed estimate is not so dramatic because RAs are always bound by the willingness of their clients and, if the ratings are significantly underestimated, many of the volatile banks will just avoid buying ratings from RAs.

Therefore, observing the divergence of ratings and PD modeling, an idea of combination of these two forecasts in order to increase the predictive power of financial instability of a bank has come to different researchers. Note that these two approaches give exactly opposite skews of their predictors that make their combination even more reliable. For example, Godlewski (2007) provided comparison of banks’ credit ratings in emerging countries and their corresponding probabilities of default. The research showed the rating tends to aggregate banks’ default risk information into intermediate-low rating grades and thus proved ratings’ partial divergence with the results of a PD scoring model. Following that, Pompella and Dicanio in 2017 introduced a new approach (PCMahalanobis method), which has parts from PD and credit rating modeling, for testing the validity of bank ratings assigned by RAs. However, the PCMahalanobis method does not provide numerical interpretation of results and allows only the determination of whether an observation belongs to any of the two binary groups: healthy or likely-to-fail banks. In contrast, this research provides a method for forecasting an exact rating of a bank with a 32 dimensional accuracy. Therefore, following a new literature stream, this paper provides a new algorithm for creation of a synergic model that was applied on the rating model and the PD model of Russian banks.

The algorithm of this paper includes several steps. The first step is to construct PD model and credit ratings’ models separately on the same dataset using the basic rating scale adjustment provided by Karminsky et al. (2011). This part of the research is based on the review of factors of potential influence on credit risk of a bank that was summarized in the previous paper of authors (Karminsky and Khromova, 2016). After the predicted values of both models are generated, calibration of ratings and PD by the methodology of Pomasanov and Vlasov (2008) is realized in order to bring ratings and PD into the single scale. Then the forecasting errors of each model are compared by the descriptive statistics parameters of their distributions (mode, median, skew). The divergence of both models from the perfect forecast is realized and the optimal weight coefficients and monotonic transformations for these two models that bring the forecasting errors’ distribution closer to a normal distribution are calculated. The obtained synergic model that consists of the set of alternative models is further checked for its out-of-sample fit.

3. Data adjustments

3.1. Building a representative sample from the empirical data

This research is based on the “Banks and Finance” database provided by the informational agency “Mobile”. It is a verified source of diverse information about international financial companies that is used extensively in the academic literature. The database provides monthly financial data that allowed us to obtain a panel dataset of Russian banks. Financial data includes balance sheet, income statement, calculated ratios and other information. There are 2071 banks in the “Mobile” database and the data was initially extracted from 2007 to 2016.

In order to generate a representative sample from the database, some data filtration methods were applied. First the bank’s distribution by the ownership type was considered. The focus of this paper is the individual profit maximizing banks, so all state-owned banks were omitted. The definition of state-owned bank was taken from the paper by Vernikov and Bokov (2008), where the direct government ownership is assigned to banks with more than 50% of shares that belong to the government (including different territorial entities of Russia or municipal corporations). According to this definition, 36 government banks were excluded from the dataset.

The main reduction of the sample size appeared due to the fact that only a small share of banks (395 banks) was assigned a rating grade. The data about history of rating changes was taken from and that are the main on-line aggregators of banking statistics. The extracted data contained assessments of national RAs (RAEX, Rus-Rating, AK&M, NRA, Ria-Rating) and international agencies (Moody’s, Standard & Poor’s or Fitch). Then the data on banks’ defaults were collected from and 86 Russian banks (which received a rating assessment at least once) had default in the concerned time period. Both ratings and defaults were added to the financial data with a two-quarter lag between them. This time lag was chosen due to the fact that the process of assigning a rating by the RA takes some time to complete all the necessary procedures.

The historical distribution of all Russian banks (before any filtrations) for the period from 2007 to 2017 is demonstrated on the Fig. 1. From Fig. 1, we can conclude that there was an increase in the number of defaults after the 2008 crisis. Moreover, the increase in sharpness of banking regulation proposed by the policy of Elvira Nabiullina in the Central Bank of Russia since 2014 is also highly noticeable on the diagram. The growth rate of defaults in 2014 achieved 98%. The default policy of the Central Bank had no correlation with the fact whether the bank had a rating assessment or not. It can be seen from the consistency of proportion of default banks amongst overall number of banks (447 defaults/2071 total = 21.6% of defaults) and banks that were assigned a rating at least once (86 defaults/395 total = 21.8% of defaults). That shows us the fact that the policy of the Central Bank was aimed at all banks with financial distress and influenced the banks with top popularity at the same rate as unfamiliar banks without rating. This fact means that there is no sample bias of modeling a behavior of banks with a rating grade and extrapolating the results to a total population of Russian banks. Moreover, it means that banks that have not yet been assigned the rating grade are not worse than those with rating, so that again emphasizes the need for obtaining a reliable model that can assess the credit risk grade of a bank based on the publicly available information.

Fig. 1.

Historical annual distribution of defaults of Russian banks from 2007 to 2017.

Source: Authors’ calculations.

The initial database was imbalanced (223 of defaults compared to 11,404 of non-defaults). The nature of imbalanced data is intrinsic (corresponds to the nature of a data set). Furthermore, there is no data available for the “default” class after the bank had experienced the default. It leads to embedded rarity and within-class imbalances as well as the failure of generalizing inductive rules by learning algorithms (He and Garcia, 2009). Therefore, the combination of random undersampling and oversampling methods was used to obtain the final data provided for the regression analysis. Thus, we randomly selected a set of majority class examples and removed these samples from data and then the random set of minority class with new banks’ names was added.

3.2. Adjustment of ratings to the base scale

RAs assign their grades in a symbolic form. However, in order to obtain coefficient estimates in an econometric model, these symbols should be transformed into numerical values. Moreover, symbolic ratings of different rating agencies should be unified to the base scale. The process of comparison of rating scales was taken from the paper of Karminsky et al. (2011).

As a result of comparisons of multiple mapping, it was found by Karminsky et al. (2011) that the best transformations of scales are obtained by using the class of linear-logarithmic transformations. In this case, the parametrization of mappings implies finding a pair of coefficients for mapping each of the scales into a basic one (free term and coefficient in front of the logarithm of the described rating scale). Moody’s was chosen as a “dependent” agency and, therefore, the base scale is associated with the international scale of this agency. The reason for this choice was the fact that Moody’s international scale is the closest one to the base scale. Therefore, the mapping was done by the following regression:

LN(M)=αiLN(Ri)+bi (1)

where M is a Moody’s international scale, taken as a base scale and Ri is the scale of the rating that should be transformed to a base scale. To build the models, the agencies’ statistics from the first quarter of 2006 to the fourth quarter of 2010 for Russian banks were used. Therefore, it allows a solution using the least-squares regression. The calculated coefficients for international agencies like Moody’s, Standard & Poor’s, Fitch (both international and national scales) and national agencies RAEX, NRA, Rus-Rating and AK&M were taken from the paper of Karminky et al. (2011).

In this research, one more Russian rating agency was added to the comparison list: this agency is Ria-Rating with the estimated regression coefficients αi = 0.278 and bi = 2.392 (all coefficients are statistically significant at 1% level). The comparison of rating scales is summarized in Appendix Table. The symbolic rating was transformed into the numeric scale beginning from 1 that is given to banks with the best rating and ending with the last largest number for the worst rating.

In order to avoid the loss in consistency for the model, the ratings were assumed to be unchanged until the moment of the new rating assignment. The final version of the dependent variable was obtained by averaging all single scale numeric grades of a bank in a particular quarter for all rating agencies. However, the averaging procedure brought us to the numerous non-integer rating groups (e.g. rating = 17.43) and the difference between this groups was too small to be properly modeled. For this reason, the numeric rating was rounded to the closest integer. Therefore, in this paper, 30 different groups of ratings were considered.

4. Construction of empirical models of PD and credit ratings

4.1. Credit ratings and PD models

The models introduced in this paper allow interested agents to determine the probability of default and credit ratings for Russian banks, having at their disposal only public information. As for the modeling methods of this research, binary logit/probit regressions were chosen for PD estimation and multinomial ordered logit/probit for credit ratings modeling. It was shown (Jiao et al., 2007; Karminsky and Kostrov, 2017; Zan et al., 2004) that the predictions of more complex modeling methods like artificial intelligence models do not outperform the standard binary and ordered multinomial models. These methods were described and applied in the paper of Karminsky and Khromova (2016), and in Karminsky and Kostrov (2017).

The optimal set of indicators was selected on the basis of the most significant parameters that were chosen by a stepwise procedure (Hajek and Michalak, 2013), overall significance of the model (likelihood ratio test, pseudo-R2, in-sample fit of the model) and the smallest Akaike and Schwartz information criteria (AIC & BIC). Also, the expected signs of the coefficients were considered. During the comparative analysis of logit and probit regressions, the decision, based on the minimization of the AIC & BIC and the greatest number of significant coefficients, was made in favor of the probit model. Moreover, concerning the specification of unobserved heterogeneity term, random effects models were chosen to be the best fit for both PD and ratings models. This conclusion was based on the Durbin-Wu-Hausmann test and RHO statistic.

The variable specification of the models was continuously challenged by the choice of financial variables, their cross terms and macroeconomic variables used as principal components (PCs) (explained in the section 4.3). The final models were checked for multicollinearity and all explanatory variables had correlations less than 35% and reasonable descriptive statistics. The results obtained by the panel probit regressions of PD modeling are shown in Table 1, while credit ratings models are provided in Table 2. All regressions were conducted for two different samples: a sample without imbalanced data reduction (N = 11,627) and a sample with omission of non-defaults combined with additional random inclusion of defaults (N = 3,289). Two different types of variable specifications were tested: a model with financial variables only (2) and a model with financial variables combined with Principal Components (3).

The results of PD models for different samples and different groups of variables.

Dependent variable / Independent variables 1.1.
Financial var only
Financial var and PC
Financial var only
Financial var and PC
Equity / Assets –10.023**
Operational expenses / Operating income 1.72***
Net interest margin –2.261*
Interbank ratio –0.0004**
Bank equity / Equity of all banks

Share PC –12.25***

Log total assets –1.909***
(Log total assets)2
Current ratio (CR)

Asset Liq PC –0.004***
Loan loss reserves / Gross loans 0.159***
CR × RGDP growth rate

CR × Loan loss reserves / Gr. loans
Real GDP growth rate

Macro PC1 PC2
CPI growth rate
Exchange rate USD/RUB 6.045***
RGDP per capita
Trade balance –3.018***
Number of observations 11 627 11 627 3289 3289
Log L –538.21 –307.55 –467.97 –309.03
Log Lo –1281.45 –1281.45 –882.97 –882.97
Pseudo R2 0.58 0.76 0.47 0.65
% of correct predictions
% Type I error
% Type II error
31.3 34.5
AIC 24 779.704 19 347.724 1276.8 845.2
BIC 24 909.996 19 722.892 1387.2 912.4

The results of ratings models for different samples and different groups of variables.

Dependent variable / Independent variables 1.1.
Financial var only
Financial var and PC
Financial var only
Financial var and PC
Equity / Assets –0.249***
Operational expenses/ Operating income

Net interest margin –0.029***
Interbank ratio –0.569***
Bank equity/ Equity of all banks

Share PC

Log total assets –0.612*

(Log total assets)2 –0.009**
Current ratio (CR)

Liq PC
Loan loss reserves / Gross loans
CR × RGDP growth rate
CR × Loan loss reserves / Gr. loans –0.002***
Real GDP growth rate

Macro PC1 PC2
CPI growth rate
Exchange rate USD/RUB
RGDP per capita
Trade balance
Number of observations 11 627 11 627 3289 3289
Log L –3254.21 –2964.40 –2965.91 –2283.72
Log Lo –5827.95 –5827.95 –4361.92 –4361.92
Pseudo R2 0.53 0.68 0.38 0.47
% of correct predictions 12 18 5 7
AIC 24 779.704 22 347.724 12 779.735 11 649.745
BIC 24 909.996 22 722.892 12 937.468 11 732.491

, (2)

, (3)

4.2. Interpretation of financial factors of influence on credit risk of a bank

In order to interpret the signs of the estimated coefficients correctly, one should remember that the higher dependent variable is, the higher is the probability of default and the higher is the numeric value of a rating, which corresponds to banks with low financial stability. Keeping this in mind, we can conclude that all signs of coefficients coincide with their expected impact on PD and credit ratings for all regressions.

The first model specification (models 1.1 and 2.1) included only financial variables that were based on the BFSR methodology described in previous studies of authors (Karminsky and Khromova, 2016). The ratio of equity to assets, which shows the structure of a bank’s capital, appeared highly significant. This ratio shows the capitalization of a bank and is included in the model in order to capture the capital adequacy of a bank. It captures the ability of a bank to cover risks with its own resources and is inversely proportional to the financial leverage. Another parameter, the ratio of operating expenses to revenues, that shows the inefficiency of a bank, adversely affects banks credit risks and is significant in all models. The net interest margin shows the profitability of a bank and becomes highly significant in the models with PCs. The interbank ratio shows the share of issued loans in overall received funds on the interbank market. With the increase in this coefficient, a bank becomes less dependent on interbank loans and therefore its rating is raised. This parameter is highly significant in the regressions constructed without multicollinearity. It was repeatedly proved that market share is a significant factor for credit risks of a bank. In this research, the market share was estimated as the ratio of a bank’s equity to overall equity of all banks. The increase in market share significantly improves the financial performance of a bank in all models. The logarithm of total assets shows the size of a bank and has a positive relationship with a bank’s financial stability. Note that the second power of the bank’s size appeared also to be significant and shows a positive sign that tells us about the parabolic relationship between bank size and PD. It means that the largest and the smallest banks are the most unstable banks, so the idea of “too big to fail” was not supported by our research. The current liquidity of a bank is also a very important factor in evaluating its rating. A higher level of current assets compared to current liabilities decreases financial instability in each model. The impact of the ratio of loan loss reserves to gross loans appeared to be significant and almost the same in all models: the high level of reserves indicates the presence of “bad” loans issued by a bank and leads to a downgrade in its rating and increase in PD.

4.3. Principal component analysis

Macroeconomic variables and cross terms of financial variables are heavily correlated with each other, which inevitably leads to multicollinearity problems if no measures are taken. Therefore, due to this, the model is constructed primarily to be applied in forecasting and principal component analysis (PCA) is used to eliminate potential problems. PCA (Hotelling, 1933; Pearson, 1901) is exploited to reveal the intrinsic structure of the relations between the involved individuals and to reduce the number of dimensions needed to capture the dispersion. Implementation of this method is done in several steps. First, the means and standard deviations of each group of variables under consideration of PC are constructed. These groups are:

  • Asset–Liquidity group (includes 5 variables: Current ratio; Current ratio × GDP growth rate; Current ratio × Loan loss reserves / Gross loans; Loan loss reserves / Gross loans; GDP growth rate);
  • Market share group (includes 3 variables: Log total assets; Log total assets 2; Bank equity share in total equity of all banks);
  • Macroeconomic group (includes 4 variables: CPI growth rate; Exchange rate USD/RUB; GDP per capita; Trade balance).

Let us assume that initial variables from liquidity group are called LV1, LV2, LV3, LV4, LV5, from market share group — SV1, SV2, SV3 and from macroeconomic group — MV1, MV2, MV3, MV4. Then the data are transformed so that the stochastic process yields zero mean and unit variance, i.e.


Secondly, the normalized data are given a new orthogonal basis via constructing linear combinations of LV1norm ... LV5norm; SV1norm ... SV3norm; MV1norm ... MV4norm. New groups of variables are uncorrelated by construction inside the group. However, external multicollinearity may still be present, but it is below 35%. Note that only two principal components from each group were tested in the model as the first two components cumulatively explain more than 80% of the initial variables.

As soon as PCA transformation is completed, the coefficients are no longer interpretable in an economic sense. In order to calculate marginal effects of variables that are inside PCs, we need to make the return procedure from principal components coefficients to initial coefficients.

The process will be shown on the example of macroeconomic variables. As each of PC1, PC2 is a linear combination of MV1 ... MV4, one can plug into the estimation equation coefficients for each of the principal components and restructure the equation. The transformation will construct a way from principal components back to the original variables and will provide an opportunity to interpret the model:

Zit=β0+j=12βjPCjit+uit (5)

Zit=β0+j=12βj(k=14γjkMVk)it+uit (6)

Zit=β0+(k=12j=14βjγjkMVk)it+uit (7)

Marginal effect at any point is calculated as


If j=18βjγjk>0, one can consider MV has a positive effect on probability of default of a bank.

The reverse procedure of marginal effects provided us with the expected sign interpretation. In the second specification (models 1.2 and 2.2), various cross products and macro variables were tested. An asset-liquidity group principal component shows the interdependence of banks’ asset quality and liquidity, observing a tendency that banks with better loan portfolio tend to have a stable liquidity. Moreover, it shows a correlation between a GDP growth rate with liquidity of a bank. An increase in GDP growth rate leads to an increase in investments and savings of firms and households and they, in its turn, pay off their debts to banks more easily and banks’ liquid funds increase. A market share group principal component was also highly significant in all models and shows the importance to include different methods of estimation of a market share.

The comparison of predictive power of different variable specifications models gives us an expected result: the model with principal components (PC) that includes financial variables, their cross terms and macro variables gives us the highest level of forecasts in any sample of data. As the result of this step of the research, the predicted values for the best model specification for both PD default model (2.2) and credit ratings model (1.2) were computed.

5. Construction of synergic models

5.1. Calibration of rating scale and probability of default

In order to compare the forecasting power of PD and credit ratings models, they should be presented in the same scale. There are various papers that study calibration of ratings and defaults (Karminsky et al., 2015; Pomasanov and Vlasov, 2008) and some rating agencies publish their ratings scales correspondence to PD officially (Moody’s, 2011). In this research, the calibration scale of Standard & Poor’s national rating provided by Pomasanov and Vlasov (2008) was taken as a basis scale of calibration.

This scale clearly shows the non-linear pattern of PD and rating grade. In order to correspond the scale provided in Table 3 to the base rating’s scale of this research (see Appendix Table), one exponential and two polynomial transformations were applied. The results of extrapolation of PD to all numeric rating grades of the base scale are provided in Fig. 2.

Calibration of the rating scale of S&P and probability of default.

Base rating scale S&P Rating Scale PD, %
9 ruAAA 0.3626
11 ruAA+ 0.4885
12 ruAA 0.6579
13 ruAA– 0.8855
13.5 ruA+ 1.1909
14 ruA 1.5999
14.5 ruA– 2.1464
15 ruBBB+ 2.8741
15.25 ruBBB 3.8388
15.5 ruBBB– 5.1103
15.75 ruBB+ 6.7732
16 ruBB 8.9263
16.5 ruBB– 5.1103
17 ruB+ 15.1375
17.5 ruB 19.3964
18 ruB– 24.5074
18.5 ruCCC+ 30.4565
18.75 ruCCC 37.1391
19 ruCCC– 44.3529
19.5 ruCC 51.813
20 ruC 59.1931
21 ruD 66.1806
Fig. 2.

Calibration of probability default (%) and the base rating scale.

Source: Authors’ calculations.

Fig. 2 shows that the rating scale corresponds to PD non-linearly. The correspondence of PD to the highest ratings from ruAAA to ruA+ was estimated by the exponential function and, for the middle ratings (from ruA+ to ruCCC+), a convex polynomial quadratic function was used. That proves that PD increases with accelerating pace for these ratings’ grades. For the bottom ratings (from ruCCC+ to ruD), a concave polynomial quadratic function proved to be the most appropriate approximation and shows the decelerating rate of change in PD to rating scale.

5.2. Comparison of distributions of forecast errors of PD and ratings model

Calibrated models of PD and credit ratings now can be compared by their in-sample predictive power. For the precise visualization, PD model’s forecasts were converted into the base rating scale and the difference between the actual rating grade and the rating grade predicted by PD model was calculated. Fig. 3 shows the distributions of forecasted errors of rating model (1.2) and PD model (2.2) calibrated in the same scale of rating grades. Moreover, the percentages of precise predictions and deviations of less than one rating grade were calculated for each model.

Fig. 3.

Distribution of deviations of ratings model and PD model forecasts (%).

Source: Authors’ calculations.

From Fig. 3, we can see the disproportional distributions of forecasted errors for both models. The share of correct forecasts (Δ = 0) in the rating model was 18%, while the default model appeared to be even more skewed than ratings model and forecasted correctly only in 6% of cases. In addition, the percentage of forecasts with a deviation of not more than one rating class from the actual rating (| Δ | < 1) for the rating model and PD model was 34.8% and 12.3% respectively. The rating model had the property of forecasting a grade lower than actual rating, while the PD model, on the contrary, underestimated the financial problem of banks and predicted a grade higher than actual rating. Fig. 3 shows that, in the PD model, the positive prediction error dominates the negative one, which means that the actual numeric ratings exceed their forecasts in this model. However, the decreasing numerical values assigned to ratings relative to their symbolic grades means this tendency implies the reverse: the ratings forecasted by the PD model are overstated. This happened because the defaults model initially had imbalanced data and even formation of a representative sample did not solve the problem of over-education of the PD model towards the non-defaulted banks. The distribution of ratings’ model forecasted errors is much more symmetric with respect to zero, however, it also shows the skewed pattern that was discussed before.

To sum up the comparison, we should conclude that the first hypothesis of this paper was not rejected after empirical modelling. Indeed, ratings models tend to overestimate the financial instability of a bank, whereas PD models underestimate it.

5.3. Construction of synergic models

In order to construct a reliable synergic model, the ratings’ grade forecasts by PD and rating model should be computed for the same observations. Note that the rating model was estimated for 11627 observations, while PD model has 3489 estimates. Each observation has its own ID and time correspondence, so we find all id_time estimates that are present in both of these data sets. The overlapping of these datasets included 3011 estimates as the PD model had some artificially generated defaults. Then the regressions in which the dependent variable was the actual rating and explanatory variables were the fitted values of rating and PD models, were run on the 3011 observations.

The first synergic model was obtained as a linear combination of PD default and rating model:

Yit=α+β1PDit+β2Ratit+uit (9)

where Yit is the actual rating, PDit and Ratit are the predicted ratings by PD model and ratings model respectively. The regressions output is provided below in Table 4 and the obtained distribution of the forecasted errors of this synergic model is illustrated on Fig. 4.

Estimated coefficients for the linear synergic model.

α 8.265***
β 1 0.182***
β 2 0.344***
Pseudo R2 0.184
Fig. 4.

Distribution of forecast errors for the linear synergic model (%).

Source: Authors’ calculations.

The linear synergic model shows much higher predictive power than PD or rating model on its own. It can predict 32% of precise rating grades and up to 58% of ratings with an error less than one grade. However, it still contains heavy tails. In order to solve this problem, we use the logarithmic model specification of synergic model:

Yit=α+β1Log20(RatitPDit)+uit (10)

The regressions output is provided below in Table 5 and the obtained distribution of the forecasted errors of this synergic model is illustrated in Fig. 5.

Estimated coefficients for the logarithmic synergic model.

α –7.268***
β 1 10.981***
Pseudo R2 0.21
Fig. 5.

Distribution of forecast errors for the logarithmic synergic model (%).

Source: Authors’ calculations.

The synergic model that was obtained by the logarithm of difference of rating and PD forecasts was found have the highest predicted power with the smallest deviations. Note that this distribution has very small tails and so such of the model does not have any prediction errors higher than three rating grades. Therefore, this optimal combination can bring us to the most consistent estimates with forecasting power of more than 44% of precise ratings and 83% of deviation less than one rating grade.

5.3.1. Out-of-sample check of the synergic model

The second part of this section is devoted to the analysis of the out-of-sample predictive power of the logarithmic synergic model. In order to accomplish this task, the data were limited to the observations from 2007 to 2015. Based on the new coefficients of the PD and ratings models, the forecast for the year 2016 was made. In order to calculate the predicted ratings, the predicted probabilities of each rating grade were calculated as the difference between the values of the standard normal distribution (F) at two points, that were calculated using the estimated boundary values (cutj) and the product of vectors of estimated coefficients (β) and the values of explanatory variables for the year 2016 (x'k) according to the formula:

Pr(outcomek=j)=F(cutjx'kβ)F(cutj1x'kβ) (11)

The rating grade with the highest predicted probability was selected as the rating model’s forecast. Concerning the PD model, the probability forecasts were estimated and then calibrated to the rating scale. Then the predicted rating grades of the PD model and the rating model were taken with the functional form and estimated coefficients on 2007–2015 data for synergic models. The following coefficients and forecast error’s distributions were obtained under the out-of-sample fit check of the logarithmic synergic model (Table 6).

Estimated coefficients for the logarithmic synergic model (out-of-sample).

α –8.375***
β1 10.729***
Pseudo R2 0.19

Then the actual financial data for the year 2016 was used as explanatory variables in both models and two separate forecasts of PD model (calibrated into ratings) and rating model were obtained. The following forecasts were placed into two different specifications of synergic models and the final synergic forecasts for the year 2016 were obtained. These forecasts were compared with the actual one assigned to a bank in the year 2016 and the distributions of forecast errors, illustrated on Fig. 6 were constructed.

Fig. 6.

Distribution of forecast errors for the logarithmic synergic model (out-of-sample, %).

Source: Authors’ calculations.

The results show a slight expected deterioration in the predictive power of the synergic models under the out-of-sample fit check. Nevertheless, the logarithmic model can accurately predict the grade of the expected rating with a probability of 31.4%. In addition, the analysis of the out-of-sample power of the model shows that, in 72.3% of the cases, the prediction error of the expected rating of a bank will not exceed one rating grade. Based on this analysis, we can conclude that the logarithmic synergic model can have a practical use for predicting the credit risks. Therefore, the aim of this research was achieved by constructing a synergic model with higher forecasting power by using the set of alternative models: the model of probability of default and the model of credit ratings for Russian banks (hypothesis 2 was not rejected). Moreover, it should be noted that the horizontal scale in all forecast errors distributions shows the deviation of actual rating from the forecast based on the 30-grades rating scale. Such scale is much more detailed than the usual 22-grade scale of any international rating agency. That means that all results in this research are even more precise if we transform our forecasts to the 22-grade scale.

6. Conclusion

The paper is aimed at comparing divergence of existing models of credit risks and at creating a synergic reliable model. For this purpose, credit ratings and PD models were applied to the same dataset and their estimates were normalized to the common scale. After thorough analysis of probability density functions of that output, the optimal weights and monotonic transformations were assigned to each model. As a result, the logarithmic synergic model with higher forecasting power (that predicted 44% of precise estimates out of a 30-grade scale) was obtained.

It was found that there is a significant divergence in the predictions of credit ratings and PD models: credit ratings models tend to overestimate the probability of financial disease of a bank, whereas PD models give underestimated results, so the first hypothesis was not rejected. Indeed, the distribution of ratings forecast errors has a negative mode, while PD models forecasts have a positive mode. Therefore, both models have forecasting bias that decreases the number of correctly predicted forecasts.

The second hypothesis was not rejected either. The usage of the set of alternative models (ratings and PD) has improved banks’ credit risks forecasting power. The logarithmic synergic model has shown the in-sample precise estimates of 44% and 83% having less than one grade deviation. That is even higher than the biased modes of separate distributions of PD and rating model (33% and 36%). Moreover, it has shown the out-of-sample predictive power of 31% of precise estimates and more that 70% of forecasts with less than one rating grade deviation in a 30-grades rating scale.

The novelty of the paper is the process of derivation of a single scale rating and PD econometric models on a new comprehensive database. Moreover, optimal weights and monotonic transformation of ratings and PD models were derived for a logarithmic synergic model that increases forecasting power of banks’ credit risks. In further research, we are going to apply such techniques of derivation for a synergic model to all other credit risk measurements. Moreover, more sophisticated methods of balanced dataset formation (He and Garcia, 2009) should be also tested. Furthermore, some other statistical or artificial intelligence methodologies can be used in order to forecast credit risks.


  • Esarey, J., & Pierce, A. (2012). Assessing fit quality and testing for misspecification in binary-dependent variable models. Political Analysis, 20(4), 480–500.
  • Florez-Lopez, R., & Ramon-Jeronimo, J. M. (2014). Modelling credit risk with scarce default data: On the suitability of cooperative bootstrapped strategies for small low-default portfolios. The Journal of the Operational Research Society, 65(3), 416–434.
  • Garcia, V., Sanchez, J., & Mollineda, R. (2012). On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge-Based Systems, 25(1), 13–21.
  • Godlewski, C. J. (2007). Are ratings consistent with default probabilities? Empirical evidence on banks in emerging market economies. Emerging Markets Finance & Trade, 43(4), 5–23.
  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417–441.
  • Karminsky, A. M., & Peresetsky, A. A. (2007). Models of ratings of international rating agencies.. Applied Economics, 1, 3–19. (in Russian)
  • Karminsky, A. M., Solodkov, V. M., & Sosurko, V. V. (2011). The unified rating mapping: A step from the myth to reality.. Bankovskoe Delo, 6, 58–63. (in Russian)
  • Karminsky, A. M., Morgunov, A. V., & Bogdanov, P. M. (2015). The assessment of default probability for the project finance transactions.. Journal of the New Economic Association, 2, 99–122. (in Russian)
  • Karminsky, A. M., & Kostrov, A. (2017). The back side of banking in Russia: Forecasting bank failures with negative capital. International Journal of Computational Economics and Econometrics, 7(1/2), 170–209.
  • Lanine, G., & Vennet, R. (2006). Failure prediction in the Russian bank sector with logit and trait recognition models. Expert Systems with Applications, 30(3), 463–478.
  • Pomasanov, M., & Vlasov, A. (2008). Calibration of national rating systems.. Rynok Cennikh Bumag, 12, 74–79. (in Russian)
  • Rösch, D., & Scheule, H. (2014). Forecasting probabilities of default and loss rates given default in the presence of selection. The Journal of the Operational Research Society, 65(3), 393–407.
  • Solovjova, I. (2016). New approaches to regulating the activities of rating agencies: A comparative analysis. Procedia: Social and Behavioral Sciences, 229, 115–125.
  • Vernikov, A., & Bokov, V. (2008). Quality of governance and bank valuation in Russia: An empirical study.. Journal of Corporate Finance, 3(7), 5–16.
  • Zan, H., Chen, H., Hsu, C.-J., Chen, W.-H., & Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: A market comparative study. Decision Support Systems, 37(4), 543–558.

Appendix. Ratings’ comparison scale

The base rating scale for national and international ratings assigned by different rating agencies (S&P, Fitch, Moody's, RAEX, Rus-Rating, AK&M, NRA, Ria Rating) in different currencies (RUB and USD).

Base rating scale S&P Fitch Moody’s RAEX Rus–Rating AK&M NRA Ria
I / N N I / N N I / N N N I / N N N N N
2 AA+ AA+ AA+ AA+ Aa1 Aa1
3 AA AA AA AA Aa2 Aa2
4 AA– AA– AA– AA– Aa3 Aa3
5 A+ A+ A+ A+ A1 A1
6 A A A A A2 A2
7 A– A– A– A– A3 A3
8 BBB+ BBB+ BBB+ BBB+ Baa1 Baa1 A+
8.5 AAA
9 BBB BBB ruAAA BBB BBB Aaa Baa2 Baa2 A
9.5 AAA
10 BBB– BBB– BBB– BBB– AA+ Baa3 Baa3 A++ A–
10.5 Aa1 AA+
11 BB+ BB+ ruAA+ BB+ BB+ AA Ba1 Ba1 BBB+
11.5 AA
12 BB BB ruAA BB BB AA– Aa2 Ba2 Ba2 BBB AAA
12.5 A+ AA–
13 BB– BB– ruAA– BB– BB– A Aa3 Ba3 Ba3 BBB– A+
13.5 ruA+ A– A1 A+ A AA+
14 B+ B+ ruA B+ B+ BBB+ A2 B1 B1 BB+ A– A+ AA
14.5 ruA– BBB A3 BBB+ AA
15 B B ruBBB+ B B BBB– B2 B2 BB AA–
15.25 ruBBB BB+ Baa1 BBB A AA–
15.5 ruBBB– BB A BB– A+ A+
15.75 ruBB+ BB– Baa2 BBB–
16 B– B– ruBB B– B– B+ Baa3 B3 B3 A A
16.25 B++
16.5 ruBB– B Ba1 B+ BB+ A– A–
16.75 Ba2
17 CCC+ CCC+ ruB+ CCC CCC B– Ba3 Caa1 Caa1 B++ B BB BBB+
17.25 BBB
17.5 ruB B1 B+ BBB– BBB+
17.75 B2 BB+
18 CCC CCC ruB– CC CC B3 Caa2 Caa2 B+ B– BB BBB
18.25 B BB–
18.5 Caa1 BB+
19 CCC– CCC– ruCCC– C C Caa2 Caa3 Caa3 B CCC+ B C++ C
19.5 Caa3 CCC B–
19.75 C+
20 Ca Ca Ca C++ C CC
21 D D ruD D D D C C C E D C C