Corresponding author: Tatiana Ratnikova ( taratnikova@yandex.ru ) © 2016 Non-profit partnership “Voprosy Ekonomiki”.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits to copy and distribute the article for non-commercial purposes, provided that the article is not altered or modified and the original author and source are credited.
Citation:
Murashov Y, Ratnikova T (2016) Under-reported income of Russian households. Russian Journal of Economics 2(1): 56-85. https://doi.org/10.1016/j.ruje.2016.04.004
|
In the proposed paper, an attempt is made to estimate the proportion of unstated income for Russian households based on micro data. An overview of microeconomic approaches to estimating the scale of under-reported income is provided. These approaches are weakly represented in the national literature, so their strengths and weaknesses are also analyzed. A theoretical model of household consumer behavior is described that allows the size of under-reported income to be estimated. The structure of household incomes and expenditures is studied based on an RLMS sample for 2012. The model is estimated using household subsamples based on the type of household and household income. The estimation technique utilizes regression variables and random effects. The resulting subsample estimates were applied to the general population and compared with those obtained by other researchers using alternative methods and other data. A comparison is made to estimates of under-reported income developed for British households.
under-reported household income, household consumer behavior, under-reported income parameter, regression analyses, instruments, random effects
Under-reported income for households is one side of the shadow economy, whose definition includes those economic activities and the income derived from them that circumvent or otherwise avoid government regulation, taxation or observation (
Typically, the size of the shadow economy is assessed in proportion to the gross domestic product. Conventional wisdom says that in developed countries it does not exceed 10% to 12% of GDP; in developing countries, it can reach 40% to 45%; and in transition economies, it accounts, for 22% to 25% of GDP (
In modern Russia—where, according to survey by the World Bank, 2012, half of the GDP is produced in the informal sector—the prevailing shadow economy is a particularly acute problem. In addition to adversely affecting the sociopolitical and economic life of the country, this phenomenon also undermines its international reputation.
The majority of Russian and foreign research papers on the shadow economy address the macroeconomic level, revealing that part of the economy—measured as a share of the GDP—that is hidden from taxes. The methods for evaluating the size of the shadow economy are based on certain macroeconomic indicators linked to the amount of goods and services produced within the country. Macroeconomists have a wide array of officially published statistical data at their disposal, and they utilize an extensive set of theoretical methods. However, the aggregate nature of macroeconomic data, while revealing the big picture, does not take into account the heterogeneity of contributions made by different social groups to the shadow economy, thus impeding the development of targeted measures that could pull household activities out from the shadows.
The alternative approach studies microeconomic data obtained from household budget surveys. It determines the size of the shadow economy based on the estimated share of income hidden by households. Unlike macroeconomics, the microeconomic approach provides access to the individual characteristics of surveyed subjects and analyzes the particular aspects of their consumer behavior. The advantage of this approach is that it is not linked to the direct accounting of tax revenues paid by households, thereby potentially enabling a more realistic estimate of the size of the shadow economy. Sample surveys to assess the shadow economy are used in many countries. The inconvenience of these methods is associated with the standard defects in all surveys. The results largely depend on the willingness of respondents to cooperate; however, the majority of respondents are reluctant to acknowledge fraudulent behavior, and thus, their responses are of questionable reliability. (On the other hand, the respondents’ cooperation is not for issues related to their sources of income and spending on major purchases). Another disadvantage is that surveys cannot cover all clandestine activities and all income groups (respondents from high-income segments of the population who have the largest proportion of concealed revenue are not available for interviews). For these reasons, this approach always produces an underestimation.
As research for the OECD countries (
Estimated size of the shadow economy in Russia (% of GDP).
The estimates obtained from an analysis of microdata about concealed household income are scarce in the literature, as there are no sufficiently developed systematic approaches to adequately replicate these estimates for the entire population.
However, to make effective decisions on regulatory measures, it is important not only to know the estimated size of concealed income across the country as a whole but also to know in more detail which population groups are more inclined to use those practices. The micro-analysis enables a more objective approach to mediumand low-income segments of the population, the data on which reflect representative national household budget surveys.
The relevance of surveys of concealed household income in lower income groups may be associated with an analysis of the changes in prices for goods consumed by certain groups of households, which may serve as a basis for estimating the effects of price policies. For example,
The study aims to contribute to the modeling of concealed income for low-and medium-income population groups within Russia. It provides an estimate of the share of concealed income for various Russian socio-demographic groups covered by RLMS statistics, using an approach that takes the consumer behavior of households into account. This analysis will be based on the consumption model of certain vital goods as a function of unobserved income, the indirect information about which can be obtained from declared income. The methodology for obtaining estimates of under-reported income parameters will be derived from a number of hypotheses about the distinctive features of consumer behavior for self-employed and hired workers. Using econometric analysis, we will obtain interval-based estimates of the under-reported income parameters for selected groups of households and, based on these estimates, calculate intervals for the proportion of concealed income for each group. To compare these estimates with those of macro approaches, we will extrapolate the results by using the information provided by Rosstat on the distribution of income of the general population.
Serious scientific studies on the shadow economy only started to appear during the second half of the 20th century. G. Becker, P. Gutmann, E. Sutherland, H. de Soto, E. Feige, and K. Hart, etc., can be considered the founders of this kind of analysis. They created the conceptual apparatus, investigated the nature of the shadow economy, and developed a methodology for its analysis (Burov and Samarukha, 2013). At the same time, in the USSR, G. Grossman, A. Kaliberda, T. Koryagina, D. Kaufmann and D. Schneider conducted studies to prove the existence of the shadow economy as part of the plan-based socialist economic system. A significant contribution to the developmental issues of the hidden economy was made by D. Makarov, A. Ponomarenko, B. Ispravnikov, and A. Oleinik, who explored the phenomenon of the shadow economy in Russia at the end of the 20th century and identified its various sectors (shadow, informal, concealed, illegal, black, etc.). The works by L. Kosals studied the origins of the shadow structures and institutions of the shadow economy. V. Tambovtsev, V. Radaev, S. Malakhov, and A. Oleinik analyzed the impact of transaction costs in the Russian economy on the proliferation of shadow practices. Radaev studied the prevalence of violence in the shadow economy.
The objective of many studies was to identify the causes behind the emergence of the shadow economy.
Among the social causes of the shadow economy, one of the main recognized causes is the gap between the standard of living of the majority of the population and that of the middle class. Inequality in wealth and social status determine the number of potential participants in the shadow economy. The shadow economy is largely represented by the poor and marginalizes strata: youth, the unemployed, migrant workers, etc. In terms of raw numbers, they make up the greatest portion of participants in the shadow economy (
The imperfection of laws is the legal reason for the shadow sector of the economy to persist. The legislative framework fails to react quickly to the rapidly changing environment of a market economy, with gaps occurring in the legal field that create favorable conditions for the shadow economy (
The political reasons researchers refer to include the merger of power and major capital, which produces an oligarchy. By lobbying their oligarchic interests, large businesses place small companies in poor conditions and “push” them into the shadow.
Finally, there is the desire to receive more with less effort. This kind of rationale, in the absence or weakness of restrictions, encourages people to engage in shady activities (
The focus of a number of papers was to find answers to the question, what are the dangers of the shadow economy? There are multiple negative effects on society from the shadow economy (
However, there is an alternative view that argues that some kinds of shadow economies (especially the informal) actually support the development of the formal economy rather than hinder it. According to
The answers to most questions of interest to researchers are closely related to measuring the scale of the shadow economy.
The effectiveness of such measurements is greatly limited by the very nature of the phenomenon, which is not covered by official statistics. The quantitative estimation methods for shadow economy parameters rely on data from reports by relevant ministries, agencies and organizations, and on sample surveys of enterprises and households. Depending on the type of information sources used, the approaches are divided into two groups: indirect (macroeconomic) and direct (microeconomic). Direct methods tend to underestimate the shadow economy; indirect methods usually overestimate it.
The majority of analytical tools are macroeconomic, among which are a number of monetary approaches associated with cash demand or associated with alternative calculations of the GDP (Barsukova, 2003). There are also popular methods for estimating the size of the shadow economy based on electricity consumption and the divergence between the official and real labor market (Italian method). The comparability of results is the main advantage of macroeconomic methods (
The number of microeconomic tools for analyzing the shadow economy is much smaller. Two primary areas of microeconomic estimates are being developed to determine the size of the shadow economy (Barsukova, 2003): the first explores the discrepancy between overall household income and expenditures, while the second examines the divergence between the consumption of certain specific goods by hired and self-employed households.
The first group of models analyze the divergence between income and expenditures, and the key conclusion is that the core of the shadow economy is self-employed households. This finding becomes the main hypothesis for the models in the second group, which compare the ratios of income and expenditures for certain categories of goods and services between self-employed and hired workers.
The second hypothesis of these models is that employees correctly report their wages, while the self-employed hide their income. The third premise of the models is that households correctly declare expenditures on current consumer goods.
The last two hypotheses can be considered a bottleneck of the model that compares the expenses of self-employed households and others.
Each of the aforementioned microeconomic areas uses household survey data. Both areas have similar defects. A selection shift (no samples from high-income households) prevents the full extent of the shadow economy from being determined based on sample data. There is the problem of comparing multi-temporal cash flows. Shadow incomes that do not appear in consumption are not captured (Barsukova, 2003), while the lack of reliable information on the total distribution of households by income and expenditures creates problems for replicating estimates across the general population and prevents the comparison of results between macro and micro approaches. However, there is independent value in modeling concealed income based on household budget survey data, because such models can be built separately for different segments of the population and can analyze the determinants for escaping into the shadow in greater detail for different socio-demographic and socio-professional groups.
The first area was the basis for the paper by
One of the key works mentioned in many empirical studies on microdata is
The basic idea of the predecessors’ model is to split demand for current consumer goods and durable goods. Considering the different categories of demand, the authors note that the portion of income a household spends on various categories depends not only on income but also on its source. As statistics show, self-employed households spend more on luxury goods while spending too little on food and other necessities, given the size of a household's total income.
This method estimates the size of the shadow economy (hereinafter, the size of concealed household income) without studying the residual variations in household incomes. This is its advantage because there is no need for special calculations of the upper and lower limits of the shadow economy parameter. The disadvantage of this method is its lower statistical capacity compared to Pissarides and Weber's approach.
These methods analyze concealed income drawing on the economic interests of individuals. However, a study of the incentives contingent upon social interactions among people is also noteworthy. The authors of many empirical studies suggest that an individual decides on his/her employment in the informal sector of the economy based on the interests of other people.
Another empirical study by
Accounting for the social interaction of households and the allocation of expenses into purchases of goods in the formal and informal sectors are areas for future research, and there is currently no ability yet to implement them based on Russian statistical data. However, based on the RLMS database (Russia Longitudinal Monitoring Survey, see
As noted above, the theoretical model in this study is based on the model developed by
The theoretical model presented by
The function of consumer spending i by households on a group of goods j reads as follows:(1)
where Cij is the consumption of a group of goods j; Zi is the household i's characteristics vector; Yip is the “permanent income”, i.e., the portion of income affecting consumer decisions; and βj is the marginal propensity to consume the group of products j.
In turn, permanent income is related to actual (real) income Yi through the following ratio:(2)
where pi is the income variation due to unforeseen circumstances and is a random variable for households.
The authors suggest that the differences between permanent and actual income are attributable to external economic factors (e.g., adverse economic conditions across the country or in a separate branch) whose vector of influence is not dependent on the type of household by income source. Therefore, the average value of pi is not dependent on the type of household. However, the variance of this parameter is related to the type of household: V (pi | i – ee) < V (pi | i – se).
For self-employed households (i – se), the dispersion parameter pi is greater than for hired households (i – ee) their income variance is higher (income for the self-employed is less stable). Hereinafter, the ee index refers to households whose main source of income is wages, and the se index refers to households whose primary source of income is derived from self-employment.
In accordance with the premises of the model, Yi is not observable, because the income of self-employed households is stated incorrectly. If ki is taken as the ratio indicating by how many times the real household income Yi exceeds the stated income net of taxes Yid, then the relationship between real and stated income will read as follows:(3)
According to the theory, employees state their income correctly; therefore, for households whose members are hired, ki=1. For the self-employed, the ki parameter represents a random variable, and ki> 1.
Thus, the unobserved permanent income, which is associated with the consumption function, can be expressed through current income and the model parameters as follows:(4)
where ln pi and ln ki are two additional random regressors.
To verify the statistical hypothesis that households hide part of their income, we need assumptions about the distribution of the ln pi and ln ki parameters responsible for the discrepancy between observed income and permanent income. The authors suggest considering this distribution to be lognormal (because, according to the theory, such is the distribution of household income). In this case, the model parameters can be represented as the sum of their averages and deviations from the averages:(5)
Then, the consumption function will read as follows:(6)
The dependent variable is a certain category of household expenditures.
The expected value of pi is related to μp as follows:(7)
If E (pi) is not dependent on the type of household employment, E (ln pi) does not depend on it either; therefore, it is possible to compare the mean logarithmic values for the hired the and the self-employed:(8)
If equation
Rather than estimating the model based on two separate subsamples of hired employees and the self-employed, we can introduce a dummy variable to account for the type of household employment, and use a linear regression in the following form:(9)
where if the household is self-employed, and 0 otherwise. This model can be estimated with ordinary OLS, adjusted for the heteroscedasticity of the error ξij.
Estimating this equation, we can calculate the assessment of the under-reported income parameter ki (indicating how many times the real income of a household exceeds the income stated).
If we assume that the marginal propensity to consume the jth category of goods βj coincides for self-employed and hired workers, the γj coefficient has the following meaning:(10)
The above analysis and expression (11)
We can estimate the difference in the error variances of income between self-employed and hired workers based on an income decomposition, but this would require building auxiliary regressions of observed income for each of the two household categories.
Observable income can be represented as follows:(12)
where Zi is the vector of household characteristics; Xi is the set of instruments; ζi=ui – vi for self-employed and ζi=ui for hired employees.
Then, . Because
A point estimate of the expected value of the under-reported income parameter cannot be found because the value of cov (u, v)se is unknown, but it is possible to calculate an interval estimate for this parameter.
If we accept the assumption that cov (u, v)se=0, then the lower limit of the interval is determined subject to the condition that , and the upper limit is obtained when making the assumption (self-employed income is at least as volatile as hired income).
In this case, the interval limits are expressed through residual income variances (13)
Based on the calculated limits of the under-reported income parameter, we came to a conclusion about the real income of households participating in the shadow economy (self-employed households according to the model).
Researchers are not only interested in the overall analytic result across the entire available sample of households but also in a more detailed study of the portion of concealed income across various socio-demographic groups.
Russia's shadow economy, according to
The subjects of the first type of relationship are large businesses that gained access to power, allocation, distribution, and assignment of material and financial resources. This segment of the population is not represented in the RLMS household budget survey data and, therefore, is not touched upon in this study.
The second type of relationship involves representatives of various socio-professional business groups whose goal is to obtain funds for business development and personal business income due to the paid satisfaction of the mass needs of the population (illegal sector and concealed component of activities). This segment of the population is poorly represented in the RLMS data.
The third type of relationship involves representatives of various socio-professional groups whose purpose is to make a living through paid satisfaction of mass needs of the population (informal sector). It is this segment of the population that is best represented in the RLMS data, and it is to this segment that our research hypotheses refer.
The shadow activities of individuals and households in the informal sector of the economy, according to
Depending on the level of household shadow activities, we will differentiate between reported income, expenditures for certain goods, and the size of underreported income.
We will assume that the level of household shadow activities is largely determined by its level of monetary income and the opportunities that exist for households living in various types of settlements.
Based on monetary income, households can be divided into three groups:
Hypothesis 1. The share of under-reported concealed income is higher in the lower and upper income groups than in the middle income group.
In support of this hypothesis, we can draw on the following arguments. In the lower income group, people may be unable to earn reportable income because of a lack of jobs in the legal sector as well as the low level of skill or its inconsistency with the structure of available jobs; therefore, almost all of the income will be concealed because of the need to ensure physical survival. The middle group includes households for whom a significant portion of income is derived from wages; these households are not as strongly motivated to conceal their earnings as members of the lower group. The upper income group contains representatives of the affluent segments of society who are not concerned with physical survival and for whom hired employment does not seem attractive. However, when promising business opportunities arise, the proceeds from them may be concealed to a great extent because of the desire not to attract attention from the criminal element of the shadow economy and the regulatory agencies, as well as the desire to optimize tax deductions.
Households can be grouped by settlement type based on two factors: size or administrative status. The hypotheses underlying the choice of these target groups are as follows.
Hypothesis 2. In small settlements and cities with million-plus populations, the share of under-reported income is higher than in medium-sized towns and villages.
This hypothesis seems valid because, in the former case, the number of formal jobs is very limited and people are forced to seek shadow income to support themselves, while in the latter case, on the contrary, there are a very large number of attractive opportunities for shadow earnings, especially in metropolitan areas.
Hypothesis 3. In rural areas and regional centers, the proportion of hidden income is higher than in urban settlements and secondary cities.
In a literature review on the shadow economy, Smith (1986) concluded that the informal economy is mostly run by the self-employed who own small family businesses. Such enterprises are widely represented in rural areas, e.g., in agriculture, forestry, fisheries, construction, distribution, and repairs. In regional centers, where schools, medical facilities, shopping malls and markets are concentrated, there are additional opportunities such as tutoring, private medical services, private trucking, and various forms of freelancing.
This section describes sampling restrictions, classifies households, and examines in detail the characteristic features that differentiate self-employed households from others. Studying these features is necessary to determine the consistency of data and key hypotheses for the model and is necessary for the choice of control variables and variables for the instrumentation of income in the consumption equation.
The paper uses data from the 21st RLMS wave (2012), which represents a non-governmental longitudinal household survey. The RLMS covers a wide range of issues and produces an extensive base of socio-economic variables that can describe the structure of income and expenditure, the structure of food consumption, the level of material well-being of the population, education levels, investment, occupations, migration, health, etc. The wave includes individual and households data. The sample represents the current (2012) situation in households of the Russian Federation (more precisely, the income groups of Russian households available for the RLMS).
The following charts demonstrate the relevance of the RLMS sampling distribution by income in 2012 for the income distribution of the total Russian population in 2012 (according to the official Rosstat data).
Bar charts for the distribution of the Russian population by monthly income per capita, 2012 (RUB thousand).
Source: RLMS data for 2012, and the household budget survey data from Rosstat, adjusted for balance of income and expenditure indicators for the population during the same period.
A comparison of distributions shows that a direct replication of the findings from this study on the entire population would be impossible. We can only speak of modeling under-reported income for lowand medium-income segments of the population.
To build the model, it is important to carefully examine the typology of households and household income distribution across the main sources. There are three such sources:
income from self-employment=total income – wage – “other income”
Formally, income from self-employment is part of “unearned income”:
unearned income=total income – wage
As we see from the descriptive statistics (
Descriptive statistics of income decomposition.
An important part of the study is the analysis of variance of different income components. Income from self-employment has the greatest coefficient of variation, at 2.33 (the ratio of the standard deviation to the mean). On the other hand, the coefficient of variation for “other components” of income is the lowest (0.71) because the most important element within them is the least-variable part of income, i.e., pensions, scholarships, etc. The variation in wages is greater than for income from self-employment, but the relative variation, or the ratio of the standard deviation to the mean value, is higher for income from self-employment (2.33 compared with 0.89). This empirical fact means we do not have to reject the hypothesis that income from self-employment is more volatile, allowing us to carry out the decomposition by
A separate task is the classification of households (division into self-employed and others) which can be solved by studying the proportion of income from self-employment out of total household income.
Distribution of the proportion of household income from self-employment within the sample (among households with positive income from self-employment; N
A noticeable spike in the distribution around a proportion of 1 indicates the presence of a large group of households for which income from self-employment is their primary income.
Descriptive statistics to decompose income for households with positive income from self-employment, grouped by the proportion of income from self-employment out of total income (RUB).
The model's last important assumption required to estimate the shadow economy factor is the lognormality of household income. This assumption has been verified based on empirical data, using the Kolmogorov-Smirnov criterion. The hypothesis is not rejected at a significance level of 1% for total income or for income from self-employment (the estimated probability of a Type I error is 14.7% for total income and 1.4% for income from self-employment).
The choice of control variables (household characteristics) is an important part of the technique for estimating the proportion of under-reported household income that will be implemented in this study. Appendix
In this study, income from self-employment means all income from entrepreneurial activity, regardless of whether the employer is self-employed (working for himself) or an individual entrepreneur who hires employees. (This paper uses a slightly different principle for categorizing individuals as self-employed than that used in
People qualified as skilled labor include individuals within ISCO groups 1–5: lawyers or government employees, expert professionals of the highest category, skilled workers in agriculture, industry workers, or other skilled workers. Unskilled workers include respondents within ISCO groups 6–9: medium-skilled specialists, clerks, services and trade employees, and unskilled workers. The basic education group includes those with a completed or incomplete secondary education.
To describe the resulting patterns, we will, for the sake of brevity, regard the self-employed as individuals who earn more than 20% of their income from self-employment.
We can conclude from Appendix
The self-employed want to get another job more often than hired employees (39.57% vs. 29.71%), but they do not state that they have a job at the moment (which perhaps means they are concealing the shadow portion of their income, or have no other paid sources of income). Financial opportunities and opportunities to improve living conditions, to take a vacation, or to pay for a child's education are no different for the self-employed from the average level of the full sample, although these opportunities exceed the capacity of those who have a small portion of income from self-employment.
We can conclude that a number of selected characteristics of household members should be used in a regression analysis because they correct some structural differences between households of the self-employed and of hired employees and may affect a household's intention to hide their income.
Appendix
Based on descriptive statistics analysis, we can conclude that the self-employed live more often in rural areas (33.2% vs. 21.2% for the sub-sample of individuals with zero income from self-employment) than in towns (18.7% vs. 27.4% for the sub-sample of individuals with zero income from self-employment). In the regional centers, their number is roughly the same.
Regarding housing, the self-employed own a greater amount of total area and residential area, with a large number of rooms, than those having no income from self-employment or having a small income from it. They are less frequently equipped with central water supply, central plumbing, hot water supply, sewer, gas and telephone. They prefer satellite antennae to cable television. Less often than other types of households, they have cottages, lawnmowers, a foreign-made car with a GPS navigator, a washing machine and a microwave, but they more often own trucks, motorcycles and tractors, and they sell their crops more often (5.7% vs 2.2%, and 1.1% for the sub-sample of individuals with no income from self-employment). Desktop computers are less common with the self-employed, while laptops are more common. Low-speed internet is more affordable for the self-employed than broadband.
The statistics show that among the self-employed with a high proportion of income from self-employment, there must be a substantial proportion of farmers (farms).
The final step of the preliminary data study is to decompose costs, on the basis of which we will build the variables for the equation model.
Households report information on the number of purchased products, their prices, and overall expenses over the past seven days. Monthly household expenditures on food represent the sum of expenditures for all products, normalized to a 30-day period.
According to the theory, household expenses can be divided into expenditures for the purchase of durable goods and current consumer goods. Expenditures for current consumer goods include food, clothing, and services. Spending on durable goods includes expenditures for purchasing household appliances.
Descriptive statistics for household expenditures, including the consumption of household-produced goods (RUB).
Household consumption includes goods produced for a household's own consumption, and it is higher for self-employed households. This is more closely associated with the significant number of self-employed in farming households, for which this type of consumption is most characteristic, rather than concealed income. Cash expenditures on food for the self-employed households and the others do not vary greatly.
Our approach involves estimating an equation of expeditures for an individual good that depends on identifying household indicators, including income and employment type (self-employed and others), to obtain an unbiased estimate of the under-reported income parameter.
Food costs are often regarded in the literature as the most correctly presented in the statistics. The model uses a log-linear form for the dependency of expenses on income:(14)
where h is the number of households; is food expenditures; Yh is income; Zh identifies the household's indicators; is a dummy variable that assumes the value of 1 for the self-employed and 0 otherwise; and is random error, independently and identically (normally) distributed with zero mean.
During the evaluation of the model, the variable (which indicates the type of household) was found to be significant at the 1% level, but negatively (
Comparison of estimates of expenditure models for certain categories of goods.
However, it is possible to use the other household expenditure categories that are reported, perhaps more appropriately, but expenditures must depend on overall household income rather than income structure (a condition necessary to correctly assess the expenditure equation). The first reason is why expenditures for durable goods are unsuitable. The second reason is why we cannot use expenditures that include the cost of transportation and are linked to the place of work. Clothing expenditures meet both conditions, so they can be used as an indicator of a household's well-being.
Thus, it is advisable to turn to the assessment of the equation where the dependent variable is the logarithm of spending on clothes for adults and children over the past 90 days (the scale of the variable is not important as long as the logarithm is used).(15)
The spending on clothing model produces a significant coefficient showing an excess of expenditures by the self-employed compared with other groups in the sample.
This model will be used as the basis for calculating the proportion of household income and researching the hidden dependencies of this value on the household's socio-economic and demographic characteristics.
It should be noted that the income value in equation
Another methodological feature of this research is the use of random effects in the expenditure model on a constant and a coefficient of γ:(16)
The i index means either the number of the income group to which a household belongs or the number of the group that corresponds to the size of the locality, or it means the number of the group corresponding to the administrative status of the locality where the household resides. Random effects for groups α0i ∼ iid (0, σ2) and γCli ∼ iid (0, ) are assumed to be uncorrelated among themselves and between groups. The use of this set of instruments enables us to estimate the model across the entire sample, and it provides a number of observations sufficient to obtain adequate estimates. This approach also improves the possibility of cross-group expenditure heterogeneity and the heterogeneity of the distribution of self-employed households between the groups, which ensures the consistency of estimates.
Estimates of β and γ.
These results suggest that differences in the consumption of clothing by the self-employed and other households do exist and vary considerably across income and settlement groups. This allows us to assess the proportion of concealed income within the selected subgroups.
The next step to estimate the proportion of concealed income requires an evaluation of residual income dispersions for the self-employed and other households on auxiliary regressions of the income logarithm against the household characteristics vector Zi and a set of tools Xi and the subsequent calculation of lower and upper limits for the under-reported income parameter k (according to expressions
The estimate of the under-reported income share is based on interval estimates of k as follows:(17)
where k is the value of the under-reported income parameter (the higher the value, the greater the need to adjust the household income to meet the “true” level); α is the share of self-employed households in the sample relative to the total number of households (the higher the percentage, the more households escape to the shadow economy); Ise is the average income for the self-employed (the more earned by the self-employed, the greater the amount of adjusted income); Iother is the average income of other households (not self-employed). (Here, as in the past, the self-employed include those and only those households that obtain more than 20% of their total income from self-employment.)
From the above expression, it follows that, in the event of equal incomes between the self-employed and other households, the share of the concealed income will be proportional to the share of the self-employed within the sample. The higher the income of the self-employed compared to other household income, the higher the estimate.
Interval estimate of concealed income share.
On average, the proportion of income concealed is 5% of common household income. Pissarides and Weber's estimates for British households in 1989 were 5.5%, but the essential difference is that their sample was more related to the general population, and their analysis demonstrated that, on average, the share of the shadow economy accounts for 5.5% of GDP in the UK. We cannot extrapolate the sample estimates to the general population in this study, but we can try to extrapolate, subject to the interval distributions of per capita income based on Rosstat data and the RLMS sample for 2012, as described in
Extrapolation of under-reported income share to the distribution of the general population by the income in the interval estimates based on per capita income and the distributions of per capita income (RUB thousand).
Source: RLMS data for 2012, and the household budget survey data from Rosstat, adjusted for balance of income and expenditure indicators for the population during the same period.
Interval estimates of the proportion of concealed income based on (a) the administrative status of the locality where the household resides and (b) the size of the settlement (in millions).
A study of the schedules shows that Hypothesis 3 (in rural and regional centers, the share of concealed income is higher than in urban-type and secondary cities) does not contradict the data of the study (
If we look at the situation from a different angle, dividing the households by settlement size, it all looks slightly different (
Interval estimate of the proportion of concealed income per household income group relative to the median household income in the RLMS sample (med
The results support Hypothesis 1 (in the lower and upper income groups, the proportion of concealed income is higher than in the middle). However, it turns out that the poorest households hide nearly 1.5 times more income than in the sub-sample of the most affluent households, whereas the percentage of self-employed households is the same for all three subsamples. The number of observations in the subsamples is sufficiently large (> 600), so it is unlikely that there are any technical reasons to consider the estimates inadequate. Rather, we are dealing here with essentially economic mechanisms. A lack of jobs and the need to compensate for the costs of inflation, which are much higher for poor households than for the wealthy (
In this study, an attempt was made to estimate the proportion of income concealed by Russian households based on RLMS data for 2012. As a theoretical framework for the study, we used a model proposed in
Because revenues are measured with errors and because external shocks simultaneously affect income and expenses, income constitutes an endogenous regressor that requires instrumentation. Strong and exogenous instruments for income in the clothing expenditure equation are indicators of the presence of expensive durable goods and foreign-made cars in a given household. The resulting set of instruments is radically different from the set used in
For economical and adequate accounting of the heterogeneity of expenditures in various income and settlement groups, we used a multilevel (hierarchical) simulation technique not previously used in similar studies. (We introduced random group effects into the model on the constant and a coefficient with a variable indicating whether the household is self-employed).
In their fundamental study,
The estimated concealed incomes based on the RLMS sample, and extrapolated to the general population, were comparable to estimates by other experts and researchers. Averaging the results obtained by extrapolation produces estimates of the lower and upper limits of the share of concealed income at 16% and 23% respectively. These results fall within the interval of estimates of the scale of the shadow economy in Russia conducted in 2011 by Rosstat based on SNA (15%), and the findings of individual experts (35–40%) for the same period; however, they are half as high as the World Bank's estimates for 2011 (50%). The differences may be explained by the fact that this study does not take into account the contribution of the criminal element to the shadow economy.
This paper has developed an approach to assessing the share of household income excluded from statistical observations, and it shows that, drawing on the results, we can get an idea of the size of the shadow economy in Russia, both in general and with respect to certain social groups. The estimate associated with certain groups can be especially significant, because when analyzing the effects of pricing and tax policies, it could prove useful to understanding which types of households are more likely to escape into the shadow. Such assessments can clarify the extent to which escaping to the shadow will mitigate the excesses of economic policies for different population groups and will thus serve as a “social shock absorber”. It can also illuminate the extent to which the shadow economy may also do harm to the pension system.
This approach may be more broadly used, as household income indicators are applied to estimate the well-being of different social strata within the population and the degree of stratification among them. Also, it is important to understand how accounting for income measurement errors can change the view of the situation and affect the mechanisms for providing targeted social support.
Descriptive statistics for the socio-demographic characteristics of household members (%).
List of statistically significant explanatory and instrumental variables for regression models of expenditures (statistically significant at the 1% level of significance).
Descriptive statistics of household characteristics in general (%).
Interval estimate of the proportion of concealed income, depending on the size of the locality.
Interval estimate of the proportion of concealed income, depending on the administrative status of the locality.
Interval estimate of the proportion of concealed income depending on the income group.
Sample estimates of the limits of the proportion of concealed income and their extrapolation to the general population.