# «Daniel Domb An honors thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Science Undergraduate College Leonard ...»

Although we can conclude that together, housing starts and new home sales give a good picture of the real estate market, I still must test how important housing starts are in describing fixed mortgage rates. Running a simple regression between the difference in fixed rate and 10-year Treasury against monthly housing starts yields a statistically significant regression with an f-statistic of 18.13 and an R-sq of 10.1%. Although there is not much descriptive value in this independent variable, I believe that in combination with the new home sales, this variable tells an interesting story about the state of the housing market. The residuals for this regression are not as tight as the residuals for new home sales, but a similar pattern has emerged.

regression with 2 independent variables and the same dependent variable. I ran a regression with housing starts and the number of new homes sold as the independent variables. I wanted to run this regression before continuing because I want to display how these two seemingly similar variables, combined can shed more light on the spread between the fixed rate and the 10-year Treasury. This regression yields an R-sq of 22.4%, and each variable are still statistically significant at a 95% confidence interval.

Both of these two variables will be reexamined later when entered into the final regression model.

Another issue, I have seemingly overlooked thus far is the potential removal of any outliers seen in the residuals, or graphs of data. Rather than overlooking this issue, I have decided to withhold from removing any outliers at this time for two reasons. First, I intend to run all of my simple regressions without taking out an outliner in hopes of finding a single variable that will describe those, what seem to be, months that are not explainable. Secondly, before I will remove an outlier I must have a valid reason for removing the outlier, and at this point there are periods of time that appear to act outside the scope of the model, but at this time I am not comfortable identifying those months as outliers from the rest of the data.

The state of the overall economy is a large strain on interest rates, and more specifically, consumers’ choices. I was determined to find a variable, or create a variable that would indicate if we were in a recessionary period or an expansionary period.

month between the peak of one recession to the bottom of the trough, and signifies the rest of the months as a ‘0’. This variable could potentially show the impact of the cyclical nature of interest rates. Not only will it show the cyclical nature of interest rates, but it can also be used as an index of consumer sentiment about the overall state of the United States economy. If consumers are more confident in the economy they may be more willing to take on higher interest rates, and conversely, if we are in a recessionary period, they will want lower rates to subsidize their riskier investment. However, again we run into the problem of the limited 13 ½ year period of my data set. Over the data set there were two recessionary periods, one in the early 1990’s and again in the early 2000’s. Therefore for the entire data set, there were only two clumps of ‘1’ and the rest of the months had a ‘0’. Overlooking the short time span of my data, I ran a regression with this dummy variable as the independent variable and the spread between the fixed rate and the 10-year Treasury as the dependent variable. The simple regression is statistically significant with an f-statistic of 5.58. However, the R-sq is very low at 3.3%.

I believe that this variable will shed some light on the final regression, but the effect will be minimal.

The next variable I examined, I believe is the most important single variable in my model. The ‘refinance index’ is an index created by the Mortgage Bankers of America Association, and is an index of the refinance activity across the United States for a given month. When an individual chooses to refinance they believe that there are financing tools available that are more suitable to their needs than their current mortgage.

use for other purchases. However, there are many educated borrowers out there that chose to refinance their home because they want to lock in a lower interest rate than their current mortgage.

The refinance index is collected in weekly terms, and I compiled the weekly data to take a monthly average so that I would have the ability to compare the index to monthly fixed mortgage rates. To examine the data, I plotted a line graph of the refinance index in Exhibit 21. Looking at the line, one sees that the early nineties did not see many people refinancing and the index was extremely low, either just above or below

100. But from January 2002 till the end of the data set, refinancing hit an all-time high practically touching 10,000 in May of 2003. Looking more closely at the graph, the first two years of the data set are the exact opposite of the last two years of the data set, with a lot of volatility in the center. I did not doubt that the refinance index would be statistically significant, but to test my presumption I ran a simple regression with the refinance index as the independent variable and the spread between the fixed rate and the 10-year Treasury as the dependent variable. Once again the simple linear regression is statistically significant with an f-statistic of 43.33 and an R-sq of 21.1%. This R-sq is the highest R-sq of any simple regression I have run in trying to explain the spread between the fixed rate and the 10-year treasury. However, looking at the residuals of this regression in Exhibit 22, the residuals are over a larger range than some of the other residual plots. However, in the middle years, the residuals are very close to the 0 residual line, bringing the total sum of all squared residuals to the lowest number hence the

other individual independent variable, and therefore, must be a part of my final regression model.

The last variable I examined is the total number dollar amount of originations.

Unfortunately mortgage origination is a difficult variable to find information on. From 1990-1997 mortgage origination information was found on the Housing and Urban Development’s website, and for the remainder of the data period the information was on the Mortgage Bankers Association website. However, I was only able to collect quarterly information rather than monthly. To convert the quarterly data to monthly data, I divided each quarter by three to get a monthly number. This manipulation of the data will smooth the data more than desirable, but mortgage origination is an important indicator of the fixed rate mortgage market, thus even though the variable isn’t perfect, it is better than nothing.

Exhibit 23 shows the graph of quarterly mortgage originations broken out on a monthly basis. The graph looks rough, because every three months are the same, so there are four shifts per year. The graph is fairly volatile, and not surprisingly, the most recent couple of years show the biggest volatility in mortgage originations. Comparing Exhibit 21 to this graph we can conclude that a lot of the increase in originations is from the flood of people rushing to refinance their homes in the last couple of years, before rates increase from their current 40 year lows. Performing a simple regression with the mortgage originations as the independent variable, and again using the spread between

relationship between these two variables. Not surprisingly, the volume of mortgage origination is statistically significant to explain the spread between fixed rates and the 10year Treasury. The f-statistic from this regression is 34.00 and the R-sq is 17.3%. The residuals for this simple regression are plotted in Exhibit 24. The residuals from all of the simple regressions are beginning to look very similar, and this plot specifically emphasizes data points on the graph that should be removed as outliers. I would not hesitate to speculate that if I could have used the actual monthly data, the results would be even stronger for the relationship between the volume of mortgage originations and the dependent variable.

**Contemporaneous Model:**

After running about 13 simple regressions and verifying the statistical significance of many of these variables, I am ready to put a preliminary model together.

First, I will run a multi-linear regression with all of the individual independent variables against the spread between the fixed rate and the 10-year Treasury rate. The f-statistic of this multi-linear regression is 27.3 and the combined R-sq is 64.1%. However, since this is a multi-linear regression I must be conscious of the R-sq adjusted, and in this model the R-sq adjusted is 61.7%. Considering there is over a 2% difference between the different measures of R-sq, I ran another regression removing one variable to see if the model improves. The dummy variable I created to measure if the United States is in a recessionary or expansionary period had the lowest statistical significance looking back at the simple regressions. In addition in the multi-linear regression, the “peak to trough”

trough” variable yields an f-statistic of 30.53, the same R-sq as the previous regression, and an R-sq adjusted that increased to 62%. Although the R-sq adjusted has gone up, there is still room for improvement. I proceeded to remove the other two variables with high p-values, “investment to loan ratio” and “new home sales”, from my model. The new regression resulted in an f-statistic of 39.55, an R-sq slightly lower at 64% and an Rsq adjusted at 62.3%. Refer to Exhibit 25 for the full regression model and residuals for this regression. The final seven variables together can explain away 62.3% of the uncertainty of the spread between the fixed rate and the 10-year Treasury.

Although, I have developed a good, statistically significant model, my work is not complete. I have only attempted to describe the spread between the fixed rate and the 10year Treasury, I still must manipulate the information I have already used to come up with a model to describe the current fixed rate alone. For my first multi-linear regression I started with 10 independent variables, however, if I add the 10-year Treasury as another independent variable, bringing the total to 11, I can use these variables to try and describe the fixed rate alone.

Earlier in my analysis I ran a regression between the fixed rate and the 10-year Treasury, yielding an R-sq of 93.2% and an R-sq adjusted of 93.1%. I expected the addition of the 10 new independent variables to increase the statistical significance of my model. My next regression contains the 10 independent variables plus the 10-year Treasury regressed against the effective mortgage rate. This regression produced an f

contemporaneous model with all 11 independent variables does a much better job describing the fixed rate mortgage, than the 10-year Treasury did alone. Checking with Exhibit 26 the R-sq and the R-sq adjusted are so close to each other, that I made the decision not to refine the model further considering I have been successful in describing away 98.2% of the uncertainty of the fixed rate. In addition the residual plots in both Exhibit 25 and 26 are for the first time almost completely random. There are still clumps of residuals in some areas, but the defined pattern that were plaguing my earlier models has managed to be drowned away through the addition of more independent variables.

When I was presenting my information about the simple regressions, I mentioned that there were times in the residual plots that there appeared to be some outliers lurking within my data sets. The original residual plots seem to have several upper peaks around mid 1991, mid 1998, early 2000 and early 2002. There are also several dips in the data that occurr most visibly in mid 1994, and at the very end of the data period in mid 2003.

Originally I was very concerned by these peaks and dips in the residuals. I was convinced that there must be some macro-economic event that is causing these strange results within the residual plots. However, through each residual plot different data points were identified as having unusually high residual value. On the plots they appeared to be the same, but usually they were within a couple of months of each other.

There are two data points that came up continually as having higher than average residual values. The first unusual residual is data point 124, or April 2000. Remembering back to April 2000, the economy was in a very strange place, the United States had just come off

month across the markets, which came as a huge surprised, and it is understandable that my model is having difficulty with that months data. The second time period is the last four data points in the 13 ½ year period. My explanation for this period is that rates were extremely low through the summer of 2003, so low that we were at levels of interest rates not seen in over 40 years. A defense for my model is that it is unable to account for the lowest period in 40 years because it can not relate to economic data and consumers choices when we are some of the highest levels. In addition, home prices have gone up so much that more borrowers are being forced into their financial decisions, and can’t afford to make the same choices they would have made several years earlier when prices were lower. If the data period extended from the early 70’s through 2003, I argue that the residual plot would not identify this time period as an outlier, but instead the model is having difficulty explaining the extreme low in the fixed mortgage rate market.

Considering all of this information I have chosen not to eliminate these data points because after checking the model without these 5 data points, the model is relatively unaffected by the change and the R-sq numbers do not change, so it is not worth the narrowing of the data period.

**Predicting Fixed Mortgage Rates:**