Math that Makes Sense: Regression Analysis for Facility Management

Predictive analytics are becoming increasingly important in the realm of facility management. Being able to effectively forecast outcomes – whether related to equipment performance or construction scheduling – is a key element of the way NIKA’s Enterprise Technologies experts help organizations better manage facility projects across the globe.

For example, given data on completed construction projects, you can estimate the length of time a new project will actually take, using the cost and original duration that the construction company gives you.  If you have data on energy consumption in a building and the outside temperature, you can estimate how much electricity or natural gas the building will consume on a given day. A great tool for making these kinds of predictions is regression analysis.

Understanding Regression Analysis

With any data set, there are three main groups of analysis that can be performed: reporting on what the data is, correlation analysis (or how some of the data relates to other parts of the data), and predictive analysis (or using the data to predict what will happen in the future).

Regression analysis falls into the latter category. It takes data from observations and finds the function that best fits them.  A regression model predicts a variable by combining independent variables.

Statistical software packages calculate the coefficients of the variables and the statistical values of R-squared and p-value.  The R-squared¹ can be described as the fit of the model, and a higher value is better (a perfect model has an R-squared of 1).  The p-value² is the statistical significance of the model or variable, and a lower value is better.

What is Needed for Regression Analysis?

First and foremost, you need a data set with plenty of observations.  You can use a smaller data set provided the data reflect a group with more commonalities.  For example, if you are looking at a data set containing information on chukwar partridges, it could be smaller than one which contains multiple kinds of birds.

Next, you should decide how to deal with outliers³.  If you have a large data set, the effect of outliers is minimal.  If you don’t have a large data set, then you might want to consider removing the outliers so they don’t exert too much influence on the model.

It is always best to have a complete data set with many observations, all of which have no missing data.  This is the ideal, but we can also work with data sets are not ideal.  If there is missing data, it is necessary to consider whether the missing data is randomly distributed or if there is a pattern.  Depending on the number of records, either incomplete records can be removed or use estimates/imputed data to fill the gaps of data that is missing at random.

If you are familiar with the data or have beliefs that data is connected (whether from experience or literature), then you can jump right into the regression analysis.  If you are not, then you will want to see if there are relationships between the data.  You can do this by running correlation analysis or taking a look at scatterplots.  This may inform you that you should transform the data to turn it into a linear relationship between the data.

What Does Regression Analysis Tell Me?

Once you run your regression analysis, you have a model for your dependent variable (what you are trying to predict).  It can be for energy usage, construction costs, or political outcomes.  Take a look at the R-squared and p-values for the coefficients and model to make sure that they are sufficient.  If they are, then it’s great!  You can plug in values to see what your dependent variable would be in that case.  If they are not, then you might want to check your premise or the data you have.

Even if you have a model that isn’t great at predicting your dependent variable, it can still be useful in discussing the effect of the independent variables on it.  If a variable has a positive coefficient in the model, then an increase in that variable will increase the dependent variable; a negative coefficient means the opposite.

Regression Analysis in Action

Here at NIKA, we have used regression analysis in many different ways. Case in point: We were hired by an organization whose construction contractor was assuring them that their project was running on schedule. Our client had a gut feeling that that wasn’t the case. NIKA’s Enterprise Technologies team performed regression analysis on similar completed projects, and produced a model that projected the construction project would be completed two years after the anticipated completion date, confirming the client’s suspicion. Based upon the objective analysis, our client was able to prepare a more realistic scenario for the project’s completion and wait to assign the initial outfitting money that would have been spent on an unfinished project.

For more information on how regression analysis and predictive analytics can be put to work for your facilities, contact NIKA’s Enterprise Technologies team.

¹ More formally, the R-squared is the amount of variance explained by the model.  Valid R-squared values vary according to discipline, some hold that anything over 0.5 is good, while there have       been published economics articles with R-squared values in the neighborhood of 0.4.

² Likewise more formally, the p-value is the probability that the null hypothesis (that is, the coefficient of the variable is 0) holds.  The generally accepted level of statistical significance is to have a     p-value of 0.05 or less.

³ In the statistical sense, not the Malcolm Gladwell book.