# “But when we apply statistical models, do we need to care about whether a model can retrieve the relationship between variables?”

Tongxi Hu writes:

Could you please answer a question about the application of statistical models. Let’s take regression models as an example.

In the real world, we use statistical models to find out relationships between different variables because we do not know the true relationship. For example, the crop yield, temperature, and precipitation. But when we apply statistical models, do we need to care about whether a model can retrieve the relationship between variables?

Examples:
Suppose the true relationship between crop yield (Y), temperature (T), and precipitation (P) is:
Y = T+ sin(T/6) + P + exp{- (P-160)/4}
Suppose we also simulated some observations of Y, T, and P. Then, we use a linear regression model to fit these simulated observations. I am sure we can fit them and fit them well using a certain statistical model. Let’s say the fitted model is:
Y = a*T+b*T^2 + c*P + d*P^2 + e).
Apparently, the fitted model can’t retrieve the real relationships between Y, T, and P. Can we really use the fitted model to do some inference?

Many researchers using statistical models to predict crop yield in future relying on statistical models fitted using historical observations. Some of their work is published on top-level journals such as Science, Nature. I am doubting their conclusions. My argument is if we are unable to make sure a model is capable of retrieving the true relationships, inference from these models can be misleading.