This is a rather philosophical matter. Historically, models were calibrated to long-run growth facts and then cross-validated by looking at the implies short- to medium-run implications for the business cycle, which is in a sense a different dataset.
When estimating a model, the parameters are chosen by looking at the same dataset for which you try to match second moments. You could argue that this is not a rigid "out of sample" test.
People nevertheless do this, because when estimating, you try to minimize the forecast error. Thus, it is not guaranteed that selected second moments are well-matched. Looking whether the model matches them is is a sensible test (not meant to denote a statistical test).
What you could do, is perform a test of the overidentifying restriction, see e.g. This test will be a lot stricter than the eyeball econometrics performed on second moments. See also If you do Bayesian estimation you should not be testing at all. Rather, you do model comparison and only reject your current model if you found a better one (the idea being that a poor model is still better than no model at all)