1. There is no such thing as the optimal sample size (except for the trivial "as much data as you have from that joint probability distribution"). If your Bayesian, you even do without any data. It is then just called your prior.
2. As detailed in Pfeifer(2013): "A Guide to Specifying Observation Equations for the Estimation of DSGE Models"
https://sites.google.com/site/pfeiferecon/Pfeifer_2013_Observation_Equations.pdf the user is responsible for specifying the mapping from model to data variables. That also captures first differences.