Hi, I think it even applies to the stochastic trend case. If you difference, you implicitly assume that your data either has a deterministic trend or a unit root (potentially with drift) that you want to take out by first difference filtering. The nice thing in principle is that you don't have to worry at this step, which of the two is present. However, the mean of the differenced series corresponds to either the drift term in the random walk process or the increase in the deterministic trend - both of which you have to take care of when writing down your model.
I may be wrong, but I don't see how using levels of the data would change anything. If your data trend was deterministic but you matched it to a detrended model based on a random walk with drift you will have a problem irregardless of using differenced or level data. The same holds true if your data follows a random with drift, but you base your model on a deterministic trend.
I don't think that by differencing "you're exactly specifying the kind of trend that you think is present in the data" here, other than "it has a unit root".
Maybe this is just a matter of semantics, but by using a first difference filter, you implicitly assume that this filter takes out the trend in the data, with the remainder being the business cycle fluctuation around this stochastic trend you are trying to explain with your model. Hence, you specify your trend present in the data is this unit root. In contrast, by using an HP-filter you specify a different kind of trend.
To phrase my point differently: Given the trend structure embedded in your model and the corresponding data, you are able to transform the model estimated on differenced data into a model estimated on level data just by "inverting" the trend filtering performed (using for example the information embedded in the model that the driving process follows a random walk with drift).