You need to convert the song first with the Multi-Genre Model
Then you convert the instrumental with the stacked model, and it removes additional left-over vocals, mostly in the mid-high range.
The Multi-Genre model and other main models like it are the primary remover models.
The stacking style models are the second, third, fourth pass etc.
Bear in mind that each additional pass with the stacking model will bear less results. The first pass will always yield mostly vocals, and additional passes will yield more instrumentation, so the trick is figuring out the sweet spot. How many passes get the audio as good as it can before it starts to degrade. Also, some 'parts' of a track may do better with more passes than other 'parts'. So really, if you want to be a perfectionist... you need to look over every additional vocal track that the stacking model provides and decide what to reinsert into the instrumental, like you would with the original Multi-Genre Model conversion.
All this being said, if you all are converting with the Multi-Genre first and then the Stacking Model and getting poor results, ... then there is a bug. :)