Is performance reduced when executing loops whose uop count is not a multiple of processor width?

Is performance reduced when executing loops whose uop count is not a multiple of processor width?.
I’m wondering how loops of various sizes perform on recent x86 processors, as a function of number of uops.

Here’s a quote from Peter Cordes who raised the issue of non-multiple-of-4 counts in ano…

Read in full here:

This thread was posted by one of our members via one of our news source trackers.