The Cortex-M4F has separate hardware for integer and floating-point arithmetic. Both integer and floating-point divide instructions take up to 12 clock cycles to complete. I've verified that integer instructions immediately following a VDIV are able to execute simultaneously while the VDIV is finishing. However, the reverse does not seem to be true - i.e., floating-point instructions immediately following an integer divide (SDIV or UDIV) must wait for the divide to complete before the floating-point instructions proceed. Does anyone know why the you can't overlap the execution of an integer divide with the execution of floating-point instructions?