NEON floating-point multiply instruction
VMUL.F32 Q0, Q1, D4[0]
This is a floating-point NEON vector multiply by scalar instruction. It is a multi-cycle instruction that has source operand requirements in both the first and second cycles. In the first cycle, Source1, in this case Q1Lo or D2, is required in N2. Source2, in this case D4, is required in N1. In the second cycle, Source1, in this case Q1Hi or D3, is required in N2. The result of the multiply, stored in Q0 for this case, is available in N5 for the next instruction that requires this register as a source operand. The low half of the result, Q0Lo or D0, is calculated in the first cycle. The high half of the result, Q0Hi or D1, is calculated in the second cycle. Assuming no data hazards, the instruction takes a minimum of two cycles to execute as indicated by the value in the Cycles column.