I don't think there is any definite answer to your question as the time to complete the transaction will mainly depend on the ability of the memory system to accept the write data transfers, and how long it then takes for the memory system to return a response to complete the transaction.
The minimum number of clock cycles would be 1 for the AW transfer, "n" for the number of data transfer in the N-beat transaction, and 1 for the B channel response, so in the case of your example of a burst of length 2 transfers this would be a minimum of 4 cycles. This assumes that the master can supply new write data in consecutive cycles, that the slave can accept data in consecutive cycles, and that the slave returns the write response immediately on the completion of the final write data transfer.
The "burst size" or data transfer width is largely irrelevant in terms of the number of cycles, unless that width is wider than the destination slave, in which case you will have added wait states while the data width is "downsized" either in the interconnect logic or in the slave itself.
In theory it is possible for the AW transfer to complete at the same time as the first W channel data transfer, but this is unlikely as you would normally need this first cycle to decode the AW transfer information to select the correct slave.