| Original audio | |
| Griffin-Lim (3 iterations) | |
| Griffin-Lim (50 iterations) | |
| Griffin-Lim (150 iterations) | |
| SPSI | |
| SPSI + Griffin-Lim (3 iterations) | |
| SPSI + Griffin-Lim (50 iterations) | |
| MCNN (baseline) | |
| MCNN (2 heads) | |
| MCNN (filter width of 9) | |
| MCNN (losses (1) and (2)) | |
| MCNN (loss (1) only) |