How to get stft_mag from a image? #2

littleTwelve · 2018-04-04T07:03:08Z

From your code , I don't know how do you convert the 512x512 image to a stft_mag whose size is different from 512x512 image?

Michalaq · 2018-04-04T08:57:25Z

I suspect you are asking how do I convert 512x512 image to stft_mag, since stft_mag should be 513x512 image (I cut the 0-frequency of the picture for training and ignore the negative frequencies): you should look at SoundSaver postprocessor:

pggan-pytorch/output_postprocess.py

Line 107 in dab2ec7

def image_to_sound(self, image):

Basically, OutputGenerator plugin plugs the output of the Generator network to some Postprocessors, which basically perform some transformations on the output and save the results to disc.

For simplicity, the ImageSaver postprocessor does not pad the spectrograms with the missing values, since it wouldn't carry any more information.

If you are curious about the creation of the spectrogram dataset, you should take a look here:

pggan-pytorch/dataset.py

Line 285 in dab2ec7

def load_file(self, item):

littleTwelve · 2018-04-04T14:12:08Z

Thanks!

littleTwelve · 2018-04-04T14:46:40Z

I think I have a misunderstanding of your code. I've thought you train the network with .jpg images before. It seems that in this function 'load_file' you do not save the spectrogram as a .jpg image but just use the spectrogram matrix as input to train a network. Am I right? @Michalaq

Michalaq · 2018-04-04T15:02:02Z

Yes! I directly compute the spectrograms from the .wav files and deliver them in tensors as the input to the network. If you are looking for a way to train directly on pictures (may as well be spectrograms of course), you can look into DefaultImageFolderDataset, which does exactly that - takes a directory of image files and delivers tensors:

pggan-pytorch/dataset.py

Line 209 in dab2ec7

class DefaultImageFolderDataset(FolderDataset):

SoundImageDataset takes as input directory of .wav files and delivers them as Images (in tensors of course) - hence the name. And since most of the logic except for the load_file is the same as for image processing, it subclasses the DefaultImageFolderDataset.

littleTwelve · 2018-04-05T02:06:35Z

Thanks! In your experiment, does using spectrograms and using pictures produce different results? I mean the quality of music.

Michalaq · 2018-05-18T22:33:22Z

I didn't perform an experiment loading spectrograms saved as pictures as data with this implementation. If the pictures are saved carefully to always after loading be equivalent to the spectrograms scaled to [0,255] (which I do anyway to use SoundImageDataset's superclass methods for other processing), I don't see a reason using pictures wouldn't work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get stft_mag from a image? #2

How to get stft_mag from a image? #2

littleTwelve commented Apr 4, 2018

Michalaq commented Apr 4, 2018

littleTwelve commented Apr 4, 2018

littleTwelve commented Apr 4, 2018

Michalaq commented Apr 4, 2018

littleTwelve commented Apr 5, 2018 •

edited

Loading

Michalaq commented May 18, 2018

How to get stft_mag from a image? #2

How to get stft_mag from a image? #2

Comments

littleTwelve commented Apr 4, 2018

Michalaq commented Apr 4, 2018

littleTwelve commented Apr 4, 2018

littleTwelve commented Apr 4, 2018

Michalaq commented Apr 4, 2018

littleTwelve commented Apr 5, 2018 • edited Loading

Michalaq commented May 18, 2018

littleTwelve commented Apr 5, 2018 •

edited

Loading