Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get stft_mag from a image? #2

Open
littleTwelve opened this issue Apr 4, 2018 · 6 comments
Open

How to get stft_mag from a image? #2

littleTwelve opened this issue Apr 4, 2018 · 6 comments

Comments

@littleTwelve
Copy link

From your code , I don't know how do you convert the 512x512 image to a stft_mag whose size is different from 512x512 image?

@Michalaq
Copy link
Member

Michalaq commented Apr 4, 2018

I suspect you are asking how do I convert 512x512 image to stft_mag, since stft_mag should be 513x512 image (I cut the 0-frequency of the picture for training and ignore the negative frequencies): you should look at SoundSaver postprocessor:

def image_to_sound(self, image):

Basically, OutputGenerator plugin plugs the output of the Generator network to some Postprocessors, which basically perform some transformations on the output and save the results to disc.

For simplicity, the ImageSaver postprocessor does not pad the spectrograms with the missing values, since it wouldn't carry any more information.

If you are curious about the creation of the spectrogram dataset, you should take a look here:

def load_file(self, item):

@littleTwelve
Copy link
Author

Thanks!

@littleTwelve
Copy link
Author

I think I have a misunderstanding of your code. I've thought you train the network with .jpg images before. It seems that in this function 'load_file' you do not save the spectrogram as a .jpg image but just use the spectrogram matrix as input to train a network. Am I right? @Michalaq

@Michalaq
Copy link
Member

Michalaq commented Apr 4, 2018

Yes! I directly compute the spectrograms from the .wav files and deliver them in tensors as the input to the network. If you are looking for a way to train directly on pictures (may as well be spectrograms of course), you can look into DefaultImageFolderDataset, which does exactly that - takes a directory of image files and delivers tensors:

class DefaultImageFolderDataset(FolderDataset):

SoundImageDataset takes as input directory of .wav files and delivers them as Images (in tensors of course) - hence the name. And since most of the logic except for the load_file is the same as for image processing, it subclasses the DefaultImageFolderDataset.

@littleTwelve
Copy link
Author

littleTwelve commented Apr 5, 2018

Thanks! In your experiment, does using spectrograms and using pictures produce different results? I mean the quality of music.

@Michalaq
Copy link
Member

I didn't perform an experiment loading spectrograms saved as pictures as data with this implementation. If the pictures are saved carefully to always after loading be equivalent to the spectrograms scaled to [0,255] (which I do anyway to use SoundImageDataset's superclass methods for other processing), I don't see a reason using pictures wouldn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants