-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get stft_mag from a image? #2
Comments
I suspect you are asking how do I convert 512x512 image to stft_mag, since stft_mag should be 513x512 image (I cut the 0-frequency of the picture for training and ignore the negative frequencies): you should look at SoundSaver postprocessor: pggan-pytorch/output_postprocess.py Line 107 in dab2ec7
Basically, OutputGenerator plugin plugs the output of the Generator network to some Postprocessors, which basically perform some transformations on the output and save the results to disc. For simplicity, the ImageSaver postprocessor does not pad the spectrograms with the missing values, since it wouldn't carry any more information. If you are curious about the creation of the spectrogram dataset, you should take a look here: Line 285 in dab2ec7
|
Thanks! |
I think I have a misunderstanding of your code. I've thought you train the network with .jpg images before. It seems that in this function 'load_file' you do not save the spectrogram as a .jpg image but just use the spectrogram matrix as input to train a network. Am I right? @Michalaq |
Yes! I directly compute the spectrograms from the .wav files and deliver them in tensors as the input to the network. If you are looking for a way to train directly on pictures (may as well be spectrograms of course), you can look into DefaultImageFolderDataset, which does exactly that - takes a directory of image files and delivers tensors: Line 209 in dab2ec7
SoundImageDataset takes as input directory of .wav files and delivers them as Images (in tensors of course) - hence the name. And since most of the logic except for the load_file is the same as for image processing, it subclasses the DefaultImageFolderDataset. |
Thanks! In your experiment, does using spectrograms and using pictures produce different results? I mean the quality of music. |
I didn't perform an experiment loading spectrograms saved as pictures as data with this implementation. If the pictures are saved carefully to always after loading be equivalent to the spectrograms scaled to [0,255] (which I do anyway to use SoundImageDataset's superclass methods for other processing), I don't see a reason using pictures wouldn't work. |
From your code , I don't know how do you convert the 512x512 image to a stft_mag whose size is different from 512x512 image?
The text was updated successfully, but these errors were encountered: