You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed some unexpected behavior while testing dendrograms on random 1D spectra, to learn about how the algorithm worked. The file given is just an array of Gaussian noise with stdev 1, with a Gaussian function added with mean 50, amplitude 6, and stdev 2. I made a large number of varying spectra and collected examples of failed detections; some of them were too broken up by noise, which makes sense, but the rest had this sort of profile:
Loading that file into an array and running Dendrogram.compute(spectrum, min_value=3, min_delta=1, min_npix=3), I get an empty dendrogram, because only one structure is detected at all, and it fails the independence test for min_delta. Lines 84-91 in pruning.py:
def result(structure, index=None, value=None):
if value is None:
if structure.parent is not None:
return (structure.height - structure.parent.height) >= delta
return (structure.vmax - structure.vmin) >= delta
return (structure.vmax - value) >= delta
return result
Here, structure.parent is None because the collection of points above min_value=3, at x=48 through x=51, forms the only structure, so line 89 compares the highest (5.25) and lowest (5.03) value in the structure and returns False.
I might have misunderstood the algorithm example in the documentation, but why this line compares vmax and vmin isn't very clear to me. My understanding was that the purpose of min_delta was to set a cutoff for distinguishing a substructure from noise. Why would a large-enough structure far above the min_value threshold not ought to be counted if it happens to be alone and to have a plateauing profile? This is just the sort of shape that can appear upon sampling a noisy Gaussian; I expect it might occur in the 2D case as well as 1D. Is this behavior intentional, and if so, what's the reasoning behind it?
The text was updated successfully, but these errors were encountered:
I think you're right, this sort of object should be included, and I think you've pinpointed a line that incorrectly excludes it. This warrants a deeper review. I'd like to hear from others whether there's any good explanation for this.
I also suggest we add this example to the test suite as a regression test.
I noticed some unexpected behavior while testing dendrograms on random 1D spectra, to learn about how the algorithm worked. The file given is just an array of Gaussian noise with stdev 1, with a Gaussian function added with mean 50, amplitude 6, and stdev 2. I made a large number of varying spectra and collected examples of failed detections; some of them were too broken up by noise, which makes sense, but the rest had this sort of profile:
bad_spectrum_file.txt
Loading that file into an array and running
Dendrogram.compute(spectrum, min_value=3, min_delta=1, min_npix=3)
, I get an empty dendrogram, because only one structure is detected at all, and it fails the independence test for min_delta. Lines 84-91 in pruning.py:Here, structure.parent is None because the collection of points above min_value=3, at x=48 through x=51, forms the only structure, so line 89 compares the highest (5.25) and lowest (5.03) value in the structure and returns False.
I might have misunderstood the algorithm example in the documentation, but why this line compares vmax and vmin isn't very clear to me. My understanding was that the purpose of min_delta was to set a cutoff for distinguishing a substructure from noise. Why would a large-enough structure far above the min_value threshold not ought to be counted if it happens to be alone and to have a plateauing profile? This is just the sort of shape that can appear upon sampling a noisy Gaussian; I expect it might occur in the 2D case as well as 1D. Is this behavior intentional, and if so, what's the reasoning behind it?
The text was updated successfully, but these errors were encountered: