Ondřejov Spectra Dataset

Ondřejov dataset contains 12936 labeled stellar spectra from Ondřejov CCD700 archive. The spectra were observed with Ondřejov Perek 2m Telescope.

Code used for generation of this dataset is in podondra/ondrejov-dataset GitHub repository.

Context

The dataset was created to support the discovery of emission-line spectra in the Large Sky Area Multi-Object Fibre Spectroscopic Telespcope (LAMOST) survey. The main idea was to use Ondřejov dataset to train a machine learning algorithm and (in combination with domain adaption) find interesting objects in the large spectral archive.

id: a unique identifier (FITS file name)
label: assigned class
object: title of observation
ra: right ascension
dec: declination
expval: exposure value in photon counts [Mcounts]
gratang: diffraction grating angle
detector: name of the detector
chipid: name of CCD chip
specfilt: spectral filter
date-obs: UTC date start of the observation
dichmir: dichroic mirror number
fluxes: 140 columns of fluxes sampled uniformly between 6519 and 6732 Angstroms

Classes

Spectra are divided into 3 classes according to profile of the H-alpha spectral line:

absorption: 6102 spectra (47.17%)
emission: 5301 spectra (40.98%)
double-peak: 1533 spectra (11.85%)

Preprocessing

In this section all the preprocessing methods applied to each spectrum are described, because it is not possible to provide original FITS files which contains raw spectra.

Air to Vacuum Wavelength Conversion

Spectra from Ondřejov CCD700 archive are in air wavelength, but LAMOST spectra use vacuum wavelength. Therefore, a conversion of Ondřejov spectra was made according to formulas provided on Vienna Atomic Line Database Wiki.

Gaussian Blur

LAMOST spectrograph spectral resolving power is between 500-1800 which is much smaller than spectral resolving power 13000 in H-alpha of Ondřejov spectrograph. To overcome this difference spectra from the dataset were blurred with Gaussian filter with a standard deviation of value 7.

Resampling

Machine learning algorithms require their inputs to be a set of features. To have the same features for all spectra, they need to be resampled to get measurements in the same wavelength across all spectra. Then it is easy to create a design matrix where each row is a spectrum and columns contain fluxes in specified wavelengths between 6519 and 6732 Angstroms.

Contact

Ondřej Podsztavek [email protected]
Petr Škoda [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt
statistics.ipynb		statistics.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ondřejov Spectra Dataset

Context

Contents

Classes

Preprocessing

Air to Vacuum Wavelength Conversion

Gaussian Blur

Resampling

Contact

About

Releases

Packages

Languages

License

podondra/ondrejov-dataset

Folders and files

Latest commit

History

Repository files navigation

Ondřejov Spectra Dataset

Context

Contents

Classes

Preprocessing

Air to Vacuum Wavelength Conversion

Gaussian Blur

Resampling

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages