AstroBinUpload.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""

Release history
--------------------------------------------------------------------------------------------
Version 1.0.4
6th December 2023 changes

1.No longer required to manually create configuration csv files. 
    Checks if csv files exist and creates them if they do not exist. 
    Default values can still be edited in csv files but also in the configurations dictionary at the start of the code. 
    Default keywords changed to lower case
    To ensure csv edits don't cause issues the data read from files is: 
        stripped of leading and trailing spaces
        keywords are converted to upper case to match FITS header keywords
        column names are converted to lower case to ensure code can work with them
        corrected data frames are saved back to csv files to format issues are resolved
2. Improved extract header function 
    converts floats to 4 decimal places
    converts dates to format %Y-%m-%d, rounds input to microseconds to ensure conversion works
    creates a subset of the header data that matches AstroBin requirements
3.Sites.csv file latitudes and longitudes saved with 4 decimal places but processed to 2 decimal 
    places to ensure the same site is not recorded multiple times.
4. Corrected issue with Bortle and SQM values not being updated correctly
5. Corrected issue with Keywords from .XISF files not being read correctly 
6. Improved code to correct file data reading and saving logic
7. Runtime option to stop program if new csv files are created to edit them.
8. Corrected program logic related to import, access and storage of external parmeters.
9. Refactored code to improve readability
10. Updated docstrings
11. Works with files generated by both Sequence Generator Pro (SGP) and NINA (.FITS, .FIT, .FTS, .XISF) 
12. Looks for filter in FITS headers and converts them to 5 digit codes used by AstroBin ( use to be four digit codes)
--------------------------------------------------------------------------------------------
Version 1.0.3
27th November 2023 changes

1. Handles pre and post text spaces in data from csv files
2. Process both FITS and XIFS files or a mixture of both
3. Focal ratio now extracted from header and reported.
4. Exports a session summary report

--------------------------------------------------------------------------------------------
Version 1.0.2
24th November 2023 changes

1. Changes to how the code handles missing Keywords from FITS headers.
2. The code use a defaults.csv to enable the user to configure values for missing keywords. 
   These default keywords are then applied to all missing header keywords allowing for a more complete upload of information to AstroBin.
   The changes attempt to make the code agnostic to the types of FITS headers processed.
3. HFR recovered from the defaults.csv file, instead of a command line entry.

--------------------------------------------------------------------------------------------
Version 1.0.1
23rd November 2023 changes

1. Code checks last LIGHT frame to determine if FITS was generated by NINA

--------------------------------------------------------------------------------------------
Version 1.0.0
23rd November 2023

1. Initial release
--------------------------------------------------------------------------------------------

acqusition.csv uploader see (https://welcome.astrobin.com/importing-acquisitions-from-csv/)

This implementation is not endorsed nor related with AstroBin development team.

Copyright (C) 2023 Steve Greaves

This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
more details.

You should have received a copy of the GNU General Public License along with
this program.  If not, see <http://www.gnu.org/licenses/>.
"""

__version__ = "1.0.4"

import pandas as pd
import os
import sys
from astropy.io import fits
import struct
import xml.etree.ElementTree as ET
import requests
import math
import re
from datetime import datetime
import numpy as np

"""
   
    Configuration dictionary used to create the default values csv file. You can edit these values or modify the csv files created to suit your needs.
    
    The 'configurations' dictionary is organized into four key sections:

    1. 'defaults': Contains default values for various parameters related to astronomical imaging.
       - 'key': Lists the parameters in the header files required by AstroBin, do not change these keys.
       - 'value': Provides the default values for each corresponding key, thes can be modified to suit your equipment setup.
       - 'comment': Offers a brief description or comment for each parameter.

    2. 'filters': Maps  your astronomical filters to their respective codes. Ensure the names match those that your image creation package uses and ensure that the Astrobin codes are correct for your filters. 
        The default codes are for an 2 inch Astronomik LRGB and narrowband set.
       - 'filter': Lists the names of the filters (e.g., 'Ha', 'SII', 'OIII', etc.).
       - 'code': Provides the corresponding AstroBin code for each filter.

    3. 'secret': Stores sensitive information for API access. You will have to edit this section to include your own API key. 
       - 'api key': The key required for accessing the relevant API.
       - 'api endpoint': The URL of the API endpoint.

    4. 'sites': Holds information about observation sites. This is updated automatically by the script when a new site is encountered.
       - 'latitude', 'longitude': The geographical coordinates of the site.
       - 'bortle', 'sqm': Bortle scale classification and Sky Quality Meter reading for the site.

    This dictionary is instrumental in initializing, validating, and processing data in the astronomical
    data analysis pipeline. It ensures the dat obtaine dcan be uploaded sucessfully to AstroBin.
    """


configurations = {
    'defaults': {
        'key': ['IMAGETYP','EXPOSURE', 'DATE-LOC', 'XBINNING', 'GAIN', 'XPIXSZ', 'CCD-TEMP', 'FOCALLEN', 'FOCRATIO', 'SITELAT', 'SITELONG', 'FILTER', 'OBJECT', 'FOCTEMP', 'SWCREATE','HFR'],
        'value': ['LIGHT','100', '2023-01-01', '1', '0', '1', '-10', '540', '5.4', '52.25', '-0.12', 'No Filter', 'No target', '20','Unknown package', '1.6'],
        'comment': ['Exposure type','Exposure time in seconds', 'Observation date', 'Camera binning', 'Camera gain', 'Camera pixel size in um', 'Camera sensor temperature in degrees C', 'Telescope focal length in mm', 'Telescope focal ratio', 'Observation site latitude in decimal degrees', 'Observation site longitude in decimal degrees', 'Filter name', 'Target name', 'Ambient temperature in degrees C as measure by the focuser','Creation package', 'Half-flux radius in pixels']
    },
    'filters': {
        'filter': ['Ha', 'SII', 'OIII', 'Red', 'Green', 'Blue', 'Lum', 'CLS'],
        'code': [4663, 4844, 4752, 4649, 4643, 4637, 2906, 4061]
    },
    'secret': {
        'api key': 'xxxxxxxxxx', # enter you API key here
        'api endpoint': 'https://www.lightpollutionmap.info/QueryRaster/'
    },
    'sites': {
        'latitude': '',
        'longitude': '',
        'bortle': '',
        'sqm': ''
    }
}

def read_or_create_csv(dictionaries):
    """
    Reads from or creates CSV files based on the input dictionaries.

    This function iterates over each dictionary provided in the input 'dictionaries'.
    For each dictionary, the function checks if a corresponding CSV file (named after the dictionary) exists.
    If the file exists, it reads the CSV file into a pandas DataFrame.
    If the file does not exist, it creates a new CSV file from the dictionary data, ensuring to:
      - Convert scalar values to single-item lists.
      - Replace NaN values with an empty string.
    It also strips whitespaces from string columns and converts keys to lower case in the DataFrame and 
    checks that the defaults 'key' column is upper case to match the header files key words.

    After processing, it saves any changes to the CSV file and updates the DataFrame in the output dictionary.

    Parameters:
    - dictionaries (dict): A dictionary where keys are the names for the CSV files to be read or created,
                            and values are dictionaries containing the data for the corresponding CSV file.

    Returns:
    - tuple: 
        - A dictionary of DataFrames corresponding to each input dictionary.
        - A boolean flag indicating whether any new CSV file was created during the function's execution.
    """
    # [Function implementation]

    dataframes = {}
    file_created = False  # Flag to track if any file is created

    # Iterate over each dictionary in the input
    for dictionary_name, dictionary_data in dictionaries.items():
        csv_file = f"{dictionary_name}.csv"
        
        # Convert scalar values to single-item lists
        for key, value in dictionary_data.items():
            if not isinstance(value, list):
                dictionary_data[key] = [value]
        
        # Check if the CSV file already exists
        if os.path.exists(csv_file):
            # If it exists, read the DataFrame from the CSV file
            df = pd.read_csv(csv_file)
            print('Reading', csv_file)
        else:
            # If it doesn't exist, create a new DataFrame from the dictionary data
            df = pd.DataFrame(dictionary_data)
            print(f"File '{csv_file}' was missing, so it was created.")
            file_created = True  # Set the flag to True as a file was created
            
            # Replace NaN values with an empty string
            df = df.fillna('')
        
            # Save the DataFrame to the CSV file
            df.to_csv(csv_file, index=False)
        
        # Strip whitespaces from string columns (object dtype)
        df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

        # Strip whitespaces from column names and make them lower case
        df.columns = df.columns.str.strip().str.lower()
        if dictionary_name == 'defaults':
            # Ensure the 'key' column is upper case to match the header files key words
            df['key'] = df['key'].str.upper()

        if dictionary_name == 'sites':
            # Ensure the 'key' column is upper case to match the header files key words
            df = df.replace('', np.nan)
    
        # Check if any changes were made (i.e., if there were any spaces stripped)
        if df.to_csv(index=False) != pd.read_csv(csv_file).to_csv(index=False):
            # Save the DataFrame back to the CSV file to correct errored input data
             df.to_csv(csv_file, index=False)
    
        dataframes[dictionary_name] = df

    return dataframes, file_created

def read_xisf_header(file_path):
        """
        Reads the header of an XISF file.

        Opens and reads the XISF file specified by 'file_path'. It checks for the XISF signature,
        reads the header, and returns it as a string. If the file is not a valid XISF file or
        an error occurs, it returns None.

        Parameters:
        - file_path (str): The path to the XISF file.

        Returns:
        - str or None: The XISF header as a string, or None if the file is invalid or an error occurs.
        """
        # Function implementation
        try:
            with open(file_path, 'rb') as file:
                signature = file.read(8).decode('ascii')
                # Check for the XISF signature
                if signature != 'XISF0100':
                    print("Invalid file format")
                    return None
                # Read and skip header length and reserved field
                header_length = struct.unpack('<I', file.read(4))[0]
                file.read(4)  # Skip reserved field
                xisf_header = file.read(header_length).decode('utf-8')
                
                return xisf_header

        except Exception as e:
            print(f"Error: {e}")
            return None
        
def xml_to_data(xml_data):
        """
        Converts XML data to a dictionary.

        Parses XML data, specifically extracting 'FITSKeyword' tags, and converts them into
        a dictionary with 'name' as keys and 'value' as values.

        Parameters:
        - xml_data (str): A string containing XML data.

        Returns:
        - dict: A dictionary containing data extracted from XML.
        """
        # Function implementation

        # Register the namespace
        ns = {'xisf': 'http://www.pixinsight.com/xisf'}
        ET.register_namespace('', ns['xisf'])

        # Parse the XML data
        root = ET.fromstring(xml_data)

        # Create a list to store our data
        data = {}

        # Iterate through each 'FITSKeyword' tag in the XML
        for fits_keyword in root.findall('.//xisf:FITSKeyword', namespaces=ns):
            #print(fits_keyword)
            name = fits_keyword.get('name')
            value = fits_keyword.get('value')

            # Add the 'name' and 'value' to the dictionary
            data[name] = value

        # Convert the list to a DataFrame
        #df = pd.DataFrame(data)

        return data

def sync_headers(default_header, fits_header):
        """
        Synchronizes FITS headers with default headers.

        Takes two dictionaries, 'default_header' and 'fits_header'. It ensures that the keys
        in 'fits_header' match those in 'default_header', filling in missing values from
        'default_header' as needed.

        Parameters:
        - default_header (dict): A dictionary containing default header values.
        - fits_header (dict): A dictionary containing FITS header values.

        Returns:
        - dict: A dictionary representing the synchronized FITS header.
        """
        # Function implementation

        # Initialize an empty dictionary
        updated_fits_header = {}

        # Add entries from fits_header that also exist in default_header
        for k, v in fits_header.items():
            if k in default_header['value'].index:
                updated_fits_header[k] = v

        # Add entries from default_header that don't exist in updated_fits_header
        for k, v in default_header['value'].items():
            if k not in updated_fits_header:
                updated_fits_header[k] = v

        return updated_fits_header

def try_parse_date(s):
    '''
    Attempts to parse a string as a date in the format '%Y-%m-%dT%H:%M:%S.%f' and returns it as '%d-%m-%Y'.
    If it fails, it raises a ValueError.

    Parameters:
    s (str): The string to parse.

    Returns:
    str: The parsed date string in the format '%Y-%m-%d'.
    
    Raises:
    ValueError: If the string cannot be parsed as a date.
    '''
    try:
        # Truncate to microsecond precision
        s = s[:26]
        return datetime.strptime(s, '%Y-%m-%dT%H:%M:%S.%f').strftime('%Y-%m-%d')
    except ValueError:
        raise ValueError("Could not parse date")

def dms_to_decimal(dms_str):
    '''
    Converts a string in the format 'degrees minutes seconds' to a decimal degree value.

    Parameters:
    dms_str (str): The string to convert.

    Returns:
    float: The converted decimal degree value.
    '''
    match = re.match(r"([+-]?\d+)\s+(\d+)\s+([\d\.]+)", dms_str)
    if match:
        degrees, minutes, seconds = map(float, match.groups())
        return round(abs(degrees) + minutes / 60 + seconds / 3600, 4) * (1 if degrees > 0 else -1)
    else:
        return dms_str

def round_floats_and_convert_datetime_in_dict(d):
    '''
    Iterates over a dictionary, rounds float values to 4 decimal places, and converts datetime strings to the format '%d-%m-%Y'.

    Parameters:
    d (dict): The dictionary to process.

    Returns:
    dict: The processed dictionary with rounded float values and converted datetime strings.
    '''
    for key, value in d.items():
        try:
            # Try to convert to float and round
            d[key] = round(float(value), 4)
        except ValueError:
            # If it's not a float, try to convert it to a date
            if isinstance(value, str):
                try:
                    d[key] = try_parse_date(value)
                except ValueError:
                    # If it's not a date, try to convert it from DMS to decimal degrees
                    d[key] = dms_to_decimal(value)
                    
    return d


def extract_headers(directories, default_values):
    """
    Extracts headers from FITS and XISF files in the specified directories.

    Parameters:
    - directories (list): A list of directories to search for files.
    - default_values (DataFrame): A DataFrame containing default header values.

    Returns:
    - list: A list of dictionaries representing the processed headers.
    """
    # Start of function extract_headers
    headers = []  # List to store the processed headers

    # Set 'key' as the index in default_values DataFrame if not already set
    try:
        default_values.set_index('key', inplace=True)
    except Exception as e:
        pass
        #print(f"Index setting issue: {e}")
    
    # Convert default_values DataFrame to a dictionary for easy lookup
    default_header = dict(default_values)

    # Iterate over each directory provided
    for directory in directories:
        # Walk through the directory
        for root, _, files in os.walk(directory):
            print(f"Extracting headers from directory: {root}")

            # Process each file in the directory
            for file in files:
                file_path = os.path.join(root, file)  # Full path of the file
                
                # Check file extension and process accordingly
                if file.lower().endswith(('.fits', '.fit', '.fts', '.xisf')):
                    
                    try:
                        if file.lower().endswith(('.fits','.fit','.fts')):
                            # Open FITS file and extract header
                            with fits.open(file_path) as hdul:
                                header = hdul[0].header
                        elif file.lower().endswith('.xisf'):
                            # Read and parse XISF file header
                            header_xml = read_xisf_header(file_path)
                            header = xml_to_data(header_xml)

                        # Convert header to dictionary and process
                        header_dict = dict(header)
                        # Synchronize header with default values and add file name
                        reduced_header_dict = sync_headers(default_header, header_dict)
                        # Round floats and converts datetime string in dictionary
                        reduced_header_dict = round_floats_and_convert_datetime_in_dict(reduced_header_dict)

                        reduced_header_dict['FILENAME'] = os.path.basename(file_path)

                        # Accumulate processed headers
                        headers.append(reduced_header_dict)

                    except Exception as e:
                        print(f"Error reading {file_path}: {e}")

    # Extract software capture information from the last processed header
    
    swcreate = header.get('CREATOR', header.get('SWCREATE', 'unknown package')) if header else 'unknown package'
    print(f"\nImages captured by {swcreate}")

    return pd.DataFrame(headers)  # Return list of dataFrames representing the processed headers

def format_seconds_to_hms(seconds):
    """
    Formats a time duration given in seconds into a human-readable format (hours, minutes, seconds).

    Converts a duration in seconds to a string format, expressing the duration in hours, minutes,
    and seconds. For example, 3661 seconds would be converted to '1 hrs 1 mins 1 secs'.

    Parameters:
    - seconds (int or float): The time duration in seconds.

    Returns:
    - str: The formatted time string.
    """
    # Function implementation

    # Divide the total seconds into hours and remainder seconds
    hours, remainder = divmod(seconds, 3600)
    # Further divide the remainder into minutes and seconds
    minutes, seconds = divmod(remainder, 60)

    # Initialize an empty list to hold time parts
    time_parts = []
    
    # Append hours to time_parts if hours is greater than 0
    if hours > 0:
        time_parts.append(f"{hours} hrs")
    
    # Append minutes to time_parts if minutes is greater than 0
    if minutes > 0:
        time_parts.append(f"{minutes} mins")
    
    # Append seconds to time_parts
    time_parts.append(f"{seconds:.0F} secs")

    # Join the time parts with spaces and return the formatted string
    return ' '.join(time_parts)

def summarize_session(df):    

    #main code for summarize_session
    summary = ""

    # Check if 'IMAGETYP' column exists in the DataFrame
    if 'IMAGETYP' in df:
        txt = "\nObservation session Summary:\n"
        summary += txt 

        # Process different frame types: LIGHT, FLAT, BIAS, and DARK
        for imagetyp in ['LIGHT', 'FLAT', 'BIAS', 'DARK']:
            if imagetyp in df['IMAGETYP'].values:
                group = df[df['IMAGETYP'] == imagetyp]

                # Process LIGHT and FLAT frames
                if imagetyp in ['LIGHT', 'FLAT']:
                    txt = f"\n{imagetyp}S:\n"
                    summary += txt

                    # Group by FILTER and summarize
                    for filter_type, group_df in group.groupby('FILTER'):
                        frame_count = group_df.shape[0]
                        total_exposure = group_df['EXPOSURE'].astype(float).sum()
                        formatted_time = format_seconds_to_hms(total_exposure)
                        txt = f"\n  Filter {filter_type}:\t {frame_count} frames, Exposure time: {formatted_time}"
                        summary += txt
                    summary += '\n'
                # Process BIAS and DARK frames, grouped by GAIN
                elif imagetyp in ['BIAS', 'DARK']:
                    for gain_value, gain_group in group.groupby('GAIN'):
                        frame_count = gain_group.shape[0]
                        total_exposure = gain_group['EXPOSURE'].astype(float).sum()
                        formatted_time = format_seconds_to_hms(total_exposure)
                        txt = f"\n{imagetyp} with GAIN {gain_value}:\t {frame_count} frames, Exposure time: {formatted_time}"
                        summary += txt
                    
                # Additional summary for LIGHT frames
                if imagetyp == 'LIGHT':
                    total_light_exposure = group['EXPOSURE'].astype(float).sum()
                    formatted_total_light_time = format_seconds_to_hms(total_light_exposure)
                    txt = f"\nTotal session exposure for LIGHTs:\t {formatted_total_light_time}\n"
                    summary += txt
    else:
        # Handling case where 'IMAGETYP' column is not present
        txt = "No 'IMAGETYP' column found in headers."
        summary += txt

    return summary

def create_calibration_df(df):
    """
    Generates a DataFrame summarizing calibration frame data from a given DataFrame based on 
    specific image types (IMAGETYP), GAIN values, and FILTER values where applicable.

    This function filters the input DataFrame for relevant calibration frame types (DARK, BIAS, 
    FLAT, FLATDARKS), then groups the data by these types along with GAIN, and FILTER (for FLAT frames).
    It provides a count of each group, which is useful for assessing the calibration data available.

    Parameters:
    - df (pandas.DataFrame): The DataFrame containing FITS header data.

    Returns:
    - pandas.DataFrame: A DataFrame with columns 'TYPE', 'GAIN', 'FILTER' (if applicable), 
      and 'NUMBER' representing the count of each group.
    """

    # Define relevant frame types for calibration
    relevant_types = ['DARK', 'BIAS', 'FLAT', 'FLATDARKS']
    
    # Filter the DataFrame for relevant frame types
    filtered_df = df[df['IMAGETYP'].isin(relevant_types)].copy()

    # Group by IMAGETYP and GAIN, and additionally by FILTER for FLAT frames
    if 'FILTER' in df.columns:
        # Set FILTER to empty string for non-FLAT frames
        filtered_df.loc[filtered_df['IMAGETYP'] != 'FLAT', 'FILTER'] = ''
        # Group by TYPE, GAIN, and FILTER, and count the number of frames
        group_counts = filtered_df.groupby(['IMAGETYP', 'GAIN', 'FILTER']).size().reset_index(name='NUMBER')
    else:
        # Group by TYPE and GAIN if FILTER column doesn't exist, and count the number of frames
        group_counts = filtered_df.groupby(['IMAGETYP', 'GAIN']).size().reset_index(name='NUMBER')

    # Rename 'IMAGETYP' column to 'TYPE'
    return group_counts.rename(columns={'IMAGETYP': 'TYPE'})

def create_lights_df(df: pd.DataFrame)-> pd.DataFrame:
    """
    Creates a DataFrame for 'LIGHT' type data 

    Args:
        df (pd.DataFrame): DataFrame containing FITS header data.

    Returns:
        pd.DataFrame: Aggregated DataFrame with 'LIGHT' type data.
    """
    # Filter the DataFrame for rows where the image type is 'LIGHT'
    light_df = df[df['IMAGETYP'] == 'LIGHT'].copy()

    # Return the DataFrame with 'LIGHT' type data 
    return pd.DataFrame(light_df)


def sqm_to_bortle(sqm):
        """
        Converts an SQM (Sky Quality Meter) value to the corresponding Bortle scale classification.

        The Bortle scale is a nine-level numeric scale used to quantify the astronomical observability of celestial objects,
        affected by light pollution. The scale ranges from 1, indicating the darkest skies, to 9, the brightest.

        Args:
        sqm (float): The SQM value indicating the level of light pollution.

        Returns:
        int: The Bortle scale classification (ranging from 1 to 9).
        """
        # Bortle scale classification based on SQM values
        if sqm > 21.99:
            return 1  # Class 1: Excellent dark-sky site
        elif 21.50 <= sqm <= 21.99:
            return 2  # Class 2: Typical truly dark site
        elif 21.25 <= sqm <= 21.49:
            return 3  # Class 3: Rural sky
        elif 20.50 <= sqm <= 21.24:
            return 4  # Class 4: Rural/suburban transition
        elif 19.50 <= sqm <= 20.49:
            return 5  # Class 5: Suburban sky
        elif 18.50 <= sqm <= 19.49:
            return 6  # Class 6: Bright suburban sky
        elif 17.50 <= sqm <= 18.49:
            return 7  # Class 7: Suburban/urban transition
        elif 17.00 <= sqm <= 17.49:
            return 8  # Class 8: City sky
        else:
            return 9  # Class 9: Inner-city sky
        
def get_bortle_sqm(lat: float, lon:float, secret_df):
    """
    Retrieves the Bortle scale classification and SQM (Sky Quality Meter) value for a given latitude and longitude.

    Parameters:
    - lat (float): The latitude coordinate.
    - lon (float): The longitude coordinate.
    - secret_df (pandas.DataFrame): A DataFrame containing the API key and endpoint.

    Returns:
    - tuple: A tuple containing the Bortle scale classification, SQM value, error message (if any), 
             and flags indicating the validity of the API key and endpoint.
    """
    # Function implementation

    def is_valid_api_key(api_key):
        """ Check if the API key is valid. """
        return api_key is not None and len(api_key) == 16 and api_key.isalnum()

    def is_valid_api_endpoint(api_endpoint):
        """ Check if the API endpoint is valid. """
        return bool(api_endpoint and api_endpoint.strip())

    if secret_df.empty or secret_df.isna().values.any():
        return 0, 0, "api_key and/or api_endpoint are empty", False, False

    # Extract the API key and endpoint from the DataFrame
    api_key = secret_df.get('api key', pd.Series([None])).iloc[0].strip() if isinstance(secret_df.get('api key', pd.Series([None])).iloc[0], str) else None
    api_endpoint = secret_df.get('api endpoint', pd.Series([None])).iloc[0].strip() if isinstance(secret_df.get('api endpoint', pd.Series([None])).iloc[0], str) else None

    api_valid = is_valid_api_key(api_key)
    api_endpoint_valid = is_valid_api_endpoint(api_endpoint)

    # Check the validity of the API key and endpoint
    if not api_valid and not api_endpoint_valid:
        return 0, 0, "Both API key and API endpoint are invalid.", api_valid, api_endpoint_valid
    elif not api_valid:
        return 0, 0, "API key is malformed.", api_valid, api_endpoint_valid
    elif not api_endpoint_valid:
        return 0, 0, "API endpoint is empty.", api_valid, api_endpoint_valid

    # Define the parameters for the API request
    params = {
        'ql': 'wa_2015',
        'qt': 'point',
        'qd': f'{lon},{lat}',
        'key': api_key
    }

    try:
        response = requests.get(api_endpoint, params=params)
        response.raise_for_status()

        if response.text.strip() == 'Invalid authentication.':
            return 0, 0, "Authentication error: Missing or invalid API key.", False, api_endpoint_valid

        artificial_brightness = float(response.text)
        sqm = (math.log10((artificial_brightness + 0.171168465)/108000000)/-0.4)
        bortle_class = sqm_to_bortle(sqm)
        return bortle_class, round(sqm, 2), None, api_valid, api_endpoint_valid

    except requests.exceptions.HTTPError as err:
        return 0, 0, f"HTTP Error: {err}", api_valid, False
    except ValueError:
        return 0, 0, "Could not convert response to float.", api_valid, api_endpoint_valid
    except Exception as e:
        return 0, 0, f"An error occurred: {e}", api_valid, api_endpoint_valid


def calculate_auxiliary_parameters(df, defaults_df, secret_df, sites_df):
    """
    Calculates auxiliary parameters for a DataFrame containing FITS header data.

    This function calculates and adds auxiliary parameters to the DataFrame, including:
      - BORTLE: Bortle scale classification for the observation site.
      - SQM: Sky Quality Meter reading for the observation site.
      - HFR: Half-flux radius in pixels.
      - IMSCALE: Image scale in arcseconds per pixel.
      - FWHM: Full-width at half-maximum in arcseconds.

    Parameters:
    - df (pandas.DataFrame): A DataFrame containing FITS header data.
    - defaults_df (pandas.DataFrame): A DataFrame containing default header values.
    - secret_df (pandas.DataFrame): A DataFrame containing API key and endpoint.
    - sites_df (pandas.DataFrame): A DataFrame containing observation site data.

    Returns:
    - pandas.DataFrame: The DataFrame with auxiliary parameters added.
    """
   #main code for calculate_auxiliary_parameters

    # Convert Lat and Long to float and round to 2 decimal places
    #df['SITELAT'] = df['SITELAT'].astype(float).round(2)
    #df['SITELONG'] = df['SITELONG'].astype(float).round(2)

    bortle, sqm, api_response_text, valid_api_key, valid_api_endpoint = get_bortle_sqm('0.0', '54.0',secret_df)

    #check if sites_df is empty
    empty_sites = sites_df.empty or (sites_df.values == '').any() or sites_df.isna().values.any()
    
    # Extract default HFR value from defaults DataFrame
    hfr_set = defaults_df.loc['HFR', 'value'].strip()

    # Set to keep track of processed latitude-longitude pairs
    processed_sites = set()


    # Iterate over each row in the DataFrame
    for index, row in df.iterrows():
        lat, lon = row['SITELAT'],row['SITELONG']
        latr, lonr = round(lat,2),round(lon,2)

        # Checking for existing site data in sites_df
        site_data = ((round(sites_df['latitude'],2) == latr) & (round(sites_df['longitude'],2) == lonr)).any()

        #check lat and long dont exist in processd sites
        coordinates_processed = (latr, lonr) in processed_sites

        if not site_data:
            if valid_api_key:
                # Fetch Bortle and SQM from API if
                # API key has valid form 
                # and 
                # (latitude and longtitude not found in sites_df, ie new site
            
                bortle, sqm, api_response_text, valid_api_key, valid_api_endpoint = get_bortle_sqm(latr, lonr,secret_df)

                #check if get_bortle_sqm returned valid values
                if api_response_text is not None:
                    msg = f"\nAPI request failed for lat {lat}, lon {lon}: Using 0 for Bortle and SQM. "
                    bortle, sqm = 0, 0
                else:
                    # Adding new site data to sites_df and save it
                    new_site = {'latitude': lat, 'longitude': lon, 'bortle': bortle, 'sqm': sqm}
                    new_site_df = pd.DataFrame([new_site])
                    if empty_sites:
                        sites_df = new_site_df
                    else:
                        sites_df = pd.concat([sites_df, new_site_df], ignore_index=True, sort=False)
                    sites_df.to_csv('sites.csv', index=False)
                    msg = f"\nRetrieved bortle {bortle} and sqm {sqm} for lat {lat}, lon {lon} from api endpoint"
            else:
                msg = f"\nlat {lat}, lon {lon} not in sites.csv and invalid api key: using 0 for bortle and sqm."
                bortle, sqm = 0, 0
        else:
            bortle, sqm = sites_df.iloc[0]['bortle'], sites_df.iloc[0]['sqm']
            msg = f"\nRetrieved Bortle {bortle} and SQM {sqm} for lat {lat}, lon {lon} from sites.csv"

        if not (latr,lonr) in processed_sites:
            processed_sites.add((latr, lonr))  # Mark as processed
            print(msg) 
        
        # Update the DataFrame with Bortle and SQM values
        df.at[index, 'BORTLE'] = bortle
        df.at[index, 'SQM'] = sqm


        # Calculate and update HFR, IMSCALE, and FWHM values
        file_path = row['FILENAME']
        hfr_match = re.search(r'HFR_([0-9.]+)', file_path)
        hfr = float(hfr_match.group(1)) if hfr_match and float(hfr_match.group(1)) > 0 else float(hfr_set)
        imscale = float(row['XPIXSZ']) / float(row['FOCALLEN']) * 206.265
        fwhm = hfr * imscale if hfr >= 0.0 else 0.0

        df.at[index, 'HFR'] = round(hfr,2)
        df.at[index, 'IMSCALE'] = round(imscale,2)
        df.at[index, 'FWHM'] = round(fwhm,2)

    print('\nCompleted sky quality extraction')
    return df

# Function to retrieve calibration data for a given row
def get_calibration_data(row: pd.Series, cal_type: str, calibration_df: pd.DataFrame) -> int:
    """
    Retrieves the count of calibration frames for a given row based on specified calibration type.

    This nested function matches a row from the aggregated DataFrame with the calibration DataFrame
    based on the calibration type (e.g., FLAT, DARK, BIAS, FLATDARKS) and other parameters like 'GAIN'
    and 'FILTER'. It returns the sum of 'NUMBER' of matched calibration frames.

    Parameters:
    - row (pd.Series): A series representing a row in the aggregated DataFrame.
    - cal_type (str): The type of calibration data to match (e.g., 'FLAT', 'DARK').
    - calibration_df (pandas.DataFrame): The DataFrame containing calibration frame data.

    Returns:
    - int: The total count of matching calibration frames.
    """
    # Function implementation

    if cal_type == 'FLAT':
        # Matching both 'GAIN' and 'FILTER' for FLAT type
        match = calibration_df[(calibration_df['TYPE'] == cal_type) & 
                                (calibration_df['GAIN'] == row['gain']) &
                                (calibration_df['FILTER'].str.upper() == row['filter'].upper())]
    else:
        # Matching 'GAIN' for other types
        match = calibration_df[(calibration_df['TYPE'] == cal_type) & 
                                (calibration_df['GAIN'] == row['gain'])]
    return match['NUMBER'].sum() if not match.empty else 0

def aggregate_parameters(lights_df, calibration_df):
    """
    Aggregates astronomical observation parameters from light frames and calibration data.

    This function processes a DataFrame of light frame data ('lights_df') and a DataFrame of calibration 
    data ('calibration_df'). It standardizes column names and formats in 'lights_df', aggregates data 
    by specific parameters (date, filter, gain, binning, and exposure), and adds calibration data counts 
    (for darks, flats, bias, and flat darks) from 'calibration_df'. The function also includes Bortle scale 
    and SQM values, and calculates the mean FWHM, sensor cooling, and temperature for each group.

    Parameters:
    - lights_df (pandas.DataFrame): A DataFrame containing data from light frames, with columns like
                                    'date-loc', 'filter', 'gain', 'xbinning', 'exposure', etc.
    - calibration_df (pandas.DataFrame): A DataFrame containing calibration frame data, with columns
                                         like 'TYPE', 'GAIN', 'FILTER', 'NUMBER', etc.

    Returns:
    - pandas.DataFrame: An aggregated DataFrame with detailed information for each set of grouped parameters.
    """

    # Standardizing column names to lower case and converting 'date-loc' to date format
    lights_df.columns = lights_df.columns.str.lower()
    lights_df['date-loc'] = pd.to_datetime(lights_df['date-loc']).dt.date
    lights_df['ccd-temp'] = lights_df['ccd-temp'].astype(float).round(0)
    lights_df['foctemp'] = lights_df['foctemp'].astype(float).round(2)

    # Aggregating data by date, filter, gain, xbinning, and exposure
    aggregated_df = lights_df.groupby(['date-loc', 'filter', 'gain', 'xbinning', 'exposure']).agg(
        number=('date-loc', 'count'),
        sensorCooling=('ccd-temp', 'mean'),
        temperature=('foctemp', 'mean'),
        meanFwhm=('fwhm', 'mean')
    ).reset_index()

    # Renaming columns for clarity
    aggregated_df.rename(columns={
        'xbinning': 'binning',
        'exposure': 'duration',
        'focratio': 'fnumber'}, inplace=True)

    # Applying get_calibration_data to aggregate calibration data
    aggregated_df['darks'] = aggregated_df.apply(get_calibration_data, args=('DARK', calibration_df), axis=1)
    aggregated_df['flats'] = aggregated_df.apply(get_calibration_data, args=('FLAT', calibration_df), axis=1)
    aggregated_df['bias'] = aggregated_df.apply(get_calibration_data, args=('BIAS', calibration_df), axis=1)
    aggregated_df['flatDarks'] = aggregated_df.apply(get_calibration_data, args=('FLATDARKS', calibration_df), axis=1)

    # Adding Bortle scale and SQM values
    aggregated_df['bortle'] = lights_df['bortle'].round(2)
    aggregated_df['meanSqm'] = lights_df['sqm'].round(2)

    # Adding fNumber, and rounding sensor cooling and temperature
    aggregated_df['fNumber'] = lights_df['focratio']
    aggregated_df['sensorCooling'] = aggregated_df['sensorCooling'].round().astype(int)
    aggregated_df['temperature'] = aggregated_df['temperature'].astype(float).round(2)
    aggregated_df['meanFwhm'] = aggregated_df['meanFwhm'].astype(float).round(2)

    return aggregated_df


def update_filter(filter_value, filter_to_code):
    """
    Maps a filter name to its corresponding code based on the 'filter_to_code' dictionary.

    If the filter name exists in 'filter_to_code', this function returns the associated code.
    If the code is not a four-digit integer, an error message is printed, and a placeholder
    indicating no code found is returned.

    Parameters:
    - filter_value (str): The name of the filter to be mapped to a code.
    - filter_to_code (dict): A dictionary mapping filter names to their corresponding codes.

    Returns:
    - int or str: The code corresponding to the filter name, or an error message if no valid code is found.
    """
    # Function implementation
    
    #check code is a five digit integer, if it is assume valid code
    code = filter_to_code.get(filter_value)
    # Try to convert the code to an integer
    try:
        code = int(code)
    except (ValueError, TypeError):
        code = None
        #Thanks to Francisco Bitto for flagging this.
    if isinstance(code, int) and 1000 <= code <= 99999:
        return code
    else:
        # Print an error message if the code is not found for a filter
        print(f"\nWarning: for filter {filter_value}: no code found. Enter a valid code in filters.csv file or input the code in the astrobin upload file.")
        return f"{filter_value}: no code found"

def create_astrobin_output(df, filter_df):
    """
    Transforms a DataFrame into a format suitable for AstroBin output.

    This function takes a DataFrame containing astronomical observation data and a DataFrame mapping
    filter names to codes. It updates the 'filter' column in the observation DataFrame to use filter codes 
    instead of names. The function also reorders and renames columns to match the expected format for AstroBin, 
    a platform for sharing astrophotography. It ensures the DataFrame columns align with AstroBin's data 
    requirements, including the transformation of filter names into corresponding codes.

    Parameters:
    - df (pandas.DataFrame): A DataFrame containing observation data.
    - filter_df (pandas.DataFrame): A DataFrame mapping filter names to their corresponding codes.

    Returns:
    - pandas.DataFrame: The transformed DataFrame with columns renamed and reordered to match AstroBin's format.
    """
    # Mapping filter name to filter code
    filter_to_code = filter_df.set_index('filter')['code']

    # Finding the position of the 'filter' column in df
    filter_col_index = df.columns.get_loc('filter')

    # Apply the update_filter function to the 'filter' column if the contents are strings
    if df['filter'].dtype == 'object':
        df['filter'] = df['filter'].apply(lambda x: update_filter(x, filter_to_code))

    # Reordering columns to match AstroBin's expected format
    column_order = ['date', 'filter', 'number', 'duration', 'binning', 'gain', 
                    'sensorCooling', 'fNumber', 'darks', 'flats', 'flatDarks', 'bias', 'bortle',
                    'meanSqm', 'meanFwhm', 'temperature']

    # Renaming columns to match AstroBin's expected format
    df.rename(columns={'date-loc': 'date'}, inplace=True)

    # Return the transformed DataFrame with rounded values and reordered columns
    return df[column_order]


def main():
    """
    Main function to process astronomical observation data for analysis and AstroBin output.

    This function performs several steps to process astronomical observation data:
    1. Reads or creates configuration CSV files.
    2. Validates directory paths provided via command line arguments.
    3. Extracts FITS headers from files in the given directories.
    4. Creates and prints a summary of the observation session.
    5. Creates DataFrame for calibration data.
    6. Creates DataFrame for light frame data.
    7. Calculates auxiliary parameters like Bortle scale and SQM.
    8. Aggregates parameters for analysis.
    9. Transforms data into a format suitable for AstroBin output.

    The function also exports the session summary and final data to text and CSV files, respectively.

    It uses several configuration files (defaults, filters, secrets, sites) to aid in processing.
    If new configuration files are created, the user is given the option to terminate the program 
    to edit these files before proceeding.
    """
    # Implementation of the main function


    # Step 1: Read or create configuration CSV files
    params, file_created = read_or_create_csv(configurations)

    # Check if any configuration file was created
    if file_created:
        # Give the user the option to terminate the program
        user_input = input("New configuration files were created. Do you wish to edit them before continuing? (y/n): ")
        if user_input.lower() == 'y':
            print("Exiting the program. Please edit the configuration files as needed and rerun the script.")
            sys.exit(0)


    # Step 2: Validate directory paths
    if len(sys.argv) < 2:
        print("Error: Please provide at least one directory path.")
        sys.exit(1)

    directory_paths = sys.argv[1:]
    # Validate directory paths
    for directory in directory_paths:
        if not os.path.isdir(directory):
            print(f"Error: The directory '{directory}' does not exist.")
            sys.exit(1)
        print(f"Processing directory: {directory}")

    # Step 3: Extract FITS headers
    print('\nReading FITS headers...\n')
    headers_df = extract_headers(directory_paths, params['defaults'])
    
    # Step 4: Create and print summary of observation session
    summary = summarize_session(headers_df)
    print(summary)

    # Step 5: Create calibration DataFrames
    calibration_df = create_calibration_df(headers_df)
    
    # Step 6: Create lights DataFrames
    lights_df = create_lights_df(headers_df)

    # Step 7: Calculate auxiliary parameters
    total_lights_df = calculate_auxiliary_parameters(lights_df, params['defaults'], params['secret'], params['sites'])
    
    # Step 8: Aggregate parameters
    aggregated_df = aggregate_parameters(total_lights_df, calibration_df)

    # Step 9: Transform data for AstroBin output
    astroBin_df = create_astrobin_output(aggregated_df, params['filters'])

    # Export summary to a text file
    summary_txt = os.path.basename(directory_paths[0]) + " session summary.txt"
    with open(summary_txt, 'w') as file:
        file.write(summary)
    print(f"\nProcessing summary exported to {summary_txt}")

    # Export final data to CSV
    output_csv = os.path.basename(directory_paths[0]) + " acquisition.csv"
    astroBin_df.to_csv(output_csv, index=False)
    print(f"\nAstroBin data exported to {output_csv}")
    

    return  astroBin_df

if __name__ == "__main__":
    df = main()
    print()
    print(df)