Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Providing access to ground observatory data #8

Open
smithara opened this issue Mar 2, 2020 · 0 comments
Open

Providing access to ground observatory data #8

smithara opened this issue Mar 2, 2020 · 0 comments

Comments

@smithara
Copy link
Member

smithara commented Mar 2, 2020

Ground observatory data should eventually be made accessible through VirES. In the meantime we can provide access to them in the shared folder with demo notebooks showing how to read them.

UPDATE: There is a notebook demoing direct download and reading from the BGS FTP so perhaps duplicating the data to the VRE is not really necessary:
https://nbviewer.jupyter.org/github/lmar76/swarmnb/blob/master/obsdata.ipynb

Re-processed observatory data are available from BGS at ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/

There are three collections, hour (from year 1900-), minute (from 1997), second (from 2012), which contain mixed records from different observatories. They are updated every 4 months(?)

The following should be duplicated to ~/shared/AUX_OBS/ and the .ZIP files extracted

  • ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/hour/ (hourly means)
    • 520MB compressed, 4.4GB extracted
    • Contains zipped text files, one for each year, between 1900 to "now". The latest file (SW_OPER_AUX_OBS_2__20200101T000000_20201231T235959_0122.txt) will be updated over time. Each year has a data file and a metadata file:
    SW_OPER_AUX_OBS_2__19000101T000000_19001231T235959_0122.txt  (inside .ZIP)
    AUX_OBS_2_1900.input  (outside .ZIP)
    
  • ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/minute/
    • 15GB compressed
    • Contains zipped CDF files (extension .DBL) one for each day from 1997 onwards e.g.:
    SW_OPER_AUX_OBSM2__20091231T000000_20091231T235959_0101.DBL
    SW_OPER_AUX_OBSM2__20091231T000000_20091231T235959_0101.HDR
    
  • ftp://ftp.nerc-murchison.ac.uk/geomag/Swarm/AUX_OBS/second/
    • 30GB compressed
Code to read the hourly files (click to expand)
import pandas as pd
import xarray as xr

def load_dataset(filename, as_pandas=False):
    df = pd.read_csv(
        filename,
        comment="#",
        names=['obs', 'gc_lat', 'long', 'rad', 'yyyy', 'mm', 'dd', 'UT', 'N', 'E', 'C'],
        delim_whitespace=True)
    # Convert to datetime index
    df.index = pd.to_datetime(
        df["yyyy"]*100000000 + df["mm"]*1000000 + df["dd"]*10000 + df["UT"].astype(int)*100 + 30,
        format="%Y%m%d%H%M")
    df = df.drop(columns=["yyyy", "mm", "dd", "UT"])
    # Note that the time series is repeated over and over, for each observatory
    # Note also that there are jumps in each time series
    if as_pandas:
        return df
    # Convert to xarray
    # Set up empty dataset with just the times
    year = df.index[0].year
    times = pd.date_range(start=f"{year}-01-01T00:30", end=f"{year}-12-31T23:30", freq="h")
    ds = xr.Dataset(
        {"NEC": ["N", "E", "C"], "Timestamp": times})
    # Loop through each sub-dataframe (containing just measurements from one observatory)
    #   Add each as a DataArray
    for obsname, df_obs in df.groupby("obs"):
        # Infill gaps in the time series (with nans)
        df_obs = df_obs.reindex(times)
        # Add data for each observatory
        ds = ds.assign({
            f"{obsname}": (("Timestamp", "NEC"), (df_obs[["N", "E", "C"]].values))})
        # Add attributes with observatory locations
        ds[obsname].attrs = {"Latitude": df_obs["gc_lat"].iloc[0].round(3),
                             "Longitude": df_obs["long"].iloc[0].round(3),
                             "Radius": df_obs["rad"].iloc[0].round(3)}
    return ds
# Example loading:
load_dataset('hour/SW_OPER_AUX_OBS_2__19020101T000000_19021231T235959_0122.txt')

gives

<xarray.Dataset>
Dimensions:    (NEC: 3, Timestamp: 8760)
Coordinates:
  * NEC        (NEC) <U1 'N' 'E' 'C'
  * Timestamp  (Timestamp) datetime64[ns] 1902-01-01T00:30:00 ... 1902-12-31T23:30:00
Data variables:
    CLH0       (Timestamp, NEC) float64 1.994e+04 -1.792e+03 ... 5.663e+04
    POT0       (Timestamp, NEC) float64 1.844e+04 -3.22e+03 ... 4.313e+04
    TOK0       (Timestamp, NEC) float64 nan nan nan ... -2.427e+03 3.451e+04
    VLJ0       (Timestamp, NEC) float64 1.886e+04 -5.148e+03 ... 4.218e+04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant