-
Notifications
You must be signed in to change notification settings - Fork 4
Tutorial: install and maintain cheta telemetry archive
This tutorial will guide you through the steps of installing a subset or all of the cheta (AKA Ska engineering archive) data archive to a local directory on a standalone laptop or desktop, followed by instructions on how to keep your cheta archive up to date.
Cheta vs. Ska engineering archive
The Ska engineering archive is being re-branded to cheta to make it easier to import
and easier to say. This is not an acronym, though you can think "Chandra engineering
telemetry archive" if that makes you happier. Mostly cheta means it is fast. In any
code you can replace Ska.engarchive
with cheta
(but the original will always work):
from cheta import fetch_eng as fetch
A quick word on sizes and speeds
The data files that comprise the cheta archive take around 200 Gb circa late-2019. While this is large, it is within the storage capability of modern laptops.
In contrast, network speeds can present a limitation for initial syncing: at 1 Mb/sec the whole archive would take 2.5 days, but if 10 Mb/sec can be sustained it takes only about 6 hours. VPN servers can limit bandwidth substantially, so it is important to have an estimate of your transfer speed prior to diving in to copy the whole archive.
Using my cheap Seagate external 2Tb drive with USB 3.0, I see transfer speeds of 80-90 Mb/s, which translates to copying the entire archive in a bit less than an hour.
At this time the cheta archive package is available in the Ska3 flight environment, so no independent installation is necessary.
FOT MATLAB tools users
To initialize python in MATLAB, type the following command at the MATLAB command prompt:
pyexec('update_path=True')
For syncing to work correctly, you will need to manually create the folder in which you plan to store the data. By default, the Python tools will look for the folder:
%SKA%\Ska_data\data\eng_archive
where %SKA%
is a windows environment variable set for you by the MATLAB FOT Tools. You can check what this is using the following command in the MATLAB Command Window:
getenv('SKA')
If the eng_archive
directory doesn't exist on your machine, you will need to create it and then restart MATLAB before continuing, or manually specify a different directory using the --data-root
flag discussed later in this tutorial.
MacOSX and linux users
A prerequisite for all of this is to have a standalone Ska3 environment installed and running
on your machine, with a $SKA
environment variable defined
and pointing to a directory with a data/
subdirectory.
This is covered in the Ska3 runtime environment for users wiki.
Now we will make a new conda environment just for doing cheta archive maintenance. This shows a good practice for doing development / experimental package updates with Ska3: leave your "flight Ska3" in a clean state corresponding to the most recent official (tested!) release, and check out new packages in a clone of the flight environment:
conda create --clone=ska3 --name ska3-cheta
source activate ska3-cheta
Now we will put the new version of cheta into our new environment by installing with pip
directly from a branch/tag on GitHub. The old versions of the conda
package manager and
the pip
installer that we use are not quite compatible, so we need to first uninstall
the Ska.engarchive conda package and then install the new version with pip
:
conda uninstall --force ska.engarchive
pip install --egg --no-deps --ignore-installed git+https://github.com/sot/[email protected]
In the examples you will see commands beginning with !
. For Windows users you must type that !
symbol.
For MacOS/linux users, do NOT type the !
symbol (instead pretend that is your command prompt).
First, let's confirm that you have the right version of cheta installed:
! python -m cheta.update_client_archive --version
update_client_archive.py 4.47.3
For this tutorial you need to be on a network that can see the ICXC web site. To test, try loading https://icxc.cfa.harvard.edu/.
In the commands below, you will see something like [--data-root=.]
. If you already
have a local copy of the cheta archive on your laptop and want to do the tutorial "on the side",
then include that option to instruct the commands to store data in that root directory.
Most people can skip this, in which case the new data will be added in the standard location which you can discover with:
! python -c "from cheta import fetch; print(fetch.msid_files.basedir)"
/Users/aldcroft/ska/data/eng_archive
The list of MSIDs that you want to copy is provided to the cheta_sync
tool in a file.
There are three related ways to select MSIDs:
- MSIDs that match the name or pattern are included, for example
aopcadmd
oraacccd*
. Note that case does not matter. - MSIDs with the same subsystem and sampling rate as given MSIDs are included. For example:
*/1wrat
gives all ACIS engineering telemetry nominally sampled at 16.4 sec, while*/aopcadmd
gives all PCAD telemetry at 1.025 sampling. - MSIDs with the same subsystem regardless of sampling rate, for example
**/3tscpos
gives all engineering SIM telemetry, while**/aopcadmd
gives all PCAD telemetry (which is more than 100 Gb). So in your favorite editor create a file namedmsid_specs
and enter the following:
aacccd*
# aopcadmd (275 Mb for just this MSID)
# */1wrat (205 Mb for all 16.4 sec ACIS engineering telemetry)
Now we're finally ready for the copy, but let's do a dry-run to see what it will copy. Instead of ccosmos
you can use chimchim
.
WINDOWS Users: you need to include the final &
shown the examples to allow typing your password. You will also need to set the --server-data-root to chimchim
! python -m cheta.update_client_archive --add-msids=msid_specs [--data-root=.] \
--server-data-root=<username>@ccosmos.cfa.harvard.edu --dry-run [&]
So let's do it:
! python -m cheta.update_client_archive --add-msids=msid_specs [--data-root=.] \
--server-data-root=<username>@ccosmos.cfa.harvard.edu [&]
The next step is bringing your local cheta archive up to date with the server version.
However, in this tutorial you have already synced with the latest available data, so
doing an update will not actually do anything. So we will apply a command to
truncate 7 days of data from the local archive to force it to be out of date. First, if you
are installing to the standard cheta archive location (instead of --data-root=.
)
then use this command to discover that directory name:
! python -c "from cheta import fetch; print(fetch.msid_files.basedir)"
/Users/aldcroft/ska/data/eng_archive
Now do the actual truncate command, first doing the --dry-run
option:
! python -m cheta.update_archive --content=pcad5eng --data-root=<previous_output OR .> \
--truncate=-7 --dry-run
Note that this command is slightly dangerous so the default for --data-root
is .
to prevent accidentally wiping out your local Ska cheta archive. If you do not
supply the --content
option then the entire archive will be truncated.
Note also that if your cheta archive gets corrupted during an update (e.g. power loss) then truncating the archive to a time before the update will often fix things.
First, get into Ska3 ipython
.
For MATLAB FOT Tools users, this means executing the following command from the MATLAB Command Window:
starting_dir = cd(get_python_install_dir()); system('start_ipython.exe&'); cd(starting_dir);
Once you have ipython
open you can test out the following python code:
import os
# set the 'SKA' environment variable to the absolute path
# of your --data-root. The folder structure should look like:
#
# [data_root]/data/eng_archive/data/[cheta data]
#
# where you set 'SKA' or --data-root to be [data_root]
#
os.environ['SKA'] = os.path.abspath('.')
# Now import fetch and get some data
from cheta import fetch_eng as fetch
# Print location we are fetching from
print(fetch.msid_files.basedir)
%matplotlib
dat = fetch.Msid('aacccdpt', -14) # 14 days before now
dat.plot()
# Print the available time range
fetch.get_time_range('aacccdpt', 'fits')
The cheta server archive on the HEAD network (in /proj/sot/ska/data/eng_archive
) is updated
every morning by 9am Eastern local. In order to get your local cheta archive sync'ed to the
primary server version you simply run this command:
! python -m cheta.update_client_archive [--data-root=.]
This command has plenty of options (see --help
) but most users will never need them.
Now go back and do the steps in the previous Check out
section to prove that it worked.
Performance
On a fast network with an solid state hard drive, you can do a daily update of the entire cheta archive in about 6 minutes. You can catch up a month of data in about an hour or two. With a slower network or slower hard drive it will take longer, with the hard drive speed being generally the more important factor.
We'll talk about this in a future session! For now run it by hand as needed.
The cheta sync archive keeps the last 60 days of updates. If you wait longer than that you will get a message like below when updating:
ERROR: unexpected discontinuity for full msid=1DEICACU content=acis2eng
Looks like your archive is in a bad state, CONTACT your local Ska expert with this info:
First row0 in new data 19557749 != length of existing data 19406329
The way to recover from this is by using the rsync
command to refresh your archive:
### Using kady for the data (SOT) ###
rsync -av --existing <user>@kady:/proj/sot/ska/data/eng_archive/data/ <local_SKA>/data/eng_archive/data/
rm <local_SKA>/data/eng_archive/data/*/5min/last_date_id
rm <local_SKA>/data/eng_archive/data/*/daily/last_date_id
### Using GRETA for the data (FOT) ###
rsync -av --existing <user>@cheru:/proj/sot/ska/data/eng_archive/data/ <local_SKA>/data/eng_archive/data/
# NOTE: the "rm" commands should not be necessary if you sync from cheru
On Windows machines, MATLAB FOT_Tools has an rysnc
executable available in the directory
FOT_Tools\local\tools\
Circa 2021 here were the directory file sizes for each content type.
953M acis2eng
224M acis3eng
235M acis4eng
1.0G acisdeahk
762M angleephem
243M ccdm10eng
2.6G ccdm11eng
748M ccdm12eng
380M ccdm13eng
11M ccdm14eng
19M ccdm15eng
463M ccdm1eng
399M ccdm2eng
139M ccdm3eng
4.4G ccdm4eng
407M ccdm5eng
575M ccdm7eng
639M ccdm8eng
510M cpe1eng
273M dp_acispow128
956M dp_eps16
164M dp_eps8
400M dp_orbit1280
626M dp_pcad1
450M dp_pcad16
1.0G dp_pcad32
6.2G dp_pcad4
18G dp_thermal1
1.1G dp_thermal128
437M ephhk
57M ephin1eng
88M ephin2eng
2.9G eps10eng
750M eps1eng
217M eps2eng
706M eps3eng
200M eps4eng
118M eps5eng
691M eps6eng
486M eps7eng
5.3G eps9eng
179M hrc0hk
388M hrc0ss
122M hrc2eng
392M hrc4eng
114M hrc5eng
5.0M hrc7eng
1.6G lunarephem0
675M lunarephem1
263M misc1eng
357M misc2eng
126M misc3eng
617M misc4eng
85M misc5eng
112M misc6eng
242M misc7eng
897M misc8eng
249M obc3eng
1.9G obc4eng
353M obc5eng
1.7G orbitephem0
709M orbitephem1
144M pcad10eng
76M pcad11eng
656M pcad12eng
952M pcad13eng
264M pcad14eng
7.7G pcad15eng
62G pcad3eng
397M pcad4eng
881M pcad5eng
758M pcad6eng
21G pcad7eng
28G pcad8eng
8.6M pcad9eng
1.2G prop1eng
582M prop2eng
116M sim1eng
57M sim21eng
57M sim2eng
315M sim3eng
92M sim_mrg
61M simcoor
125M simdiag
186M sms1eng
75M sms2eng
1.5G solarephem0
604M solarephem1
572M tel1eng
259M tel2eng
589M tel3eng
764M thm1eng
176M thm2eng
186M thm3eng