new daily_avg code

kristinemlarson · Aug 10, 2023 · 82cfc03 · 82cfc03
1 parent 1def93b
commit 82cfc03
Show file tree

Hide file tree

Showing 14 changed files with 146 additions and 79 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,12 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
 
+## 1.5.0
+August 10, 2023
+
+Updated daily_avg to have three sections.  raw arcs, arcs with QC, daily average with QC.
+Updated the discussion page.
+
 ## 1.4.9
 August 10, 2023
 

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # gnssrefl
 
-**github version: 1.4.9** [![PyPI Version](https://img.shields.io/pypi/v/gnssrefl.svg)](https://pypi.python.org/pypi/gnssrefl) [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.5601495.svg)](http://dx.doi.org/10.5281/zenodo.5601495) [![Documentation Status](https://readthedocs.org/projects/gnssrefl/badge/?version=latest)](https://gnssrefl.readthedocs.io/en/latest/?badge=latest)
+**github version: 1.5.0** [![PyPI Version](https://img.shields.io/pypi/v/gnssrefl.svg)](https://pypi.python.org/pypi/gnssrefl) [![DOI](https://zenodo.org/badge/doi/10.5281/zenodo.5601495.svg)](http://dx.doi.org/10.5281/zenodo.5601495) [![Documentation Status](https://readthedocs.org/projects/gnssrefl/badge/?version=latest)](https://gnssrefl.readthedocs.io/en/latest/?badge=latest)
 
 August 7, 2023, again
 

diff --git a/docs/_static/mchn_01.png b/docs/_static/mchn_01.png
diff --git a/docs/_static/mchn_02.png b/docs/_static/mchn_02.png
diff --git a/docs/_static/mchn_03.png b/docs/_static/mchn_03.png
diff --git a/docs/_static/mchn_04.png b/docs/_static/mchn_04.png
diff --git a/docs/_static/mchn_05.png b/docs/_static/mchn_05.png
diff --git a/docs/_static/mchn_06.png b/docs/_static/mchn_06.png
diff --git a/docs/_static/mchn_07.png b/docs/_static/mchn_07.png
diff --git a/docs/_static/mchn_08.png b/docs/_static/mchn_08.png
diff --git a/docs/pages/README_dailyavg.md b/docs/pages/README_dailyavg.md
@@ -1,84 +1,95 @@
 # daily_avg<a name="module5"></a>
 
+Updated August 10, 2023 by Kristine Larson
+
 <code>daily_avg</code> is a utility for people interested 
 in daily averaged reflector heights, such as are used for measuring 
 snow accumulation or water levels in lakes/rivers. *It is not to be used for tides!*
-The goal is to make a valid daily average - for this reason, we have two required inputs 
-for quality control. 
+There are two required inputs for quality control. 
 
 - The first is called a *median filter* value. This input helps remove
 large outliers. For each day, a median RH is found. Then all values larger than the 
 *median filter* value from the median RH are thrown out. 
 
-- The second required input to <code>daily_avg</code> sets a limit for how 
+- The second required input *ReqTracks* sets a limit for how 
 many satellite arcs are considered sufficient to create a trustworth daily average. 
 If you had 5 arcs, for example, you probably would not want to compare that 
 with another day where 100 arcs were available. The number of tracks required 
 varies a lot depending on the azimuth mask and the number of frequencies available.
 If you are not sure what values to use at your GNSS site, run it once with very minimal constraints.
 The code provides some feedback plots that will let you pick better values.
 
-Note: the computed daily average value should be associated with 12:00 UTC, not midnight.   
+Notes: 
 
+- the computed daily average value should be associated with 12:00 UTC, not midnight.   
 
-Here is an example from one of our use cases where there are a few large outliers.  
-I have set the median filter value to 2 meters and the required number of tracks to 12:
+- If you are interested in looking at subdaily variations of reflector height, you should
+be using subdaily.
 
-<code> daily_avg mchn 2 12 </code>
 
-You can easily see the outliers. 
+The outputs:
 
-<p align=center>
-<img width=500 src=../_static/mchn-A.png>
-</p>
+- completely raw RH that have been concatenated into a single file.  The location of the file
+is printed to the screen and a plot is created.
 
-Is 12 a good choice?  The code also prints out a plot telling you how many
-tracks are available each day:
+- all RH that meet the QC criteria
 
-<p align=center>
-<img width=500 src=../_static/mchn_nvals.png>
-</p>
+- daily average RH that meet the QC criteria
 
-These can vary quite a bit by year as the station operators change receivers and/or 
-tracking strategies. You should pick the values that are best for your experiment.
 
+I illustrate the steps you might take with station MCHN. The antenna is very close to the water, so that is 
+But the receiver itself is operated suboptimally and only L1 GPS data can be used here.
+This severely limits the number of tracks that can be used. 
 
-Next I have rerun the code with a better median filter constraint of 0.25 meters:
+I start out with almost no QC (outliers with 2 meters of the median value, only 5 satelliet arcs required per day):
 
-<code> daily_avg mchn 0.25 12 </code>
+
+<code> daily_avg mchn 2 5 </code>
 
 <p align=center>
-<img width=500 src=../_static/mchn-B.png>
+<img width=500 src=../_static/mchn_01.png>
 </p>
 
-Since this documentation was written, I have added the median value to the plots:
-
 <p align=center>
-<img width=500 src=../_static/mchn_tighter.png>
+<img width=500 src=../_static/mchn_02.png>
 </p>
 
-If you still are finding it challenging to see the variations from the median value, you
-can try setting the plot_limits option:
+<p align=center>
+<img width=500 src=../_static/mchn_03.png>
+</p>
 
 
-<code> daily_avg mchn 0.25 12 -plot_limits T </code>
+You can easily see the outliers - and that you need to use something more 
+useful than 2 meters for the median filter.  I will use 0.25 meters.  I am also going to 
+change the required tracks to 10
 
 <p align=center>
-<img width=500 src=../_static/mchn_wlimits.png>
+<img width=500 src=../_static/mchn_04.png>
 </p>
 
+<p align=center>
+<img width=500 src=../_static/mchn_05.png>
+</p>
 
-The daily average plot:
 
 <p align=center>
-<img width=500 src=../_static/mchn-C.png>
+<img width=500 src=../_static/mchn_06.png>
 </p>
 
-Two txt files are created. One has all the tracks that fit the QC. The other has the desired daily
-average.  The location of the files is printed to the screen. You are welcome to generate your 
-own quality control codes for the daily average if you find this one does not meet your purposes.
 
-Updated August 10, 2023
+There is also an optional plot_limits setting so you can see how close the variation of
+the measurements is with respect to your choices. I will also tighten the median filter a bit:
+
+<code> daily_avg mchn 0.20 10 -plot_limits T </code>
+
+You can see that at least visually, this makes no change in the daily averages.
+
+<p align=center>
+<img width=500 src=../_static/mchn_07.png>
+</p>
+
+<p align=center>
+<img width=500 src=../_static/mchn_08.png>
+</p>
 
-Kristine M. Larson
 
diff --git a/gnssrefl/daily_avg.py b/gnssrefl/daily_avg.py
@@ -102,6 +102,9 @@ def readin_plot_daily(station,extension,year1,year2,fr,alldatafile,csvformat,
     It reads in RH files created by gnssir. Applies median filter and saves average results 
     for further analysis
 
+    if there is only one RH on a given day - there is no median value and thus nothing will 
+    be saved for that day.  
+
     Parameters
     ----------
     station : str
@@ -159,8 +162,19 @@ def readin_plot_daily(station,extension,year1,year2,fr,alldatafile,csvformat,
 
     """
     xdir = os.environ['REFL_CODE']
-    print('All RH retrievals that meet your QC criteria will be written to: ' )
-    print(alldatafile)
+    print('All RH retrievals - including bad ones - will be written to: ' )
+    alldatafile2 = alldatafile + '.noqc'
+    print(alldatafile2, '\n')
+
+    noqc = open(alldatafile2, 'w+')
+    # put in a header
+    noqc.write(" {0:s}  \n".format('% NO QUALITY CONTROL AT ALL' ))
+    noqc.write(" {0:s}  \n".format('% year,doy, RH(m), Month, day, azimuth(deg),freq, satNu, LSP amp,pk2noise,UTC(hr)' ))
+    noqc.write(" {0:s}  \n".format('% (1), (2), (3),   (4),   (5),  (6),        (7),   (8),  (9),     (10),    (11)' ))
+
+
+    print('All RH retrievals that meet your median filter and ReqTracks criteria will be written to: ' )
+    print(alldatafile, '\n')
     allrh = open(alldatafile, 'w+')
     # put in a header
     allrh.write(" {0:s}  \n".format('% year,doy, RH(m), Month, day, azimuth(deg),freq, satNu, LSP amp,pk2noise,UTC(hr)' ))
@@ -227,14 +241,20 @@ def readin_plot_daily(station,extension,year1,year2,fr,alldatafile,csvformat,
                             gfreq = frequency[cc]
                             # added 21may14
                             gutcTime = utcTime[cc]
+
+                            # write out everyting including bad retrievals
+                            dumb= write_out_all(noqc, csvformat, len(rh), yr, doy, d, rh, azimuth, frequency, sat,amplitude,peak2noise,utcTime,[])
 
-                            NG = len(good)
                             # tvall no longer being used as a variable but still sending it to the function.
                             # unfortunately the info is not sorted - because of the way the directory listing works....
-                            tvall = write_out_all(allrh, csvformat, NG, yr, doy, d, good, gazim, gfreq, gsat,gamp,gpeak2noise,gutcTime,tvall)
+                            NG = len(good)
 
         # only save if there are some minimal number of values
                             if (len(good) >= ReqTracks):
+
+                                # write out the individual tracks that made your met your QC metrics ...
+                                tvall = write_out_all(allrh, csvformat, NG, yr, doy, d, good, gazim, gfreq, gsat,gamp,gpeak2noise,gutcTime,tvall)
+
                                 rh = good
                                 # this is the plot with all the data -not the daily average
                                 alltimes = []
@@ -264,7 +284,6 @@ def readin_plot_daily(station,extension,year1,year2,fr,alldatafile,csvformat,
                                     ijk = (gsat > 200) * (gsat < 300); # galileo
                                     ngal = np.append(ngal, len(gsat[ijk]))
 
-                                medv_obstime = datetime.datetime(year=yr, month=d.month, day=d.day, hour=12, minute=0, second=0)
                                 obstimes.append(datetime.datetime(year=yr, month=d.month, day=d.day, hour=12, minute=0, second=0))
 
                                 medRH.append(medv)
@@ -315,44 +334,76 @@ def readin_plot_daily(station,extension,year1,year2,fr,alldatafile,csvformat,
         ax.plot(tttimes, tv_median[:,8]-howBig,'-',color='gray',label='median-limit')
 
     plt.ylabel('meters',fontsize=fs)
-    plt.title('All Reflector Heights for Station: ' + station.upper(),fontsize=fs)
+    plt.title('All Reflector Heights for ' + station.upper() + ' when QC is applied: ' ,fontsize=fs)
     plt.xticks(fontsize=fs)
     plt.yticks(fontsize=fs)
     plt.gca().invert_yaxis()
     # this command changes the x-axis
     fig.autofmt_xdate()
-    plt.legend(loc="upper right")
+    #plt.legend(loc="upper right")
+    plt.legend(loc="best")
     plt.grid()
+
+    if NumFiles > 1:
+        pltname = xdir + '/Files/' + subdir + '/' + station + '_AllRH.png'
+        print('All RH png file saved as: ', pltname)
+        plt.savefig(pltname)
 #   new plot
     print('A total of ', NumFiles, ' days were evaluated.')
-    print( NotEnough, ' days did not meet the threshold set for a dependable daily average')
-    if (NumFiles < 1):
-        sys.exit()
 
-    pltname = xdir + '/Files/' + subdir + '/' + station + '_AllRH.png'
-    plt.savefig(pltname)
-    print('All RH png file saved as: ', pltname)
+    print( NotEnough, ' days did not meet the threshold set for a dependable daily average')
+    allrh.close()
+    noqc.close()
 
-    #nr,nc = tv.shape
-    # calculate frequency biases and print to the screen
-    # turnning this off because tvall was slowing it down
-    #fbias_daily_avg(station)
-    #print(nr,nc)
+    quick_raw(alldatafile2,xdir, station,subdir)
 
-    # close the file with all the RH values 
-    allrh.close()
+    # save daily average plot ... 
 
 
     # plot the number of retrievals vs time
     txtdir =  xdir + '/Files/' + subdir 
     nr,nc = tv.shape
-    if nr > 0:
+    if (nr > 0) & (NumFiles > 1):
         daily_avg_stat_plots(obstimes,meanRH,meanAmp, station,txtdir,tv,ngps,nglo,ngal,nbei,test)
-    else:
-        print('No results that met your criteria were found, so no need to make a plot')
 
     return tv, obstimes
 
+def quick_raw(alldatafile2,xdir,station,subdir):
+    """
+    quick plot of the raw RH data.  No QC
+
+    Parameters
+    ----------
+    alldatafile2 : str
+        name of the raw file to be read
+    xdir : str
+        code environ variable (I think)
+    station : str
+        4 ch station name
+    subdir : str
+        subdirectory name for results in xdir/Files
+
+    """
+    if os.path.exists(alldatafile2):
+       raw = np.loadtxt(alldatafile2,comments='%')
+       ns= len(raw.shape)
+       if ns == 2:
+           nr,nc = raw.shape
+       elif ns == 1:
+           nr = 1
+
+       if ns > 0:
+           plt.figure()
+           plt.plot(raw[:,0] + raw[:,1]/365.25, raw[:,2], 'b.')
+           plt.grid()
+           plt.xlabel('year')
+           plt.ylabel('reflector height (m)')
+           plt.title(station + ': Completely raw RH data ... no QC applied ')
+           pltname = xdir + '/Files/' + subdir + '/' + station + '_AllRH_noQC.png'
+           plt.savefig(pltname)
+           print('All RH png file without QC saved as: ', pltname)
+
+
 def daily_avg_stat_plots(obstimes,meanRH,meanAmp, station,txtdir,tv,ngps,nglo,ngal,nbei,test):
     """
     plots of results for the daily avg code
@@ -399,7 +450,8 @@ def daily_avg_stat_plots(obstimes,meanRH,meanAmp, station,txtdir,tv,ngps,nglo,ng
     fig.autofmt_xdate()
     plt.ylabel('Reflector Height (m)',fontsize=fs)
     today = str(date.today())
-    plt.title(station.upper() + ': Daily Mean Reflector Height, Computed ' + today,fontsize=fs)
+    #plt.title(station.upper() + ': Daily Mean Reflector Height, Computed ' + today,fontsize=fs)
+    plt.title(station.upper() + ': Daily Mean Reflector Height',fontsize=fs)
     plt.grid()
     plt.xticks(fontsize=fs); plt.yticks(fontsize=fs)
 
@@ -431,7 +483,8 @@ def daily_avg_stat_plots(obstimes,meanRH,meanAmp, station,txtdir,tv,ngps,nglo,ng
     fig.autofmt_xdate()
     plt.ylabel('Amplitude (v/v)',fontsize=fs)
     today = str(date.today())
-    plt.title(station.upper() + ': Daily Mean Reflection Amplitude, Computed ' + today,fontsize=fs)
+    #plt.title(station.upper() + ': Daily Mean Reflection Amplitude, Computed ' + today,fontsize=fs)
+    plt.title(station.upper() + ': Daily Mean Reflection Amplitude', fontsize=fs)
     plt.grid()
     plt.xticks(fontsize=fs); plt.yticks(fontsize=fs)
     pltname = txtdir + '/' + station + '_RHamp.png'
@@ -517,7 +570,7 @@ def write_out_RH_file(obstimes,tv,outfile,csvformat):
 def write_out_all(allrh, csvformat, NG, yr, doy, d, good, gazim, gfreq, gsat,gamp,gpeak2noise,gutcTime,tvall ):
     """
     writing out all the RH retrievals to a single file: file ID is allrh)
-    tvall had everything in it.  but it was slowing everything down, so i removed it
+    tvall had everything in it,  but it was slowing everything down, so i do not do anything with it.
 
     Parameters
     ----------

diff --git a/gnssrefl/daily_avg_cl.py b/gnssrefl/daily_avg_cl.py
@@ -44,28 +44,25 @@ def daily_avg(station: str , medfilter: float, ReqTracks: int, txtfile: str = No
     daily averaged RH without outliers. These daily average values are nominally associated 
     with the time of 12 hours UTC.
 
-    There are multiple optional choices as discussed below. The code also creates a file with all 
-    the subdaily RH as well. 
 
-    The two required parameters - medfilter and ReqTracks. These are quality control parameters.
+    There are two required parameters - medfilter and ReqTracks. These are quality control parameters.
     They are applied in two steps. The code first calculates the median value each day - and keeps
-    only the RH that are within medfilter (meters) of this median value.  All these RH are written out
-    to a file.  The location of the file is written to the screen.
+    only the RH that are within medfilter (meters) of this median value.  
 
-    If you are unfamiliar with a median filter, please see the discussion on the issues page 
-    of github.
-
-    In the next step the code checks that a reasonable number of RH values are 
-    available on each day to warrant making a daily average. You could have occasions when
-    there are 100 RH values one day and 5 another. If you blindly compute daily averages you
-    might consider those to be of the same quality.   
-
-    The other constraints are more practical. Like limiting the years you want to evaluate.
-    Or the min and max azimuths.
+    If you are unfamiliar with a median filter, please see 
+    https://gnssrefl.readthedocs.io/en/latest/pages/README_dailyavg.html
 
     The outputs are stored in $REFL_CODE/Files/station by default.  If you want to specify a new
-    subdirectory, I believe that is an allowed option. 
+    subdirectory, I believe that is an allowed option.  You can also specify specific years to analyze 
+    and apply fairly simple azimuth constraints.
     
+    Three text files are created
+
+    1. individual RH values with no QC applied
+    
+    2. individual RH values with QC applied
+
+    3. daily average RH 
 
     Examples
     -------- 

diff --git a/setup.py b/setup.py
@@ -40,7 +40,7 @@
 ]
 setup(
     name="gnssrefl",
-    version="1.4.9",
+    version="1.5.0",
     author="Kristine Larson",
     author_email="[email protected]",
     description="A GNSS reflectometry software package ",