Release/0.8.0 (#102)

* Feature/update add collection test (#94) * update add collection test to get the url for json history * update changelog * /version 0.7.0a24 * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Feature/update add collection test (#95) * update add collection test to get the url for json history * update changelog * update test to test for nan * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Bump jinja2 from 3.1.2 to 3.1.3 (#99) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Using cmr-umm-updater default branch (develop) * use develop * Update CONTRIBUTING.md * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update uat_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Update ops_associations.txt with new collections * Issue #96: ensure the created dimension is sorted (#101) * implement sorting of the output queue according to the order of the input queue to satisfy issue #96 * Update CHANGELOG.md with issue-96 fix * release 0.8.0 --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: sliu008 <[email protected]> Co-authored-by: concise bot <[email protected]> Co-authored-by: jonathansmolenski <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Frank Greguska <[email protected]> Co-authored-by: ank1m <[email protected]> Co-authored-by: James Wood <[email protected]>
podaac · Mar 1, 2024 · c9157ec · c9157ec
1 parent f86cc5b
commit c9157ec
Show file tree

Hide file tree

Showing 10 changed files with 124 additions and 139 deletions.
diff --git a/.github/workflows/add-collection-test.yml b/.github/workflows/add-collection-test.yml
@@ -46,7 +46,7 @@ jobs:
           pip3 install netCDF4
           pip3 install git+https://github.com/nasa/harmony-py.git
           pip3 install git+https://github.com/podaac/cmr-umm-updater.git
-          pip3 install git+https://github.com/podaac/cmr-association-diff.git@6193079a14e36f4c9526aa426015c2b6be41f0e2
+          pip3 install git+https://github.com/podaac/cmr-association-diff.git
           pip3 install python-dateutil --upgrade
       - name: Run CMR Association diff scripts
         run: |

diff --git a/.github/workflows/build-pipeline.yml b/.github/workflows/build-pipeline.yml
@@ -141,7 +141,7 @@ jobs:
           git tag -a "${{ env.software_version }}" -m "Version ${{ env.software_version }}"
           git push origin "${{ env.software_version }}"
       - name: Publish UMM-S with new version
-        uses: podaac/cmr-umm-updater@feature/umm_version
+        uses: podaac/cmr-umm-updater@develop
         if: |
           github.ref == 'refs/heads/main'    ||
           startsWith(github.ref, 'refs/heads/release')
@@ -160,7 +160,7 @@ jobs:
           LAUNCHPAD_TOKEN_UAT: ${{secrets.LAUNCHPAD_TOKEN_UAT}}
           LAUNCHPAD_TOKEN_OPS: ${{secrets.LAUNCHPAD_TOKEN_OPS}}
       - name: Publish L2ss Concise Chain UMM-S with new version
-        uses: podaac/cmr-umm-updater@feature/umm_version
+        uses: podaac/cmr-umm-updater@develop
         if: |
           github.ref == 'refs/heads/main'    ||
           startsWith(github.ref, 'refs/heads/release')

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 ### Changed 
-### Deprecated 
+### Deprecated
+### Removed
+### Fixed
+
+
+## [0.8.0]
+
+### Added
+### Changed 
+  - [issues/96](https://github.com/podaac/concise/issues/96):
+    - Preserve the order of the input files so the output file matches order
+### Deprecated
 ### Removed
 ### Fixed
 
@@ -24,7 +35,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
     - Updated jupyter notebook
     - Update notebook test to use python code directly instead of using jupyter notebook
   - Updated python libraries
-  - Update history json to have url in history
+  - Update history json to have url in history 
+  - Update add collection test to use url in json history
 ### Deprecated 
 ### Removed
 ### Fixed

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -77,14 +77,6 @@ If any performance improvements are being made, include graphs and charts.
 - `feature/issue-#`
     - Work for enhancements and new features should be done in a branch with this naming convention
     - The issue number should match the associated Github issue number
-- `bugfix/issue-#`
-    - Work for bug fixes should be done in a branch with this naming convention
-    - The issue number should match the associated Github issue number
-- `hotfix/issue-#` or `hotfix/short-fix-description`
-    - Rare/special case to address a special anomaly.
-    - The issue number should match the associated Github issue number, 
-    unless no such issue exists. If not, use a short description of the 
-    issue e.g. `hotfix/fix-request-url`
 
 ### Changelog
 
@@ -200,4 +192,4 @@ All functions should contain a docstring, though short or trivial
 function may contain a 1-line docstring. 
 
 If adding a new module, ensure it has been added to [index.rst](docs/index.rst) 
-for inclusion in auto-generated Sphinx docs.
+for inclusion in auto-generated Sphinx docs.
diff --git a/add_collection_test.py b/add_collection_test.py
@@ -6,6 +6,7 @@
 import numpy as np
 import netCDF4 as nc
 import requests
+import json
 from harmony import BBox, Client, Collection, Request, Environment
 import argparse
 from utils import FileHandler
@@ -135,22 +136,29 @@ def verify_variables(merged_group, origin_group, subset_index, both_merged):
             merged_data = np.resize(merged_var[subset_index], origin_var.shape)
             origin_data = origin_var
 
+        equal_nan = True
+        if merged_data.dtype.kind == 'S':
+            equal_nan = False
+
         # verify variable data
         if isinstance(origin_data, str):
             unittest.TestCase().assertEqual(merged_data, origin_data)
         else:
-            unittest.TestCase().assertTrue(np.array_equal(merged_data, origin_data, equal_nan=True))
+            unittest.TestCase().assertTrue(np.array_equal(merged_data, origin_data, equal_nan=equal_nan))
+
 
+def verify_groups(merged_group, origin_group, subset_index, file=None, both_merged=False):
+    if file:
+        print("verifying groups ....." + file)
 
-def verify_groups(merged_group, origin_group, subset_index, both_merged=False):
     verify_dims(merged_group, origin_group, both_merged)
     verify_attrs(merged_group, origin_group, both_merged)
     verify_variables(merged_group, origin_group, subset_index, both_merged)
 
     for child_group in origin_group.groups:
         merged_subgroup = merged_group[child_group]
         origin_subgroup = origin_group[child_group]
-        verify_groups(merged_subgroup, origin_subgroup, subset_index, both_merged)
+        verify_groups(merged_subgroup, origin_subgroup, subset_index, both_merged=both_merged)
 
 
 # GET TOKEN FROM CMR
@@ -173,7 +181,7 @@ def download_file(url, local_path, headers):
         with open(local_path, 'wb') as file:
             for chunk in response.iter_content(chunk_size=8192):
                 file.write(chunk)
-        print("Original File downloaded successfully.")
+        print("Original File downloaded successfully. " + local_path)
     else:
         print(f"Failed to download the file. Status code: {response.status_code}")
 
@@ -217,6 +225,7 @@ def test(collection_id, venue):
     print('\nDone downloading.')
 
     filename = file_names[0]
+
     # Handle time dimension and variables dropping
     merge_dataset = nc.Dataset(filename, 'r')
 
@@ -233,34 +242,16 @@ def test(collection_id, venue):
     }
 
     original_files = merge_dataset.variables['subset_files']
+    history_json = json.loads(merge_dataset.history_json)
     assert len(original_files) == max_results
 
-    for file in original_files:
-
-        # if the file name end in an alphabet so we know there is some extension
-        if file[-1].isalpha():
-            file_name = file.rsplit(".", 1)[0]
-        else:
-            file_name = file
-
-        print(file_name)
-        cmr_query = f"{cmr_base_url}{file_name}&collection_concept_id={collection_id}"
-        print(cmr_query)
-
-        response = requests.get(cmr_query, headers=headers)
-
-        result = response.json()
-        links = result.get('items')[0].get('umm').get('RelatedUrls')
-        for link in links:
-            if link.get('Type') == 'GET DATA':
-                data_url = link.get('URL')
-                parsed_url = urlparse(data_url)
-                local_file_name = os.path.basename(parsed_url.path)
-                download_file(data_url, local_file_name, headers)
+    for url in history_json[0].get("derived_from"):
+        local_file_name = os.path.basename(url)
+        download_file(url, local_file_name, headers)
 
     for i, file in enumerate(original_files):
         origin_dataset = nc.Dataset(file)
-        verify_groups(merge_dataset, origin_dataset, i)
+        verify_groups(merge_dataset, origin_dataset, i, file=file)
 
 
 def run():

diff --git a/cmr/ops_associations.txt b/cmr/ops_associations.txt
@@ -70,3 +70,32 @@ C2274919541-POCLOUD
 C2205620319-POCLOUD
 C2183155461-POCLOUD
 C2208421887-POCLOUD
+C2628595723-POCLOUD
+C2746966926-POCLOUD
+C2628600898-POCLOUD
+C2746966928-POCLOUD
+C2746966927-POCLOUD
+C2754895884-POCLOUD
+C2746966657-POCLOUD
+C2628598809-POCLOUD
+C2799465529-POCLOUD
+C2799465526-POCLOUD
+C2799465507-POCLOUD
+C2799465497-POCLOUD
+C2799465538-POCLOUD
+C2799465544-POCLOUD
+C2799465542-POCLOUD
+C2799465428-POCLOUD
+C2799438350-POCLOUD
+C2799438351-POCLOUD
+C2799438353-POCLOUD
+C2296989388-POCLOUD
+C2205553958-POCLOUD
+C2706513160-POCLOUD
+C2147480877-POCLOUD
+C2147478146-POCLOUD
+C2730520815-POCLOUD
+C2799465509-POCLOUD
+C2799465518-POCLOUD
+C2799465522-POCLOUD
+C2068529568-POCLOUD
diff --git a/cmr/uat_associations.txt b/cmr/uat_associations.txt
@@ -80,3 +80,40 @@ C1238621102-POCLOUD
 C1240739713-POCLOUD
 C1243175554-POCLOUD
 C1245295750-POCLOUD
+C1256783381-POCLOUD
+C1259115177-POCLOUD
+C1256783388-POCLOUD
+C1259115167-POCLOUD
+C1259115178-POCLOUD
+C1256783382-POCLOUD
+C1259115166-POCLOUD
+C1261072655-POCLOUD
+C1261072658-POCLOUD
+C1261072648-POCLOUD
+C1261072646-POCLOUD
+C1261072656-POCLOUD
+C1261072645-POCLOUD
+C1261072659-POCLOUD
+C1261072654-POCLOUD
+C1254854453-LARC_CLOUD
+C1254855648-LARC_CLOUD
+C1254854962-LARC_CLOUD
+C1247485682-LARC_CLOUD
+C1247485690-LARC_CLOUD
+C1247485685-LARC_CLOUD
+C1242274079-POCLOUD
+C1240739526-POCLOUD
+C1261072651-POCLOUD
+C1261072650-POCLOUD
+C1242274070-POCLOUD
+C1240739691-POCLOUD
+C1257081729-POCLOUD
+C1261072661-POCLOUD
+C1261072652-POCLOUD
+C1261072662-POCLOUD
+C1261072660-POCLOUD
+C1261645986-LARC_CLOUD
+C1258237266-POCLOUD
+C1259966654-POCLOUD
+C1258237267-POCLOUD
+C1240739686-POCLOUD
diff --git a/podaac/merger/harmony/download_worker.py b/podaac/merger/harmony/download_worker.py
@@ -44,8 +44,8 @@ def multi_core_download(urls, destination_dir, access_token, cfg, process_count=
         url_queue = manager.Queue(len(urls))
         path_list = manager.list()
 
-        for url in urls:
-            url_queue.put(url)
+        for iurl, url in enumerate(urls):
+            url_queue.put((iurl, url))
 
         # Spawn worker processes
         processes = []
@@ -64,7 +64,7 @@ def multi_core_download(urls, destination_dir, access_token, cfg, process_count=
 
         path_list = deepcopy(path_list)  # ensure GC can cleanup multiprocessing
 
-    return [Path(path) for path in path_list]
+    return [Path(path) for ipath, path in sorted(path_list)]
 
 
 def _download_worker(url_queue, path_list, destination_dir, access_token, cfg):
@@ -91,7 +91,7 @@ def _download_worker(url_queue, path_list, destination_dir, access_token, cfg):
 
     while not url_queue.empty():
         try:
-            url = url_queue.get_nowait()
+            iurl, url = url_queue.get_nowait()
         except queue.Empty:
             break
 
@@ -105,4 +105,4 @@ def _download_worker(url_queue, path_list, destination_dir, access_token, cfg):
         else:
             logger.warning('Origin filename could not be assertained - %s', url)
 
-        path_list.append(str(path))
+        path_list.append((iurl, str(path)))