Skip to content

Commit e33a137

Browse files
Avantol13Avantol13-machine-usermcannalte
authored
(PXP-7855): Feat/merge refactor (#85)
* chore(tests): add edge cases to tests and expected output, ensure no duplicate records * feat(merge): refactored merge code and updated test calls to correctly handle duplicates and other edge cases * Apply automatic documentation changes * chore(merge): refactor again, breakout functions, improve readability * Apply automatic documentation changes * chore(tests): handle more edge cases when no guid is specified * fix(merge): edge case with multiple empty guids, need to handle updating all previous records * Apply automatic documentation changes * chore(merge): don't make copies unnecessarily, cleanup getting values from dict * Apply automatic documentation changes * chore(merge): cleaner updating of headers * Apply automatic documentation changes * fix(merge): add headers back where needed * Apply automatic documentation changes * feat(merge-refactor-suggestion): remove headers arg from _get_updated_records, remove unused reference to "existing_urls" variable (#86) Co-authored-by: Matthew Cannalte <[email protected]> * Apply automatic documentation changes * fix(merge): ensure no duplicates when GUID="", more test cases for handling commas and spaces in file names * fix(merge): handle case with multiple duplicates but no guid * Apply automatic documentation changes Co-authored-by: Alexander VT <[email protected]> Co-authored-by: Matthew Cannalte <[email protected]>
1 parent 0d16282 commit e33a137

File tree

18 files changed

+364
-162
lines changed

18 files changed

+364
-162
lines changed

.secrets.baseline

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
"files": "poetry.lock",
44
"lines": null
55
},
6-
"generated_at": "2021-04-16T20:42:51Z",
6+
"generated_at": "2021-04-28T19:37:37Z",
77
"plugins_used": [
88
{
99
"name": "AWSKeyDetector"
@@ -176,37 +176,37 @@
176176
{
177177
"hashed_secret": "96c9184fb19c9c1618ccf44d141f8029a739891c",
178178
"is_verified": false,
179-
"line_number": 115,
179+
"line_number": 121,
180180
"type": "Hex High Entropy String"
181181
},
182182
{
183183
"hashed_secret": "e1da93616713812cb50e0ac845b1e9e305d949f1",
184184
"is_verified": false,
185-
"line_number": 311,
185+
"line_number": 317,
186186
"type": "Hex High Entropy String"
187187
},
188188
{
189189
"hashed_secret": "47f42f4c34fddab383b817e689dc0fb75af81266",
190190
"is_verified": false,
191-
"line_number": 335,
191+
"line_number": 341,
192192
"type": "Hex High Entropy String"
193193
},
194194
{
195195
"hashed_secret": "300d95dd5d30ab6928ffda6c08c6a129a23e5b39",
196196
"is_verified": false,
197-
"line_number": 359,
197+
"line_number": 365,
198198
"type": "Hex High Entropy String"
199199
},
200200
{
201201
"hashed_secret": "f9e664db75c7f23a299b0b055c10e08d47073e93",
202202
"is_verified": false,
203-
"line_number": 421,
203+
"line_number": 427,
204204
"type": "Hex High Entropy String"
205205
},
206206
{
207207
"hashed_secret": "7c35c215b326b9463b669b657c1ff9873ff53d9a",
208208
"is_verified": false,
209-
"line_number": 446,
209+
"line_number": 452,
210210
"type": "Hex High Entropy String"
211211
}
212212
]
0 Bytes
Binary file not shown.
-4 Bytes
Binary file not shown.
0 Bytes
Binary file not shown.

docs/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/_build/html/tools/indexing.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,7 @@ <h1>Indexing Tools<a class="headerlink" href="#indexing-tools" title="Permalink
364364

365365
<dl class="py function">
366366
<dt id="gen3.tools.indexing.verify_manifest.async_verify_object_manifest">
367-
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.verify_manifest.</span></code><code class="sig-name descname"><span class="pre">async_verify_object_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'acl':</span> <span class="pre">&lt;function</span> <span class="pre">_get_acl_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'authz':</span> <span class="pre">&lt;function</span> <span class="pre">_get_authz_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_name':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_name_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_size':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_size_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'guid':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'md5':</span> <span class="pre">&lt;function</span> <span class="pre">_get_md5_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'urls':</span> <span class="pre">&lt;function</span> <span class="pre">_get_urls_from_row&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='verify-manifest-errors-1619452575.9644923.log'</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/verify_manifest.html#async_verify_object_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.verify_manifest.async_verify_object_manifest" title="Permalink to this definition"></a></dt>
367+
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.verify_manifest.</span></code><code class="sig-name descname"><span class="pre">async_verify_object_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'acl':</span> <span class="pre">&lt;function</span> <span class="pre">_get_acl_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'authz':</span> <span class="pre">&lt;function</span> <span class="pre">_get_authz_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_name':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_name_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_size':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_size_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'guid':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'md5':</span> <span class="pre">&lt;function</span> <span class="pre">_get_md5_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'urls':</span> <span class="pre">&lt;function</span> <span class="pre">_get_urls_from_row&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='verify-manifest-errors-1619720217.934012.log'</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/verify_manifest.html#async_verify_object_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.verify_manifest.async_verify_object_manifest" title="Permalink to this definition"></a></dt>
368368
<dd><p>Verify all file object records into a manifest csv</p>
369369
<dl class="field-list simple">
370370
<dt class="field-odd">Parameters</dt>

docs/_build/html/tools/metadata.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ <h1>Metadata Tools<a class="headerlink" href="#metadata-tools" title="Permalink
102102

103103
<dl class="py function">
104104
<dt id="gen3.tools.metadata.ingest_manifest.async_ingest_metadata_manifest">
105-
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.metadata.ingest_manifest.</span></code><code class="sig-name descname"><span class="pre">async_ingest_metadata_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">metadata_source</span></em>, <em class="sig-param"><span class="pre">auth=None</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'guid_for_row':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_for_row&gt;</span></em>, <em class="sig-param"><span class="pre">'indexed_file_object_guid':</span> <span class="pre">&lt;function</span> <span class="pre">_query_for_associated_indexd_record_guid&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='ingest-metadata-manifest-errors-1619452576.3926728.log'</span></em>, <em class="sig-param"><span class="pre">get_guid_from_file=True</span></em>, <em class="sig-param"><span class="pre">metadata_type=None</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/metadata/ingest_manifest.html#async_ingest_metadata_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.metadata.ingest_manifest.async_ingest_metadata_manifest" title="Permalink to this definition"></a></dt>
105+
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.metadata.ingest_manifest.</span></code><code class="sig-name descname"><span class="pre">async_ingest_metadata_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">metadata_source</span></em>, <em class="sig-param"><span class="pre">auth=None</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'guid_for_row':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_for_row&gt;</span></em>, <em class="sig-param"><span class="pre">'indexed_file_object_guid':</span> <span class="pre">&lt;function</span> <span class="pre">_query_for_associated_indexd_record_guid&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='ingest-metadata-manifest-errors-1619720218.4036705.log'</span></em>, <em class="sig-param"><span class="pre">get_guid_from_file=True</span></em>, <em class="sig-param"><span class="pre">metadata_type=None</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/metadata/ingest_manifest.html#async_ingest_metadata_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.metadata.ingest_manifest.async_ingest_metadata_manifest" title="Permalink to this definition"></a></dt>
106106
<dd><p>Ingest all metadata records into a manifest csv</p>
107107
<dl class="field-list simple">
108108
<dt class="field-odd">Parameters</dt>

gen3/tools/indexing/manifest_columns.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ def _parse_multiple_values(values):
180180
['/a', '/b']
181181
['/a', '/b']
182182
"""
183-
values = values.translate(values.maketrans("[],\"'", " "))
183+
values = values.translate(values.maketrans("[]\"'", " "))
184184
return values.split()
185185

186186

0 commit comments

Comments
 (0)