Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dynamic aws param store to airflow dag #48

Merged
merged 14 commits into from
Aug 20, 2024

Conversation

jayckaiser
Copy link
Collaborator

Feature: Dynamic AWSParamStoreToAirflowDAG Paths

Description & motivation

The previous implementation of the AWSParamStoreToAirflowDAG has a serious limitation: the keys in Parameter Store had to be structured as {prefix}/{tenant_code}/secret. This was the structure of keys historically, but newer batches of keys have placed tenant in the middle of the key (i.e., {prefix}/{tenant_code}/{suffix}/secret).

This PR updates the DAG and associated SSM helper classes to accept either structure when building out connections.

PR Merge Priority:

  • Low
  • Medium
  • High

This DAG is really only used in South Carolina and GSN to my understanding. I need this feature to populate the SC 2025 connections in Airflow, but I can temporarily checkout the branch in prod while it's under review.

Changes to existing files:

  • ea_airflow_util/callables/ssm.py: Add conditionals to handle original and new path formats.
  • ea_airflow_util/dags/aws_param_store_to_airflow_dag.py: Improve DAG logging.

Tests and QC done:

Ran successfully in South Carolina dev.

Future ToDos & Questions:

It is likely that this approach of retrieving credentials for Ed-Fi ODSes will be deprecated very soon.

Copy link
Contributor

@rlittle08 rlittle08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left one suggestion for error handling, but not sure if it's even a real error case. If this has been tested in SC already, LGTM


# If a wildcard is in the prefix string, extract and set the tenant code as the first path element.
if self.TENANT_REPR in self._prefix:
inferred_tenant = re.search(self._prefix.replace(self.TENANT_REPR, "(.*)"), p['Name']).group(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there every be a case where no TENANT_REPR doesn't exist anywhere in p['Name']? If so, this would error, I think

also, this line 104 could use some more friendly commenting, but that's a nitpick :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comments. I would like to come up with an example of the situation you describe so we can work around it.

@jayckaiser jayckaiser changed the base branch from main to rc/0.3.2 August 20, 2024 22:44
@jayckaiser jayckaiser merged commit dd02571 into rc/0.3.2 Aug 20, 2024
@jayckaiser jayckaiser deleted the feature/dynamic_aws_param_store_to_airflow_dag branch September 5, 2024 18:58
jayckaiser added a commit that referenced this pull request Sep 5, 2024
* Update project version metadata.

* Feature/Sharefile custom users DAG & S3 to Snowflake standardization (#50)

* add sharefile dag and helpers

* use S3ToSnowflake operator

* use correct operator

* fix metadata column name

* datetime issues

* rename to num_expected_files and make use optional

* Feature/simplify sharefile to disk (#49)

* Add type hints and comments; make ds_nodash and ts_nodash optional; reduce number of loops and simply variable-referencing.

* Simplify Regex call.

* Deprecate ds_nodash and ts_nodash from sharefile callable.

* Raise HTTP errors for all ShareFile hook exceptions, instead of generic Airflow exceptions.

* Update .gitignore.

* add logging when deleting from sharefile

* Add kwargs to sharefile_to_disk() for Airflow context paramter overflow.

---------

Co-authored-by: johncmerfeld <[email protected]>

* Feature/dynamic aws param store to airflow dag (#48)

* Allow wildcards to be passed to SSM prefix strings (untested).

* BugFix (wrong datatype for ParameterFilters).

* Bugfix (revert logic when not using wildcards).

* Minor cleanup.

* Add 1 second sleep between page gets.

* Ignore unexpected keys in collected connections.

* Attempt to fix parameter filters to use both Recursive and Contains options.

* Clean up error handling in AWS ParamStore DAG.

* Cleanup to try to use a single filter.

* Output missing paramstore keys in DAG.

* Bugfix: set.difference()

* Update metadata files.

* Update ssm.py

* Update CHANGELOG with missing updates.

* Callable: S3 to Sharefile (#51)

* init

* tested code

* update docs

* use None instead of empty string

* update allshared comment

---------

Co-authored-by: Jay Kaiser <[email protected]>

* Hotfix/support files in ftp (#52)

* Overload FTP.download_all() to support files.

* Fix file passed as remote_dir.

* Update CHANGELOG.

---------

Co-authored-by: Samantha LeBlanc <[email protected]>
Co-authored-by: johncmerfeld <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants