-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Refactor the Crawler for modularity and better maintainability #153
Labels
enhancement
New feature or request
Comments
This was referenced May 23, 2023
This was referenced Jun 1, 2023
This was referenced Jun 12, 2023
This was referenced Jun 18, 2023
This was referenced Jun 18, 2023
This was referenced Jun 18, 2023
This was referenced Jun 21, 2023
Merged
This was referenced Jun 25, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem statement
Introduction
The crawl module is responsible for querying GCP resources via REST API. As of the writing of this document, the current version of the crawl.py comprises 29 methods that retrieve information about different GCP resources. However, the list of supported resources is expected to expand significantly in the future. Therefore, a modularized crawler module will be essential to ensure the scalability and ease of maintenance of the scanning functionalities over time.
Moreover, there's quite a bit of repetitive code in the system. It would be great to minimize these repetitions and have a single point where we can easily manage and maintain the code. By doing so, we can enhance the overall robustness of the codebase.
Explanation with references
For instance, the signature of the methods in
crawl.py
either takesdiscovery.Resource
orCredentials
. The credentials are then used to create a client of typediscovery.Resource
so that we can query the resources from GCP. For example,get_managed_zones()
function takes credentials and creates the followingdiscovery.Resource
to call the discovery API.Reference codes can be found [here (https://github.com/google/gcp_scanner/blob/c2b85cc1023498c9a5a507abce9d928bf02592e9/src/gcp_scanner/crawl.py#L396 in the main gcp_scanner codebase. Additionally, if we take another code block, we can see that this same code is repeated. Hence, this code needs to be refactored and modularized using object-oriented paradigm.
Another example: Building DNS service: occurance1, occurance2
In the above examples and many other places, we needed to build a client and then pass it into the crawl function. But
the current process of building the client is not consistent and well-maintained. For example, here
gke_client
is passed to thecrawl.get_gke_clusters()
function, however, in this case the credential is directly passed to the crawl. Hence, it is necessary to maintain a consistent way to implementclient
so that it becomes straightforward for new contributors to implement additional client and GCP resources.High-Level Acceptance criteria
This refactoring plan should achieve the following objectives.
Proposed solution
Crawler
A new
crawler
package will be implemented. The location of the package is given belowsrc/gcp_scanner/crawler
.Next, existing crawl methods will be categorized according to their resource type and put into their own module. Factory Design Pattern will be implemented to increase the scalability of the code. For example, the following methods fetch information on compute resources.
Expected modules
A category-wise list of the existing functions that need to be put into their own module is given below.
Compute Resources
GCP App Resources
GCP storage resources
GCP DNS resources
GKE resources
SQL instances
Bigquery
PubSub Subscriptions
CloudFunctions
Bigtable
Spanner
FileStore
KMS
Endpoints
Serviceusage
Sourcerepo
Cloudresourcemanager
Client
We also need to refactor and implement client to reduce code repetition. Here using the factory design pattern we can
refactor the creation of
discovery.Resource
client.Idea on refactoring repetitive if statements in the
scanner.py
Testing
Accommodate existing unit tests with the new modular structure
Task List
Client
Could not refactor the following methods in the
craw.py
. Signature of these functions is a little bit different.Crawl
refactoring repetitive if statements
Refinement tickets
The text was updated successfully, but these errors were encountered: