Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package: Add --exclude option to exclude components from being started. #492

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

junhaoliao
Copy link
Collaborator

Description

When developing / testing certain components which provide a live reload debug server, other CLP components also need to be started for functionality verifications. Previously, developers have been starting the whole CLP package and later stopping the component(s) to be debugged, to avoid resource (e.g. TCP host + port) conflicts. The approach works but has proven to be inconvenience and prone to manual errors. This PR:

  1. Add --exclude option to exclude components from being started.

Validation performed

  1. Built the package.
    cd <clp-root>
    task
    
  2. Started the package, excluding components webui and log_viewer_webui.
    cd build/clp-package/sbin
    ./start-clp.sh --exclude webui log_viewer_webui
    
  3. Observed all components started except webui and log_viewer_webui.

@junhaoliao junhaoliao requested a review from haiqi96 July 23, 2024 20:43
Comment on lines 1101 to 1132
if target in (ALL_TARGET_NAME, DB_COMPONENT_NAME) and DB_COMPONENT_NAME not in exclude:
start_db(instance_id, clp_config, conf_dir)
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, DB_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, DB_COMPONENT_NAME) and CONTROLLER_TARGET_NAME not in exclude and DB_COMPONENT_NAME not in exclude:
create_db_tables(instance_id, clp_config, container_clp_config, mounts)
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, QUEUE_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, QUEUE_COMPONENT_NAME) and CONTROLLER_TARGET_NAME not in exclude and QUEUE_COMPONENT_NAME not in exclude:
start_queue(instance_id, clp_config)
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, REDIS_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, REDIS_COMPONENT_NAME) and CONTROLLER_TARGET_NAME not in exclude and REDIS_COMPONENT_NAME not in exclude:
start_redis(instance_id, clp_config, conf_dir)
if target in (ALL_TARGET_NAME, RESULTS_CACHE_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, RESULTS_CACHE_COMPONENT_NAME) and RESULTS_CACHE_COMPONENT_NAME not in exclude:
start_results_cache(instance_id, clp_config, conf_dir)
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, RESULTS_CACHE_COMPONENT_NAME):
create_results_cache_indices(instance_id, clp_config, container_clp_config, mounts)
if target in (
ALL_TARGET_NAME,
CONTROLLER_TARGET_NAME,
COMPRESSION_SCHEDULER_COMPONENT_NAME,
):
) and CONTROLLER_TARGET_NAME not in exclude and COMPRESSION_SCHEDULER_COMPONENT_NAME not in exclude:
start_compression_scheduler(instance_id, clp_config, container_clp_config, mounts)
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, QUERY_SCHEDULER_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, CONTROLLER_TARGET_NAME, QUERY_SCHEDULER_COMPONENT_NAME) and CONTROLLER_TARGET_NAME not in exclude and QUERY_SCHEDULER_COMPONENT_NAME not in exclude:
start_query_scheduler(instance_id, clp_config, container_clp_config, mounts)
if target in (ALL_TARGET_NAME, COMPRESSION_WORKER_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, COMPRESSION_WORKER_COMPONENT_NAME) and COMPRESSION_WORKER_COMPONENT_NAME not in exclude:
start_compression_worker(
instance_id, clp_config, container_clp_config, num_workers, mounts
)
if target in (ALL_TARGET_NAME, QUERY_WORKER_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, QUERY_WORKER_COMPONENT_NAME) and QUERY_WORKER_COMPONENT_NAME not in exclude:
start_query_worker(instance_id, clp_config, container_clp_config, num_workers, mounts)
if target in (ALL_TARGET_NAME, REDUCER_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, REDUCER_COMPONENT_NAME) and REDUCER_COMPONENT_NAME not in exclude:
start_reducer(instance_id, clp_config, container_clp_config, num_workers, mounts)
if target in (ALL_TARGET_NAME, WEBUI_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, WEBUI_COMPONENT_NAME) and WEBUI_COMPONENT_NAME not in exclude:
start_webui(instance_id, clp_config, mounts)
if target in (ALL_TARGET_NAME, LOG_VIEWER_WEBUI_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, LOG_VIEWER_WEBUI_COMPONENT_NAME) and LOG_VIEWER_WEBUI_COMPONENT_NAME not in exclude:
start_log_viewer_webui(instance_id, clp_config, container_clp_config, mounts)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haiqi96
These look simple but are definitely less clean and scalable. I am thinking about refactoring this into a dict which stores {component_name: (method, (args1, args2))} pairs and a for-loop to go over the dict. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's ok to have the line that launches the component. the part that introduce complexity is the if statement, especially we have some component -> COMPONENT_NAME / GROUP_MAPPING but it's not very clear what are all the mappings. So I would prefer to approach this from simplifying the if statements.

Copy link
Collaborator Author

@junhaoliao junhaoliao Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I agree. Abstracting the if-condition checks into a single function call per if should keep the code simple and clean.

@@ -1096,37 +1098,37 @@ def main(argv):
conf_dir = clp_home / "etc"

# Start components
if target in (ALL_TARGET_NAME, DB_COMPONENT_NAME):
if target in (ALL_TARGET_NAME, DB_COMPONENT_NAME) and DB_COMPONENT_NAME not in exclude:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like how the code is written now. it's quite hard to read and could be error prone. In addition, it looks like if user make a typo in the exclude list, CLP will sliently run the "exclude target"

In the closed source CLP, we have a config file to specify what components to run and one can remove component from the config file to remove components.
It might make more sense if we let the start-clp to launch everything by default (i.e. if a config file is not provided), or launch the specified component based on the config file.

If we plan to go with the current approach, I think if the if statement can be viewed as

if target in (ALL_TARGET, [TASK_TARGET]) and TASK_TARGET not in exclude:

maybe we can factor it out as a function to make the code less messy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like how the code is written now. it's quite hard to read and could be error prone. In addition, it looks like if user make a typo in the exclude list, CLP will sliently run the "exclude target"

In the closed source CLP, we have a config file to specify what components to run and one can remove component from the config file to remove components. It might make more sense if we let the start-clp to launch everything by default (i.e. if a config file is not provided), or launch the specified component based on the config file.

If we plan to go with the current approach, I think if the if statement can be viewed as

if target in (ALL_TARGET, [TASK_TARGET]) and TASK_TARGET not in exclude:

maybe we can factor it out as a function to make the code less messy.

With help from @Henry8192 , we have refactored the if-checks into calls of a checker helper. Also now we also validate whether the names in the exclude list are valid before proceeding. Let me know if this is more readable and maintainable now.

@junhaoliao junhaoliao marked this pull request as ready for review July 27, 2024 17:15
@junhaoliao junhaoliao requested a review from haiqi96 July 29, 2024 16:15
:param target: The target name specified by the user.
:param exclude: A list of component names that should be excluded from starting.
:param component_names: A tuple of components to which the target component corresponds
or depends on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any example of "depends on"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I wrote "depends on" because I previously understood that "target"s (specially CONTROLLER_TARGET_NAME) are master components that contains subcomponents like DB, queue, redis, etc..

Now having understood the relationships better, I think it might make more sense to name this target_names or target_and_component_names. Let's discuss more about the glossary in thread https://github.com/y-scope/clp/pull/492/files#r1695767819

@haiqi96
Copy link
Contributor

haiqi96 commented Jul 29, 2024

Thanks for making the change, it now makes much more sense to me

@@ -59,6 +59,22 @@
validate_worker_config,
)

# Constants
COMPONENT_NAMES = (
CONTROLLER_TARGET_NAME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, a target isn't a component. One option is to change this list to COMPONENT_AND_TARGET_NAMES, or we can have two lists, one for individual components and another one for target(s)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fair to say a component can always be a target? If so, maybe we can rename is as TARGET_NAMES and the usages below would be like we check the user specified "target" against this targets list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think I had this discussion with Kirk before and it's fair to call it TARGET_NAMES (so it is also consistent with ALL_TARGET_NAMES

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now when I look at the component_should_start, I kinda feel the function is a bit unintuitive and maybe can be simplified .. let me think if I can come up with somthing better

@junhaoliao junhaoliao requested a review from haiqi96 July 29, 2024 19:54
@haiqi96
Copy link
Contributor

haiqi96 commented Aug 7, 2024

Trying to think about this from another perspective, what if we rearrange the code, such that we have a function, takes in the target, the exclude, and generates a List: components_to_run. in that case, for each if statement, we simply need to write something like if DB_COMPONENT in components_to_run.

A trade off is that now for targets correspond to mulitple component, we would need to have some good way to define them.
Like

"ALL_TARGET": [
"comp1",
"comp2",
"comp3,
...
]
"CONTROLLER_TARGET": [
"comp1",
...
]

In the long term, this might introduce some clarity though. now it isn't very obvious from code that what components are in CONTROLLER_TARGET (like you have to find every if statement that has CONTROLLER_TARGET in it)

@kirkrodrigues
Copy link
Member

What's the status of this PR?

@haiqi96
Copy link
Contributor

haiqi96 commented Nov 13, 2024

What's the status of this PR?

I guess Junhao got too busy on other stuff and this gets deprioritized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants