-
Notifications
You must be signed in to change notification settings - Fork 7
[Feature] Add HuaweiCloud provider #1011
Comments
Hi @kiwik, it's great that you can contribute to CloudTik.
You can submit a PR if you have some code. |
@jerrychenhf thank you reminding the key message, I just start to read CloudTik document in these days, don't design and coding yet. I will discuss with my colleagues about it, maybe commit design in following months. Our team is intersting in Big Data and AI technoloy and enhancement, we can build cooperation with oap-project team in severl open source projects. |
Task list:
|
@kiwik Great to hear that! As to the tasks, it's exactly the right sequence. You can start with implement the workspace create, delete and status which implement the VPC design (VPC and subnets, firewalls, ... with or without public IP, with VPC peering or not), identity and roles for instance authentication and authorization, and managed cloud storage for workspace. Once the workspace is implemented, it's ready to implement the Node Provider to create or delete instances, tagging, get instance information and so on. For Spark (and other workload) to access the cloud storage, there some lightweight implementation in Runtime configuration steps. But this can leave to the last step. CloudTik aslo support K8S provider with integration with Cloud (mostly related OIDC provider integration for identity and roles). If Huawei Cloud have a K8S engine to integrate with Huawei cloud resources, a integration layer can be developed for Cloud Kubernetes Provider. |
Thank you @jerrychenhf to append so many details, it help to make whole workflow clear. On high level plan, I have started to implement HuaweiCloud ECS provider first, that based on virtual machine, and HuaweiCloud support K8S engine too, named CCE service, I will implement CCE provider after ECS provider is ready. I will commit the workspace related functions for ECS provider in this week. I plan to split whole HuaweiCloud provider code to a series of patch sets that foucs on a certain feature, like: workspace, node provider and so on. Hopefully, this small-scale PR will make code reviewing a little easier. |
1. Create and delete workspace networking resources 2. Add HUAWEICLOUD SDK package into setup.py and requirements.txt 3. Add HUAWEICLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources 2. Add HUAWEICLOUD SDK package into setup.py and requirements.txt 3. Add HUAWEICLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEICLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEICLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEI CLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEI CLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEI CLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEI CLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEI CLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEI CLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEI CLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEI CLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEI CLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEI CLOUD default config files, and update schema. Related-with: oap-project#1011
1. Create and delete workspace networking resources and cloud storage. 2. Add HUAWEI CLOUD SDK package into setup.py and requirements.txt. 3. Add HUAWEI CLOUD default config files, and update schema. Related-with: #1011
@kiwik The plan looks great! The workspace functions has been committed. While you are implementing, I will be able to help in some aspects when I free up. A few notes for helping your implementation of this part:
Look forward to your next patch for this. I can help to start with the Spark runtime and fuse support for Huawei cloud storage. One best practice is to follow an existing node provider (the most close one to HuaweiCloud API). Avoid changes without specific reason so that the new implementation will more likely not breaking any assumptions. |
@kiwik I merged the code for Huawei Cloud Hadoop to integrate obs storage. #1138 A few issues found in Hadoop Huawei project (https://github.com/huaweicloud/obsa-hdfs) I made some modifications to the source code and compiled Hadoop 3.3.1 version of hadoop-huawei jar. |
@kiwik #1140 PR made some improvements to the make_xxx_client so that the credential handling logic is shared and can be improved more easily in the future. So the following code: `
` ecs_client = make_ecs_client(config) shoud be -> ecs_client = make_ecs_client(config, region_of_working_node) |
I ping Huawei Cloud obsa-hdfs maintainer, hope him can help. ^ And so nice to you to help add framework of Node provider, it's a entry point, let me starting to solve out whole code path. |
@kiwik It's great! mount cloud fs to local is very useful feature especially for ML/DL cases. Additionally, obsfs seemed to be quite old and doesn't mention to support newer Linux versions such as 20.04 (we use). |
It's different in Huawei Cloud, for the concept VPC Peering in Huawei Cloud only support to connect VPCs in same region, and concept Cloud Connect for VPCs cross regions, see following refer. They apply different API and SDK, so I perfer to keep it simple for CloudTik in the first Huawei Cloud support release, support VPC Peering in same region right now, then support Cloud Connect cross region in the future, maybe we can add some describe and limitation into CloudTik document for Huawei Cloud provider, something like VPC Peering: https://support.huaweicloud.com/usermanual-vpc/zh-cn_topic_0046655036.html |
You are right, I should update refer: https://support.huaweicloud.com/api-obs/obs_04_0021.html |
@jerrychenhf ^ a quick fix in order to don't block your works. |
I see. Thanks! |
|
Test env: openEuler 20.03 LTS SP3 OS on Arm64 HuaweiCloud VM Test cases (commands):
|
1. Enable OBSClient security provider policy chain, it can work with ENV "OBS_ACCESS_KEY_ID" and "OBS_SECRET_ACCESS_KEY" or ECS agent to get AK/SK automatically, it's disable by default. 2. Fix cloud.storage.uri return a whole OBS bucket URI for command "cloudtik workspace info" Related-with: oap-project#1011
1. Enable OBSClient security provider policy chain, it can work with ENV "OBS_ACCESS_KEY_ID" and "OBS_SECRET_ACCESS_KEY" or ECS agent to get AK/SK automatically, it's disable by default. 2. Fix cloud.storage.uri return a whole OBS bucket URI for command "cloudtik workspace info" Related-with: oap-project#1011
1. Enable OBSClient security provider policy chain, it can work with ENV "OBS_ACCESS_KEY_ID" and "OBS_SECRET_ACCESS_KEY" or ECS agent to get AK/SK automatically, it's disable by default. 2. Fix cloud.storage.uri return a whole OBS bucket URI for command "cloudtik workspace info" Related-with: oap-project#1011
1. Enable OBSClient security provider policy chain, it can work with ENV "OBS_ACCESS_KEY_ID" and "OBS_SECRET_ACCESS_KEY" or ECS agent to get AK/SK automatically, it's disable by default. 2. Fix cloud.storage.uri return a whole OBS bucket URI for command "cloudtik workspace info" 3. Fix ECS create server error for command "cloudtik start", remove unnecessary item in dict of create_server. Related-with: oap-project#1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule Related-with: oap-project#1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule Related-with: oap-project#1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule 5. Add workspace subnet DNS option Related-with: oap-project#1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule 5. Add workspace subnet DNS option 6. Add configurable workspace bandwidth option for EIP and NAT Related-with: oap-project#1011
1. Enable OBSClient security provider policy chain, it can work with ENV "OBS_ACCESS_KEY_ID" and "OBS_SECRET_ACCESS_KEY" or ECS agent to get AK/SK automatically, it's disable by default. 2. Fix cloud.storage.uri return a whole OBS bucket URI for command "cloudtik workspace info" 3. Fix ECS create server error for command "cloudtik start", remove unnecessary item in dict of create_server. Related-with: #1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule 5. Add workspace subnet DNS option 6. Add configurable workspace bandwidth option for EIP and NAT 7. Add fs.obs.endpoint in core-site.xml for HuaweiCloud provider Related-with: oap-project#1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule 5. Add workspace subnet DNS option 6. Add configurable workspace bandwidth option for EIP and NAT 7. Add fs.obs.endpoint in core-site.xml for HuaweiCloud provider Related-with: oap-project#1011
Related-with: oap-project#1011
1. "ap-southeast-3" region is in Singapore, we can use more stable and speed networking access to some resources. 2. Update server flavor and image to match the region. 3. Allocate and attach EIP to head node 4. Add workspace security group egress rule 5. Add workspace subnet DNS option 6. Add configurable workspace bandwidth option for EIP and NAT 7. Add fs.obs.endpoint in core-site.xml for HuaweiCloud provider Related-with: #1011
@kiwik One question to the recent change of the default region and image_ref and flavor_ref: So what happen when user set the region in the configuration file to another region, does he have to change the image_ref and flavor_ref? or the current default image_ref and flavor_ref works for other regions too? If the current default image_ref and flavor_ref works only for the current default region you set (ap-southeast-3), we would need some improvements for better user expeierences:
Thanks, |
Understood, your opinion make sense, the default value should be available for most cases to avoid user changing configure file. Acturally for HuaweiCloud the flavor_ref is uniqued in different across regions, but some flavor may be sold out or only apply latest generation flavor in some region, flavor_ref ai1s.* is wider used than ai1.* in HuaweiCloud, so I change it. https://support.huaweicloud.com/productdesc-ecs/ecs_01_0047.html |
@kiwik So the improvement is only needed for image_ref. Please refer to _configure_ami function for AWS bootstrap step for configuring automatically the image id. The basic logic is if user specified a image in the configuration file, we use that. If user don't specify one explicitly, we take two steps to get the image id, first try to using API to listing the image id satisfy our needs for that region, use that if there is one. We also keep a list of static known image ids for major regions and use it as the last choice. Using this method, we don't need the default image value in the default.yaml file so that we can distinguish whether user explicitly specify one or not. |
No problem, thank you showing a reference example. |
Hi @jerrychenhf , I update |
1.Add "op_svc_userid" into head node metadata so that head node can apply temp agancy AK/SK context to launche worker nodes with workspace keypair. 2.Add default security ingress rule in workspace security group Related-with: oap-project#1011
1.Add "op_svc_userid" into worker node metadata so that head node can apply temp agancy AK/SK context to launche worker nodes with workspace keypair. 2.Add default security ingress rule in workspace security group Related-with: oap-project#1011
1.Add "op_svc_userid" into worker node metadata so that head node can apply temp agancy AK/SK context to launche worker nodes with workspace keypair. 2.Add default security ingress rule in workspace security group Related-with: #1011
1. Remove the image_ref UUID in defaults.yaml in HuaweiCloud provider, try to get default image if user don't specify image_ref 2. Update HuaweiCloud Python SDK versions Related-with: oap-project#1011
1. Remove the image_ref UUID in defaults.yaml in HuaweiCloud provider, try to get default image if user don't specify image_ref 2. Update HuaweiCloud Python SDK versions Related-with: #1011
If CloudTik can support huaweicloud provider, it's great. We would like using CloudTik to launch Spark/ML Cluster with OAP enhancement on HuaweiCloud, and I can help to implement it.
https://www.huaweicloud.com/
The text was updated successfully, but these errors were encountered: