-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ssh remote connection & validator installation for sn12 #39
Conversation
drunest
commented
Sep 25, 2024
•
edited
Loading
edited
- Add ssh remote connection without password. but it requires passphrase for the private key
- Generate pre_config file based on the subnet codename
- Transfer the required files to the remote server
- Make .env file on the remove server based on the pre_config.json and .env.template
- Run the pre_install.sh on the remote server
- Run the install.sh on the remote server
- Child hotkey is configurable when the validator is installed on the provision server
- Add dumper commands and normalizing codename API endpoints
- SECRET_KEY and POSTRES_PASSWORD are using random value
Currently, it has only generating new child hotkey based on the main hotkey |
app/src/auto_validator/core/admin.py
Outdated
# change the path to the actual path of the private key | ||
# change the hotkey to the actual hotkey of the validator | ||
# change the IP address to the actual IP address of the server | ||
# change the username to the actual username of the server | ||
# change the path to the actual path of the private key | ||
# change the passphrase to the actual passphrase of the private key | ||
install_validator_on_remote_server( | ||
subnet_slot.subnet.codename, | ||
subnet_slot.blockchain, | ||
subnet_slot.netuid, | ||
"/root/.bittensor/wallets/validator", | ||
"validator-hotkey", | ||
"219.15.67.27", | ||
"root", | ||
"/root/.ssh/id_rsa", | ||
"1234567890", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not ready for review, why are you submitting it for review when the first thing that I read in the PR is that this isn't finished?
app/src/auto_validator/core/api.py
Outdated
@@ -9,6 +12,8 @@ | |||
from .authentication import HotkeyAuthentication | |||
from .utils.bot import trigger_bot_send_message | |||
|
|||
YAML_FILE_PATH = settings.LOCAL_SUBNETS_SCRIPTS_PATH + "/subnets.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAML_FILE_PATH = settings.LOCAL_SUBNETS_SCRIPTS_PATH + "/subnets.yaml" | |
SUBNETS_CONFIG_PATH = pathlib.Path(settings.LOCAL_SUBNETS_SCRIPTS_PATH) / "subnets.yaml" |
"yaml file" is ambigous and this is a string representing a path, not a python Path object that it should be. Also you assume that the path delimiter is /
which does not necessarily need to be true.
yaml_file_path = f"{LOCAL_SUBNETS_SCRIPTS_PATH}/subnets.yaml" | ||
csv_file_path = f"{LOCAL_SUBNETS_SCRIPTS_PATH}/secrets.csv" | ||
|
||
local_hotkey_path = f"{coldkey_path}/hotkeys/{hotkey_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local_hotkey_path = f"{coldkey_path}/hotkeys/{hotkey_name}" | |
local_hotkey_path = coldkey_path / 'hotkeys' / hotkey_name |
"subnet1", "mainnet", "1", "/root/.bittensor/wallets/validator/", "validator-hotkey", "29.92.12.1", "root", "/root/.ssh/id_rsa", "1234567890" | ||
) | ||
""" | ||
ssh = create_ssh_client(ssh_ip_address, ssh_user, ssh_key_path, ssh_passphrase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a context manager which will call .close() on __exit__
ssh.close() | ||
|
||
|
||
def create_new_child_hotkey(netuid, wallet_name, parent_hotkey, child_hotkey, proportion): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add typing everywhere, it's not python 2.7 ;)
proportion (float): proportion of the parent hotkey, (0, 1]. | ||
|
||
Examples: | ||
create_new_child_hotkey("1", "validator", "validator-hotkey", "child-hotkey", "0.5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_new_child_hotkey("1", "validator", "validator-hotkey", "child-hotkey", "0.5") | |
>>> create_new_child_hotkey("1", "validator", "validator-hotkey", "child-hotkey", "0.5") | |
>>> |
commands = [ | ||
"btcli", | ||
"stake", | ||
"set_children", | ||
"--netuid", | ||
netuid, | ||
"--children", | ||
child_wallet.hotkey.ss58_address, | ||
"--proportion", | ||
proportion, | ||
"--hotkey", | ||
parent_wallet.hotkey.ss58_address, | ||
"--wallet.name", | ||
wallet_name, | ||
] | ||
result = subprocess.run(commands, capture_output=True, text=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use bittensor
sdk, not subprocess.run
pyproject.toml
Outdated
"paramiko>=3.5.0", | ||
"scp>=0.15.0", | ||
"discord.py==2.4.0", | ||
"python-dotenv>=1.0.1", | ||
"paramiko>=3.5.0", | ||
"scp>=0.15.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"paramiko>=3.5.0", | |
"scp>=0.15.0", | |
"discord.py==2.4.0", | |
"python-dotenv>=1.0.1", | |
"paramiko>=3.5.0", | |
"scp>=0.15.0", | |
"discord.py==2.4.0", | |
"python-dotenv>=1.0.1", | |
"paramiko>=3.5.0", | |
"scp>=0.15.0", |
netuid: int, | ||
child_wallet_name: str, | ||
child_hotkey_name: str, | ||
proportion: float = 0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proportion: float = 0.1, | |
proportion: float = 1, |
except Exception as e: | ||
self.logger.exception(f"Error while getting children hotkeys: {e}") | ||
return None | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't catch it, let it 500. Returning None is much harder to handle
except Exception as e: | ||
self.logger.exception(f"Error while revoking children hotkeys: {e}") | ||
return False | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let it crash
else: | ||
secrets[row["SECRET_KEYS"]] = row["SECRET_VALUES"] | ||
secrets["SUBNET_CODENAME"] = subnet_codename | ||
secrets["BITTENSOR_NETWORK"] = "finney" if blockchain == "mainnet" else "test" | ||
secrets["BITTENSOR_CHAIN_ENDPOINT"] = ( | ||
"wss://entrypoint-finney.opentensor.ai:443" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take a value from constance, which will use finney by default
if blockchain == "mainnet" | ||
else "wss://test.finney.opentensor.ai:443" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not mainnet is not necessarily testnet, you cannot make that assumption
origin = repo.remotes.origin | ||
origin.pull("master") | ||
except GitCommandError as e: | ||
raise ValueError(f"Error while pulling the repository: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which repository? for which netuid?
remote_env_template_path = f"{remote_path}.env.template" | ||
remote_pre_config_path = f"{remote_path}pre_config.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/
should be on this side, not in the path
print(stderr.read().decode()) | ||
ssh.close() | ||
# Run pre_install.sh on remote server | ||
# ssh_manager.execute_command(f"bash {remote_path}/install.sh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah run it
sys.stdin = StringIO("y\n") | ||
cli_manager.stake_revoke_children( | ||
wallet_name=self.parent_wallet_name, | ||
wallet_hotkey=self.parent_hotkey_name, | ||
wallet_path=self.parent_wallet_path, | ||
network=network, | ||
netuid=netuid, | ||
all_netuids=False, | ||
quiet=True, | ||
verbose=False, | ||
wait_for_finalization=True, | ||
wait_for_inclusion=True, | ||
) | ||
sys.stdin = sys.__stdin__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sys.stdin thing should be in the CLIManager class, not in every single place where you are calling it
@@ -26,22 +26,19 @@ def connect(self) -> bool: | |||
passphrase=self.passphrase, | |||
) | |||
except Exception as e: | |||
self.logger.exception(f"SSH Execute Command Error: {e}") | |||
self.logger.exception("SSH Execute Command Error: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What command execution are we logging here about?
This is a connecftion attempt, not a command
# Ensure local_files is a list | ||
if isinstance(local_files, str): | ||
local_files = [local_files] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Ensure local_files is a list | |
if isinstance(local_files, str): | |
local_files = [local_files] |
don't try to fix input, error out instead. Static code analyzer (or IDE) should be used to check types, if we really need it (not in this project though)
@@ -75,6 +76,8 @@ def copy_files_to_remote(self, local_files: str, remote_path: str) -> None: | |||
break | |||
self.logger.info("Files copied to remote server successfully") | |||
except SCPException as e: | |||
self.logger.exception(f"SCP Error: {e}") | |||
self.logger.exception("SCP Error: %s", e) | |||
raise SCPException("SCP Error: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise SCPException("SCP Error: %s", e) | |
raise |
except OSError as e: | ||
self.logger.exception(f"IOError: {e}") | ||
self.logger.exception("IOError: %s", e) | ||
raise OSError("IOError: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise OSError("IOError: %s", e) | |
raise |
remote_env_template_path = os.path.join(remote_path, ".env.template") | ||
remote_pre_config_path = os.path.join(remote_path, "pre_config.json") | ||
remote_env_path = os.path.join(remote_path, ".env") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_env_template_path = os.path.join(remote_path, ".env.template") | |
remote_pre_config_path = os.path.join(remote_path, "pre_config.json") | |
remote_env_path = os.path.join(remote_path, ".env") | |
remote = pathlib.Path(remote_path) | |
remote_env_template_path = remote / ".env.template" | |
remote_pre_config_path = remote / "pre_config.json" | |
remote_env_path = remote / ".env" |
# Run pre_install.sh on remote server | ||
# ssh_manager.execute_command(f"bash {remote_path}/install.sh") | ||
# Run install.sh on remote server | ||
remote_install_script_path = os.path.join(remote_path, "install.sh") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remote_install_script_path = os.path.join(remote_path, "install.sh") | |
remote_install_script_path = remote / "install.sh" |
# ssh_manager.execute_command(f"bash {remote_path}/install.sh") | ||
# Run install.sh on remote server | ||
remote_install_script_path = os.path.join(remote_path, "install.sh") | ||
ssh_manager.execute_command(f"bash {remote_install_script_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns an empty string on error and you don't check it here, so the entire operation will succeed and the user of the web interface won't know about it unless he logs into the server and inspects the logs. Unacceptable. The command should fail and the user should get a message of some sort.
result = self.cli_manager.stake_revoke_children(*args, **kwargs) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result = self.cli_manager.stake_revoke_children(*args, **kwargs) | |
return result | |
with self: | |
result = self.cli_manager.stake_revoke_children(*args, **kwargs) | |
return result |
and then the caller doesn't need to with
but can just call a method