Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ssh remote connection & validator installation for sn12 #39

Merged
merged 46 commits into from
Oct 18, 2024

Conversation

drunest
Copy link
Contributor

@drunest drunest commented Sep 25, 2024

  • Add ssh remote connection without password. but it requires passphrase for the private key
  • Generate pre_config file based on the subnet codename
  • Transfer the required files to the remote server
  • Make .env file on the remove server based on the pre_config.json and .env.template
  • Run the pre_install.sh on the remote server
  • Run the install.sh on the remote server
  • Child hotkey is configurable when the validator is installed on the provision server
  • Add dumper commands and normalizing codename API endpoints
  • SECRET_KEY and POSTRES_PASSWORD are using random value

@drunest
Copy link
Contributor Author

drunest commented Oct 1, 2024

Currently, it has only generating new child hotkey based on the main hotkey
subnets.yaml file includes the codename list for normalizing the subnet_identifier
Add dumper commands and normalizing codename API endpoint. there is no authentication here right now. It would considerable in the future.

Comment on lines 115 to 131
# change the path to the actual path of the private key
# change the hotkey to the actual hotkey of the validator
# change the IP address to the actual IP address of the server
# change the username to the actual username of the server
# change the path to the actual path of the private key
# change the passphrase to the actual passphrase of the private key
install_validator_on_remote_server(
subnet_slot.subnet.codename,
subnet_slot.blockchain,
subnet_slot.netuid,
"/root/.bittensor/wallets/validator",
"validator-hotkey",
"219.15.67.27",
"root",
"/root/.ssh/id_rsa",
"1234567890",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not ready for review, why are you submitting it for review when the first thing that I read in the PR is that this isn't finished?

@@ -9,6 +12,8 @@
from .authentication import HotkeyAuthentication
from .utils.bot import trigger_bot_send_message

YAML_FILE_PATH = settings.LOCAL_SUBNETS_SCRIPTS_PATH + "/subnets.yaml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
YAML_FILE_PATH = settings.LOCAL_SUBNETS_SCRIPTS_PATH + "/subnets.yaml"
SUBNETS_CONFIG_PATH = pathlib.Path(settings.LOCAL_SUBNETS_SCRIPTS_PATH) / "subnets.yaml"

"yaml file" is ambigous and this is a string representing a path, not a python Path object that it should be. Also you assume that the path delimiter is / which does not necessarily need to be true.

app/src/auto_validator/core/api.py Outdated Show resolved Hide resolved
app/src/auto_validator/core/api.py Outdated Show resolved Hide resolved
app/src/auto_validator/core/api.py Outdated Show resolved Hide resolved
yaml_file_path = f"{LOCAL_SUBNETS_SCRIPTS_PATH}/subnets.yaml"
csv_file_path = f"{LOCAL_SUBNETS_SCRIPTS_PATH}/secrets.csv"

local_hotkey_path = f"{coldkey_path}/hotkeys/{hotkey_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
local_hotkey_path = f"{coldkey_path}/hotkeys/{hotkey_name}"
local_hotkey_path = coldkey_path / 'hotkeys' / hotkey_name

"subnet1", "mainnet", "1", "/root/.bittensor/wallets/validator/", "validator-hotkey", "29.92.12.1", "root", "/root/.ssh/id_rsa", "1234567890"
)
"""
ssh = create_ssh_client(ssh_ip_address, ssh_user, ssh_key_path, ssh_passphrase)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a context manager which will call .close() on __exit__

ssh.close()


def create_new_child_hotkey(netuid, wallet_name, parent_hotkey, child_hotkey, proportion):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add typing everywhere, it's not python 2.7 ;)

proportion (float): proportion of the parent hotkey, (0, 1].

Examples:
create_new_child_hotkey("1", "validator", "validator-hotkey", "child-hotkey", "0.5")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create_new_child_hotkey("1", "validator", "validator-hotkey", "child-hotkey", "0.5")
>>> create_new_child_hotkey("1", "validator", "validator-hotkey", "child-hotkey", "0.5")
>>>

Comment on lines 273 to 288
commands = [
"btcli",
"stake",
"set_children",
"--netuid",
netuid,
"--children",
child_wallet.hotkey.ss58_address,
"--proportion",
proportion,
"--hotkey",
parent_wallet.hotkey.ss58_address,
"--wallet.name",
wallet_name,
]
result = subprocess.run(commands, capture_output=True, text=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use bittensor sdk, not subprocess.run

pyproject.toml Outdated
Comment on lines 28 to 33
"paramiko>=3.5.0",
"scp>=0.15.0",
"discord.py==2.4.0",
"python-dotenv>=1.0.1",
"paramiko>=3.5.0",
"scp>=0.15.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"paramiko>=3.5.0",
"scp>=0.15.0",
"discord.py==2.4.0",
"python-dotenv>=1.0.1",
"paramiko>=3.5.0",
"scp>=0.15.0",
"discord.py==2.4.0",
"python-dotenv>=1.0.1",
"paramiko>=3.5.0",
"scp>=0.15.0",

app/src/auto_validator/core/api.py Outdated Show resolved Hide resolved
netuid: int,
child_wallet_name: str,
child_hotkey_name: str,
proportion: float = 0.1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
proportion: float = 0.1,
proportion: float = 1,

Comment on lines 82 to 85
except Exception as e:
self.logger.exception(f"Error while getting children hotkeys: {e}")
return None
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't catch it, let it 500. Returning None is much harder to handle

Comment on lines 108 to 111
except Exception as e:
self.logger.exception(f"Error while revoking children hotkeys: {e}")
return False
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let it crash

app/src/auto_validator/core/utils/ssh.py Show resolved Hide resolved
else:
secrets[row["SECRET_KEYS"]] = row["SECRET_VALUES"]
secrets["SUBNET_CODENAME"] = subnet_codename
secrets["BITTENSOR_NETWORK"] = "finney" if blockchain == "mainnet" else "test"
secrets["BITTENSOR_CHAIN_ENDPOINT"] = (
"wss://entrypoint-finney.opentensor.ai:443"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take a value from constance, which will use finney by default

Comment on lines 118 to 119
if blockchain == "mainnet"
else "wss://test.finney.opentensor.ai:443"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not mainnet is not necessarily testnet, you cannot make that assumption

origin = repo.remotes.origin
origin.pull("master")
except GitCommandError as e:
raise ValueError(f"Error while pulling the repository: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which repository? for which netuid?

Comment on lines 180 to 181
remote_env_template_path = f"{remote_path}.env.template"
remote_pre_config_path = f"{remote_path}pre_config.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ should be on this side, not in the path

print(stderr.read().decode())
ssh.close()
# Run pre_install.sh on remote server
# ssh_manager.execute_command(f"bash {remote_path}/install.sh")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah run it

Comment on lines 85 to 98
sys.stdin = StringIO("y\n")
cli_manager.stake_revoke_children(
wallet_name=self.parent_wallet_name,
wallet_hotkey=self.parent_hotkey_name,
wallet_path=self.parent_wallet_path,
network=network,
netuid=netuid,
all_netuids=False,
quiet=True,
verbose=False,
wait_for_finalization=True,
wait_for_inclusion=True,
)
sys.stdin = sys.__stdin__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sys.stdin thing should be in the CLIManager class, not in every single place where you are calling it

@@ -26,22 +26,19 @@ def connect(self) -> bool:
passphrase=self.passphrase,
)
except Exception as e:
self.logger.exception(f"SSH Execute Command Error: {e}")
self.logger.exception("SSH Execute Command Error: %s", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What command execution are we logging here about?

This is a connecftion attempt, not a command

Comment on lines 66 to 69
# Ensure local_files is a list
if isinstance(local_files, str):
local_files = [local_files]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Ensure local_files is a list
if isinstance(local_files, str):
local_files = [local_files]

don't try to fix input, error out instead. Static code analyzer (or IDE) should be used to check types, if we really need it (not in this project though)

@@ -75,6 +76,8 @@ def copy_files_to_remote(self, local_files: str, remote_path: str) -> None:
break
self.logger.info("Files copied to remote server successfully")
except SCPException as e:
self.logger.exception(f"SCP Error: {e}")
self.logger.exception("SCP Error: %s", e)
raise SCPException("SCP Error: %s", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise SCPException("SCP Error: %s", e)
raise

except OSError as e:
self.logger.exception(f"IOError: {e}")
self.logger.exception("IOError: %s", e)
raise OSError("IOError: %s", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise OSError("IOError: %s", e)
raise

Comment on lines 168 to 170
remote_env_template_path = os.path.join(remote_path, ".env.template")
remote_pre_config_path = os.path.join(remote_path, "pre_config.json")
remote_env_path = os.path.join(remote_path, ".env")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remote_env_template_path = os.path.join(remote_path, ".env.template")
remote_pre_config_path = os.path.join(remote_path, "pre_config.json")
remote_env_path = os.path.join(remote_path, ".env")
remote = pathlib.Path(remote_path)
remote_env_template_path = remote / ".env.template"
remote_pre_config_path = remote / "pre_config.json"
remote_env_path = remote / ".env"

# Run pre_install.sh on remote server
# ssh_manager.execute_command(f"bash {remote_path}/install.sh")
# Run install.sh on remote server
remote_install_script_path = os.path.join(remote_path, "install.sh")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
remote_install_script_path = os.path.join(remote_path, "install.sh")
remote_install_script_path = remote / "install.sh"

# ssh_manager.execute_command(f"bash {remote_path}/install.sh")
# Run install.sh on remote server
remote_install_script_path = os.path.join(remote_path, "install.sh")
ssh_manager.execute_command(f"bash {remote_install_script_path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns an empty string on error and you don't check it here, so the entire operation will succeed and the user of the web interface won't know about it unless he logs into the server and inspects the logs. Unacceptable. The command should fail and the user should get a message of some sort.

Comment on lines 23 to 24
result = self.cli_manager.stake_revoke_children(*args, **kwargs)
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result = self.cli_manager.stake_revoke_children(*args, **kwargs)
return result
with self:
result = self.cli_manager.stake_revoke_children(*args, **kwargs)
return result

and then the caller doesn't need to with but can just call a method

@ppolewicz ppolewicz merged commit 48fce20 into bactensor:master Oct 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants