Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DPE-2645] POC Add shards to cluster #260

Merged
merged 51 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
ac24eed
charm can start as shard or config server
MiaAltieri Sep 13, 2023
f8ff8a1
mongos, shard, and config server all start without error
MiaAltieri Sep 14, 2023
9b64f31
use correct snap
MiaAltieri Sep 15, 2023
c167fad
Merge branch 'main' into start-shard-config-mongos
MiaAltieri Sep 15, 2023
11f3c65
fmt + lint
MiaAltieri Sep 15, 2023
4bf9d5f
update error processing
MiaAltieri Sep 15, 2023
e924c00
bump lib patch
MiaAltieri Sep 15, 2023
51a22ec
Merge branch 'main' into start-shard-config-mongos
MiaAltieri Sep 15, 2023
d1f0a0d
enable auth
MiaAltieri Sep 18, 2023
06af49a
mongos should be run on 0.0.0.0
MiaAltieri Sep 19, 2023
363db5f
addressing PR comments
MiaAltieri Sep 19, 2023
c4c0c27
Merge branch 'main' into start-shard-config-mongos
MiaAltieri Sep 19, 2023
ac78ed3
PR comments
MiaAltieri Sep 19, 2023
49cba1b
Merge branch 'main' into start-shard-config-mongos
MiaAltieri Sep 19, 2023
563f049
correct ip binding
MiaAltieri Sep 19, 2023
67964c7
Merge branch 'start-shard-config-mongos' into start-mongos-auth
MiaAltieri Sep 19, 2023
e8aaeb5
mongos and config server now start correctly, and mongos has auth ena…
MiaAltieri Sep 19, 2023
0598dde
cleaning up code
MiaAltieri Sep 19, 2023
7c64ce0
fix unit tests
MiaAltieri Sep 19, 2023
666f3dc
Merge branch '6/edge' into start-mongos-auth
MiaAltieri Sep 20, 2023
58d91fa
don't publish 6/edge changes to 5/edge
MiaAltieri Sep 20, 2023
80c8bf9
revert changes on init admin user
MiaAltieri Sep 20, 2023
02fcf7c
add new lib
MiaAltieri Sep 20, 2023
ff83789
set up basic relation structure
MiaAltieri Sep 20, 2023
426cff8
operator password and keyfile now shared from config server
MiaAltieri Sep 21, 2023
af99322
fixes + working with replicas now
MiaAltieri Sep 21, 2023
caef63d
add docstrings
MiaAltieri Sep 21, 2023
38b7b8f
unit, lint, fmt
MiaAltieri Sep 21, 2023
766a59b
simplify function for tox
MiaAltieri Sep 21, 2023
e6dff94
Merge branch '6/edge' into share-secrets
MiaAltieri Sep 21, 2023
a9f6cd6
personal nits
MiaAltieri Sep 21, 2023
0a432ea
PR comments
MiaAltieri Sep 22, 2023
059356f
propogating passwords of internal db users happens automatically
MiaAltieri Sep 22, 2023
bed71ce
fix bug in role retrieval
MiaAltieri Sep 25, 2023
b86259f
revert changes to set operator password
MiaAltieri Sep 27, 2023
315eaa5
Merge branch '6/edge' into share-secrets
MiaAltieri Sep 27, 2023
3f5d3d2
lint + fmt
MiaAltieri Sep 27, 2023
b8ca419
add shard works
MiaAltieri Sep 28, 2023
7f4da07
cleaning
MiaAltieri Sep 28, 2023
6213835
Pr comments
MiaAltieri Sep 28, 2023
a940412
create lib file in charmcraft
MiaAltieri Sep 28, 2023
bd07155
pr comments
MiaAltieri Sep 28, 2023
74398c9
deferred events should be followed by a return
MiaAltieri Sep 29, 2023
98ad1b1
add additional log
MiaAltieri Sep 29, 2023
659f22d
Merge branch 'share-secrets' into add-shards
MiaAltieri Sep 29, 2023
587f3e5
add more logs
MiaAltieri Sep 29, 2023
9bcbc85
Merge branch '6/edge' into add-shards
MiaAltieri Sep 29, 2023
0f2fb68
PR comments
MiaAltieri Oct 4, 2023
243317a
updates for new linter
MiaAltieri Oct 4, 2023
1321f59
use ips from relation
MiaAltieri Oct 4, 2023
bd75d13
Merge branch '6/edge' into add-shards
MiaAltieri Oct 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Testing high availability on a production cluster can be done with:
tox run -e ha-integration -- --model=<model_name>
```

Note if you'd like to test storage re-use in ha-testing, your storage must not be of the type `rootfs`. `rootfs` storage is tied to the machine lifecycle and does not stick around after unit removal. `rootfs` storage is used by default with `tox run -e ha-integration`. To test ha-testing for storage re-use:
Note if you'd like to test storage reuse in ha-testing, your storage must not be of the type `rootfs`. `rootfs` storage is tied to the machine lifecycle and does not stick around after unit removal. `rootfs` storage is used by default with `tox run -e ha-integration`. To test ha-testing for storage reuse:
```shell
juju create-storage-pool mongodb-ebs ebs volume-type=standard # create a storage pool
juju deploy ./*charm --storage mongodb=mongodb-ebs,7G,1 # deploy 1 or more units of application with said storage pool
Expand Down
2 changes: 1 addition & 1 deletion lib/charms/mongodb/v0/mongodb.py
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ def create_role(self, role_name: str, privileges: dict, roles: dict = []):

Args:
role_name: name of the role to be added.
privileges: privledges to be associated with the role.
privileges: privileges to be associated with the role.
roles: List of roles from which this role inherits privileges.
"""
try:
Expand Down
4 changes: 2 additions & 2 deletions lib/charms/mongodb/v0/mongodb_backups.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ def _try_to_restore(self, backup_id: str) -> None:

If PBM is resyncing, the function will retry to create backup
(up to BACKUP_RESTORE_MAX_ATTEMPTS times) with BACKUP_RESTORE_ATTEMPT_COOLDOWN
time between attepts.
time between attempts.

If PMB returen any other error, the function will raise RestoreError.
"""
Expand Down Expand Up @@ -541,7 +541,7 @@ def _try_to_backup(self):

If PBM is resyncing, the function will retry to create backup
(up to BACKUP_RESTORE_MAX_ATTEMPTS times)
with BACKUP_RESTORE_ATTEMPT_COOLDOWN time between attepts.
with BACKUP_RESTORE_ATTEMPT_COOLDOWN time between attempts.

If PMB returen any other error, the function will raise BackupError.
"""
Expand Down
170 changes: 170 additions & 0 deletions lib/charms/mongodb/v0/mongos.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
"""Code for interactions with MongoDB."""
# Copyright 2023 Canonical Ltd.
# See LICENSE file for licensing details.

import logging
from dataclasses import dataclass
from typing import Optional, Set
from urllib.parse import quote_plus

from pymongo import MongoClient
from pymongo.errors import PyMongoError

from config import Config

# The unique Charmhub library identifier, never change it
LIBID = "e20d5b19670d4c55a4934a21d3f3b29a"

# Increment this major API version when introducing breaking changes
LIBAPI = 0

# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 1

# path to store mongodb ketFile
logger = logging.getLogger(__name__)


@dataclass
class MongosConfiguration:
"""Class for mongos configuration.

— database: database name.
— username: username.
— password: password.
— hosts: full list of hosts to connect to, needed for the URI.
- port: integer for the port to connect to connect to mongodb.
- tls_external: indicator for use of internal TLS connection.
- tls_internal: indicator for use of external TLS connection.
"""

database: Optional[str]
username: str
password: str
hosts: Set[str]
port: int
roles: Set[str]
tls_external: bool
tls_internal: bool

@property
def uri(self):
"""Return URI concatenated from fields."""
hosts = [f"{host}:{self.port}" for host in self.hosts]
hosts = ",".join(hosts)
# Auth DB should be specified while user connects to application DB.
auth_source = ""
if self.database != "admin":
MiaAltieri marked this conversation as resolved.
Show resolved Hide resolved
auth_source = "&authSource=admin"
return (
f"mongodb://{quote_plus(self.username)}:"
f"{quote_plus(self.password)}@"
f"{hosts}/{quote_plus(self.database)}?"
f"{auth_source}"
)


class NotReadyError(PyMongoError):
"""Raised when not all replica set members healthy or finished initial sync."""


class MongosConnection:
"""In this class we create connection object to Mongos.

Real connection is created on the first call to Mongos.
Delayed connectivity allows to firstly check database readiness
and reuse the same connection for an actual query later in the code.

Connection is automatically closed when object destroyed.
Automatic close allows to have more clean code.

Note that connection when used may lead to the following pymongo errors: ConfigurationError,
ConfigurationError, OperationFailure. It is suggested that the following pattern be adopted
when using MongoDBConnection:

with MongoMongos(self._mongos_config) as mongo:
try:
mongo.<some operation from this class>
except ConfigurationError, OperationFailure:
<error handling as needed>
"""

def __init__(self, config: MongosConfiguration, uri=None, direct=False):
"""A MongoDB client interface.

Args:
config: MongoDB Configuration object.
uri: allow using custom MongoDB URI, needed for replSet init.
direct: force a direct connection to a specific host, avoiding
reading replica set configuration and reconnection.
"""
self.mongodb_config = config

if uri is None:
uri = config.uri

self.client = MongoClient(
uri,
directConnection=direct,
connect=False,
serverSelectionTimeoutMS=1000,
connectTimeoutMS=2000,
)
return

def __enter__(self):
"""Return a reference to the new connection."""
return self

def __exit__(self, object_type, value, traceback):
"""Disconnect from MongoDB client."""
self.client.close()
self.client = None

def get_shard_members(self) -> Set[str]:
"""Gets shard members.

Returns:
A set of the shard members as reported by mongos.

Raises:
ConfigurationError, OperationFailure
"""
shard_list = self.client.admin.command("listShards")
curr_members = [
self._hostname_from_hostport(member["host"]) for member in shard_list["shards"]
]
return set(curr_members)

def add_shard(self, shard_name, shard_hosts, shard_port=Config.MONGODB_PORT):
"""Adds shard to the cluster.

Raises:
ConfigurationError, OperationFailure
"""
shard_hosts = [f"{host}:{shard_port}" for host in shard_hosts]
shard_hosts = ",".join(shard_hosts)
shard_url = f"{shard_name}/{shard_hosts}"
# TODO Future PR raise error when number of shards currently adding are higher than the
# number of secondaries on the primary shard. This will be challenging, as there is no
# MongoDB command to retrieve the primary shard. Will likely need to be done via
# mongosh

if shard_name in self.get_shard_members():
logger.info("Skipping adding shard %s, shard is already in cluster", shard_name)
return

logger.info("Adding shard %s", shard_name)
self.client.admin.command("addShard", shard_url)

@staticmethod
def _hostname_from_hostport(hostname: str) -> str:
"""Return hostname part from MongoDB returned.

mongos typically returns a value that contains both, hostname, hosts, and ports.
e.g. input: shard03/host7:27018,host8:27018,host9:27018
Return shard name
e.g. output: shard03
"""
return hostname.split("/")[0]
125 changes: 118 additions & 7 deletions lib/charms/mongodb/v0/shards_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,17 @@
This class handles the sharing of secrets between sharded components, adding shards, and removing
shards.
"""
import json
import logging
from typing import Optional

from charms.mongodb.v0.helpers import KEY_FILE
from charms.mongodb.v0.mongodb import MongoDBConnection, NotReadyError, PyMongoError
from charms.mongodb.v0.mongos import MongosConnection
from charms.mongodb.v0.users import MongoDBUser, OperatorUser
from ops.charm import CharmBase
from ops.charm import CharmBase, RelationBrokenEvent
from ops.framework import Object
from ops.model import BlockedStatus, MaintenanceStatus, WaitingStatus
from ops.model import ActiveStatus, BlockedStatus, MaintenanceStatus, WaitingStatus
from tenacity import RetryError, Retrying, stop_after_delay, wait_fixed

from config import Config
Expand All @@ -29,8 +32,9 @@

# Increment this PATCH version before using `charmcraft publish-lib` or reset
# to 0 if you are raising the major API version
LIBPATCH = 1
LIBPATCH = 2
KEYFILE_KEY = "key-file"
HOSTS_KEY = "hosts"
OPERATOR_PASSWORD_KEY = MongoDBUser.get_password_key_name_for_user(OperatorUser.get_username())


Expand All @@ -48,7 +52,11 @@ def __init__(
self.framework.observe(
charm.on[self.relation_name].relation_joined, self._on_relation_joined
)
# TODO Future PR, enable shard drainage by listening for relation departed events
self.framework.observe(
charm.on[self.relation_name].relation_changed, self._on_relation_event
)

# TODO Follow up PR, handle rotating passwords

def _on_relation_joined(self, event):
"""Handles providing shards with secrets and adding shards to the config server."""
Expand Down Expand Up @@ -84,8 +92,67 @@ def _on_relation_joined(self, event):
},
)

# TODO Future PR, add shard to config server
# TODO Follow up PR, handle rotating passwords
def _on_relation_event(self, event):
"""Handles adding, removing, and updating of shards."""
if self.charm.is_role(Config.Role.REPLICATION):
self.unit.status = BlockedStatus("role replication does not support sharding")
logger.error("sharding interface not supported with config role=replication")
return

if not self.charm.is_role(Config.Role.CONFIG_SERVER):
logger.info(
"skipping relation joined event ShardingRequirer is only be executed by config-server"
)
return

if not self.charm.unit.is_leader():
return

if not self.charm.db_initialised:
event.defer()

departed_relation_id = None
if type(event) is RelationBrokenEvent:
departed_relation_id = event.relation.id

try:
logger.info("Adding shards not present in cluster.")
self.add_shards(departed_relation_id)
# TODO Future PR, enable updating shards by listening for relation changed events
# TODO Future PR, enable shard drainage by listening for relation departed events
except PyMongoError as e:
logger.error("Deferring _on_relation_event for shards interface since: error=%r", e)
event.defer()
return

def add_shards(self, departed_shard_id):
"""Adds shards to cluster.

raises: PyMongoError
"""
with MongosConnection(self.charm.mongos_config) as mongo:
cluster_shards = mongo.get_shard_members()
relation_shards = self._get_shards_from_relations(departed_shard_id)

# TODO Future PR, limit number of shards add at a time, based on the number of
# replicas in the primary shard
for shard in relation_shards - cluster_shards:
try:
shard_hosts = self._get_shard_hosts(shard)
if not len(shard_hosts):
logger.info("host info for shard %s not yet added, skipping", shard)
MiaAltieri marked this conversation as resolved.
Show resolved Hide resolved
continue

self.charm.unit.status = MaintenanceStatus(
f"Adding shard {shard} to config-server"
)
logger.info("Adding shard: %s ", shard)
mongo.add_shard(shard, shard_hosts)
except PyMongoError as e:
logger.error("Failed to add shard %s to the config server, error=%r", shard, e)
raise

self.charm.unit.status = ActiveStatus("")

def _update_relation_data(self, relation_id: int, data: dict) -> None:
"""Updates a set of key-value pairs in the relation.
Expand All @@ -103,6 +170,28 @@ def _update_relation_data(self, relation_id: int, data: dict) -> None:
if relation:
relation.data[self.charm.model.app].update(data)

def _get_shards_from_relations(self, departed_shard_id: Optional[int]):
"""Returns a list of the shards related to the config-server."""
relations = self.model.relations[self.relation_name]
return set(
[
self._get_shard_name_from_relation(relation)
for relation in relations
if relation.id != departed_shard_id
]
)

def _get_shard_hosts(self, shard_name) -> str:
"""Retrieves the hosts for a specified shard."""
relations = self.model.relations[self.relation_name]
for relation in relations:
if self._get_shard_name_from_relation(relation) == shard_name:
return json.loads(relation.data[relation.app].get(HOSTS_KEY, "[]"))
MiaAltieri marked this conversation as resolved.
Show resolved Hide resolved

def _get_shard_name_from_relation(self, relation):
"""Returns the name of a shard for a specified relation."""
return relation.app.name


class ConfigServerRequirer(Object):
"""Manage relations between the config server and the shard, on the shard's side."""
Expand Down Expand Up @@ -166,7 +255,13 @@ def _on_relation_changed(self, event):
)
return

# TODO future PR, leader unit verifies shard was added to cluster
# send shard hosts to config-server mongos, so that shard can be added to the cluster.
self._update_relation_data(
event.relation.id,
{HOSTS_KEY: json.dumps(self.charm._unit_ips)},
)

# TODO future PR, leader unit verifies shard was added to cluster (update-status hook)

def update_operator_password(self, new_password: str) -> None:
"""Updates the password for the operator user.
Expand Down Expand Up @@ -233,3 +328,19 @@ def update_keyfile(self, key_file_contents: str) -> None:
self.charm.set_secret(
Config.Relations.APP_SCOPE, Config.Secrets.SECRET_KEYFILE_NAME, key_file_contents
)

def _update_relation_data(self, relation_id: int, data: dict) -> None:
"""Updates a set of key-value pairs in the relation.

This function writes in the application data bag, therefore, only the leader unit can call
it.

Args:
relation_id: the identifier for a particular relation.
data: dict containing the key-value pairs
that should be updated in the relation.
"""
if self.charm.unit.is_leader():
relation = self.charm.model.get_relation(self.relation_name, relation_id)
if relation:
relation.data[self.charm.model.app].update(data)
Loading
Loading