Skip to content

Commit

Permalink
Setup (#250)
Browse files Browse the repository at this point in the history
* allow hdf5:// prefix - #247

* fix flake8 errors

* use tcp by default in hsds app

* update test to run with quick stat

* add toml file

* catch login name not defined

* remove defunct rangeget test

* updated readme quick start

* bump aiohttp version to 3.8.5

* fix quick start instructions
  • Loading branch information
jreadey authored Aug 25, 2023
1 parent 334bef8 commit 6e26cf4
Show file tree
Hide file tree
Showing 23 changed files with 422 additions and 387 deletions.
9 changes: 7 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,18 @@ RUN mkdir /usr/local/src/hsds/ \
/usr/local/src/hsds/hsds/util/ \
/etc/hsds/

COPY setup.py /usr/local/src/hsds/
COPY pyproject.toml /usr/local/src/hsds/
COPY setup.cfg /user/local/src/hsds/
COPY hsds/*.py /usr/local/src/hsds/hsds/
COPY hsds/util/*.py /usr/local/src/hsds/hsds/util/
COPY admin/config/config.yml /etc/hsds/
COPY admin/config/config.yml /usr/local/src/hsds/admin/config/
COPY entrypoint.sh /
RUN /bin/bash -c 'cd /usr/local/src/hsds; pip install -e ".[azure]" ; cd -'
RUN /bin/bash -c 'cd /usr/local/src/hsds; \
pip install build;\
python -m build;\
pip install -v . ;\
cd -'

EXPOSE 5100-5999
ENTRYPOINT ["/bin/bash", "-c", "/entrypoint.sh"]
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,27 @@
HSDS is a web service that implements a REST-based web service for HDF5 data stores.
Data can be stored in either a POSIX files system, or using object-based storage such as
AWS S3, Azure Blob Storage, or [MinIO](https://min.io).
HSDS can be run a single machine using Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).
HSDS can be run a single machine with or without Docker or on a cluster using Kubernetes (or AKS on Microsoft Azure).

In addition, HSDS can be run in serverless mode with AWS Lambda or h5pyd local mode.

## Quick Start

Make sure you have Python 3, Pip, and git installed, then:

1. Clone this repo: `$ git clone https://github.com/HDFGroup/hsds`
2. Go to the hsds directory: `$ cd hsds`
3. Run install: `$ python setup.py install` OR install from pypi: `$ pip install hsds`
4. Setup password file: `$ cp admin/config/passwd.default admin/config/passwd.txt`
5. Create a directory the server will use to store data, and then set the ROOT_DIR environment variable to point to it: `$ mkdir hsds_data; export ROOT_DIR="${PWD}/hsds_data"` For Windows: `C:> set ROOT_DIR=%CD%\hsds_data`
6. Create the hsds test bucket: `$ mkdir hsds_data/hsdstest`
7. Start server: `$ ./runall.sh --no-docker` For Windows: `C:> runall.bat`
8. In a new shell, set environment variables for the admin account: `$ export ADMIN_USERNAME=admin` and `$ export ADMIN_PASSWORD=admin` (adjust for any changes made to the passwd.txt file). For Windows - use the corresponding set commands
9. Run the test suite: `$ python testall.py --skip_unit`
10. (Optional) Post install setup (test data, home folders, cli tools, etc): [docs/post_install.md](docs/post_install.md)
11. (Optional) Install the h5pyd package for an h5py compatible api and tool suite: https://github.com/HDFGroup/h5pyd

To shut down the server, and the server was started with the --no-docker option, just control-C.
Make sure you have Python 3 and Pip installed, then:

1. Run install: `$ ./build.sh` from source tree OR install from pypi: `$ pip install hsds`
2. Create a directory the server will use to store data, example: `$ mkdir ~/hsds_data`
3. Start server: `$ hsds --root_dir ~/hsds_data`
4. Run the test suite. In a separate terminal run:
- Set user_name: `$ export USER_NAME=$USER`
- Set user_password: `$ export USER_PASSWORD=$USER`
- Set admin name: `$ export ADMIN_USERNAME=$USER`
- Set admin password: `$ $export ADMIN_PASSWORD=$USER`
- Run test suite: `$ python testall.py --skip_unit`
5. (Optional) Install the h5pyd package for an h5py compatible api and tool suite: https://github.com/HDFGroup/h5pyd
6. (Optional) Post install setup (test data, home folders, cli tools, etc): [docs/post_install.md](docs/post_install.md)

To shut down the server, and the server is not running in Docker, just control-C.

If using docker, run: `$ ./stopall.sh`

Expand Down
18 changes: 12 additions & 6 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,17 @@ if [ $run_pyflakes ]; then
fi
fi

echo "running setup.py"
python setup.py install
pip install --upgrade build

echo "clean stopped containers"
docker rm -v $(docker ps -aq -f status=exited)
echo "running build"
python -m build
pip install -v .

echo "building docker image"
docker build -t hdfgroup/hsds .
command -v docker
if [ $? -ne 1 ]; then
echo "clean stopped containers"
docker rm -v $(docker ps -aq -f status=exited)

echo "building docker image"
docker build -t hdfgroup/hsds .
fi
4 changes: 0 additions & 4 deletions entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,6 @@ elif [ $NODE_TYPE == "head_node" ]; then
echo "running hsds-headnode"
export PYTHONUNBUFFERED="1"
hsds-headnode
elif [ $NODE_TYPE == "rangeget" ]; then
echo "running hsds-rangeget"
export PYTHONUNBUFFERED="1"
hsds-rangeget
else
echo "Unknown NODE_TYPE: " $NODE_TYPE
fi
103 changes: 63 additions & 40 deletions hsds/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,33 @@
import sys
import logging
import time
import uuid

from .hsds_app import HsdsApp
from . import config

_HELP_USAGE = "Starts hsds a REST-based service for HDF5 data."
_HELP_USAGE = "Starts HSDS, a REST-based service for HDF5 data."

_HELP_EPILOG = """Examples:
- with a POSIX-based storage using a directory: ./hsdata for storage:
hsds --root_dir ~/hsdata
- with POSIX-based storage and config settings and password file:
hsds --root_dir ~/hsdata --password-file ./admin/config/passwd.txt \
--config_dir ./admin/config
- with minio data storage:
hsds --s3-gateway http://localhost:6007 --access-key-id demo:demo
--secret-access-key DEMO_PASS --password-file ./admin/config/passwd.txt
--bucket-name hsds.test
- with a POSIX-based storage for 'hsds.test' sub-folder in the './data'
folder:
- with AWS S3 storage and a bucket in the us-west-2 region:
hsds --s3-gateway http://s3.us-west-2.amazonaws.com --access-key-id ${AWS_ACCESS_KEY_ID} \
--secret-access-key ${AWS_SECRET_ACCESS_KEY} --password-file ./admin/config/passwd.txt
hsds --bucket-dir ./data/hsds.test
"""

# maximum number of characters if socket directory is given
Expand Down Expand Up @@ -139,14 +148,13 @@ def main():
epilog=_HELP_EPILOG,
)

group = parser.add_mutually_exclusive_group(required=True)
group.add_argument(
parser.add_argument(
"--root_dir",
type=str,
dest="root_dir",
help="Directory where to store the object store data",
)
group.add_argument(
parser.add_argument(
"--bucket_name",
nargs=1,
type=str,
Expand Down Expand Up @@ -197,7 +205,7 @@ def main():
)
parser.add_argument(
"--count",
default=1,
default=4,
type=int,
dest="dn_count",
help="Number of dn sub-processes to create.",
Expand Down Expand Up @@ -241,16 +249,25 @@ def main():
print(f"unsupported log_level: {log_level_cfg}, using INFO instead")
log_level = logging.INFO

print("set logging to:", log_level)
print("set logging to::", log_level)
logging.basicConfig(level=log_level)

userConfig = UserConfig()

# set username based on command line, .hscfg, $USER, or $JUPYTERHUB_USER
login_username = None
try:
login_username = os.getlogin()
except OSError:
pass # ignore

# set username based on command line, .hscfg, or login user
if args.hs_username:
username = args.hs_username
elif "HS_USERNAME" in userConfig:
username = userConfig["HS_USERNAME"]
elif not args.password_file:
# no password file, add the login name as user
username = login_username
else:
username = None

Expand All @@ -260,7 +277,7 @@ def main():
elif "HS_PASSWORD" in userConfig:
password = userConfig["HS_PASSWORD"]
else:
password = "1234"
password = login_username

if username:
kwargs["username"] = username
Expand All @@ -271,38 +288,23 @@ def main():
sys.exit(f"password file: {args.password_file} not found")
kwargs["password_file"] = args.password_file

if args.host:
# use TCP connect
kwargs["host"] = args.host
# use unix domain socket if a socket dir is set
if args.socket_dir:
socket_dir = os.path.abspath(args.socket_dir)
if not os.path.isdir(socket_dir):
raise FileNotFoundError(f"directory: {socket_dir} not found")
kwargs["socket_dir"] = socket_dir
else:
# USE TCP connect
if args.host:
kwargs["host"] = args.host
else:
kwargs["host"] = "localhost"
# sn_port only relevant for TCP connections
if args.port:
kwargs["sn_port"] = args.port
else:
kwargs["sn_port"] = 5101 # TBD - use config
else:
# choose a tmp directory for socket if one is not provided
if args.socket_dir:
socket_dir = os.path.abspath(args.socket_dir)
if not os.path.isdir(socket_dir):
raise FileNotFoundError(f"directory: {socket_dir} not found")
else:
if "TMP" in os.environ:
# This should be set at least on Windows
tmp_dir = os.environ["TMP"]
print("set tmp_dir:", tmp_dir)
else:
tmp_dir = "/tmp"
if not os.path.isdir(tmp_dir):
raise FileNotFoundError(f"directory {tmp_dir} not found")
rand_name = uuid.uuid4().hex[:8]
socket_dir = os.path.join(tmp_dir, f"hs{rand_name}")
print("using socket dir:", socket_dir)
if len(socket_dir) > MAX_SOCKET_DIR_PATH_LEN:
raise ValueError(
f"length of socket_dir must be less than: {MAX_SOCKET_DIR_PATH_LEN}"
)
os.mkdir(socket_dir)
kwargs["socket_dir"] = socket_dir

if args.logfile:
logfile = os.path.abspath(args.logfile)
Expand All @@ -329,6 +331,27 @@ def main():
if args.dn_count:
kwargs["dn_count"] = args.dn_count

if args.bucket_name:
bucket_name = args.bucket_name
else:
bucket_name = config.get("bucket_name")
if not bucket_name:
sys.exit("bucket_name not set")
if args.root_dir:
root_dir = args.root_dir
else:
root_dir = config.get("root_dir")
if not root_dir:
# check that AWS_S3_GATEWAY or AZURE_CONNECTION_STRING is set
if not config.get("aws_s3_gateway") and not config.get("azure_connection_string"):
sys.exit("root_dir not set (and no S3 or Azure connection info)")
else:
if not os.path.isdir(root_dir):
sys.exit(f"directory: {root_dir} not found")
bucket_path = os.path.join(root_dir, bucket_name)
if not os.path.isdir(bucket_path):
os.mkdir(bucket_path)

app = HsdsApp(**kwargs)
app.run()

Expand Down
2 changes: 1 addition & 1 deletion hsds/basenode.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
from .util.k8sClient import getDnLabelSelector, getPodIps
from . import hsds_logger as log

HSDS_VERSION = "0.8.1"
HSDS_VERSION = "0.8.2"


def getVersion():
Expand Down
13 changes: 11 additions & 2 deletions hsds/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,21 @@ def getCmdLineArg(x):
# return value of command-line option
# use "--x=val" to set option 'x' to 'val'
# use "--x" for boolean flags

option = "--" + x + "="
for i in range(1, len(sys.argv)):
arg = sys.argv[i]
if i < len(sys.argv) - 1:
next_arg = sys.argv[i + 1]
else:
next_arg = None
if arg == "--" + x:
# boolean flag
debug(f"got cmd line flag for {x}")
return True
if next_arg is None or next_arg.startswith("-"):
# treat as a boolean flag
return True
else:
return next_arg
elif arg.startswith(option):
# found an override
nlen = len(option)
Expand All @@ -69,6 +77,7 @@ def _load_cfg():
config_dir = getCmdLineArg("config_dir")

if config_dir:
eprint("got command line arg for config_dir:", config_dir)
config_dirs.append(config_dir)
if not config_dirs and "CONFIG_DIR" in os.environ:
config_dirs.append(os.environ["CONFIG_DIR"])
Expand Down
7 changes: 1 addition & 6 deletions hsds/domain_sn.py
Original file line number Diff line number Diff line change
Expand Up @@ -1123,11 +1123,6 @@ async def PUT_Domain(request):
else:
is_toplevel = False

if is_toplevel and not is_folder:
msg = "Only folder domains can be created at the top-level"
log.warn(msg)
raise HTTPBadRequest(reason=msg)

if is_toplevel and not isAdminUser(app, username):
msg = "creation of top-level domains is only supported by admin users"
log.warn(msg)
Expand Down Expand Up @@ -1164,7 +1159,7 @@ async def PUT_Domain(request):
linked_json = await getDomainJson(app, l_d, reload=True)
log.debug(f"got linked json: {linked_json}")
if "root" not in linked_json:
msg = "Folder domains cannot ber used as link target"
msg = "Folder domains cannot be used as link target"
log.warn(msg)
raise HTTPBadRequest(reason=msg)
root_id = linked_json["root"]
Expand Down
2 changes: 2 additions & 0 deletions hsds/hsds_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,8 @@ def run(self):
pargs = [py_exe, cmd_path, "--node_type=sn", "--log_prefix=sn "]
if self._username:
pargs.append(f"--hs_username={self._username}")
# make this user admin
pargs.append(f"--admin_user={self._username}")
if self._password:
pargs.append(f"--hs_password={self._password}")
if self._password_file:
Expand Down
4 changes: 4 additions & 0 deletions hsds/servicenode_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,10 @@ async def getObjectIdByPath(app, obj_id, h5path, bucket=None, refresh=False, dom
# find domain object is stored under
domain = link_json["h5domain"]

if domain.startswith("hdf5:/"):
# strip off prefix
domain = domain[6:]

if bucket:
domain = bucket + domain

Expand Down
4 changes: 4 additions & 0 deletions hsds/util/domainUtil.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,10 @@ def getDomainFromRequest(request, validate=True, allow_dns=True):
if not domain:
raise ValueError("no domain")

if domain.startswith("hdf5:/"):
# strip off the prefix to make following logic easier
domain = domain[6:]

if domain[0] != "/":
# DNS style hostname
if validate:
Expand Down
1 change: 1 addition & 0 deletions hsds/util/s3Client.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ def _get_client_kwargs(self):
kwargs["endpoint_url"] = self._s3_gateway
kwargs["use_ssl"] = self._use_ssl
kwargs["config"] = self._aio_config
log.debug(f"s3 kwargs: {kwargs}")
return kwargs

def _renewToken(self):
Expand Down
Loading

0 comments on commit 6e26cf4

Please sign in to comment.