Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling virtual-hosted-style addressing #208

Closed
DamienMatias opened this issue Jun 3, 2024 · 6 comments · Fixed by #218
Closed

Allow disabling virtual-hosted-style addressing #208

DamienMatias opened this issue Jun 3, 2024 · 6 comments · Fixed by #218
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@DamienMatias
Copy link

DamienMatias commented Jun 3, 2024

Tell us more about this new feature.

Hello,

I was wondering if there was a way to force the usage of path style requests instead of the default virtual-hosted-style addressing ?
This is actually possible with mountpoint-s3 as you can see here in their documentation.

To disable virtual-hosted-style addressing, use the --force-path-style command-line flag to instead send requests to https://example.com/docexamplebucket/.

This would allow us using the LakeFS S3 Gateway and potentially other usages that leverage tools still built around the path style addressing.

Thank you 🙏

@DamienMatias DamienMatias added the enhancement New feature or request label Jun 3, 2024
@fuatbasik
Copy link
Contributor

Hello, @DamienMatias !
Thank you for your interest in Amazon S3 Connector for PyTorch. We'll discuss this internally report back here when we have something concrete to share.

@N-o-Z
Copy link

N-o-Z commented Jun 21, 2024

@DamienMatias Hi - lakeFS maintainer here 👋🏽
As a WA for this issue you can configure LAKEFS_GATEWAYS_S3_DOMAIN_NAME to enable lakeFS to work with a virtual host

@devel4848
Copy link

I'am also interested in disabling virtual-hosted-style addressing. I would like to use S3 Connector for PyTorch with a S3-compatible Ceph storage that can not be configured for virtual-hosted-style addressing because of the DNS implication. Only path-style addressing can be used in our context.

@jamesbornholt
Copy link
Member

The S3 client we use supports disabling virtual-hosted-style addressing here, so I think this would just be a matter of plumbing through a new flag from the various constructors (S3IterableDataset.from_prefix and friends) to the Rust constructor here, similar to #195. I'm not sure we're going to get to this in the short term, but we'd happily review a PR!

@dannycjones dannycjones added the good first issue Good for newcomers label Jul 30, 2024
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Jul 31, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, path_style=True)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, path_style=True)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION, path_style=True)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Jul 31, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, path_style=True)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, path_style=True)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION, path_style=True)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Jul 31, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, path_style=True)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, path_style=True)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION, path_style=True)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Aug 1, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3ClientConfig, S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"
s3client_config = S3ClientConfig(path_style=True)

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint, S3ClientConfig

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
checkpoint = S3Checkpoint(region=REGION, s3client_config=s3client_config)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Aug 1, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3ClientConfig, S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"
s3client_config = S3ClientConfig(path_style=True)

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint, S3ClientConfig

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
s3client_config = S3ClientConfig(path_style=True)
checkpoint = S3Checkpoint(region=REGION, s3client_config=s3client_config)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Aug 7, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3ClientConfig, S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint, S3ClientConfig

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)
checkpoint = S3Checkpoint(region=REGION, s3client_config=s3client_config)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Aug 7, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3ClientConfig, S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint, S3ClientConfig

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)
checkpoint = S3Checkpoint(region=REGION, s3client_config=s3client_config)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
balamurugana added a commit to balamurugana/s3-connector-for-pytorch that referenced this issue Aug 7, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3ClientConfig, S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint, S3ClientConfig

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)
checkpoint = S3Checkpoint(region=REGION, s3client_config=s3client_config)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes awslabs#208

Signed-off-by: Bala.FA <[email protected]>
IsaevIlya pushed a commit that referenced this issue Aug 8, 2024
This PR extends support to other S3 object storage like MinIO which
has path-style addressing to access bucket/object.

An example is like
```py
from s3torchconnector import S3ClientConfig, S3MapDataset, S3IterableDataset

DATASET_URI="s3://<BUCKET>/<PREFIX>"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)

iterable_dataset = S3IterableDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

for item in iterable_dataset:
  print(item.key)

map_dataset = S3MapDataset.from_prefix(DATASET_URI, region=REGION, s3client_config=s3client_config)

item = map_dataset[0]

bucket = item.bucket
key = item.key
content = item.read()
len(content)
```

And

```py
from s3torchconnector import S3Checkpoint, S3ClientConfig

import torchvision
import torch

CHECKPOINT_URI="s3://<BUCKET>/<KEY>/"
REGION = "us-east-1"
s3client_config = S3ClientConfig(force_path_style=True)
checkpoint = S3Checkpoint(region=REGION, s3client_config=s3client_config)

model = torchvision.models.resnet18()

with checkpoint.writer(CHECKPOINT_URI + "epoch0.ckpt") as writer:
    torch.save(model.state_dict(), writer)

with checkpoint.reader(CHECKPOINT_URI + "epoch0.ckpt") as reader:
    state_dict = torch.load(reader)

model.load_state_dict(state_dict)
```

Fixes #208

Signed-off-by: Bala.FA <[email protected]>
@dannycjones
Copy link

dannycjones commented Aug 8, 2024

I'll reopen this as while its merged, there's no new release. (Looks like I mistakenly linked closing the PR to closing this issue.)

This should be supported in the next published version!

@dannycjones dannycjones reopened this Aug 8, 2024
@IsaevIlya
Copy link
Contributor

This feature was release in v1.2.5, so closing the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants