Skip to content

Commit

Permalink
Redirecting stream config to BlockCache (#1445)
Browse files Browse the repository at this point in the history
* Auto convert streaming config to block-cache config
  • Loading branch information
ashruti-msft authored Nov 5, 2024
1 parent ba739b6 commit 46a557c
Show file tree
Hide file tree
Showing 22 changed files with 169 additions and 3,670 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
**Bug Fixes**
- [#1426](https://github.com/Azure/azure-storage-fuse/issues/1426) Read panic in block-cache due to boundary conditions.

**Other Changes**
- Stream config will be converted to block-cache config implicitly and 'stream' component is no longer used from this release onwards.

## 2.3.2 (2024-09-03)
**Bug Fixes**
- Fixed the case where file creation using SAS on HNS accounts was returning back wrong error code.
Expand Down
9 changes: 3 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Please submit an issue [here](https://github.com/azure/azure-storage-fuse/issues
## NOTICE
- Due to known data consistency issues when using Blobfuse2 in `block-cache` mode, it is strongly recommended that all Blobfuse2 installations be upgraded to version 2.3.2. For more information, see [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Known-issues).
- As of version 2.3.0, blobfuse has updated its authentication methods. For Managed Identity, Object-ID based OAuth is solely accessible via CLI-based login, requiring Azure CLI on the system. For a dependency-free option, users may utilize Application/Client-ID or Resource ID based authentication.
- `streaming` mode is being deprecated. This is the older option and is replaced with the `block-cache` mode which is the more performant streaming option.
- `streaming` mode is deprecated. Blobfuse2 will implicitly convert your streaming config to block-cache.

## Limitations in Block Cache
- Concurrent write operations on the same file using multiple handles is not checked for data consistency and may lead to incorrect data being written.
Expand All @@ -38,7 +38,7 @@ Visit [this](https://github.com/Azure/azure-storage-fuse/wiki/Blobfuse2-Supporte
- Basic file system operations such as mkdir, opendir, readdir, rmdir, open,
read, create, write, close, unlink, truncate, stat, rename
- Local caching to improve subsequent access times
- Streaming/Block-Cache to support reading AND writing large files
- Block-Cache to support reading AND writing large files
- Parallel downloads and uploads to improve access time for large files
- Multiple mounts to the same container for read-only workloads

Expand All @@ -65,7 +65,7 @@ One of the biggest BlobFuse2 features is our brand new health monitor. It allows
- CLI to check or update a parameter in the encrypted config
- Set MD5 sum of a blob while uploading
- Validate MD5 sum on download and fail file open on mismatch
- Large file writing through write streaming/Block-Cache
- Large file writing through write Block-Cache

## Blobfuse2 performance compared to blobfuse(v1.x.x)
- 'git clone' operation is 25% faster (tested with vscode repo cloning)
Expand Down Expand Up @@ -154,8 +154,6 @@ To learn about a specific command, just include the name of the command (For exa
* `--high-disk-threshold=<PERCENTAGE>`: If local cache usage exceeds this, start early eviction of files from cache.
* `--low-disk-threshold=<PERCENTAGE>`: If local cache usage comes below this threshold then stop early eviction.
* `--sync-to-flush=false` : Sync call will force upload a file to storage container if this is set to true, otherwise it just evicts file from local cache.
- Stream options
* `--block-size-mb=<SIZE IN MB>`: Size of a block to be downloaded during streaming.
- Block-Cache options
* `--block-cache-block-size=<SIZE IN MB>`: Size of a block to be downloaded as a unit.
* `--block-cache-pool-size=<SIZE IN MB>`: Size of pool to be used for caching. This limits total memory used by block-cache. Default - 80% of free memory available.
Expand Down Expand Up @@ -230,7 +228,6 @@ Below diagrams guide you to choose right configuration for your workloads.
<br/><br/>
- [Sample File Cache Config](./sampleFileCacheConfig.yaml)
- [Sample Block-Cache Config](./sampleBlockCacheConfig.yaml)
- [Sample Stream Config](./sampleStreamingConfig.yaml)
- [All Config options](./setup/baseConfig.yaml)


Expand Down
7 changes: 6 additions & 1 deletion azure-pipeline-templates/e2e-tests-block-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,12 @@ steps:
displayName: 'Unmount RW mount'
- script: |
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=$(WORK_DIR)/testdata/config/azure_key_bc.yaml --container-name=${{ parameters.container }} --temp-path=${{ parameters.temp_dir }} --output-file=${{ parameters.config_file }}
if [ "${{ parameters.idstring }}" = "Stream" ]; then
CONFIG_FILE=$(WORK_DIR)/testdata/config/azure_stream.yaml
else
CONFIG_FILE=$(WORK_DIR)/testdata/config/azure_key_bc.yaml
fi
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=$CONFIG_FILE --container-name=${{ parameters.container }} --temp-path=${{ parameters.temp_dir }} --output-file=${{ parameters.config_file }}
displayName: 'Create Config File for RO mount'
env:
NIGHTLY_STO_ACC_NAME: ${{ parameters.account_name }}
Expand Down
71 changes: 71 additions & 0 deletions blobfuse2-nightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1596,6 +1596,77 @@ stages:
mount_dir: $(MOUNT_DIR)
block_size_mb: "8"

- stage: StreamDataValidation
jobs:
# Ubuntu Tests
- job: Set_1
timeoutInMinutes: 300
strategy:
matrix:
Ubuntu-22:
AgentName: 'blobfuse-ubuntu22'
containerName: 'test-cnt-ubn-22'
adlsSas: $(AZTEST_ADLS_CONT_SAS_UBN_22)
fuselib: 'libfuse3-dev'
tags: 'fuse3'

pool:
name: "blobfuse-ubuntu-pool"
demands:
- ImageOverride -equals $(AgentName)

variables:
- group: NightlyBlobFuse
- name: ROOT_DIR
value: "/usr/pipeline/workv2"
- name: WORK_DIR
value: "/usr/pipeline/workv2/go/src/azure-storage-fuse"
- name: skipComponentGovernanceDetection
value: true
- name: MOUNT_DIR
value: "/usr/pipeline/workv2/blob_mnt"
- name: TEMP_DIR
value: "/usr/pipeline/workv2/temp"
- name: BLOBFUSE2_CFG
value: "/usr/pipeline/workv2/blobfuse2.yaml"
- name: GOPATH
value: "/usr/pipeline/workv2/go"

steps:
- template: 'azure-pipeline-templates/setup.yml'
parameters:
tags: $(tags)
installStep:
script: |
sudo apt-get update --fix-missing
sudo apt update
sudo apt-get install cmake gcc $(fuselib) git parallel -y
if [ $(tags) == "fuse2" ]; then
sudo apt-get install fuse -y
else
sudo apt-get install fuse3 -y
fi
displayName: 'Install fuse'

- template: 'azure-pipeline-templates/e2e-tests-block-cache.yml'
parameters:
conf_template: azure_stream.yaml
config_file: $(BLOBFUSE2_CFG)
container: $(containerName)
idstring: Stream
adls: false
account_name: $(NIGHTLY_STO_BLOB_ACC_NAME)
account_key: $(NIGHTLY_STO_BLOB_ACC_KEY)
account_type: block
account_endpoint: https://$(NIGHTLY_STO_BLOB_ACC_NAME).blob.core.windows.net
distro_name: $(AgentName)
quick_test: false
verbose_log: ${{ parameters.verbose_log }}
clone: true
# TODO: These can be removed one day and replace all instances of ${{ parameters.temp_dir }} with $(TEMP_DIR) since it is a global variable
temp_dir: $(TEMP_DIR)
mount_dir: $(MOUNT_DIR)

- stage: FNSDataValidation
jobs:
# Ubuntu Tests
Expand Down
1 change: 0 additions & 1 deletion cmd/imports.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,5 +40,4 @@ import (
_ "github.com/Azure/azure-storage-fuse/v2/component/file_cache"
_ "github.com/Azure/azure-storage-fuse/v2/component/libfuse"
_ "github.com/Azure/azure-storage-fuse/v2/component/loopback"
_ "github.com/Azure/azure-storage-fuse/v2/component/stream"
)
8 changes: 4 additions & 4 deletions cmd/mountv1.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,9 @@ import (
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/Azure/azure-storage-fuse/v2/component/attr_cache"
"github.com/Azure/azure-storage-fuse/v2/component/azstorage"
"github.com/Azure/azure-storage-fuse/v2/component/block_cache"
"github.com/Azure/azure-storage-fuse/v2/component/file_cache"
"github.com/Azure/azure-storage-fuse/v2/component/libfuse"
"github.com/Azure/azure-storage-fuse/v2/component/stream"

"github.com/spf13/cobra"
"github.com/spf13/pflag"
Expand Down Expand Up @@ -96,7 +96,7 @@ type PipelineConfig struct {
NonEmptyMountOption bool `yaml:"nonempty,omitempty"`
LogOptions `yaml:"logging,omitempty"`
libfuse.LibfuseOptions `yaml:"libfuse,omitempty"`
stream.StreamOptions `yaml:"stream,omitempty"`
block_cache.StreamOptions `yaml:"stream,omitempty"`
file_cache.FileCacheOptions `yaml:"file_cache,omitempty"`
attr_cache.AttrCacheOptions `yaml:"attr_cache,omitempty"`
azstorage.AzStorageOptions `yaml:"azstorage,omitempty"`
Expand All @@ -113,7 +113,7 @@ var bfv2FuseConfigOptions libfuse.LibfuseOptions
var bfv2FileCacheConfigOptions file_cache.FileCacheOptions
var bfv2AttrCacheConfigOptions attr_cache.AttrCacheOptions
var bfv2ComponentsConfigOptions ComponentsConfig
var bfv2StreamConfigOptions stream.StreamOptions
var bfv2StreamConfigOptions block_cache.StreamOptions
var bfv2ForegroundOption bool
var bfv2ReadOnlyOption bool
var bfv2NonEmptyMountOption bool
Expand All @@ -132,7 +132,7 @@ func resetOptions() {
bfv2FileCacheConfigOptions = file_cache.FileCacheOptions{}
bfv2AttrCacheConfigOptions = attr_cache.AttrCacheOptions{}
bfv2ComponentsConfigOptions = ComponentsConfig{}
bfv2StreamConfigOptions = stream.StreamOptions{}
bfv2StreamConfigOptions = block_cache.StreamOptions{}
bfv2ForegroundOption = false
bfv2ReadOnlyOption = false
bfv2NonEmptyMountOption = false
Expand Down
4 changes: 2 additions & 2 deletions cmd/mountv1_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ import (
"github.com/Azure/azure-storage-fuse/v2/common/log"
"github.com/Azure/azure-storage-fuse/v2/component/attr_cache"
"github.com/Azure/azure-storage-fuse/v2/component/azstorage"
"github.com/Azure/azure-storage-fuse/v2/component/block_cache"
"github.com/Azure/azure-storage-fuse/v2/component/file_cache"
"github.com/Azure/azure-storage-fuse/v2/component/stream"

"github.com/spf13/cobra"
"github.com/spf13/pflag"
Expand Down Expand Up @@ -607,7 +607,7 @@ func (suite *generateConfigTestSuite) TestCLIParamStreaming() {
suite.assert.Nil(err)

// Read the generated v2 config file
options := stream.StreamOptions{}
options := block_cache.StreamOptions{}

viper.SetConfigType("yaml")
config.ReadFromConfigFile(v2ConfigFile.Name())
Expand Down
1 change: 1 addition & 0 deletions common/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ import (

var RootMount bool
var ForegroundMount bool
var IsStream bool

// IsDirectoryMounted is a utility function that returns true if the directory is already mounted using fuse
func IsDirectoryMounted(path string) bool {
Expand Down
14 changes: 10 additions & 4 deletions component/block_cache/block_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ type BlockCache struct {
maxDiskUsageHit bool // Flag to indicate if we have hit max disk usage
noPrefetch bool // Flag to indicate if prefetch is disabled
prefetchOnOpen bool // Start prefetching on file open call instead of waiting for first read

lazyWrite bool // Flag to indicate if lazy write is enabled
fileCloseOpt sync.WaitGroup // Wait group to wait for all async close operations to complete
stream *Stream
lazyWrite bool // Flag to indicate if lazy write is enabled
fileCloseOpt sync.WaitGroup // Wait group to wait for all async close operations to complete
}

// Structure defining your config parameters
Expand Down Expand Up @@ -175,7 +175,13 @@ func (bc *BlockCache) Stop() error {
// Return failure if any config is not valid to exit the process
func (bc *BlockCache) Configure(_ bool) error {
log.Trace("BlockCache::Configure : %s", bc.Name())

if common.IsStream {
err := bc.stream.Configure(true)
if err != nil {
log.Err("BlockCache:Stream::Configure : config error [invalid config attributes]")
return fmt.Errorf("config error in %s [%s]", bc.Name(), err.Error())
}
}
defaultMemSize := false
conf := BlockCacheOptions{}
err := config.UnmarshalKey(bc.Name(), &conf)
Expand Down
13 changes: 13 additions & 0 deletions component/block_cache/block_cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ func (tobj *testObj) cleanupPipeline() error {
os.RemoveAll(tobj.fake_storage_path)
os.RemoveAll(tobj.disk_cache_path)

common.IsStream = false
return nil
}

Expand Down Expand Up @@ -2597,6 +2598,18 @@ func (suite *blockCacheTestSuite) TestReadWriteBlockInParallel() {
suite.assert.Equal(fs.Size(), int64(62*_1MB))
}

func (suite *blockCacheTestSuite) TestZZZZZStreamToBlockCacheConfig() {
common.IsStream = true
config := "read-only: true\n\nstream:\n block-size-mb: 16\n max-buffers: 80\n buffer-size-mb: 8\n"
tobj, err := setupPipeline(config)
defer tobj.cleanupPipeline()

suite.assert.Nil(err)
suite.assert.Equal(tobj.blockCache.Name(), "block_cache")
suite.assert.EqualValues(tobj.blockCache.blockSize, 16*_1MB)
suite.assert.EqualValues(tobj.blockCache.memSize, 8*_1MB*80)
}

// In order for 'go test' to run this suite, we need to create
// a normal test function and pass our suite to suite.Run
func TestBlockCacheTestSuite(t *testing.T) {
Expand Down
Loading

1 comment on commit 46a557c

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.60.

Benchmark suite Current: 46a557c Previous: 8c6a53e Ratio
read_10_20GB_file 55.08630561828613 seconds 30.89565896987915 seconds 1.78

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.