Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set block-size for a file before creation #31

Open
magicDGS opened this issue Mar 22, 2017 · 3 comments
Open

Set block-size for a file before creation #31

magicDGS opened this issue Mar 22, 2017 · 3 comments
Labels

Comments

@magicDGS
Copy link

I would like to create an HDFS file with custom "dfs.blocksize". Is there any way to override the default block-size for a file before it is written? Thank you very much in advance.

My first try is something like this:

private final OutputStream getHdfsOutputStream(final java.nio.Path hdfsPath, final Long blockSize) {
    if ("hdfs".equals(outputPath.toUri().getScheme()) {
        hdfsPath.getFileSystem().provider().setAttribute(hdfsPath, "hadoop:blockSize",  blockSize);
    }
    return Files.newOutputStream(hdfsPath);
}

But it's not working because the setAttribute implementation only allows to set lastModifiedTime, lastAccessTime or creationTime. I need to create in a Java application different files with different block-sizes, but without touching the configuration file or modifying the Hadoop cluster. At the same time, I have to deal with non-Hadoop paths, so that's why I'm checking for hdfs URIs.

I would appreciate if someone could help me :)

@damiencarol
Copy link
Owner

damiencarol commented Aug 30, 2017

@magicDGS the concept of attribute in this Java API is to define existing files.
For example the attribute haddop:blockSize tell you the configuration about an existing file (in this implementation).
I don't know if it's possible to change bloc size of an existing file in current version of HDFS (2.x).
To do what you want to do (specify dfs.block.size BEFORE creating the files) you have 2 solutions:

  • change dfs.block.size value of the file site-hdfs.xml loaded in you application
  • change this implementation to implement some parameter mechanism

Something that will let us customize the configuration of the provider.

@magicDGS
Copy link
Author

Changing the site-hdfs.xml is not an option for my use case, because my block-size is going to change for every file, so it is not using the default. I also do not think that changing the provider configuration should be changed, because I want to modify the block-size in a per-Path basis.

Maybe adding a new class with HDFS-specific OpenOption to jsr203-hadoop and consider that in the HadoopFileSystem class may be useful. Let me know if this option works for you and I can submit a patch. Thank you in advance!

@damiencarol
Copy link
Owner

@magicDGS hello, I'm restarting the maintenance on this package. any patch is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants