Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3FileSystemProvider.newInputStream issue #87

Open
glassfox opened this issue Dec 4, 2017 · 4 comments
Open

S3FileSystemProvider.newInputStream issue #87

glassfox opened this issue Dec 4, 2017 · 4 comments

Comments

@glassfox
Copy link

glassfox commented Dec 4, 2017

s3fs version: 1.5.3

Hi all,

When I trying to read a lot of files. 2000 simultaneous files and more. Thrown exception:

com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool 
    	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1038) 
    	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742) 
    	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716) 
    	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) 
    	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) 
    	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) 
    	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) 
    	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4191) 
    	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4138) 
    	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1385) 
    	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1263) 
    	at com.upplication.s3fs.S3FileSystemProvider.newInputStream(S3FileSystemProvider.java:347)  


Proposal:
After short investigation in internet, I found that S3Object object required to be closed in the end of use.
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3Object.html#close--

@jarnaiz
Copy link
Member

jarnaiz commented Dec 9, 2017

Hello @glassfox

I try to reproduce the issue but I cant...
If i close the s3Object of the S3FileSystemProvider#newInputStream(...)

S3Object object = s3Path.getFileSystem().getClient().getObject(s3Path.getFileStore().name(), key);

Then we cant read the inputstream because I close the Object who hold the InputStream.

Maybe the problem is the AmazonS3Client itself or the Amazon S3 limits:
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

This is my test to try to reproduce the issue:

@Test
    public void testConcurrency() throws IOException, InterruptedException {

        int concurrency = 2000;

        //final String content = "sample content";
        //final Path file = uploadSingleFile(content);
        final FileSystemProvider provider = fileSystemAmazon.provider();

        final List<Path> uploadedFiles = new ArrayList<>();
        final List<String> uploadedContent = new ArrayList<>();

        for (int i = 0; i < concurrency; i++) {
            final String content = "sample content"+i;
            final Path file = uploadSingleFile(content);
            uploadedFiles.add(file);
            uploadedContent.add(content);
        }

        ExecutorService service = Executors.newFixedThreadPool(concurrency);
        for (int i = 0; i < concurrency; i++) {
            final int index = i;
            service.execute(new Runnable() {
                @Override
                public void run() {

                    try {
                        System.out.println("Reading: " + index);
                        InputStream stream = provider.newInputStream(uploadedFiles.get(index));
                        String result = new String(IOUtils.toByteArray(stream));
                        assertEquals(uploadedContent.get(index), result);
                        stream.close();
                        System.out.println("Closing: " + index);
                    } catch (IOException e) {
                        fail("err!");
                    }
                }
            });
        }

        service.awaitTermination(20, TimeUnit.SECONDS);
    }

private static final String bucket = EnvironmentBuilder.getBucket();
    private static final URI uriGlobal = EnvironmentBuilder.getS3URI(S3_GLOBAL_URI_IT);

    private FileSystem fileSystemAmazon;

    @Before
    public void setup() throws IOException {
        System.clearProperty(S3FileSystemProvider.AMAZON_S3_FACTORY_CLASS);
        fileSystemAmazon = build();
    }

    private static FileSystem createNewFileSystem() throws IOException {
        return FileSystems.newFileSystem(uriGlobal, EnvironmentBuilder.getRealEnv());
    }

    private static FileSystem build() throws IOException {
        try {
            FileSystems.getFileSystem(uriGlobal).close();
            return createNewFileSystem();
        } catch (FileSystemNotFoundException e) {
            return createNewFileSystem();
        }
    }

    private Path uploadSingleFile(String content) throws IOException {
        try (FileSystem linux = MemoryFileSystemBuilder.newLinux().build("linux")) {

            Path file = Files.createFile(linux.getPath(UUID.randomUUID().toString()));
            Files.write(file, content.getBytes());

            Path result = fileSystemAmazon.getPath(bucket, UUID.randomUUID().toString());

            Files.copy(file, result);
            return result;
        }
    }

    private Path uploadDir() throws IOException {
        try (FileSystem linux = MemoryFileSystemBuilder.newLinux().build("linux")) {
            Path assets = Files.createDirectories(linux.getPath("/upload/assets1"));
            Path dir = fileSystemAmazon.getPath(bucket, "0000example" + UUID.randomUUID().toString() + "/");
            Files.walkFileTree(assets.getParent(), new CopyDirVisitor(assets.getParent(), dir));
            return dir;
        }
    }

@glassfox
Copy link
Author

glassfox commented Dec 10, 2017 via email

@jarnaiz
Copy link
Member

jarnaiz commented Dec 12, 2017

Hi,

That is an option and im goint to test it, but I read the source code of S3Object and if Im not wrong, the close method, only close the InputStream, and is the InputStream (wrapped) who release all the http connections.

You can see here:

@glassfox
Copy link
Author

glassfox commented Dec 12, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants