Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the project under FSCrawler? #11

Open
dadoonet opened this issue Jan 31, 2019 · 8 comments
Open

Move the project under FSCrawler? #11

dadoonet opened this issue Jan 31, 2019 · 8 comments

Comments

@dadoonet
Copy link

Hey @shadiakiki1986

What do you think of moving your project under https://github.com/dadoonet/fscrawler?
I would like to provide OOTB a Docker image for FSCrawler (dadoonet/fscrawler#586) but I don't want to reinvent the wheel as you did a lot of job here.

WDYT?

@shadiakiki1986
Copy link
Owner

shadiakiki1986 commented Jan 31, 2019 via email

@dadoonet
Copy link
Author

Well. I'd need a pull request! 🤣

I mean: would you like to contribute such a thing?
Note that I'm now producing a version per major elasticsearch version, 5.x, 6.x and 7.x from now.

@shadiakiki1986
Copy link
Owner

shadiakiki1986 commented Jan 31, 2019 via email

@dadoonet
Copy link
Author

I think we need to create new modules under https://github.com/dadoonet/fscrawler/tree/master/distribution like docker or docker-es7, docker-es6, docker-es5 and follow the example described at https://github.com/moditect/moditect/wiki/Creating-dependency-images#example-a-dependency-image-for-vertx

I have no experience on this 😄

@shadiakiki1986
Copy link
Owner

shadiakiki1986 commented Jan 31, 2019

The main docker file in this repository is for fscrawler only, and not for elasticsearch. The way it works is that it downloads a particular branch of the fscrawler repository (could be master). You can see this here.

There is a separate docker file for the elasticsearch instance, but it's just a wrapper that uses the official elasticsearch image from docker.elastic.co. You can see this here.

Therefore, it makes more sense to just have a docker folder instead of docker-es{5,6,7}.

About the moditect repository, their idea of splitting the dockerfile into 2 files is nice, but I feel it's an overkill for the fscrawler dockerfile. The logs on hub.docker.com don't show the exact time it takes to build the image, but I don't think it was more than a minute. What would be more beneficial is the total size of the docker image. I was using alpine linux first as a base image for fscrawler 2.4, and the fscrawler image was 250 MB. Later, I faced problems compiling fscrawler 2.5 and 2.6 with alpine linux, so I just moved to using ubuntu as the base image. This inflated the docker image to 750 MB (ref). But it hasn't been a big deal for my use though.

@dadoonet
Copy link
Author

The main docker file in this repository is for fscrawler only, and not for elasticsearch.

Yeah. FSCrawler es 5,6,7 actually does not embed any elasticsearch instance. It just describes which Rest High Level Client is packaged with FSCrawler.

So if you want FSCrawler to speak with ES5, you need to use the es5 version of FSCrawler.
About moditect, it was just an example. The interesting part to me is:

<plugin>
    <groupId>io.fabric8</groupId>
    <artifactId>docker-maven-plugin</artifactId>
    <executions>
        <execution>
            <id>build-dependency-image</id>
            <phase>package</phase>
            <goals>
                <goal>build</goal>
            </goals>
            <configuration>
                <images>
                    <image>
                        <alias>vertx-helloworld-base</alias>
                        <name>moditect/vertx-helloworld-base</name>
                        <build>
                            <dockerFileDir>${project.basedir}/src/main/docker-base</dockerFileDir>
                            <assembly>
                                <descriptor>assembly-base.xml</descriptor>
                            </assembly>
                        </build>
                    </image>
                </images>
            </configuration>
        </execution>
    </executions>
</plugin>

@dadoonet
Copy link
Author

I realize I should have only linked to http://dmp.fabric8.io/ and specifically http://dmp.fabric8.io/#docker:build and http://dmp.fabric8.io/#docker:source

@shadiakiki1986
Copy link
Owner

Yeah. FSCrawler es 5,6,7 actually does not embed any elasticsearch instance. It just describes which Rest High Level Client is packaged with FSCrawler.

Ok I understand now. In this case, yes it makes sense to have docker-es{5,6,7} folders.

I realize I should have only linked to http://dmp.fabric8.io/ and specifically http://dmp.fabric8.io/#docker:build and http://dmp.fabric8.io/#docker:source

I see you want to use the docker-maven-plugin to build the docker images from maven. I don't have experience with that either, but the docs under 1.1. Building Images say

An external Dockerfile can be specified in which Maven properties can be inserted. This is also the default mode, if only a single image should be built and a top-level Dockerfile exists. See Simple Dockerfile build for details of this zero XML configuration mode.

I would expect that this simple build works for the existing docker file as it is. Of course adding xml tags to control which Rest High Level Client to use would reduce the need from having 3 docker files to just one docker file with a parameter that comes different from each of 3 different xml files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants