Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing of large files takes a long time #74

Open
dstenger opened this issue Jun 26, 2018 · 11 comments
Open

Testing of large files takes a long time #74

dstenger opened this issue Jun 26, 2018 · 11 comments
Assignees

Comments

@dstenger
Copy link
Contributor

This issue occurs with the current status of master branch.

Testing of large files takes a very long time.

E.g. with version 0.6, a GeoPackage file of size 700 MB can be tested in around 3 Minutes.
With current version of master branch, testing a GeoPackage file of size 700 MB takes more than 1 hour (I cancelled the test run after 1 hour).

What is the reason for this behaviour?
This makes testing of large files very difficult.

@jyutzler
Copy link
Contributor

This is due to the new geometry tests built by @ajanett in #62. Yes, this takes a long time. What do you suggest we do about it?

@lgoltz
Copy link
Contributor

lgoltz commented Jun 26, 2018

@ajanett Can you explain why the test needs so long? What are your suggestions?

@ajanett
Copy link
Contributor

ajanett commented Jun 26, 2018

See issue #61. The ETS code is now visiting each and every feature instance and examining the geometry BLOB of each one. Yes, this can take a long time, even in a not so large GeoPackage. I don't like the time it is taking but I've looked at the implementation options and it currently is as good as it'll get given the ETS framework. All of the tests that are doing this into one code block in order to reduce the processing time as much as possible. However, the fact remains that each geometry BLOB is looked at now and this will take a LONG time. I've suggested allowing an option on ETS for geopackage to provide for a preliminary test looking only at a certain subset of each feature - say only testing the first N rows ... because in 90% of the time, if there is a problem it'll be in all of the feature instances... therefore do we really need to examine each feature instance all the time? Another option might be to provide a secondary ETS geopackage test that focuses only on the geometry tests since they do take so long.

@ajanett
Copy link
Contributor

ajanett commented Jun 26, 2018

There are 2 tests we could remove (or comment out) that are likely taking the most time: 78 (which the requirement for 78 was removed), and 67 which is testing an extension requirement. This will only help a little bit, but since these two involve additional SQL statements, I expect removal of these would provide the maximum relief.

It sure would be nice if TEAM Engine at least gave us some feedback that the tests are actually being run.

The only other options are those I've listed previously:

  • Provide a parameter allowing an option to limit the test to N feature instance rows. I find this type of thing to be very valuable in database schema tests in general. Run a test first with a limit on the number of rows so it can be determined quickly whether the geopackage is valid at all. If it passes, run it a second time over all of the data.
  • Split up the ETS for GeoPackage into two ETS packages. One for general content, and the second for feature geometry tests that take a long time.

@dstenger
Copy link
Contributor Author

I am in favour of creating an option (e.g. extended geometry validation) which activates full validation of geometries.
If this option is not activated, limited validation is executed (e.g. not all geometries are validated).

@jyutzler
Copy link
Contributor

@dstenger Please specify how you want this option added to the system. I do not know enough about the Team Engine to know what you need here. Once it is specified, I will attempt to implement it.

@bermud
Copy link
Contributor

bermud commented Jul 2, 2018

I suggest to:
1- (default and can be use for certification) test the first one found for each feature type
2- add a parameter (option) that allows to do the full geometry validation

@dstenger
Copy link
Contributor Author

A quick fix solves the problem temporarily: ab9eb29

However, a clean solution should be implemented as soon as possible.

@ajanett
Copy link
Contributor

ajanett commented Jul 13, 2018

I believe the fix needs to change the initial SQL statement in featureGeometryEncodingTesting(), adding the LIMIT clause when we want to restrict the test to a certain number of records, because then we wouldn't be fetching an excessive number of records.

@jyutzler
Copy link
Contributor

This is currently assigned to me, but I am not aware of any action expected of me. Please revise and/or re-assign.

@dstenger
Copy link
Contributor Author

dstenger commented Jan 6, 2021

As far as I see, ajanett#1 is not part of the master branch of this repository. We should check whether we want to migrate the code.
Afterwards, we can verify if it solves the problem described in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To do
Development

No branches or pull requests

5 participants