Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when files are overwritten with CRS set #69

Closed
wiesehahn opened this issue Jul 8, 2024 · 7 comments
Closed

error when files are overwritten with CRS set #69

wiesehahn opened this issue Jul 8, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@wiesehahn
Copy link

I was wondering if it would be possible to update/overwrite existing files in place. My original intend was to compress files from LAS to LAZ and directly get rid of uncompressed files, but I guess this is not possible with lasR.

I also tried to update files with wrong CRS information, what happens is that files are actually overwritten, and in this case header information about CRS is updated. However, files are only 257kb in size afterwards, it seems they are cropped to smaller extent.

I guess this is not intended, otherwise its easy to corrupt entire archives of data!?

library(lasR)
folder <- "L:/lidar/ALS/he/tmp"
file <- list.files(folder, full.names = T)


file.size(file)
#> [1] 175523266
las <-  lidR::readLAS(file)
summary(las)
#> class        : LAS (v1.3 format 1)
#> memory       : 1.6 Gb 
#> extent       : 539000, 540000, 5622000, 5623000 (xmin, xmax, ymin, ymax)
#> coord. ref.  : NA 
#> area         : 1 kunits²
#> points       : 28.34 million points
#> density      : 28.34 points/units²
#> density      : 18.04 pulses/units²
#> File signature:           LASF 
#> File source ID:           0 
#> Global encoding:
#>  - GPS Time Type: GPS Week Time 
#>  - Synthetic Return Numbers: no 
#>  - Well Know Text: CRS is GeoTIFF 
#>  - Aggregate Model: false 
#> Project ID - GUID:        00000000-0000-0000-0000-000000000000 
#> Version:                  1.3
#> System identifier:        LAStools (c) by rapidlasso GmbH 
#> Generating software:      las2las (version 190307) 
#> File creation d/y:        182/2019
#> header size:              235 
#> Offset to point data:     235 
#> Num. var. length record:  0 
#> Point data format:        1 
#> Point data record length: 28 
#> Num. of point records:    28343611 
#> Num. of points by return: 18042017 7036515 2522135 620908 107134 
#> Scale factor X Y Z:       0.001 0.001 0.001 
#> Offset X Y Z:             0 6e+06 0 
#> min X Y Z:                539000 5622000 238.524 
#> max X Y Z:                540000 5623000 398.355 
#> Variable Length Records (VLR):  void
#> Extended Variable Length Records (EVLR):  void
pipeline = set_crs(25832) + write_las(ofile =paste0(folder, "/*.laz"))
exec(pipeline, with = list(progress = TRUE, ncores = concurrent_files(half_cores())), on = file)
#> Read files headers: [==========] 100% (1 threads)                    Overall: [          ] 0% (1 threads) | : no progress                     Overall: [==========] 100% (1 threads) |                     Overall: [==========] 100% (1 threads)                    
#> [1] "L:\\lidar\\ALS\\he\\tmp\\3dm_32539_5622_1_he.laz"
file.size(file)
#> [1] 262568
las <-  lidR::readLAS(file)
summary(las)
#> class        : LAS (v1.3 format 1)
#> memory       : 2.1 Mb 
#> extent       : 539000, 539070.7, 5622417, 5622500 (xmin, xmax, ymin, ymax)
#> coord. ref.  : ETRS89 / UTM zone 32N 
#> area         : 4696 m²
#> points       : 39.2 thousand points
#> density      : 8.34 points/m²
#> density      : 4.85 pulses/m²
#> File signature:           LASF 
#> File source ID:           0 
#> Global encoding:
#>  - GPS Time Type: GPS Week Time 
#>  - Synthetic Return Numbers: no 
#>  - Well Know Text: CRS is GeoTIFF 
#>  - Aggregate Model: false 
#> Project ID - GUID:        00000000-0000-0000-0000-000000000000 
#> Version:                  1.3
#> System identifier:        LAStools (c) by rapidlasso GmbH 
#> Generating software:      las2las (version 190307) 
#> File creation d/y:        182/2019
#> header size:              235 
#> Offset to point data:     305 
#> Num. var. length record:  1 
#> Point data format:        1 
#> Point data record length: 28 
#> Num. of point records:    39183 
#> Num. of points by return: 22763 11707 3697 830 160 
#> Scale factor X Y Z:       0.001 0.001 0.001 
#> Offset X Y Z:             0 6e+06 0 
#> min X Y Z:                539000 5622417 274.449 
#> max X Y Z:                539070.7 5622500 326.112 
#> Variable Length Records (VLR):
#>    Variable Length Record 1 of 1 
#>        Description: by LAStools of rapidlasso GmbH 
#>        Tags:
#>           Key 3072 value 25832 
#> Extended Variable Length Records (EVLR):  void

Created on 2024-07-08 with reprex v2.1.0

@Jean-Romain
Copy link
Collaborator

Jean-Romain commented Jul 8, 2024

I was wondering if it would be possible to update/overwrite existing files in place. My original intend was to compress files from LAS to LAZ and directly get rid of uncompressed files, but I guess this is not possible with lasR.

What you want is to create new LAZ files with the same names as the LAS files and later remove the LAS files. Replacing the original collection of LAS files on-the-fly would technically be possible but requires custom developments such as the internal list of files is updated on-the-fly with a lot of tests to guarantee that everything is ok and valid.

I also tried to update files with wrong CRS information, what happens is that files are actually overwritten, and in this case header information about CRS is updated. However, files are only 257kb in size afterwards, it seems they are cropped to smaller extent.

I tried this minimal reproducible example. Everything looks good to me

library(lasR)
file <- system.file("extdata", "MixedConifer.las", package = "lasR")

pipeline = set_crs(25832) + write_las(ofile = paste0(tempfile(fileext = ".laz")))
ans = exec(pipeline, on = file)

file.size(file)
#> [1] 1356219
file.size(ans)
#> [1] 266571

library(lidR)
las1 = readLAS(file)
las2 = readLAS(ans)
plot(las1)
plot(las2)

@wiesehahn
Copy link
Author

sorry for mixing this up, compressing files in place was just my initial thought and has nothing to do with this issue I guess.
Your example works for me as well.

In my example above we overwrite a file which initially has no proper CRS set (it is compressed before already). It should have the same number of points and its area is 1kunits² before and should be 1km² afterwards.

@Jean-Romain
Copy link
Collaborator

Jean-Romain commented Jul 8, 2024

Does this looks like your use case? In this case you have an error and you corrupted your original file. I should add a protection against overwriting files.

What is happening is that MixedConifer.laz is not loaded in memory but streamed, i.e. we are loading one point at a time in memory. But while we are reading the file we are also overwriting it. This is obviously failing.

library(lasR)
file <- system.file("extdata", "MixedConifer.laz", package = "lidR")
file2 =  tempfile(fileext = ".laz")

file.copy(file, file2)

file.size(file)
file.size(file2)

pipeline = set_crs(25832) + write_las(ofile = file2)
ans = exec(pipeline, on = file2)

file.size(file)
file.size(file2)
file.size(ans)

@wiesehahn
Copy link
Author

Does this looks like you use case? In this case you have an error and you corrupted your original file. I should add a protection against overwriting files.

Yes exactly, I tried this because if we are trying to update files (e.g. set CRS, compress,...) it would be convenient to not duplicate memory on disk if running it on large collections (temporarily if we delete old files afterwards). But if it is not possible there should be an error instead of corrupting files I think.

What is happening is that MixedConifer.laz is not loaded in memory but streamed, i.e. we are loading one point at a time in memory. But while we are reading the file we are also overwriting it. This is obviously failing.

With your example I also get a slightly smaller file which is corrupted, but I also get an error ERROR: ERROR: 'end-of-file during chunk with index 0' after 37176 of 37657 points, which was not the case in my example.

@Jean-Romain
Copy link
Collaborator

Yes exactly, I tried this because if we are trying to update files (e.g. set CRS, compress,...) it would be convenient to not duplicate memory on disk if running it on large collections (temporarily if we delete old files afterwards).

I understand. Your use case is not supported yet but this is not something technically impossible. By the way you could do it by forcing the data to be loaded in memory. In this case overwriting the file does not corrupt it reading because it has already been read. Yet, if the pipeline crashes for any reason you loose your original file. This is why a better method would be to write a temporary file and move it if the writing is successful.

pipeline = set_crs(25832) + lasR:::nothing(read = TRUE) + write_las(ofile = file2)
ans = exec(pipeline, on = file2)

But if it is not possible there should be an error instead of corrupting files I think.

Absolutely

With your example I also get a slightly smaller file which is corrupted, but I also get an error ERROR: ERROR: 'end-of-file during chunk with index 0' after 37176 of 37657 points, which was not the case in my example.

Please provide a reproducible example then.

@wiesehahn
Copy link
Author

Please provide a reproducible example then.

Not yet sure how to do this, probably by making the CRS in header invalid.

trying to do this like this crashes R instantaneously

library(lasR)
file <- system.file("extdata", "MixedConifer.laz", package = "lidR")
file2 =  tempfile(fileext = ".laz")

pipeline = set_crs(0) + write_las(ofile = file2)
ans = exec(pipeline, on = file)

while pipeline = set_crs(1000) + write_las(ofile = file2)
provides Error: EPSG:1000 PROJ: proj_create_from_database: crs not found

@Jean-Romain
Copy link
Collaborator

Jean-Romain commented Jul 8, 2024

I added a test to prevent overwritting the processed files (devel). Yet your use case it valid and this could be a new feature

@Jean-Romain Jean-Romain self-assigned this Jul 10, 2024
@Jean-Romain Jean-Romain added the enhancement New feature or request label Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants