Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Las 1.3 and 1.4 read support. #16

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

[WIP] Las 1.3 and 1.4 read support. #16

wants to merge 13 commits into from

Conversation

evetion
Copy link
Collaborator

@evetion evetion commented Apr 1, 2018

  • Read in EVLR
  • Support custom attributes pasted on top of standard point types 😞
  • Read in waveform data
  • Tests for waveform

@evetion
Copy link
Collaborator Author

evetion commented Apr 1, 2018

@visr In my opinion this requires us to change our header, points = load() ways, because of several reasons:

  • EVLR can only be read after the pointdata (if we support pipes)
  • Waveform data in the las file (or externally) doesnt fit in either header or points
  • 1.4 supports very large files (both in counts as in the extra waveform data) which leads to not wanting to load all points or waveform at once.

I'm thinking about the following API (apart from the whole datastreams).

header = load(file)
points = points(header)
waves = waves(header)

Where the header (could be renamed) stores the file/stream reference and can load the points or waves on the fly. All small VLRs can be stored in the header, but larger ones (internal waves) will only be streamed/mmapped.

I'm also thinking of disabling pipes for these versions and updating LazIO with 1.3/1.4 support. Pipes for loading .laz files can't work with this API at all and with the filesizes it quickly leads to memory problems.

Let me know your thoughts.

@evetion evetion changed the title Las 1.3 and 1.4 read support. [WIP] [WIP] Las 1.3 and 1.4 read support. Apr 1, 2018
@evetion
Copy link
Collaborator Author

evetion commented Apr 1, 2018

I'm aware of the las14 branch started by @c42f, and will probably merge the work in there (mainly fixed size strings) at a later stage.

@visr
Copy link
Owner

visr commented Apr 1, 2018

Exciting! Yeah I realize the API will probably need to change to support this. Let's try to get it right this time. Will need some time to go over this.

Regarding the fixed size strings, I was also looking at https://github.com/JuliaComputing/FixedSizeStrings.jl, but it's not very active, so might be better to provide for that in here for now.

@evetion
Copy link
Collaborator Author

evetion commented Apr 17, 2018

Got a first working version of waveform parsing for the internal 1.3 spec. Still using hardcoded vlrs and assuming 8 bits per sample, but we're getting there.

h, p, e = load("test/LDR100301_120340_4.LAS")
wf = LasIO.waveform(p[3], h, e[end])
plot(wf[:, 1], wf[:,end], label="$(p[3])", xlabel="Distance (ps)", ylabel="Sample (V)", title="Digitized waveform")

wf

@codecov-io
Copy link

codecov-io commented Apr 22, 2018

Codecov Report

Merging #16 into master will decrease coverage by 5.86%.
The diff coverage is 43.68%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #16      +/-   ##
==========================================
- Coverage   62.62%   56.75%   -5.87%     
==========================================
  Files           8       10       +2     
  Lines         396      481      +85     
==========================================
+ Hits          248      273      +25     
- Misses        148      208      +60
Impacted Files Coverage Δ
src/LasIO.jl 100% <ø> (ø) ⬆️
src/waveform.jl 0% <0%> (ø)
src/point.jl 61.53% <0%> (-2.47%) ⬇️
src/meta.jl 3.12% <0%> (-1.43%) ⬇️
src/srs.jl 91.54% <100%> (-0.02%) ⬇️
src/vlrs.jl 39.72% <29.03%> (-60.28%) ⬇️
src/fileio.jl 43.75% <42.3%> (-2.41%) ⬇️
src/header.jl 87.37% <75%> (+19.7%) ⬆️
src/fixedstrings.jl 95% <95%> (ø)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c35f4c...2e87503. Read the comment docs.

@evetion
Copy link
Collaborator Author

evetion commented Apr 24, 2018

@visr Could you review the current functionality and the way it's implemented? The goal here is to fully support reading LAS 1.x.

I will start writing tests and documentation now. Write support can be added later in another PR.

Copy link
Owner

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking of disabling pipes for these versions and updating LazIO with 1.3/1.4 support. Pipes for loading .laz files can't work with this API at all and with the filesizes it quickly leads to memory problems.

Yeah dropping this piping LAZ support is fine by me. We can work on getting LazIO up to speed afterwards.

I'm thinking about the following API (apart from the whole datastreams).

header = load(file)
points = points(header)
waves = waves(header)

Where the header (could be renamed) stores the file/stream reference and can load the points or waves on the fly. All small VLRs can be stored in the header, but larger ones (internal waves) will only be streamed/mmapped.

Something like that sounds good to me, it's also in line with #4 (comment). How do you decide what's a small VLR and what is big though? Do you mean VLR vs EVLR? We don't want to have a situation where people don't have enough RAM to read the header since it also reads large VLRs. So perhaps we should take it out of the header altogether? It aligns also better with the structure described in the spec:

image

We might want to name the functions laspoints/getpoints or something like that if we want to export them.

variable_length_records::Vector{LasVariableLengthRecord}
user_defined_bytes::Vector{UInt8}
end

function Base.show(io::IO, header::LasHeader)
n = Int(header.records_count)
n = Int(header.records_count_new)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that here you need to do something similar as above?

n = lv >= v"1.4" ? header.records_count_new : header.records_count

@@ -128,15 +131,15 @@ end

function save(s::Stream{format"LAS"}, header::LasHeader, pointdata::AbstractVector{<:LasPoint})
# checks
header_n = header.records_count
header_n = header.records_count_new
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here

@@ -64,7 +64,7 @@ mutable struct LasHeader
point_return_count_new::Vector{UInt64} # 15

# VLRs
variable_length_records::Vector{LasVariableLengthRecord}
variable_length_records::Dict{UInt16, Union{LasVariableLengthRecord, ExtendedLasVariableLengthRecord}}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you prefer to make the Vector into a Dict?

Would be good to mention here what the key is supposed to be. It is Record ID right?
That means we cannot store two of the same Record IDs, I don't see why that should not be allowed. Also from the LAS 1.4 spec:

The Record ID is dependent upon the User ID.

Different User IDs are likely to use the same Record IDs.

@@ -8,7 +8,7 @@ using FileIO
using FixedPointNumbers
using ColorTypes
using GeometryTypes # for conversion
using Compat
using StaticArrays
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUIRE needs to be updated for this

@visr
Copy link
Owner

visr commented Dec 6, 2018

Rebased, this branch now runs on julia 1.0 as well.

@c42f
Copy link
Collaborator

c42f commented Dec 7, 2018

Cool stuff guys, sorry I can't help with it (don't have the time... I no longer use this stuff at all!)

Random additional thoughts on lazy loading:

las_file = LasIO.open("test.las") # create `LASFile` stream, reading only the minimal amount (header without VLRs?)
# Following possibly return iterators?
# How to get points to associate neatly with waveforms? `mmap` would help a lot there.
p = collect(points(las_file))
v = collect(vlrs(las_file))
h = header(las_file)

data = LasIO.load("test.las") # just load all the data as in the old interface
# Could return a `LASData` instead of a tuple?
p = points(data)
h = header(data)
v = vlrs(data)
w = waveforms(data)

# It's a bit weird, but if `LASData` supports iteration, destructuring can be made to work
# (not suitable for production code which will have to deal with optional waveforms)
header,points = LasIO.load("test.las")

The function names there are a bit generic, not to be taken too seriously ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants