-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Following the comment in discourse #162
Comments
Hi @sl-solution ! Can you please test against master? |
For scenarios like using Random, InMemoryDatasets
ds = Dataset(rand(1:1000, 10^7,3), :auto);
sds = view(ds, shuffle(1:nrow(ds)), :);
@time show(ds); # this is fast
@time show(sds); # 2.419944 seconds (5.29 k allocations: 283.562 KiB) |
Hi @sl-solution ! Sorry, I just have time to take a look at this issue now. After profiling, I see that almost the entire execution is inside this function: According to Tables.jl API, I need to access an element using this function: Tables.getcolumn(ctable.table, column_name)[I] In your case, it calls julia> @btime Tables.getcolumn(sds, :x1)[1]
4.688 ms (1 allocation: 48 bytes)
314
julia> @btime Tables.getcolumn(ds, :x1)[1]
147.847 ns (2 allocations: 272 bytes)
403 |
If that's the case, does the following help? julia> _tmp = Tables.getcolumn(sds, :x1) # this is expensive but we can do it once
julia> _tmp[1] |
Actually no, because this function is called to get the element at each cell. Thus, I cannot store it temporally. Since PrettyTables.jl must work with many other types of data, including those that do no support Tables.jl, it will not be good to create functions to store columns at each different case. |
but here _tmp isn't the whole column it is like a pointer to the actual values. |
Yes, but PrettyTables will call:
for each table element, leading to the same result. The only way to improve inside PrettyTables is to store a column and then get all the elements, instead of asking for a column at every cell. However, this is not feasible since I do not render only structures that supports Tables.jl API. |
I see, let me think more about this. |
Thank you for supporting the development of
InMemoryDatasets.jl
.I added a comment to your post in discourse (here), however, I thought it might be better to discuss it here.
Currently printing a large view of a data set is very slow, I wondering if this can be improved?
The text was updated successfully, but these errors were encountered: