Skip to content
Dan Kelley edited this page Jan 5, 2017 · 5 revisions

This work is being done in branch dk-1093-large-rdi, not in an older branch called dk-1093. (I am experimenting with the idea of more informative branch names, of the form developerInitials-issueNumber-WordsSeparatedWithHyphens.)

  • 2017 Jan 4 I think things are working now, for blocks where from and to yield a subset that is small enough to fit into R. However, I do not think this is the common use case. When I work with data, I would likely prefer to work with by argument, to get a rough overview of the whole timeseries, before focussing on smaller time intervals. I need to write more C code to handle by in this way, and so I would say the work is only 1/4 done. Remaining tasks:
    1. Handle by better, by filling up an unsigned char array with the results of a series of seek and fread calls.
    2. Handle the case of numeric from and to faster (hand these arguments to the existing C function -- easy peasy).
    3. See whether the present scheme of determining the segment pointers is inefficient. The present code reads the whole file twice: a first pass merely count pointers (for a memory allocation) and the second stores into the allocated memory. Another approach would be to have a growable allocation, so I will try that, now that I have a 6Gb file as a test case. (The worry with growable allocation is that time will be spend copying that memory, especially if the growth factor is small, but that we can still run out of memory, if the growth factor is large.)