Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectReader/Writer support .cobz extension for compressed .cob files #420

Closed
wants to merge 4 commits into from

Conversation

davidsminor
Copy link
Contributor

We're having some network traffic problems which this would alleviate

@johnhaddon
Copy link
Member

I like how you've kept future compression schemes in mind with the "compressionType", but I'm wondering a bit how this interacts with the use of the "cobz" extension to trigger the compression/decompression behaviour. It seems like we'd actually need a parameter to the writer to specify the compression type if we wanted to support more types in the future (either that, or keep inventing new extensions). What's the reason for not doing that now, and using the extension instead? Is it because we want to support writing compressed files from interfaces which currently expose only a filename? I can see how that might be a problem, but to me it seems cleaner in the long run to continue to use just the ".cob" extension and a parameter to specify an arbitrary compression type.

@andrewkaufman
Copy link
Member

Yeah, we need to be able to only specify a file name and have the compression enabled. In Houdini, the cob io has to go through their IOTranslator interface, and the only argument we get passed is a file name. So just .cob won't do for that. Also I think it's pretty confusing to an artist to have some cobs be compressed and some not compressed... how are they meant to know the difference when IT yells at them about taking up too much disk space unless the files have different extensions?

@johnhaddon
Copy link
Member

What would be the impact of just making compressed files the default?

@davidsminor
Copy link
Contributor Author

I guess I'll write out a couple of monster particle sequences and do a bit of testing...

@andrewkaufman
Copy link
Member

We're also using cobs in rigs, to store deformer weights and other relationships, so if we start writing them all compressed, we'd need to make sure the decompression doesn't impact rig performance. We used that as a reasonable excuse to delay performance testing compression inside scc's, and go with cobz, since the cobz files would only be in use when explicitly chosen by a user.

I take it you're really against a new extension John?

@johnhaddon
Copy link
Member

Really against it would be a bit of an overstatement. Just one extension does feel a bit cleaner and more future proof if we want to add new compression options (blosc seems quite compelling). And I suspect there are a bunch of other places where we'd need to start testing for .cobz in addition to .cob as we do already in a couple of places in this pull request.

So if a quick bit of testing revealed no downsides for rigging then I'd advocate for a "compression" parameter defaulting to "gzip". But if you're not comfortable with that at this point that's fair enough too - it seems like we could add the parameter and provide back-compatibility for .cobz easily enough in the future if we wanted to change our mind.

@davidsminor
Copy link
Contributor Author

So I've done a test on a .cob and a .cobz, with equivalent content (a big goddang point cloud). The .cob is 1 858 018 624 bytes, and the .cobz is 464 187 761 bytes. Here are the results from a test run I did reading/writing the file (I did it on local storage so the network didn't confuse things):

read /tmp/fxCache.260250.cob in 0.6571829319 sec
read /tmp/fxCache.260250.cobz in 10.3590641022 sec
write /tmp/fxCache.260250.new.cob in 20.7204079628 sec
write /tmp/fxCache.260250.new.cobz in 47.2128419876 sec

So it looks like a large .cobz takes 10-20 times longer to read than a .cob, and 2-3 times longer to write. I'll see what it's like with smaller files - switching the default behaviour isn't looking too safe given those results though.

@davidsminor
Copy link
Contributor Author

Hmm. Ok, so I just copied all the .cob files from a typical animation rig to my /tmp directory, converted them to .cobz, and read all the .cobs followed by all the .cobz files. Here are the results:

read all .cob in 0.162973165512
read all .cobz in 3.1422150135

So the .cobz files still take about 20 times longer to load, but that's 20 times longer than almost nothing, so it doesn't seem too offensive...

@davidsminor
Copy link
Contributor Author

I think I'm in the mood for a bit of bloscing.

@davidsminor
Copy link
Contributor Author

Ok, after some light bloscing, here are some results... There's a compression level setting in blosc, and I experimented with a range of values on the giant cob files in the previous test:

uncompressed:

/tmp/fxCache.260250.cob:                  1 858 018 624 bytes
write /tmp/fxCache.260250.cob in          20.7204079628 sec
read /tmp/fxCache.260250.cob in            0.6571829319 sec

gzip:

/tmp/fxCache.260250.cobz:                   464 187 809 bytes
write /tmp/fxCache.260250.new.cobz in      47.212841987 sec
read /tmp/fxCache.260250.cobz in          10.3590641022 sec

compression level = 0:

/tmp/fxCache.260250.blosc.cobz:           1 858 018 734 bytes
write  /tmp/fxCache.260250.blosc.cobz in  24.183438062  sec
read  /tmp/fxCache.260250.blosc.cobz in   3.8174369335  sec

compression level = 3:

/tmp/fxCache.260250.blosc.cobz:           1 129 696 949 bytes
write  /tmp/fxCache.260250.blosc.cobz in  21.279296875  sec
read  /tmp/fxCache.260250.blosc.cobz in   3.9342408180  sec

compression level = 5:

/tmp/fxCache.260250.blosc.cobz:             530 222 106 bytes
write  /tmp/fxCache.260250.blosc.cobz in   17.39113903  sec
read  /tmp/fxCache.260250.blosc.cobz in    4.780226945  sec

compression level = 7:

/tmp/fxCache.260250.blosc.cobz:             506 482 864 bytes
write  /tmp/fxCache.260250.blosc.cobz in  16.5465929508 sec
read  /tmp/fxCache.260250.blosc.cobz in    4.7145540714 sec

compression level = 9:

/tmp/fxCache.260250.blosc.cobz:             501 427 592 bytes
write  /tmp/fxCache.260250.blosc.cobz in  16.8133058548 sec
read  /tmp/fxCache.260250.blosc.cobz in   4.70250201225 sec

Looks like anything below 5 isn't particularly worthwhile because the compression ratio's rubbish, and above that it goes faster as you push the compression level, so level 9 looks like the best. It looks like a giant blosc cob compressed at level 9 actually writes faster than an uncompressed cob, which is nice. It's about 7 times slower to read than an uncompressed cob - ie twice as fast as gzip. The compression ratio's not quite as good, but it looks like it's a reasonable price to pay for the extra speed.

I tried compression level 9 with different thread counts just for kicks. It definitely improves things, but it's not exactly amazing:

2 threads:

write  /tmp/fxCache.260250.blosc.cobz  in  15.6342961788  sec
read   /tmp/fxCache.260250.blosc.cobz  in  4.02384114265  sec

4 threads:

write  /tmp/fxCache.260250.blosc.cobz  in  13.6540808678  sec
read   /tmp/fxCache.260250.blosc.cobz  in  3.47949504852  sec

8 threads:

write  /tmp/fxCache.260250.blosc.cobz  in  13.2812790871  sec
read   /tmp/fxCache.260250.blosc.cobz  in  3.24927020073  sec

16 threads:

write  /tmp/fxCache.260250.blosc.cobz  in  13.5564141273  sec
read   /tmp/fxCache.260250.blosc.cobz  in  3.17400693893  sec

...and there's another parameter called the type size (set to 4 bytes for the previous tests), which I experimented with at compression level 9/1 thread too:

typesize = 1 byte:
/tmp/fxCache.260250.blosc.cobz:  579 539 063 bytes
write  /tmp/fxCache.260250.blosc.cobz  in  18.2077488899  sec
read  /tmp/fxCache.260250.blosc.cobz  in  4.22861099243  sec

typesize = 2 bytes:
/tmp/fxCache.260250.blosc.cobz:  631 653 905 bytes
write  /tmp/fxCache.260250.blosc.cobz  in  18.8769581318  sec
read  /tmp/fxCache.260250.blosc.cobz  in  5.12113189697  sec

typesize = 4 bytes:
/tmp/fxCache.260250.blosc.cobz:  501 427 592 bytes
write  /tmp/fxCache.260250.blosc.cobz  in  16.2243359089  sec
read  /tmp/fxCache.260250.blosc.cobz  in  4.7057390213  sec

typesize = 6 bytes:
/tmp/fxCache.260250.blosc.cobz:  864 141 538
write  /tmp/fxCache.260250.blosc.cobz  in  23.7693929672  sec
read  /tmp/fxCache.260250.blosc.cobz  in  7.60991311073  sec

typesize = 8 bytes:
/tmp/fxCache.260250.blosc.cobz:  659 098 477
write  /tmp/fxCache.260250.blosc.cobz  in  18.9456019402  sec
read  /tmp/fxCache.260250.blosc.cobz  in  5.31748104095  sec

typesize = 12 bytes (sizeof(Imath::V3f))
/tmp/fxCache.260250.blosc.cobz:  732 889 062
write  /tmp/fxCache.260250.blosc.cobz  in  21.3244469166  sec
read  /tmp/fxCache.260250.blosc.cobz  in  7.57901287079  sec

typesize = 16 bytes 
/tmp/fxCache.260250.blosc.cobz:  1 023 986 354
write  /tmp/fxCache.260250.blosc.cobz  in  25.0788691044  sec
read  /tmp/fxCache.260250.blosc.cobz  in  6.34875392914  sec

So basically everything other than 4 kinda sucks.

In summary, the best settings I found were faster than uncompressed for writing, twice as fast as gzip/7 times slower than uncompressed for reading, and had a 73% compression ratio, vs 75% for gzip.

@johnhaddon
Copy link
Member

Thanks for looking into that David - what do you reckon? Seems like there's enough justification for sticking with your original approach?

@carstenkolve
Copy link
Contributor

re the filename thing, looks like the approach that openvdb is taking is to have it not visible to the user from the filename if the file is blosc compressed (just like any other choices the user has made impacting filesize like bitdepth etc. are not visible in the filename either). exr or many other image/video formats also does usually not expose any compression scheme used within the file. - houdini does though, naming the files bgeo.gz or bgeo.sc from h14 onwards (which I find personally useful, however it does not have much impact on my workflow)
an indicator when writing or reading if the data is compressed or not might be sufficient?
I think it depends if we forsee many cases where we want the user to control if we compress or not. - if we decide to compress by default, invisible to the user (and potentially also inside scc), then it is probably only useful during the transition period.

@davidsminor
Copy link
Contributor Author

It's possible that the blosc may not have been bloscing quite hard enough - I wasn't actually enabling the SSE stuff, so I think I'll give that a go. Apparently sidefx have been seeing a 3-10 fold improvement over gzip rather than 2-3 like I've been experiencing.

@johnhaddon
Copy link
Member

If you're going to take a look at bloscing again, I'd be interested in a second set of tests pushing the files over the network rather than doing it all locally. It's possible that we'd see extra benefit there (doing a bit more work to do the compression, but winning it back because there's less data to push to the servers).

@andrewkaufman
Copy link
Member

How are you feeling about adding a blosc dependancy to Cortex at this stage? It's probably a bit late in the game, but I guess we technically are still in beta...

@johnhaddon
Copy link
Member

I'm feeling like it's OK if we handle it like we handle TIFF etc - with conditional compilation.

@davidsminor
Copy link
Contributor Author

curse that treacherous blosc! Turns out I actually was using sse in those tests - typing garbage in the sse part of one of the .c files stops it from compiling. Although it was actually a little bit faster if I compiled the library separately and linked to it - 4.6 seconds for a read vs 4.7 seconds. Also, this went up to 5.6 seconds if I hacked out the sse stuff. So much for that then...

@andrewkaufman andrewkaufman added the pr-revision PRs that should not be merged because a Reviewer has requested changes. label May 14, 2015
@davidsminor
Copy link
Contributor Author

Ok, picking this up again, this time dropping the system caches every time I run my test. I start out by reading a 1 201 596 803 byte cob file, and then write out an uncompressed version, write out a compressed version, clear the caches, read in the uncompressed file and read in the compressed file.

Write uncompressed: 11.4818680286s
Write compressed: 11.6397240162s
Read uncompressed: 10.8295259476s
Read compressed: 7.21835708618s

So those timings ain't bad. Does seem to get a bit crashy when I try it on a 5gb file though (it's while it's writing a MemoryIndexedIO), so I'll have to investigate that...

@davidsminor
Copy link
Contributor Author

seems to be an overflow in std::stringstream...

@davidsminor
Copy link
Contributor Author

Ok, well if I replace the stringstream with a boost::iostreams::stream that writes directly to an std::vector, it don't crash no more (although the unit tests are broken). Actually, does anyone know a nice concise way of making something a bit like a stringstream that wraps round a vector? The method I've been using is a bit long winded.

@davidsminor
Copy link
Contributor Author

Ok, finally some data for the 5 072 942 220 byte file after replacing the stringstream with something else (compresses to 1 944 729 448 bytes):

Write uncompressed: 46.9117369652s
Write compressed: 45.4420931339s
Read uncompressed: 46.7038700581s
Read compressed: 29.811563015s

Again, this was clearing the system caches before reading the files

@davidsminor
Copy link
Contributor Author

Right, well I've just pushed my latest version in case people want to have a look at it.

Regarding the MemoryIndexedIO commit: it currently breaks one of the unit tests, and I'm not really sure about the structure of it, but I put it up there anyway in case someone knows an easier way of doing what I want to do. My goal was to stop using std::stringstream, which seems to crash if you write more than 4gb into it, and to avoid copying the contents of "ConstCharVectorDataPtr buf" in read mode (it'd be nice if I could avoid this in append mode too, but I don't really know if I can). Also, I can't help thinking MemoryIndexedIO::buffer() should return a ConstCharVectorDataPtr, because it's not something you're going to want to write to anyway, and it means you don't have to to any copying in that method.

@andrewkaufman andrewkaufman removed the pr-revision PRs that should not be merged because a Reviewer has requested changes. label Jun 16, 2015

class MemoryIndexedIO::StreamFile : public StreamIndexedIO::StreamFile
class MemoryIndexedIO::CharVectorDataDevice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd be tempted to make this a CharVectorDevice that is initialised with a vector<char> &, just to make the code a bit less wordy, and to increase the chance that we might reuse it elsewhere at some point. Is that possible or is there something I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wait, I guess the benefit is that this way, if you only ever call CharVectorDataDevice::read(), we never call writable(), whereas we would have to initialise a CharVectorDevice with myData->writable(), and possibly pay for a copy?

@johnhaddon
Copy link
Member

So, as I understand it, the reason for needing an IStreamFile and an OStream file is just to avoid a copy for the OStream case, where we're the sole owners of the data (because we got it with either new or copy in the constructor)? In practice does the copy matter, since it'll be lazy-copy-on-writed? Or do we have use cases where the caller of MemoryIndexedIO::buffer() actually does then call writable() on the result? I agree though, that it'd be much better if MemoryIndexedIO::buffer() just returned const data so we didn't need to care.

Also, what's the failing unit test, and why does it fail?

@johnhaddon
Copy link
Member

A few other thoughts :

  • It seems a pity that this is isolated to just cob files - is it worth doing it in the array writes at the IndexedIO level, so we get it for SceneCaches too? Presumably if we did it at that level, we'd have smaller memory spikes, because we wouldn't be compressing the entire thing at once, but we'd lose out on some compression that would be possible between fields if done at the higher level? It'd be interesting to have some data on the tradeoffs there for some real world scenarios.
  • It seems a bit odd that we're still doing the compression of the indexedio index with gzip compression, before then passing the whole thing through blosc.
  • It might be worth having a gzip mode to allow some level of compression to be available even when building without blosc.
    and we're still compressing the indexedio index with gzip before going into this blosh

@davidsminor
Copy link
Contributor Author

Thanks for looking - I'll work on another draft. Doing this on the IndexedIO level would certainly be good, and I'd imagine you'd get similar compression ratios, at least for large point clouds or meshes. The hope was that this modification would be helpful for our current show, and enabling compression at the IndexedIO level seemed like too big a change to be safely deployed. The show's approaching completion though and they're finding other ways to cut their data usage, so maybe we need to decide if we want to continue down this road or investigate IndexedIO level compression for cortex 10.

@andrewkaufman andrewkaufman added the pr-revision PRs that should not be merged because a Reviewer has requested changes. label Jun 26, 2015
@andrewkaufman
Copy link
Member

I've made an issue for the longer term investigation (#481) so I'll close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-revision PRs that should not be merged because a Reviewer has requested changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants