-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunked image encoding #793
Comments
Any ideas on that? |
This does not necessarily need to be a part of Node Mapnik. There might be some Node module that can encode PNG this way giving it raw data from |
@cklein05 I believe that you need not do anything on the node side to do this, but rather if you use apache or nginx they will be able to control this completely (but this is just my hunch). Overall, I do not feel that node-mapnik should have any responsibilities in doing this and so I do not think we should add this to the library.
In node-mapnik, the slowest part of the processing should be simply rendering the image. My best suggestion is this: If speed is a requirement, I would suggest you investigate another solution besides WMS. WMS is not a fast format for many reasons, one of them being caching. |
@flippmoke OK, I have to accept this. However,
Of course, Node.js as well as both Apache and NGINX do support chunked encoding perfectly. However, this is not of much performance benefit, if we must wait until Mapnik has encoded the whole image. The idea was to overlap/parallelize image encoding and streaming. Imagine to replace the destination string in So, an approximation of the above may have been these buffers sent back by a callback function. The idea was not to split up image processing/encoding, but to split up or buffer the output stream. |
On the other hand |
I am not sold however, that this would actually lower memory footprint much or at all, if we preserve the life of the In short, I don't think its impossible, but could be hard to implement properly. |
The goal is not to save memory, rather this:
For this purpose, I would create a new |
Something like this: #include <array>
#include <iostream>
template<typename Callback, std::size_t Size, class Char = char>
class callback_streambuf : public std::basic_streambuf<Char>
{
public:
using base_type = std::streambuf;
using char_type = typename base_type::char_type;
using int_type = typename base_type::int_type;
callback_streambuf(Callback callback)
: buffer_{},
callback_(callback)
{
base_type::setp(buffer_.begin(), buffer_.end());
}
protected:
int sync()
{
bool ok = callback_(base_type::pbase(),
base_type::pptr() - base_type::pbase());
base_type::setp(buffer_.begin(), buffer_.end());
return ok ? 0 : -1;
}
int_type overflow(int_type ch)
{
int ret = sync();
if (ch == base_type::traits_type::eof())
{
return ch;
}
base_type::sputc(ch);
return ret == 0 ? 0 : base_type::traits_type::eof();
}
private:
std::array<char_type, Size> buffer_;
Callback callback_;
};
struct cout_callback
{
template<class Char, class Size>
bool operator()(const Char* buffer, Size size) const
{
std::cout << "CALLBACK: '";
std::cout.write(buffer, size);
std::cout << "' (" << size << ")" << std::endl;
return true;
}
};
main()
{
cout_callback callback;
callback_streambuf<cout_callback, 10> streambuf(callback);
std::ostream stream(&streambuf);
stream << "Lorem ipsum dolor sit amet, "
"consectetur adipiscing elit, "
"sed do eiusmod tempor incididunt ut "
"labore et dolore magna aliqua.";
stream.flush();
} $ g++ -std=c++14 callback_stream.cpp -o callback_stream && ./callback_stream
CALLBACK: 'Lorem ipsu' (10)
CALLBACK: 'm dolor si' (10)
CALLBACK: 't amet, co' (10)
CALLBACK: 'nsectetur ' (10)
CALLBACK: 'adipiscing' (10)
CALLBACK: ' elit, sed' (10)
CALLBACK: ' do eiusmo' (10)
CALLBACK: 'd tempor i' (10)
CALLBACK: 'ncididunt ' (10)
CALLBACK: 'ut labore ' (10)
CALLBACK: 'et dolore ' (10)
CALLBACK: 'magna aliq' (10)
CALLBACK: 'ua.' (3) |
@talaj That's it. That's what I was asking for... 👍 Thank you for investigating and creating that descriptive sample application as well as the It's still to check whether this actually works with an asynchronous |
Some remarks: Template parameter int chunk_size = 10; /* from NaN method */
callback_streambuf<cout_callback> streambuf(callback, chunk_size ); A |
I'm sorry for not responding, I have vacation. I will try to take a look on it later this week.
It depends on whether you want to set size of the intermediate buffer in runtime. Probably yes, it was just an example.
I'm also thinking about distinct callback function or indication by zero |
Never mind. That's all voluntary and there's no rush. Enjoy your vacation :) |
@talaj Great work! I will have a look at it ASAP. However, some other things are still in my queue. Maybe you could tell me the basic steps required to test this on a Ubuntu 14.04 box (clone repo, apply pull request, build, etc.)? Sorry for not yet being so deep into Node.js, npn, gyp, travis & Co. |
@talaj Finally I got it sorted to get your branch and to compile it. The newly built Node Mapnik seems to work as expected. However, it will take some time in order to make some statements about any performance gains when using this with our WMS server. Also, I want to test how different buffer sizes influence the performance. Keeping you informed. |
I will make buffer size a parameter of encodeChunked(). It makes sense at least for testing. |
Chunk size parameter is there now, see unit tests. |
Thanks for the new chunk size parameter. Sorry for not responding, I've been out of the office some days. First simple tests showed, that the new method basically works as expected. Again, thanks a lot for your work. Although the image transfer over the network is now definitely spread out (in contrast to a transfer burst after the whole encoding process), we could not yet detect any absolute performance gain, even with slow network connections. However, the process seems now much more network friendly. I guess, we need some more sophisticated tests... We'll continue with more tests in two weeks (it's still vacation time). |
Finally, I can come back with some successful and quite promising tests. However, testing the benefits caused by this patch is quite challenging, as there are so many variables, that affect the results:
Let me describe our basic testing setup. We decided to eliminate rendering time and used two different pre-rendered images: One is a high-complex random (noise) image (800 x 600 px) which Mapnik compresses to 1.652.373 bytes (~1.6 MB) PNG in ~130 milliseconds The other is a low-complex catastral map image (1201 x 801 px) which Mapnik compresses to 231.261 bytes (~200 KB) PNG in ~65 milliseconds For each of the two images, we measured the response time of a WMS GetMap request with these chunk sizes (buffer sizes):
over physical networks with these bandwidths:
which makes 72 distinct requests in total. Measures have been taken using Apache JMeter, recording the average response time of 100 sequential request samples for each of the 72 required request over a physical local network. Network bandwidth has been limited on Linux using the wondershaper traffic shaping script. In the attached Excel sheet, there are three blocks of values and line charts, one for each block. The first block notes the absolute response times in milliseconds. This should be quite self-explanatory. The second block represents the relative response times in percent, the not chunked encoding response time being 100 percent. Those values are calculated like t_rel = t_x / t_notchunked * 100. The lower the value, the more time was saved by chunking. In the chart, the not chunked requests would form a horizontal line at 100,00 percent. A point at 80 percent means, that only 80 percent of the time of the not chunked request was required for the request. Values above 100 percent seem to be caused by the chunk generating overhead in the C++ code, which is bigger (kicks in earlier), the smaller the encoded image, the faster the network and the smaller the chunk size (which is not a surprise). The first and second block of values and charts always include the fact, that transferring data over a slow network takes more time. So, in the third block, we tried to express time savings independently from overall response time. In this chart, we compared the absolute savings to the encoding time of the image, which has been measured on the server and is provided as a single (average) constant, one for each of the two images. This may be arbitrary, but that time is the only one we know for the request, which does not include network transfer time. The results look quite right and give us some higher resolution on the 2Mbit/s end of the charts. So, finally, the third block shows time saving compared to encoding time in percent. The higher this value, the better, that is, the more of the encoding time was saved on the overall response time. Those values are calculated like t_rel = (t_x - t_notchunked) / t_encoding * 100. Get the Excel sheet from here: More to come soon... in the meantime, I'm curious about any comments. |
Great elaboration! One could expect upper bound of time saving to be the encoding time. Interestingly, some relative time savings of the random image shows more than 100% of encoding saved. That would mean chunked image can be transferred faster than one-piece image. Or is it a measurement problem? Relative time savings of the catastral map image shows up to 75% saving of the encoding time on slower lines. So the concept works. I think we can also reduce the chunking overhead a bit. On the other hand, are tens of milliseconds, compared to hundreds of milliseconds overall response time, worth the effort? According to my experience, to strip tens or even hundreds of milliseconds from rendering time is much more simple. |
Yes, we also find those more than 100% saving requests interesting. I cannot guarantee that it's not a measurement problem, since the server is theoretically used by several people from the Internet. But I'm quite sure, it's not. We've been taking 100 samples per request and have been working with the average times. Obviously, the transfer is spread out into small chunks. On fast networks, and if encoding is slow, there may even be gaps on the wire between those chunks. Maybe it is possible, that such transfers are simply faster than one single burst transfer. I'm not a networking expert, but this may likely be the case. Yes, the concept really works and your buffered stream works great. If you see room for improving your code, you're welcome to do so... 👍 BTW, we've encountered that provided palettes are not handled by the new Yes, likely it's much simpler to gain more speed from rendering optimization. However, this likely involves adjusting the stylesheet and/or the database. As the creator of a WMS server, we actually have no knowledge of those parts of the setup, since this is up to the server's maintainer. Even if this setup is optimal, we can still get some improvements with this chunking technique. Actually, the goal was to optimize network transfer. More precisely, to do it like Apache Tomcat / GeoServer does it. It really annoyed me to see, that our Mapnik based WMS server is always faster from within the 1 GBit/s LAN and always slower (only 5 to 10 percent slower) over the Internet compared to GeoServer, although the server will likely often be used over the Internet. Now, with the chunked encoding in place, the server is almost always faster than GeoServer. Additionally, a common use case for WMS (in times when tile servers are used everywhere) is HDP maps or printing. There, you often have huge images, for which a lot a savings can be achieved, even with fast networks. |
Summing it up... Chunk SizesIt seems, that chunk sizes smaller than 2048 are not so useful, since there is too much chunk generating overhead. With fast networks (~ >= 100 Mbit/s) there may even be some negative benefit, that is, chunked transfer is slower than not chunked transfer. With 2 KB and 4 KB chunks, chunked encoding was always faster than not chunked encoding for both test images. There may be a negative effect with smaller images (< 200 KB), but those are not subject to this test. Even more performance could be obtained with even bigger chunks (e.g. 8 KB, 16 KB). Maybe we will run some more tests on that. AFAIK, GeoServer on Apache Tomcat uses a default chunk size of 8 KB. Network BandwidthMost savings can be achieved on network bandwidths between 10 Mbit/s and 100 Mbit/s. It seems like for smaller images the optimal bandwidth range is closer to 10 MBit/s (shifted to the left), whereas for bigger images the optimal bandwidth range is closer to 100 Mbit/s (shifted to the right). For networks with a bandwidth smaller than 10 MBit/s, there is less benefit, since the overall transfer time eats up most of the savings. Maybe some better effect can be achieved with even smaller images, however, chunking makes less sense for smaller images in general. For networks with 1000 MBit/s and above (true intranets), there is less benefit, since the network is too fast compared to the time needed to encode the image. There is no real overlapping of encoding and transfer, from which this patch tries to benefit from. With really big images, the performance gain may be greater, but such images are unlikely requested from a WMS server. So, the goal for those networks is just not to be slower than with not chunked encoding due to chunk generating overhead. This can easily be achieved with big enough chunk sizes (> 2 KB). Between 10 MBit/s and 100 MBit/s (likely even up to 200 or 300 MBit/s), savings range from 80 to 50 percent for bigger images of the not chunked transfer time. These bandwidths represent typical broadband internet connections and are the networks, this patch was initially developed for. As you can see in the third blocks of values and the charts Relative Time Saving compared to Encoding Time in Percent, up to 90 percent of the encoding time of an image can be saved. Rendering TimeSince we have run our tests with pre-rendered image, of course, the savings discussed so far are only half the story. With a real WMS GetMap request, rendering time often plays the leading role in overall performance. Rendering time adds a positive offset to the values in the first data block, which constantly moves the lines in chart Response Time [ms] up. The lines of the 2nd chart Relative Response Time compared to Not Chunked Response Time in Percent get flattened and move towards the 100 percent line (the lines get squeezed towards to 100 percent line). The 3rd chart Relative Time Saving compared to Encoding Time in Percent gets not changed by the added rendering time. Obviously, a long rendering time wrecks the positive effects on performance caused by chunked image encoding. The longer the rendering time, the less the positive effect. However, it's not easy to describe this mathematically (at least for me :-p). ConclusionWith this patch, the goal was not to generally improve performance (which should better be done with a process called rendering optimization), but to not waste time when encoding and transferring the rendered image. In other words, we tried to make this final step as efficient as possible. I reckon, the numbers and charts provided clearly show, that we have been successful. The real world test for this patch is a setup that compares our Mapnik based WMS server with GeoServer, running in an Apache Tomcat server. As already said in the first post, Mapnik was always faster from within the local 1 GBit/s network and almost always slower when queried from the Internet. Now, with this patch in place, the Mapnik based server is almost always faster when queried from both the LAN and the Internet. Additionally, there are more benefits from transferring an image in chunks. In general, I believe that smaller chunks are likely more network friendly than a single burst transfer. Our tests suggest, that those may be transferred even faster than a single big block of data. (For some requests the saving was greater than 100 percent of the encoding time. In theory, the encoding time should be the saving's upper limit.) Finally, this patch lowers the server's memory footprint, since only a small buffer for a single chunk is required at a time for a single encoding process (in contrast to a buffer that holds the whole encoded image). With all that in mind, I'd really love having this patch implemented in Node Mapnik. |
This is fixed and included into the unit tests now. I've also fixed a nasty bug when resources were released in wrong time, causing crashes. |
Thank you for fixing this... :-) Haven't yet tested but will soon. Have you also been able to reduce chunking overhead even more? |
I was doing some simple profiling with perf. From what I could see, time spent by chunking was insignificant next to other calls in Node. I will try to find some time to do more performance tests. In any case, I think that performance could be improved in two ways:
|
I completely agree with #531. However, I don’t see any benefit coming from your second suggestion. Since chunks are instantly sent down the wire when the callback is called, isn't this the same as increasing chunk size, which is already implemented? Of course, from your (or the encodeChunked method's) point of view, the focus lies on generating chunks, no matter when those are delivered. From my (or the chunked encoding implementing HTTP server's) point of view, this is likely not north it, since, for example, getting two chunks per callback is actually the same than doubling chunk size. |
I must say that I'm not sure.
But the callback is not instantly called when a chunk is generated. A chunk is generated on a separate thread. The callback must be called from the main thread. The main thread processes a queue of operations. It takes some time than the main thread get through the queue to the callback. In the meantime, new chunks could've been generated. It now loops by all generated chunks in this |
As you may know, we are implementing a fully featured WMS server based on Node Mapnik. It turns out, that most of Mapnik's incredible rendering speed, compared to many other WMS servers, has only little effect with slow network connections. For example, many typical map requests are served about 1.5 to 2.5 times faster than by GeoServer on a local 1 GB network (not to mention localhost requests). However, this factor tends to fall to 1.0 or even 0.8 on typical Internet links.
One reason may be the missing ability to implement
chunked
HTTP Transfer-Encoding as GeoServer does. Here, the final map image is encoded to, for example, PNG format in chunks. Each chunk is sent down the wire as soon as it is available. With Node Mapnik, the whole rendered image is encoded into one single chunk by one of thesave_to_string
methods. So, with GeoServer, there is a level of parallelism by overlapping image encoding and sending, which actually brings some extra speed compared to our implementation.So, I'm asking for a new
encodeChunked
method for the Node MapnikImage
object. Likely the simplest implementation works with a callback function that gets called for each chunk available. Typically, one would specify the desired chunk size as an argument when the function is called. The callback function gets invoked once for each chunk, which is passed as a Node.jsBuffer
object. Additionally, a boolean flaglast_chunk
is required to signal which one is the last chunk.Since I'm no C++ expert, I currently have no idea how to implement this on the C++ side. This seems to require a stream/buffer, that triggers an event/callback after N bytes have been written to the stream. Maybe there is some Boost template for that? Also, I have no idea whether it's better to implement this as a sync or async method (from a performance point of view), since for each chunk some (expensive?) inter-thread access seems to be required. Actually, this new method could as well be
encodeChunkedSync
.On the Javascript side this could simply look like (heavily shortened):
The text was updated successfully, but these errors were encountered: