Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate multiple diagrams using pipe mode #2

Closed
krisztianb opened this issue May 19, 2020 · 12 comments
Closed

Generate multiple diagrams using pipe mode #2

krisztianb opened this issue May 19, 2020 · 12 comments

Comments

@krisztianb
Copy link

Hi @agirorn

I read your post here: markushedvall/node-plantuml#15 (comment) about using nailgun in combination with plantuml.

I'm using PlantUML output in a project of mine: https://github.com/krisztianb/typedoc-umlclass and would really like to improve the performance of the diagram generation.

Is your approach faster then the node-plantuml one? Since the node-plantuml project seems to have been abandoned I was wondering whether you are still maintaining this project?

@agirorn
Copy link
Owner

agirorn commented May 25, 2020

Yes, you can say I'm maintaining it, but it is a young project and there have been no maintenance requests so far. Do you have a maintenance request for this project? I also accept pull requests if you want to contribute.

@krisztianb
Copy link
Author

Hey! I would really like to improve the performance when creating a lot of PlantUML diagrams. As I see it one can only achieve that with the nailGun approach. So adding such a feature would be cool. What do you think?

@agirorn
Copy link
Owner

agirorn commented May 26, 2020

I like the idea of improving its performance.

I do have to admit that I'm not very excited about the nailGun approach since I believe that markushedvall/node-plantuml is doing that unsuccessfully on some platforms?

Is the main processing penalty for processing multiple PalntUML files not the initial forking cost of the PlantUML .jar?

I assume that if we can run PlanTUML with -pipe as I said in comment markushedvall/node-plantuml#15 (comment). could we not just fork it once and then pipe one UML after the other and then kill it when are done, when it is idle for some time(timeout) or on process.exit?

manual kill

const plantuml = require('plantuml')({
  idle: 2000,
});
await save(await plantuml(UML_TEXT);  // using the same forck
await save(await plantuml(UML_TEXT); // using the same forck
if (plantum.isRunning()) {
  plantum.stop(); // manual kill if skipped it will die after 2 sec by default
}

auto kill

const delay = require('delay');

const plantuml = require('plantuml')({
  idle: 2000,
});
await save(await plantuml(UML_TEXT); // using the same forck
await save(await plantuml(UML_TEXT); // using the same forck
delay(3000);
// Now the forck should be dead and it we have to start a new slow one before processing the UML file.
await save(await plantuml(UML_TEXT);

What do you think?

@krisztianb
Copy link
Author

It seems that you are right and this would be an even smarter approach.

As I see it Markus was also adding that flag when generating diagrams: https://github.com/markushedvall/node-plantuml/blob/master/lib/node-plantuml.js#L149

The catch would be to start the process only once, write the diagrams into the stdin stream of the process and read the results from stdout. Markus's solution seems to do this stdin/stdout processing but starts a separate process for every diagram.

But looking at the command line options and FAQ:

https://plantuml.com/command-line#f92fb84f0d5ea07f
https://plantuml.com/faq#e0b670e765a9adba

It seems that one can provide multiple diagrams after one another separated by a delimitor you can specify.

@krisztianb krisztianb changed the title Are you still maintining this project? Improve performance using pipe mode May 26, 2020
@agirorn
Copy link
Owner

agirorn commented May 26, 2020

@krisztianb do you want to take a crack at implementing this and do a PR?

@krisztianb
Copy link
Author

krisztianb commented May 26, 2020

I had some more thoughts on this. I would prefer the module returning a constructor that can be used to build objects that encapsulate the process. So if you decide to use multiple processes (eg. one for each CPU) you can simply construct as many objects as you like.

So we would have something that looks like this:

const options = { "format": "svg" };
const generator = new PlantUmlDiagramGenerator(options);

generator.start();
const promise = generator.generate("some plantuml code here");
promise.then((buffer) => {
  // do something with the result
});
generator.stop();

What do you think?

It would be fun to try implementing this. However I'd rather switch to TypeScript while doing so. What about you? Don't you want to give it a try? I'll have to fix some other bug reports of my plugin before I can take this on. 😃

@krisztianb
Copy link
Author

krisztianb commented Jun 1, 2020

Good news! This is definitely doable. I finally got some time to try this out.

I downloaded the latest JAR file and created a test.puml file with the following content:

@startuml
some plant uml commands
@enduml
@startuml
some plant uml commands
@enduml

Then I executed the following command in the console:

type test.puml | java -jar plantuml.jar -pipe -tsvg -pipedelimitor XXXXXXXXXXX

Out came 2 SVG "files" on stdout each ending with: </svg>XXXXXXXXXXX

Note: I also tried this with the JAR file that comes with the node-plantuml v0.5.0 module and it does not work. It only outputs the first SVG without the delimitor. It seems that this is not working with the version included in that module. I'm using that rather old version, because I found that the newer ones produce malformed SVG output.

@krisztianb krisztianb changed the title Improve performance using pipe mode Generate multiple diagrams using pipe mode Jun 1, 2020
@krisztianb
Copy link
Author

krisztianb commented Jun 1, 2020

What do you think about the following approach as a solution?

The module provides a function whose interface looks like this (TypeScript):

function fromStream(diagramFinished: (svgData: string) => unknown): stream.Writable

It returns a writable stream (the stdin of the spawned JAVA PlantUML child process) and gets a callback function as an argument. The caller has to write the PlantUML code into the returned stream and close it.

The function has an event handler on the "data" event of the stdout stream of the child process and reads the output from there. Whenever the output contains the delimiter, it calls the callback with the SVG data it read from the stream. The callback is called as many times as diagrams were read from the stream.

Your original solution could be put into a function called fromText and remain as it is.

@agirorn
Copy link
Owner

agirorn commented Jun 3, 2020

You are just full of good ideas and I like your enthusiasm.

One of my main goals for this project is to hide as much of the problem of processing the UML from the user so splitting the stream after processing maybe don not quite fit here.

From per perspective, I don't think it is not a good idea to have many programing models in the same node_module since I find it gets confusing quite fast and it is much better to have one module do the work as a stream, a generator or class factory and then have other building blocks that bridge that gap to programming models.

Regarding typescript, it the code is converted to typescript duration the improvement process then so be, I guess that is the future of all JS code in some sense.

Maybe what we should be building is a Transform stream that you can pipe UML into and get SVG from

process.stdin.pip(plantUmlTransformStream({ delimiter: 'XXXX' }).pipe(process.stdout)

This would have the following benefits

  • It hides the process forking
  • The transform can pay the forking cost once.
  • It is fairly easy to split the stream using tools like split2
  • it should be fairly easy to wrap the stream in a promise or callback style function.
const streamify = require('stream-array');
const os = require('os');
const {
  transformStream, // The new shine stream
  transform,  // Just like the old one.
} = require('plantum');
const { UML1, UML2 } = require('./umls');

streamify([UML1, UML2, os.EOL]).pip(transformStream({ delimiter: 'XXXX' }).pipe(process.stdout);

process.stdout.write(await transform(UML1))
process.stdout.write(await transform(UML1))

transform(UML1, (err, svg) => {
  if (err) {
    console.error(err);
  } else {
    process.stdout.write(svg)
  }
});

I guess transformStream could have 2 options, delimiter set by default as "XXXXXX" and split: boolean set by default to true, then users of the module would get one data event per processed UML and for the once that need more control they can have the standard buffered stream where you can pass in the prefers hunk size etc...

I also think we do not have to have all the bells and whistles in the first release, maybe it sold just be grate to keep the async function and add the stream that splits by default.

@krisztianb
Copy link
Author

Nice. I haven't seen Split2 before. Good idea to use that.

I also like the transform stream approach. I saw Markus use something similar here.

The parameters look good to me. I'll give it a try and see what I can come up with.

@krisztianb
Copy link
Author

Hmmmm... I haven't worked much with streams yet, have you?

It seems to me that a tranform stream won't work here, because we can't do the transformation within the _transform method that must be overwritten by our sub class (let's call it PlantUmlTransformStream), because it is done by the JAVA process that we spawn.

Maybe we need a different approach like Markus was using here. He returns an object with an input and an output stream that can then be used to pipe data into it and out from it.

@krisztianb
Copy link
Author

This is what I came up with: https://github.com/krisztianb/plantuml-pipe

Looking forward to your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants