Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Flows / Nanomsg #8

Open
roscopecoltran opened this issue Dec 15, 2016 · 12 comments
Open

Distributed Flows / Nanomsg #8

roscopecoltran opened this issue Dec 15, 2016 · 12 comments
Labels

Comments

@roscopecoltran
Copy link

Hi,

Hope you are all well !

I was wondering how complicated it would be to add the surveyor pattern from Nanomsg in msgflo-cpp for adding some distributed search in the graph like described in the articles above.

Resources:
https://daniel-j-h.github.io/post/distributed-search-nanomsg-bond/
https://github.com/daniel-j-h/DistributedSearch

It would be cool to have a nanomsg socket, so could be used to communicate with other scripts; it would be efficient for image processing pipeline or computer vision pipelines to be more efficient.

Thanks in advance for your input.

Regards,
Richard

@jonnor
Copy link
Member

jonnor commented Jan 26, 2017

Hi Richard! I think the surveyor pattern is quite doable in Msgflo. In fact we use something quite similar for analyzing images at The Grid, each of the different computer vision algorithms is its own "respondent". It is in NoFlo, but the principle is the same for MsgFlo, and in terms of scalability indeed it actually a better fit for the distributed/multi-process nature of Msgflo - we just didn't have it when we initially built it!

One possible implementation is a unidirectional data flow, where sending out the survey and collecting the results is done by two different participants.

In .FBP DSL it would look something like this:

issuer(IssueSurvey) SURVEY ->  IN a(RespondentA)
issuer SURVEY ->  IN b(RespondentB)
issuer SURVEY -> IN c(RespondentC)

a RESPONSE -> IN collector(CollectResponse)
b RESPONSE -> IN collector
c RESPONSE -> IN collector

The collector would typically be responsible for storing responses to a database as they happen. If any "completion" notifications are needed, it would typically also do that. Perhaps it has a surveycompleted outport also.

The questions are (as usual) how to do syncronization and decide "when are we done?". We're assuming that we have a static number of respondents (N=3). Then you can let each response contain an identifier of the respondent respondent: "a". The response should also pass on an identifier carried in the survey (probably a UUID). The collector can then collect the survey(s), and if desired, wait until each of the respondents have have given an answer.
Sometime in distributed search one might want to have "first to respond" completion semantics, or "first 10 to respond".

If you want timeout handling, then need an additional connection issuer SURVEY -> START collector, that can be used to notify collector that now we started a new survey. If payload includes the timestamp that the survey was started, then one does not have to infer this at receive time, and one is more independent of delivery times, good for high levels of concurrency.

@jonnor
Copy link
Member

jonnor commented Jan 26, 2017

A completed graph could be like this

msgflo-survey-and-collect

issuer(IssueSurvey) SURVEY -> START collector(CollectResponse)

issuer SURVEY -> IN a(RespondentA)
issuer SURVEY -> IN b(RespondentB)
issuer SURVEY -> IN c(RespondentC)

a RESPONSE -> IN collector(CollectResponse)
b RESPONSE -> IN collector
c RESPONSE -> IN collector

collector DONE -> IN show(ShowResults)
collector TIMEDOUT -> FAIL log(LogErrors)

@jonnor
Copy link
Member

jonnor commented Jan 26, 2017

Regarding NanoMsg: msgflo-cpp supports getting the binary data from the AMQP/MQTT message. So as long as the participants on each side agree, you can use whatever you want, including NanoMsg.

@jonnor
Copy link
Member

jonnor commented Jan 26, 2017

Hope this answers your questions!

@jonnor
Copy link
Member

jonnor commented Jan 26, 2017

I see now that nanomsg also has pubsub facilities and so on. In order to use this with msgflo one would have to have a nanomsg-powered broker (to replace mosquitto/rabbitmq) and then teach msgflo to use it to bind the topics/queues of participants together. Definitely possible, but unless there is a particular need to use nanomsg, I'd recommend going with AMQP, at least initially.

@roscopecoltran
Copy link
Author

roscopecoltran commented Feb 2, 2017

Hi @jonnor,

Hope you are all well !

Thanks for this detailed reply and many apologies for my late reply !

Questions:

- noflo/msgflow/flowhub as an API Gateway

The closest flow, towards my little backend project, golang based, rather than nodejs for better QPS expectation, and load-balancing strategy with Vulcand, is the one proposed by Krakend in the picture below.

krakend-gateway
Nb. just think that the input requests can be multi-part post request with an image to classify with another docker container or rest service, and probably to pre-process them with another service like imaginary or picfit to crop them or convert to grayscale.

So, I spotted some pull requests from Syntace bridging goflow and noflo-ui.

Would it better to combine goflow, Krakend, go-mangos and administrate service trees with noflo-ui to get a kind of elastic service gateway ? or to use msgflo-cpp + noflo-ui, as a relay to all incoming requests ?

- Noflo-ui

  • How noflo-ui compares to node-red, cascades for such tasks from your point of view as the author ?
  • Is there any slim docker (alpine based) or minified version for noflo-ui ?
  • Can we bundle noflo+some plugins and generate a dist bundle for noflo with some addons , to get a really small docker container at the end ?

- Docker related/demos:

Hope I was clear about the global intent with my project; it is all about a distributed requests pipeline manager with an aggregation gateway api.

Cheers,
Richard

@jonnor
Copy link
Member

jonnor commented Feb 2, 2017

How noflo-ui compares to node-red, cascades for such tasks from your point of view as the author ?

As far as I understand, node-red is a combination of a dataflow runtime, and a UI for programming it. It only supports Node.js. noflo-ui communicates with a runtime over Websocket, and supports anything which implements the FBP runtime protocol.

cascades is not a UI, and thus is not comparable with noflo-ui - but more with Msgflo (both support multiple languages and communicate over network). The project does not seem to be updated for about 2 years, which is not so good. The main architectural difference that I see is that Msgflo uses a message broker, like RabbitMQ. These are battle tested and have many good features like persistence, clustering, authentication and so on.

@jonnor
Copy link
Member

jonnor commented Feb 2, 2017

Is there any slim docker (alpine based) or minified version for noflo-ui ?

Not that I am aware of. We might create such an image in the future.
For many cases one can just use app.flowhub.io -
as noflo-ui is a client-side webapp, communication to/from the runtime is direct.

@jonnor
Copy link
Member

jonnor commented Feb 2, 2017

Can we bundle noflo+some plugins and generate a dist bundle for noflo with some addons , to get a really small docker container at the end ?

Of course. NoFlo is a regular Node.js library, so you can use any Node.js/NPM tools for this. webpack for instance supports creating single-file builds, also for Node.js. You can use it directly or through https://github.com/noflo/grunt-noflo-browser

@jonnor
Copy link
Member

jonnor commented Feb 2, 2017

Would it better to combine goflow, Krakend, go-mangos and administrate service trees with noflo-ui to get a kind of elastic service gateway ? or to use msgflo-cpp + noflo-ui, as a relay to all incoming requests ?

I have not developed API gateways before, so there is a limit to how specific advice I can give. But I
would say that unless there are good reasons to do otherwise, reusing battle-tested solutions is a good thing (provided they are not so complex that one does not understand them). Focus the effort on the unique thing you want/need.
For an HTTP gateway/proxy, most of the hard parts are going to be dealing with HTTP (and doing it at scale). msgflo-cpp does not help you there (its not an HTTP library). By nature, HTTP is also a request-response, so a lot of the benefits from message-queues are hard to realize. The clients will be keeping the connection open and expecting a response as quickly as possible.

@jonnor
Copy link
Member

jonnor commented Feb 2, 2017

Nb. just think that the input requests can be multi-part post request with an image to classify with another docker container or rest service, and probably to pre-process them with another service like imaginary or picfit to crop them or convert to grayscale.

If you can, avoid doing heavy computation like images in a syncronous HTTP request-response. Instead have a system where one request creates an image processing "job", for which a request is returned as the job is requested. Put the job data on a queue, and persist some status in a database.
This is also a case where the API should be batched, supporting specifying many images to be processed in one go. Then use a Webhook and/or EventSource, to notify of job changes/completion.
Processing the images is then done by a dedicated Msgflo role which takes jobs of the queue. One might want a dedicated Msgflo role for delivering the Webhook as well, as this has different performance characteristics.

If you can't avoid the syncronous HTTP for long processing requests, then it can be beneficial to put them on a queue. Heroku for instance recommends putting things taking longer than 500ms on a queue and using a background worker to process it.

@roscopecoltran
Copy link
Author

The input is coming from mobile devices camera with a rate of 4 images per seconds, already resized at 320x240px, in real time, and need to give back a reply in 300ms. That's why I was thinking about it such way.

As after giving a reply to the mobile user, I will queue some async jobs for more advanced processing or logging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants