Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sendto of some models fails #55

Open
cortner opened this issue Jun 24, 2023 · 5 comments
Open

Sendto of some models fails #55

cortner opened this issue Jun 24, 2023 · 5 comments

Comments

@cortner
Copy link
Member

cortner commented Jun 24, 2023

the line assemble.jl#L25

    (nprocs() > 1) && sendto(workers(), basis = basis)

fails for some not-entirely-standard models.

A while back we discussed serializing models to JSON, and then transferring those to the processes.

Again we may want input from a Julia expert here on how this is best done instead of hacking something together.

CC @tjjarvinen

@tjjarvinen
Copy link
Collaborator

I would like to do chages for the whole parallel assembly.

Better way to do this would be to use a pipeline style parallelization

  • Main process spawns a async task to fill RemoteChannel for job items
  • Spawn Workers listen job item RemoteChannel for work
  • One worker gets a job it calculates it and deposits to result RemoteChannel
  • Main process listens results RemoteChannel and assembles results together

Do do this you need to do some thing like this

# jobs and results are RemoteChannels
# f is a function that is mean to be done parallel
function parallel_work(jobs, results, f)
    while true
        id, input = take!(jobs)
        x = f(input...)
        put!(results, (id,x))
    end
end

# Create RemoteChannels for the work
jobs = RemoteChannel(()->Channel(2*nworkers()))
results = RemoteChannel(()->Channel(2*nworkers()))

# Create a input feeding task
# as this is async it will go on background 
# and execution continues
@async for x in inputs
        # build some id for the job
        put!(jobs, (id, x) )
end

# This spawns parallel workers that will do the work
for p in workers()
        remote_do(parallel_work, p, jobs, results, function_to_execute)
end

# Collect results and assemble them together.
for _ in allwork
     id, result = take!(results)
     # assemble result to correct place
end

This is what I have done earlier here, if you like to look for a reference.

You have to options on movin basis. One is to have it move in the RemoteChannel. The other to send it to all workers before the calculation. Latter is probably the best option here.

Also in my opinnion, the multithreaded version should do the same thing. Only change RemoteChannel to Channel and remote_do to @spawn, everything else can be the same.

@cortner
Copy link
Member Author

cortner commented Jun 25, 2023

this sounds perfect. Yes please make a new PR!

@cortner
Copy link
Member Author

cortner commented Jun 26, 2023

one other thing to keep in mind when you do this please: In the past with multi-threaded assembly we sometimes had the problem that writing into the LSQ matrix ended up being a huge overhead (and bottleneck!). We thought this might be related to memory layout but never fully explored it.

EDIT: ... and I'm noticing this again with the implementation in ACEfit - the processor load drops to around 100% (i.e. ~ 1 thread) even though all threads are still active...

@wcwitt
Copy link
Collaborator

wcwitt commented Jun 26, 2023

Open to being persuaded, but I'm pretty reluctant to make this change without a compelling reason.

Main process spawns a async task to fill RemoteChannel for job items
Spawn Workers listen job item RemoteChannel for work
One worker gets a job it calculates it and deposits to result RemoteChannel
Main process listens results RemoteChannel and assembles results together

This is functionally identical to the current approach (right?), but with 2-3x more code and less readable for non-experts.

You have to options on movin basis. One is to have it move in the RemoteChannel. The other to send it to all workers before the calculation. Latter is probably the best option here.

The latter is what we do currently, but it fails because of a Julia bug for some basis types. Hence this issue.

Again, I'm open to being persuaded - and if there are performance issues we need to address them - but I spent a fair bit of time weighing the alternatives before choosing this pattern and I don't (yet) see how this change could improve performance.

@cortner
Copy link
Member Author

cortner commented Jun 26, 2023

To me - what Teemu suggests looks quite Natural and I like the idea of having an implementation that can naturally run distributed or multithreaded.

But TBH this discussion doesn't currently address my main problem from the OP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants