Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior with repeating jobs of the same type #235

Open
tcastelli opened this issue May 23, 2017 · 4 comments
Open

Strange behavior with repeating jobs of the same type #235

tcastelli opened this issue May 23, 2017 · 4 comments

Comments

@tcastelli
Copy link

tcastelli commented May 23, 2017

Hello @vsivsi , I am currently finding a strange behaviour and maybe you can shed some light on the situation:

These are all the configured settings:
Meteor 1.4.3.2
job-collection 1.5.1
server: Galaxy

JobCollection
promote: every 30 min

Queue processJobs
concurrency: 1,
payload: 2,
pollInterval: 1 minute,
prefetch: 2,
callbackStrict: true,
workTimeout: 1 hour

Jobs of type TT
priority : 'normal'
repeat: schedule value "every 24 hours"
retry : wait value 15 minutes, backoff: "constant"

I create 3 Jobs of the same type "TT" and what I am expecting to see is that when they are created, they are set to run for the first time on the following day at 00:00 (that works correctly), and then, that they will be promoted to ready as late as 00:30 and than they'll be fetched by the queue in groups of 2 starting 1 minute later. In this case since all 3 should become ready at the same time (no dependencies) the prefetch would make the third one also be fetched while processing the first 2. After being processed and completed next job should be created and scheduled for next day at 00:00. Intead, this is what I see:
jc

I don't get why:

  • the after value for the first repeating jobs is set to the same day at 00:03:06 instead of 00:00:00 of next day (for 2 of them)
  • Why this happens again (settings the wrong after value) also for one of the jobs the next time both jobs that were set to after 00:03:06 are processed.

Update: Ok, so I have checked and it seems that whenever the job changes to ready, the after value gets updated, so the problem is why the "after" date is set to an incorrect value whenever the first job is done. I assume this happened since two of the jobs("mubJZ.....", and "rWNED......" were processed before the 24 at 00:00:00 and that's when they were supposed to run again like "eX6Q......")

Update2:
I have seen this problem #217 and the workaround seems to be implemented in job save,

schedule = @later?.schedule(doc.repeatWait)
 +      unless schedule and next = schedule.next(2, schedule.prev(1, doc.after))[1]

but if you look at job.done function later is also used but without that fix

          next = @later?.schedule(doc.repeatWait).next(2)
          if next and next.length > 0
            d = new Date(next[0])
            if (d - time > 500) or (next.length > 1)
              if d - time <= 500
                d = new Date(next[1])
              else
              wait = d - time
              if doc.repeatUntil - wait >= time
                jobId = @_rerun_job doc, doc.repeats - 1, wait

Not sure if that has anything to do with the problem, did you check that part too @mitar ?
(the 500 value seems a little weird to me)

Update3: I have verified that the problem comes from the previous code at job.done. If the server/db is busy and you set it like me to "every 24 hours", it is possible that the time of next[0] gets translated simply into a new Date() because of a later.js bug (if it gets done during the same 00 hour in my case) and this value can have a difference of more than 500 ms with the time when the variable time is created (db is busy or slow). In that case you stay with the value of next[0] which is the wrong date :(

I think the job.save workaround can be used also here to avoid this kind of problem.

Thanks for any update on this matter.

@vsivsi
Copy link
Owner

vsivsi commented May 25, 2017

Hi, just so you know, I'm about to drop off the internet for almost a month, so I won't be able to look into this until around June 20th. Sorry for any inconvenience.

@mitar
Copy link
Collaborator

mitar commented May 26, 2017

@tcastelli: Care to make a PR with a followup to #218 and with a test for what are you reporting here?

@tcastelli
Copy link
Author

I can try, but not really into coffeescript so a revision must be done by someone who is more fluent on that. I think by just copying and pasting your workaround would make it work for the other case too. For now, what we have done is just use "every 24hours at XX:XX" being xx:xx the time delimitation, making it harder for the error to happen (It should be done and finished within the same second)

@mitar
Copy link
Collaborator

mitar commented May 29, 2017

CoffeeScript will not bite you. :-) Try copying as you are suggesting and see if it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants