-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Sync To User performance (batch 3) #1898
Conversation
These are used for CSV upload, no need to serialize them.
AuditLog.create_logs_async(current_user, records_to_sync, "fetch", Time.current) unless disable_audit_logs? | ||
records = records_to_sync | ||
|
||
AuditLog.create_logs_async(current_user, records, "fetch", Time.current) unless disable_audit_logs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this ? If we still need the logging, we could send it to Datadog I guess, but its adding overhead to the sync path here:
records_by_class = records.group_by { |record| record.class.to_s }
records_by_class.each do |record_class, records_for_class|
log_data = {
user_id: user.id,
record_class: record_class,
record_ids: records_for_class.map(&:id),
action: action,
time: time
}.to_json
Note that the above is not done async, its done before the handoff to the sidekiq job.
If no-one has looked at these logs in the past month or so, maybe we can just drop them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We debated this a fair bit, and yes that block of code is synchronous. No one has looked at them not just in the past month or so, but ever, in the last 3 years.
However, the argument for keeping this is a fail-safe I think, it's like insurance.
I'll post some numbers shortly around all of these, which will show that the time consumed by this is relatively small to the other things, which is why we didn't bother cleaning this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't drop or move these to datadog afaik.
The purpose of this is to be able to audit which users have accessed specific data.
The example scenario we've used for this is: a legal authority asks us for information on which users have accessed data related to x.
Unless something has changed so that this is no longer a scenario we need to provide logs for, we'll need to keep this around in some form. Datadog doesn't work because we only retain data for 30 days there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correcto, this is a compliance issue that we need to be bullet-proof against. We'll have to take the perf hit for it. That being said, if there is any data massaging that we can push into an async job, all the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
end | ||
|
||
def to_response(model) | ||
rename_attributes(model.attributes, inverted_key_mapping).as_json | ||
rename_attributes(model.attributes, to_response_key_mapping).as_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need as_json
here ? I think to_json
does that for us, at so Oj.dump
would do the same. We may be invoking as_json
twice for every payload, worth testing at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing this in 3fbcacc which caused these specs to fail. I haven't dug into this fully but as_json
does something weird with timestamps.
[3] pry(main)> e = Encounter.first
=> #<Encounter:0x00007fcb8b256f80
....
encountered_on: Thu, 23 Jan 2020,
created_at: Thu, 23 Jan 2020 07:36:27 UTC +00:00,
updated_at: Thu, 23 Jan 2020 07:36:27 UTC +00:00>
[4] pry(main)> e.as_json
=> { ...
"encountered_on"=>Thu, 23 Jan 2020,
"created_at"=>Thu, 23 Jan 2020 07:36:27 UTC +00:00,
"updated_at"=>Thu, 23 Jan 2020 07:36:27 UTC +00:00}
[5] pry(main)> e.as_json.as_json
=> {...
"encountered_on"=>"2020-01-23",
"created_at"=>"2020-01-23T07:36:27.163Z",
"updated_at"=>"2020-01-23T07:36:27.163Z"}
I didn't want to change this just in case the app is relying on this behaviour, just to decrease the surface of change. Will dig into this some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't figure out a fix here. Calling Oj.add_to_json
doesn't have any effect either. Leaving this be for now, want to avoid breaking timestamps for apps unknowingly.
"process_token" => encode_process_token(response_process_token) | ||
}, | ||
}, mode: :compat), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried using the Oj optimized versions of encoding for some of the core objects? Trying that for dates, times, arrays and hashes would be interesting to see if it helps serialize time. See https://github.com/ohler55/oj/blob/develop/pages/JsonGem.md ... I think we would need calls like this in an initializer somewhere:
Oj.add_to_json(Array)
Oj.add_to_json(Hash)
etc...
def inverted_key_mapping | ||
key_mapping.invert | ||
def to_response_key_mapping | ||
from_request_key_mapping.invert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this naming change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Story card: ch2086
Because
Whilst perf-testing block-syncs, we discovered various inefficencies in our sync (GET) code-paths that get exacerbated when many users re-sync at the same time.
A full list of improvements and the rationale is here.
This addresses
This cleans up some cpu-intensive inefficient code from our sync to user pipeline. Most noticeably:
to_json
). This is a drop-in solution and improves ourrender
times by ~2.5x.