Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Sync To User performance (batch 3) #1898

Merged
merged 6 commits into from
Dec 23, 2020
Merged

Improve Sync To User performance (batch 3) #1898

merged 6 commits into from
Dec 23, 2020

Conversation

prabhanshuguptagit
Copy link
Contributor

@prabhanshuguptagit prabhanshuguptagit commented Dec 17, 2020

Story card: ch2086

Because

Whilst perf-testing block-syncs, we discovered various inefficencies in our sync (GET) code-paths that get exacerbated when many users re-sync at the same time.

A full list of improvements and the rationale is here.

This addresses

This cleans up some cpu-intensive inefficient code from our sync to user pipeline. Most noticeably:

  • Use OJ instead of the default rails serializer (to_json). This is a drop-in solution and improves our render times by ~2.5x.
  • Some cleaning up in the transformers.

@harimohanraj89 harimohanraj89 temporarily deployed to simple-review-pr-1898 December 17, 2020 13:13 Inactive
AuditLog.create_logs_async(current_user, records_to_sync, "fetch", Time.current) unless disable_audit_logs?
records = records_to_sync

AuditLog.create_logs_async(current_user, records, "fetch", Time.current) unless disable_audit_logs?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this ? If we still need the logging, we could send it to Datadog I guess, but its adding overhead to the sync path here:

    records_by_class = records.group_by { |record| record.class.to_s }

    records_by_class.each do |record_class, records_for_class|
      log_data = {
        user_id: user.id,
        record_class: record_class,
        record_ids: records_for_class.map(&:id),
        action: action,
        time: time
      }.to_json

Note that the above is not done async, its done before the handoff to the sidekiq job.

If no-one has looked at these logs in the past month or so, maybe we can just drop them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We debated this a fair bit, and yes that block of code is synchronous. No one has looked at them not just in the past month or so, but ever, in the last 3 years.

However, the argument for keeping this is a fail-safe I think, it's like insurance.

I'll post some numbers shortly around all of these, which will show that the time consumed by this is relatively small to the other things, which is why we didn't bother cleaning this up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't drop or move these to datadog afaik.

The purpose of this is to be able to audit which users have accessed specific data.
The example scenario we've used for this is: a legal authority asks us for information on which users have accessed data related to x.

Unless something has changed so that this is no longer a scenario we need to provide logs for, we'll need to keep this around in some form. Datadog doesn't work because we only retain data for 30 days there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correcto, this is a compliance issue that we need to be bullet-proof against. We'll have to take the perf hit for it. That being said, if there is any data massaging that we can push into an async job, all the better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Fwiw this takes extremely little time (only ~1% of the response time, ~20ms). I think we can keep this around, even without moving out the data massaging and come back to this later in a fun friday cleanup.

end

def to_response(model)
rename_attributes(model.attributes, inverted_key_mapping).as_json
rename_attributes(model.attributes, to_response_key_mapping).as_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need as_json here ? I think to_json does that for us, at so Oj.dump would do the same. We may be invoking as_json twice for every payload, worth testing at least.

Copy link
Contributor Author

@prabhanshuguptagit prabhanshuguptagit Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing this in 3fbcacc which caused these specs to fail. I haven't dug into this fully but as_json does something weird with timestamps.

[3] pry(main)> e = Encounter.first
=> #<Encounter:0x00007fcb8b256f80
....
 encountered_on: Thu, 23 Jan 2020,
 created_at: Thu, 23 Jan 2020 07:36:27 UTC +00:00,
 updated_at: Thu, 23 Jan 2020 07:36:27 UTC +00:00>
[4] pry(main)> e.as_json
=> { ...
 "encountered_on"=>Thu, 23 Jan 2020,
 "created_at"=>Thu, 23 Jan 2020 07:36:27 UTC +00:00,
 "updated_at"=>Thu, 23 Jan 2020 07:36:27 UTC +00:00}
[5] pry(main)> e.as_json.as_json
=> {...
"encountered_on"=>"2020-01-23",
 "created_at"=>"2020-01-23T07:36:27.163Z",
 "updated_at"=>"2020-01-23T07:36:27.163Z"}

I didn't want to change this just in case the app is relying on this behaviour, just to decrease the surface of change. Will dig into this some more.

Copy link
Contributor Author

@prabhanshuguptagit prabhanshuguptagit Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't figure out a fix here. Calling Oj.add_to_json doesn't have any effect either. Leaving this be for now, want to avoid breaking timestamps for apps unknowingly.

"process_token" => encode_process_token(response_process_token)
},
}, mode: :compat),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using the Oj optimized versions of encoding for some of the core objects? Trying that for dates, times, arrays and hashes would be interesting to see if it helps serialize time. See https://github.com/ohler55/oj/blob/develop/pages/JsonGem.md ... I think we would need calls like this in an initializer somewhere:

Oj.add_to_json(Array)
Oj.add_to_json(Hash)
etc...

def inverted_key_mapping
key_mapping.invert
def to_response_key_mapping
from_request_key_mapping.invert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this naming change.

Copy link
Contributor

@harimohanraj89 harimohanraj89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@vkrmis vkrmis merged commit 8ff7ba1 into master Dec 23, 2020
@vkrmis vkrmis deleted the sync-perf-3 branch December 23, 2020 13:31
kitallis added a commit that referenced this pull request Dec 24, 2020
kitallis added a commit that referenced this pull request Dec 24, 2020
kitallis added a commit that referenced this pull request Dec 24, 2020
* Revert "Improve Sync To User performance (batch 3) (#1898)"

This reverts commit 8ff7ba1.

* Revert "Improve Sync To User performance (batch 2) (#1897)"

This reverts commit d78d0ca.

* Add the old sync indexes back
kitallis pushed a commit that referenced this pull request Dec 30, 2020
We discovered that the perf improvements for block level sync tend to slow down FG syncing. We reverted the improvements (#1920) to unblock deploys while a fix was figured out.

This combines the reverted perf improvements (#1898 and #1897) and fixes them to be FG compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants