Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prototype] Introduce KeyGrouping to make it possible to enable OrderedSerialization without code changes #1857

Open
wants to merge 3 commits into
base: 0.17.x
Choose a base branch
from

Conversation

ttim
Copy link
Collaborator

@ttim ttim commented Jun 2, 2018

We're considering to enable OrderedSerialization for most users at Twitter. Currently we have a blocker for that - users needs to change a source code to enable it (and not only pass a parameter, which is much more beneficial for us because we have auto-tuning infrastructure to make incremental land of such things possible).

I've tried to prototype how we can workaround this - basic idea is to introduce KeyGrouping class which holds implicitly provided Ordering and generated OrderedSerialization. With this approach we can choose which one of them to use in runtime.
Another pros of this approach:

  • users will not be needed to import OrderedSerialization if they want to use it - it would be provided by default
  • Scalding's key grouping will not mix with general orderings in the code - right now when you import OrderedSerialization on top level you can accidentally break some other ordering use cases.

Another thing which is nice about this approach is the fact it's source backward compatible.

This is just prototype (it's not even holds both OrderedSerialization and Ordering, I used it to gather Ordering's stats across our repo) but it illustrates the idea.

@johnynek what do you think?

@CLAassistant
Copy link

CLAassistant commented Jun 2, 2018

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ttim
❌ Timur Abishev


Timur Abishev seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Collaborator

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My knee jerk is reaction is that we should be very suspect of an unusual macro here. We have thought about having OrderedSerialization required but I want to explore this carefully. Another approach is a wrapper that requires OrderedSerialization.

If you have users with conflicting Orderings, can we examine that a bit more? How commonly? What are the use cases? Why does it conflict with OrderedSerialization which is a superset?

val desc = ordSerDesc(implicitInScope.toString())
.getOrElse(implicitInScope.toString())
val path =
if (pos.source.path.contains("/workspace/"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

// implicit def unsafeConvert[T](ord: Ordering[T]): KeyGrouping[T] = KeyGrouping(ord)

implicit def convert[T]: KeyGrouping[T] = macro KeyGroupingMacro[T]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment what this macro is and why it is needed?

@@ -1 +1 @@
version in ThisBuild := "0.17.2"
version in ThisBuild := "0.17.2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version is out of date.

}
import scala.math.Ordering

case class KeyGrouping[T](ord: Ordering[T])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we introduce a new type here I’d like to be a sealed trait and either be an Ordering or an OrderedSerialization.

@johnynek
Copy link
Collaborator

Thinking back at this...

We could have an alternative approach:
require OrderedSerialization everywhere we use Ordering, but have a low priority implicit that uses Kryo to supply the implicit if you have an ordering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants