-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge develop into cascading3 #1762
Conversation
Conflicts: CHANGES.md README.md project/Build.scala scalding-core/src/main/scala/com/twitter/package.scala tutorial/execution-tutorial/ExecutionTutorial.scala version.sbt
* Set up repl tutorial * Move wonderland tutorial under tutorial/
* Clean up temporary files created by forceToDiskExecution * Try disabling travis caching * Revert "Try disabling travis caching" This reverts commit b9a04b6. * Forward filesToCleanup to parent in cleanCache * consistency * synchronize in addFilesToCleanup
…empty-dir Revert FileSource support for empty directories
…es. (twitter#1627) * TypedText - Added versions of dailyPrefixSuffix for TSV and CSV sources. * TypedText - Refactored functions for loading files with different types of delimiters.
* upgrade sbt * fix repl
…witter#1626) * Allow supplying better error messages and chained exceptions in InvalidSourceTap / InvalidSourceException * Add 2ndary constructor to InvalidSourceException
Bump chill version to 0.8.3 and fix started to fail `ConfigTest` because `void` is serializable by default in Kryo now.
…ache_execution Added API for Execution/Config to work with DistributedCache
Pick up Algebird 0.13.0
…_vkvs Use JobConf input files list for input size computation used by ReducersEstimators
Using MAP_OUTPUT_BYTES instead of FILE_BYTES_READ in RatioBasedEstimator
…ator Fixed reduce estimator for paths with a glob pattern
Expand libjars globs in ScaldingShell to match the behavior of Tool.
This PR cherry-picks changes to CHANGES.md file made during 0.17.2 release. Also this PR includes mima check related changes, but disables mima check because it's intended that binary compatibility can be broken (and it is) on develop branch.
…ph-lib Add summingbird graph library as a zero-dep module
|
looks like the hadoop tests has one test failure:
Thanks a ton for working on this. Would love to get this merged and move to cascading 3. |
Yeah there's an issue in the hadoop test and some failures in the estimator-tests. I think I might have missed something in the merge as there were a bunch of reducer & memory estimator changes that went into develop. I'm working on this on the side so if you're urgently blocked on this, I wouldn't want to hold you up so I can grant push access to my branch (or you can fork this one). |
Actually @johnynek looking at the job that fails:
The check seems to be failing on the fact that we are expecting 4 steps (we get 3 though). Tracking back the original bug - cwensel/cascading#52 it was more regarding the fact that the Cascading3 planner failed on this code path. I'm not sure if the 3 steps is a bad thing? |
@piyushnarang I agree. I think you can change the test to be I think 3 is better right? 1 Map-only job for the second part, and 1 MR job for the forced part, and 1 Map only for the force -> write. Seems like 3 is what I would have expected. |
Yeah that's what I was thinking too. Ok, I switched this out. We have some failures in the estimators left and then we should be good to go (famous last words) :-) |
very exciting. We are somehow hitting constant problems in planning now as Stripe. Many jobs take hours to plan and some have been stuck in planning for over 36 hours until we killed them. I am hoping that cascading 3 can fix this issue for us. |
hey @johnynek - so our tests are green now. There's a lot of changes here as we've diverged a fair bit. I've taken a pass to sanity check but I might have missed something :-). Looking at the diff against develop (develop...piyushnarang:cascading3) it largely looks like only cascading3 changes there. |
minor issues: develop...piyushnarang:cascading3#diff-0336b62cdc4bda6c450ac5c6a3e3fc52R273 I don't see why the recursion on the head uses the descriptions but not the tail. Why the change with respect to the current code? |
I think we should go ahead and merge this after @piyushnarang comments on my above question. I reviewed the diff. Looks expected. The only issue, I guess is some template taps were removed. I hope that is not a major downstream pain. Note: this merge is into the twitter:cascading3 branch, not develop. @pankajroark can someone at Twitter kick-off a test run of a version of scalding against that branch? If it looks good, we should merge. We are really struggling with planner runtime issues at Stripe currently and I am hopeful the new planner will help. We are creating very big jobs (since our data is smaller than yours, people can write much more complex jobs that still complete in reasonable time), so we have hundreds of steps. |
i'm going to merge this into cascading3 branch and try to fix up with a followup PR changing the build to point to https for more repos. |
Putting this up to have CI kick off tests so that we can move on to fixing tests and cleaning up stuff I might have messed up as the merge was pretty large.