Releases: mars-project/mars
v0.10.0
What's Changed
- Optimize tile of DataFrame.setitem by reducing time of generating chunk meta by @qinxuye in #3140
- Increase the default value of alru cache max size by @zhongchun in #3146
- Support scipy special function with tuple output by @RandomY-2 in #3139
- Fix
DAG.to_dot
when reducers have multiple outputs by @chaokunyang in #3150 - Fix deserializing RandomStateField when its value is None by @chaokunyang in #3149
- Patch pandas magic functions to allow reverse operands by @wjsi in #3155
- Run flaky test
test_load_third_party_modules
separately by @chaokunyang in #3162 - Manually install cri-dockerd before installing kubernetes by @wjsi in #3166
- [Shuffle] Add
n_mappers
andn_reducers
toShuffleProxy
by @chaokunyang in #3160 - [Ray] task based shuffle for ray by @chaokunyang in #3040
- Add support for
{DataFrame,Series}.align
by @wjsi in #3147 - Integrate remaining error functions and fresnel integrals except
fresnel_zeros
by @RandomY-2 in #3172 - Improve numexpr fusion by @fyrestone in #3177
- Ensure key is a valid Python identifier by @fyrestone in #3190
- Bump terser from 5.7.1 to 5.14.2 in web component by @dependabot in #3194
- Implement airy functions (except the
ai_zeros
andbi_zeros
functions) by @shantam-8 in #3195 - Disable version updates for dependabot by @wjsi in #3203
- [Ray] Fix ray memory leak by @fyrestone in #3184
- [Ray] Support reducer has inputs which isn't mapper by @chaokunyang in #3206
- Refine UT and logs by @fyrestone in #3204
- release actor lock when set_subtask_result by @chaokunyang in #3210
- Refine apply key generation by @chaokunyang in #3208
- fix remove mapper data by @chaokunyang in #3214
- [Ray] Configurable subtask num_cpus by @fyrestone in #3207
- Fix versionner compatibility with PEP600 by @chaokunyang in #3223
- Support get mappers data without index/mapperids by @chaokunyang in #3222
- [Ray] RayExecutionContext.get_chunk_meta from meta service by @fyrestone in #3212
- [Ray] Share RayTaskState across tasks by @fyrestone in #3219
- [Shuffle] Support shuffle operands mapper whose outputs aren't mapper blocks by @chaokunyang in #3228
- Apply Operand Closure clean up by @vcfgv in #3205
- Fix dataframe sort_values with multiple ascendings bug in pandas < 1.4 by @fyrestone in #3234
- Lifecycle gc task service by @fyrestone in #3230
- Fix dataframe loc with slice returns incorrect results by @fyrestone in #3241
- Fix dataframe setitem bugs when partial indexes exist in target dataframe by @fyrestone in #3240
- [Shuffle] isolate mappers in different subtasks for fetch_by_index mode by @chaokunyang in #3239
- TypeDispatcher support one type multiple serializers by @fyrestone in #3242
- [Shuffle] Skip store shuffle object refs to reduce meta overhead by @chaokunyang in #3209
- [ray] Support scheduling ray tasks in Ray oscar deploy backend by @chaokunyang in #3165
- Dump subtask graph for all backends by @fyrestone in #3245
- [Metrics] Fix metrics and docs by @zhongchun in #3233
- Remove storage service from supervisor by @vcfgv in #3254
- Fix optimization rule memory leak by @fyrestone in #3246
- fsspec integration by @hekaisheng in #3253
- [Ray] Enable CI of mars/dataframe for Ray DAG by @fyrestone in #3250
- Fix minikube installation by @hekaisheng in #3244
- Implements scipy.stats.rankdata by @shantam-8 in #3218
- Add S3 support by @fyrestone in #3258
- Fix tensor frexp by @fyrestone in #3259
- Optimize the display of task process bar by @zhongchun in #3264
- [Ray] Optimize ray executor submit subtask by @fyrestone in #3271
- [Ray] Enable CI of mars/learn for Ray DAG by @fyrestone in #3261
- [Ray] Enable CI of mars/tensor for Ray DAG by @fyrestone in #3275
- Compatible with pandas 1.5.0 by @hekaisheng in #3276
- Remove skip_ray_dag mark for raydataset tests by @vcfgv in #3255
- MapChunk Operand Closure and Callable cleanup by @vcfgv in #3238
- [Ray] Spread scheduling subtasks with empty dependencies by @fyrestone in #3281
- Speedup mars deserialization by new by @chaokunyang in #3283
- A cython-based ordered_set to speedup
discard
operation by @chaokunyang in #3277 - Optimize concat by @fyrestone in #3286
- Fix
md.concat
error when there are same fetch chunk data by @zhongchun in #3285 - [Ray] Improve Ray executor GC by @fyrestone in #3287
- Fix some CI issues by @hekaisheng in #3296
- [Ray] Implement Ray executor subtask GC by @fyrestone in #3294
- [Ray] Add metrics for Ray executor by @fyrestone in #3295
- Bump up required vineyard version to address the CI failure. by @sighingnow in #3298
- [Operand] support loc setitem by @chaokunyang in #3291
- [Ray] Support worker_mem for ray executor by @fyrestone in #3300
- Fix duplicate execution by @fyrestone in #3301
- Fix CI by @hekaisheng in #3306
- [Ray] Basic slow subtask detection by @fyrestone in #3305
- Fix stats tests and pin sphinx version by @hekaisheng in #3313
- Fix s3 client kwargs by @fyrestone in #3316
- Update Mars on Ray doc by @fyrestone in #3311
Full Changelog: v0.10.0a1...v0.10.0
v0.10.0a1
This is the release notes of v0.10.0a1. See here for the complete list of solved issues and merged PRs.
New Features
- Oscar
- Stop importing main module when starting Mars local cluster (#3110)
- Tensor
- DataFrame
Enhancements
- Disable bloom filter in merge for now (#2967)
- [Ray] Implement ray task executor progress (#3008)
- Dump remote tracebacks to make local ones more friendly (#3028)
- Use tell when remove mapper data after execution (#3027)
- Optimize import speed for Mars package (#3022)
- Do not aggressively choose tree method in tile of groupby for distributed setting (#3032)
- [Ray] Implements get_chunks_result for Ray execution context (#3023)
- Refine ThreadedServiceContext.get_chunks_meta usage (#3037)
- Shuffle both sides at the same time for
md.merge
(#3041) - Assign reducer ops in task assigner to make them more balanced across cluster (#3048)
- [Ray] Destroy Ray executor when the task finish (#3049)
- [Ray] Implements get_chunks_meta for Ray execution context (#3052)
- [Ray] Support basic subtask retry and lineage reconstruction (#2969)
- Combine tree and shuffle methods in
DataFrameGroupBy.agg
tile (#3051) - [Ray] Implements get_total_n_cpu for Ray execution context (#3059)
- [Ray] Implement cancel method on Ray task executor (#3044)
- Use OS-designated ports instead of random ports to create sub pools (#3053)
- Unify DataFrameGroupByAgg's tile logic for auto method (#3084)
- Simplify router clean up when pools or clusters ends (#3086)
- Call immutable web API only once when previous call blocks (#3085)
- [Ray] Create RayTaskState actor as needed by default (#3081)
- [Ray] Implement gc for ray task executor context (#3061)
- Simplify argument passing in actor batch calls (#3098)
- Optimize performance of transfer (#3091)
- Add
n_reducers
andreducer_ordinal
to shuffle operands (#3055) - Optimize serializable memory (#3120)
Bug fixes
- Fix errors when deleting mapper data (#3018)
- Fix recursive_tile that it may cause duplicated tile for one tileable (#3021)
- Fix error message when sparse data format not supported (#3046)
- Patch pandas to make pickle compatible between 1.2 and 1.3 (#3047)
- Fix chunk index error in auto_merge_chunks (#3057)
- [Ray] Fix ray worker failover (#3080)
- [Metric] Fix prometheus metric backend (#3124)
- Fix mt.{cumsum, cumprod} when the first chunk is empty (#3134)
Tests
- Check initialization of serializables on CI (#3007)
- Use @pytest_asyncio.fixture instead of @pytest.fixture for async fixtures (#3025)
- Change code owners to Mars PMC maintainers (#3031)
- [Ray] Fix ray executor progress test (#3033)
- [Ray] Optimize Ray CI execution time and stability (#3102)
- Make test_session_set_progress more stable under Ray tests (#3103)
- Update pytest imports for test_special.py (#3129)
- [Ray] Fix flaky test
test_optional_supervisor_node
(#3133)
Others
v0.9.0
This is the release notes of v0.9.0. See here for the complete list of solved issues and merged PRs.
This release note only covers the difference from v0.9.0rc3; for all highlights and changes, please refer to the release notes of the pre-releases:
alpha1
alpha2
beta1
beta2
rc1
rc2
rc3
Changes that break compatibility
From v0.9 on, Python 3.6 is dropped support.
Highlights
- Performance is fully optimized in this version, welcome to give your feedback.
New Features
- Oscar
- Stop importing main module when starting Mars local cluster (#3113)
- Tensor
- DataFrame
Enhancements
- Dump remote tracebacks to make local ones more friendly (#3030)
- Optimize import speed for Mars package (#3035)
- [Ray] Implement ray task executor progress (#3065)
- Shuffle both sides at the same time for
md.merge
(#3066) - Refine ThreadedServiceContext.get_chunks_meta usage (#3067)
- Do not aggressively choose tree method in tile of groupby for distributed setting (#3070)
- Disable bloom filter in merge for now (#3071)
- [Ray] Implements get_chunks_result for Ray execution context (#3072)
- Use tell when remove mapper data after execution (#3073)
- Assign reducer ops in task assigner to make them more balanced across cluster (#3075)
- [Ray] Destroy Ray executor when the task finish (#3074)
- Combine tree and shuffle methods in
DataFrameGroupBy.agg
tile (#3077) - [Ray] Implements get_chunks_meta for Ray execution context (#3076)
- Use OS-designated ports instead of random ports to create sub pools (#3087)
- Call immutable web API only once when previous call blocks (#3088)
- Unify DataFrameGroupByAgg's tile logic for auto method (#3094)
- [Ray] Support basic subtask retry and lineage reconstruction (#3097)
- Simplify argument passing in actor batch calls (#3100)
- [Ray] Implements get_total_n_cpu for Ray execution context (#3104)
- Optimize performance of transfer (#3105)
- Add
n_reducers
andreducer_ordinal
to shuffle operands (#3107) - [Ray] Implement cancel method on Ray task executor (#3093)
- [Ray] Create RayTaskState actor as needed by default (#3114)
- [Ray] Implement gc for ray task executor context (#3116)
- Optimize serializable memory (#3126)
Bug fixes
- Patch pandas to make pickle compatible between 1.2 and 1.3 (#3050)
- Fix errors when deleting mapper data (#3064)
- Fix chunk index error in auto_merge_chunks (#3068)
- Fix recursive_tile that it may cause duplicated tile for one tileable (#3069)
- [Ray] Fix ray worker failover (#3115)
- [Ray] Fix pandas schema parsing when reading Ray dataset (#3117)
- [Ray] fix auto scale-in hang (#3125)
- [Metric] Fix prometheus metric backend (#3127)
- Fix mt.{cumsum, cumprod} when the first chunk is empty (#3136)
Tests
- Check initialization of serializables on CI (#3013)
- [Ray] Optimize Ray CI execution time and stability (#3121)
- Update pytest imports for test_special.py (#3131)
- [Ray] Fix flaky test test_optional_supervisor_node (#3135)
Others
- Build web code before CIBW when deploying to PyPI (#3016)
v0.8.7
v0.9.0rc3
This is the release notes of v0.9.0rc3. See here for the complete list of solved issues and merged PRs.
New Features
- Tensor
- Implementing Ellipsoidal Harmonics Functions (#2891, thanks @shantam-8!)
- Services
Enhancements
- Add execution API to enable custimization of Mars Task Service (#2894)
- Optimize serialization performance (#2914)
- Skip adding band in meta when fetch shuffle data (#2922)
- Store complete meta on worker and update supervisor meta via fetching from workers (#2912)
- Use cython to accelerate core serialization (#2924)
- Refine lifecycle api to support incref or decref with ref counts (#2926)
- Ignore fetch operands when assign initial nodes (#2929)
- Use cython to accelerate message serialization (#2932)
- Ignore broadcaster's locality when assign subtasks (#2943)
- Allow spawning serialization to threads for large objects (#2944)
- Add metrics and event report for Ray channels (#2936)
- Add more logs about execution info (#2940)
- Add support for
dask.persist
(#2953, thanks @loopyme!) - Remove
should_be_monotonic
property (#2949) - Add metrics on operand and subtask executions (#2947, thanks @zhongchun!)
- [Ray] optimize ray fetcher by query in remote node (#2957)
- Improve deploy backend (#2958)
- Support reporting tile progress (#2954)
- Add logic key for tileable graph (#2961, thanks @zhongchun!)
- [Ray] Loads the subtask inputs from meta (#2976)
- New ExecutionConfig API (#2968)
- Fix speculative execution compatibility with coloring (#2995)
- Make functions that may take long run in thread for lifecycle tracker (#2992)
- Optimize metric configs (#2996, thanks @zhongchun!)
- Expand the ability of resource evaluator (#2997, thanks @zhongchun!)
- Optimize gen subtask graph (#3004)
- [Ray] Ray execution state (#3002)
Bug fixes
- Fix paramter issue of worker actor pool (#2911, thanks @zhongchun!)
- Fix default config to ensure storage backends configured (#2935)
- Wrap errors in operand execution to protect scheduling service (#2964)
- Fix dtype of series result for
DataFrame.apply
(#2978) - Fix potential data leak for shuffle tasks (#2975)
- Fix potential empty chunks when creating DataFrame from pandas (#2987)
- [Ray] Support new ray cluster through ray client (#2981)
- Fix missing extra_params when constructing operands (#2999)
- Fix
msg_to_simple_str
in Ray backend and add tests (#3003) - Fix incorrect result for
df.sort_values
when specifying multiple ascending (#2984)
Documentation
- Add development documents for metrics (#2955, thanks @zhongchun!)
Tests
v0.8.6
This is the release notes of v0.8.6. See here for the complete list of solved issues and merged PRs.
New Features
- Tensor
- Implementing Ellipsoidal Harmonics Functions (#2927, thanks @shantam-8!)
Enhancements
- Add support for
dask.persist
(#2990, thanks @loopyme!) - Optimize gen subtask graph (#3006)
- Ignore broadcaster's locality when assign subtasks (#2994)
Bug fixes
- Fix task hang when error object cannot be pickled (#2913)
- Fix potential KeyError in actor_ref calls when running with multiple processes (#2962)
- Wrap errors in operand execution to protect scheduling service (#2971)
- Fix dtype of series result for
DataFrame.apply
(#2979) - Fix default config to ensure storage backends configured (#2989)
- Fix potential empty chunks when creating DataFrame from pandas (#2991)
- Fix incorrect result for
df.sort_values
when specifying multiple ascending (#3006) - Fix missing extra_params when constructing operands (#3006)
Tests
- Fix version mismatch between kubernetes and minikube (#2988)
v0.9.0rc2
This is the release notes of v0.9.0rc2. See here for the complete list of solved issues and merged PRs.
New Features
- Web
- Add stack display page on Mars Web (#2876)
Enhancements
- Avoid printing too many messages in Oscar (#2871)
- Expand slot scheduler to resource scheduler (#2846, thanks @zhongchun!)
- Optimized iterative tiling by pruning unrelated chunks (#2874)
- Optimize
DataFrameIsin
's tile (#2864) - Add benchmark for serialization (#2901)
- [Ray] Ray client channel get recv when first complied (#2740, thanks @Catch-Bull!)
- Use bloom filter to optimize df.merge execution (#2895)
- Stop recording all mapper meta (#2900)
- [Ray] Use main pool as owner when autoscale disabled (#2878)
Bug fixes
- Fix XGBoost when some workers do not have
evals
data (#2861) - Fix duplicate node iteration in GraphAssigner (#2857)
- Raise ActorNotExist when no supervisors available (#2859)
- Fix dtype infer in DataFrame arithmetic on datetime consts (#2879)
- Fix timeout for
wait_task
(#2883) - Make sure error can be raised in
Actor.__pre_destroy__
(#2887)
Tests
v0.8.5
This is the release notes of v0.8.5. See here for the complete list of solved issues and merged PRs.
New Features
- Web
- Add stack display page on Mars Web (#2881)
Enhancements
- Avoid printing too many messages in Oscar (#2880)
- [Ray] Use main pool as owner when autoscale disabled (#2903)
Bug fixes
- Fix XGBoost when some workers do not have
evals
data (#2863) - Raise ActorNotExist when no supervisors available (#2869)
- Fix dtype infer in DataFrame arithmetic on datetime consts (#2880)
- Fix duplicate node iteration in GraphAssigner (#2880)
- Fix timeout for
wait_task
(#2890) - Make sure errors can be raised in
Actor.__pre_destroy__
(#2892)
Tests
v0.9.0rc1
This is the release notes of v0.9.0rc1. See here for the complete list of solved issues and merged PRs.
New Features
- Tensor
- Implements
mars.tensor.setdiff1d
(#2823)
- Implements
- Learn
- Added support for
mars.learn.metrics.roc_auc_score
(#2832)
- Added support for
- Services
- A speculative execution based task scheduler (#2576)
- Metric
- [ray] Add metric for ray object store (#2776, thanks @Catch-Bull!)
- Others
- Use versioneer to manage release versions (#2806)
Enhancements
- Support generating a DOT file for subtask graph (#2803)
- Support generating dtypes, index_value etc lazily for DataFrame chunks (#2756)
- [ray] Default enable fault tolerance for ray (#2801)
- Improve subtask details in logs (#2836)
- Accurate resource management for global slot manager (#2732)
- Configure nthread of XGBoost jobs (#2844)
- Improved performance of
mars.learn.metrics.{roc_curve, roc_auc_score}
(#2838) - Bump minimist and nanoid in Mars UI due to security alerts (#2849)
- Fix store duplicate chunk and meta per subtask (#2845)
Bug fixes
- Fix default value of
gpu
property for some operands (#2811) - Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2817)
- Fix race condition of
set_subtask_result
(#2784) - Fix duplicate subtask submit (#2815)
- Change
StorageHandlerActor
to stateful (#2824) - Fix running xgboost on Ray cluster (#2826)
- Fix
FileSystem.ls
for OSS (#2837) - Stop fetching data when pure dependencies specified (#2840)
- Fix dirty version number caused by versioneer when building with cibuildwheel (#2855)
Tests
v0.8.4
This is the release notes of v0.8.4. See here for the complete list of solved issues and merged PRs.
New Features
- Tensor
- Implements
mars.tensor.setdiff1d
(#2829)
- Implements
- Learn
- Added support for
mars.learn.metrics.roc_auc_score
(#2841)
- Added support for
- Others
Enhancements
- Support generating a DOT file for subtask graph (#2818)
- Enhance subtask details in logs (#2842)
- Configure cores of XGBoost jobs (#2847)
- Improved performance of
mars.learn.metrics.{roc_curve, roc_auc_score}
(#2850) - Fix store duplicate chunk and meta per subtask (#2851)
- Bump minimist and nanoid in Mars UI due to security alerts (#2851)
Bug fixes
- Fix race condition of set_subtask_result (#2819)
- Fix duplicate subtask submit (#2819)
- Fixes the failure on Vineyard CI by ensure the input tensor chunk is a numpy's ndarray (#2819)
- Fix default value of gpu property for some operands (#2820)
- Fix running xgboost on Ray cluster (#2830)
- Change StorageHandlerActor to stateful (#2830)
- Fix
FileSystem.ls
for OSS (#2842) - Stop fetching data when pure dependencies specified (#2843)