Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[20220323] NAS Roadmap 2022

Yuge Zhang edited this page Mar 23, 2022 · 2 revisions

This document is to track current status and work items of NAS.

Current version: v2.6 (v2.7 pending release).

Planned Architecture Diagram

Milestones

  • The first end-to-end example: DARTS algo on DARTS search space: https://github.com/microsoft/nni/pull/4509 (blocked by one-shot refactoring)
  • Showcase of a slightly more complex case, demonstrating SOTA results.

Breakdowns

Unfinished work items till first milestone:

Till second milestone:

  • Need discussion to finalize a concrete "complex case".
  • Might be more unseen bug fixes and enhancements.

Backlog

  • Constructing model space
    • Space hub
      • APIs on retrieving searched results directly
      • Test already implemented spaces (we will have a space hub reproducibility list)
        • Make sure they are runnable
        • Load checkpoint of searched architecture and evaluate
        • Reproduce re-training
        • Runnable with built-in algos
        • Reproduce result with at least one algo
          • Pending work item: integrating training service of Microsoft internal clusters
        • (if a benchmark search space) test with benchmark
      • Incorporating spaces featuring NLP and speech tasks.
    • Mutation primitives
      • More APIs, e.g., Permute, ValueRange
      • A higher-level API to unify the usages. For example, oneof() to unify XXXChoice
      • Primitive mutators are too messy, need refactor
    • User experience
      • Use value-choice on base model's arguments
  • Evaluator
    • More built-in evaluators for cases like self-supervision, object detection (depending on space hub)
    • More fine-grained control for logs, checkpoints, visualizations for Lightning-based evaluators (details yet to be discovered)
  • Strategy
  • Engine
    • Refactor model-IR converter into "base" execution engine.
  • Experiment
    • Interface refactor (unify with HPO experiment)
    • Bug fixes
      • Process won't stop without start()
      • Sometimes fails with OS Error 9
      • Sometimes needs twice Ctrl+C to kill the experiment
    • Export model enhancements
    • Visualization
      • More friendly tips when the model can't be visualized
      • Visualizing one-shot experiments
  • Others
    • Serializer
      • Known issue with inheritance
      • Known issue with __new__

Some features which I think is not important at all in the current stage is not listed in the backlog. For example, Cross-graph optimization, Tensorflow support, unifying mutation APIs between one-shot and multi-trial.