Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Invalid Component 'NoPreprocessing' in 'data_preprocessor' Argument (Fixes #1745) #1750

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/citation_cff.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Check out a copy of the repository
uses: actions/checkout@v3
uses: actions/checkout@v3.1.0

- name: Check whether the citation metadata from CITATION.cff is valid
uses: citation-file-format/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dist.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:

steps:
- name: Check out the repo
uses: actions/checkout@v3
uses: actions/checkout@v3.1.0
with:
submodules: recursive

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:

steps:
- name: Check out the repo
uses: actions/checkout@v3
uses: actions/checkout@v3.1.0
with:
submodules: recursive

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ jobs:
steps:

- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v3.1.0
with:
submodules: recursive

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/generate-baselines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ jobs:
python-version: ${{ steps.python-version.outputs.value }}

- name: Checkout Automlbenchmark
uses: actions/checkout@v2
uses: actions/checkout@v3.1.0
with:
repository: ${{ env.AUTOMLBENCHMARK_REPO }}
ref: ${{ env.AUTOMLBENCHMARK_REF }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pre-commit-update.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
auto-update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3.1.0

- uses: actions/setup-python@v2

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
run-all-files:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v3.1.0
with:
submodules: recursive

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ jobs:
steps:

- name: Checkout
uses: actions/checkout@v3
uses: actions/checkout@v3.1.0
with:
submodules: recursive

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/regressions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ jobs:
# branch: the branch name

- name: Checkout Automlbenchmark
uses: actions/checkout@v3
uses: actions/checkout@v3.1.0
with:
repository: ${{ env.AUTOMLBENCHMARK_REPO }}
ref: ${{ env.AUTOMLBENCHMARK_REF }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/stale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v5
- uses: actions/stale@v6
with:
days-before-stale: 60
days-before-close: 7
Expand Down
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
repos:

- repo: https://github.com/pycqa/isort
rev: 5.10.1
rev: 5.11.5
hooks:
- id: isort
name: isort imports autosklearn
Expand All @@ -15,7 +15,7 @@ repos:
files: test/.*

- repo: https://github.com/psf/black
rev: 22.6.0
rev: 23.3.0
hooks:
- id: black
name: black formatter autosklearn
Expand All @@ -31,15 +31,15 @@ repos:

# This is disabled as most modules fail this
- repo: https://github.com/pycqa/pydocstyle
rev: 6.1.1
rev: 6.3.0
hooks:
- id: pydocstyle
files: DISABLED # autosklearn/.*
always_run: false
additional_dependencies: ["toml"] # Needed to parse pyproject.toml

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.971
rev: v1.2.0
hooks:
- id: mypy
name: mypy auto-sklearn
Expand Down
7 changes: 4 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,10 +252,11 @@ Lastly, if the feature really is a game changer or you're very proud of it, cons
make doc
```
* If you're unfamiliar with sphinx, it's a documentation generator which can read comments and docstrings from within the code and generate html documentation.
* If you've added documentation, we also has a command `linkcheck` for making sure all the links correctly go to some destination.
* If you've added documentation, we also have a command `links` for making
sure all the links correctly go to some destination.
This helps tests for dead links or accidental typos.
```bash
make linkcheck
make links
```
* We also use sphinx-gallery which can take python files (such as those in the `examples` folder) and run them, creating html which shows the code and the output it generates.
```bash
Expand Down Expand Up @@ -396,7 +397,7 @@ Lastly, if the feature really is a game changer or you're very proud of it, cons
# If you changed documentation:
# This will generate all documentation and check links
make doc
make linkcheck
make links
make examples # mainly needed if you modified some examples

# ... fix any issues
Expand Down
2 changes: 1 addition & 1 deletion autosklearn/__version__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Version information."""

# The following line *must* be the last in the module, exactly as formatted:
__version__ = "0.15.0"
__version__ = "0.16.0dev"
35 changes: 23 additions & 12 deletions autosklearn/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@
warnings_to,
)
from autosklearn.util.parallel import preload_modules
from autosklearn.util.progress_bar import ProgressBar
from autosklearn.util.smac_wrap import SMACCallback, SmacRunCallback
from autosklearn.util.stopwatch import StopWatch

Expand Down Expand Up @@ -239,6 +240,7 @@ def __init__(
get_trials_callback: SMACCallback | None = None,
dataset_compression: bool | Mapping[str, Any] = True,
allow_string_features: bool = True,
disable_progress_bar: bool = False,
):
super().__init__()

Expand Down Expand Up @@ -295,6 +297,7 @@ def __init__(
self.logging_config = logging_config
self.precision = precision
self.allow_string_features = allow_string_features
self.disable_progress_bar = disable_progress_bar
self._initial_configurations_via_metalearning = (
initial_configurations_via_metalearning
)
Expand Down Expand Up @@ -626,6 +629,12 @@ def fit(
# By default try to use the TCP logging port or get a new port
self._logger_port = logging.handlers.DEFAULT_TCP_LOGGING_PORT

progress_bar = ProgressBar(
total=self._time_for_task,
disable=self.disable_progress_bar,
desc="Fitting to the training data",
colour="green",
)
# Once we start the logging server, it starts in a new process
# If an error occurs then we want to make sure that we exit cleanly
# and shut it down, else it might hang
Expand All @@ -643,6 +652,7 @@ def fit(
# space
self._backend.save_start_time(self._seed)

progress_bar.start()
self._stopwatch = StopWatch()

# Make sure that input is valid
Expand Down Expand Up @@ -961,6 +971,7 @@ def fit(
self._logger.exception(e)
raise e
finally:
progress_bar.join()
self._fit_cleanup()

self.fitted = True
Expand Down Expand Up @@ -1910,15 +1921,17 @@ def cv_results_(self):
metric_dict[metric.name] = []
metric_mask[metric.name] = []

model_ids = []
mean_fit_time = []
params = []
status = []
budgets = []

for run_key in self.runhistory_.data:
run_value = self.runhistory_.data[run_key]
for run_key, run_value in self.runhistory_.data.items():
config_id = run_key.config_id
config = self.runhistory_.ids_config[config_id]
if run_value.additional_info and "num_run" in run_value.additional_info:
model_ids.append(run_value.additional_info["num_run"])

s = run_value.status
if s == StatusType.SUCCESS:
Expand Down Expand Up @@ -1979,6 +1992,8 @@ def cv_results_(self):
metric_dict[metric.name].append(metric_value)
metric_mask[metric.name].append(mask_value)

results["model_ids"] = model_ids

if len(self._metrics) == 1:
results["mean_test_score"] = np.array(metric_dict[self._metrics[0].name])
rank_order = -1 * self._metrics[0]._sign * results["mean_test_score"]
Expand Down Expand Up @@ -2154,14 +2169,11 @@ def show_models(self) -> dict[int, Any]:
warnings.warn("No ensemble found. Returning empty dictionary.")
return ensemble_dict

def has_key(rv, key):
return rv.additional_info and key in rv.additional_info

table_dict = {}
for run_key, run_val in self.runhistory_.data.items():
if has_key(run_val, "num_run"):
model_id = run_val.additional_info["num_run"]
table_dict[model_id] = {"model_id": model_id, "cost": run_val.cost}
for run_key, run_value in self.runhistory_.data.items():
if run_value.additional_info and "num_run" in run_value.additional_info:
model_id = run_value.additional_info["num_run"]
table_dict[model_id] = {"model_id": model_id, "cost": run_value.cost}

# Checking if the dictionary is empty
if not table_dict:
Expand All @@ -2174,21 +2186,20 @@ def has_key(rv, key):

table = pd.DataFrame.from_dict(table_dict, orient="index")
table.sort_values(by="cost", inplace=True)
table["rank"] = np.arange(1, len(table.index) + 1)

# Check which resampling strategy is chosen and selecting the appropriate models
is_cv = self._resampling_strategy == "cv"
models = self.cv_models_ if is_cv else self.models_

rank = 1 # Initializing rank for the first model
for (_, model_id, _), model in models.items():
model_dict = {} # Declaring model dictionary

# Inserting model_id, rank, cost and ensemble weight
model_dict["model_id"] = table.loc[model_id]["model_id"].astype(int)
model_dict["rank"] = rank
model_dict["rank"] = table.loc[model_id]["rank"].astype(int)
model_dict["cost"] = table.loc[model_id]["cost"]
model_dict["ensemble_weight"] = table.loc[model_id]["ensemble_weight"]
rank += 1 # Incrementing rank by 1 for the next model

# The steps in the models pipeline are as follows:
# 'data_preprocessor': DataPreprocessor,
Expand Down
11 changes: 9 additions & 2 deletions autosklearn/estimators.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ def __init__(
get_trials_callback: SMACCallback | None = None,
dataset_compression: Union[bool, Mapping[str, Any]] = True,
allow_string_features: bool = True,
disable_progress_bar: bool = False,
):
"""
Parameters
Expand Down Expand Up @@ -275,12 +276,12 @@ def __init__(

smac_scenario_args : dict, optional (None)
Additional arguments inserted into the scenario of SMAC. See the
`SMAC documentation <https://automl.github.io/SMAC3/main/api/smac.scenario.scenario.html#module-smac.scenario.scenario>`_
`SMAC documentation <https://automl.github.io/SMAC3/main/api/smac.scenario.html#smac.scenario.Scenario>`_
for a list of available arguments.

get_smac_object_callback : callable
Callback function to create an object of class
`smac.optimizer.smbo.SMBO <https://automl.github.io/SMAC3/main/api/smac.optimizer.smbo.html>`_.
`smac.facade.AbstractFacade <https://automl.github.io/SMAC3/main/api/smac.facade.html>`_.
The function must accept the arguments ``scenario_dict``,
``instances``, ``num_params``, ``runhistory``, ``seed`` and ``ta``.
This is an advanced feature. Use only if you are familiar with
Expand Down Expand Up @@ -381,6 +382,10 @@ def __init__(
Whether autosklearn should process string features. By default the
textpreprocessing is enabled.

disable_progress_bar: bool = False
Whether to disable the progress bar that is displayed in the console
while fitting to the training data.

Attributes
----------
cv_results_ : dict of numpy (masked) ndarrays
Expand Down Expand Up @@ -475,6 +480,7 @@ def __init__(
self.get_trials_callback = get_trials_callback
self.dataset_compression = dataset_compression
self.allow_string_features = allow_string_features
self.disable_progress_bar = disable_progress_bar

self.automl_ = None # type: Optional[AutoML]

Expand Down Expand Up @@ -525,6 +531,7 @@ def build_automl(self):
get_trials_callback=self.get_trials_callback,
dataset_compression=self.dataset_compression,
allow_string_features=self.allow_string_features,
disable_progress_bar=self.disable_progress_bar,
)

return automl
Expand Down
8 changes: 7 additions & 1 deletion autosklearn/experimental/askl2.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ def __init__(
load_models: bool = True,
dataset_compression: Union[bool, Mapping[str, Any]] = True,
allow_string_features: bool = True,
disable_progress_bar: bool = False,
):

"""
Expand Down Expand Up @@ -263,7 +264,7 @@ def __init__(

smac_scenario_args : dict, optional (None)
Additional arguments inserted into the scenario of SMAC. See the
`SMAC documentation <https://automl.github.io/SMAC3/main/api/smac.scenario.scenario.html#module-smac.scenario.scenario>`_
`SMAC documentation <https://automl.github.io/SMAC3/main/api/smac.scenario.html#smac.scenario.Scenario>`_
for a list of available arguments.

logging_config : dict, optional (None)
Expand All @@ -284,6 +285,10 @@ def __init__(
load_models : bool, optional (True)
Whether to load the models after fitting Auto-sklearn.

disable_progress_bar: bool = False
Whether to disable the progress bar that is displayed in the console
while fitting to the training data.

Attributes
----------

Expand Down Expand Up @@ -337,6 +342,7 @@ def __init__(
scoring_functions=scoring_functions,
load_models=load_models,
allow_string_features=allow_string_features,
disable_progress_bar=disable_progress_bar,
)

def train_selectors(self, selected_metric=None):
Expand Down
Loading