You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is to track the development of the Cardea class. Previous updates were also mentioned in #73.
The Cardea class is responsible for handling and interacting with all the components (data_assembler, data_labeler, featurizer, and modeler).
Overall, the Cardea class:
Provides simple user-facing abstractions
label: generate label times
featurize: generate feature matrix
fit/predict
evaluate
save/load
Hides away the interaction with other systems
Entityset
Featuretools DeepFeatureSynthesis
ComposeML
MLBlocks Pipelines
Pipeline Selection and Tuning
design choices:
remove load_entityset and make the assumption that a cardea instance only deals with one data source. The data is loaded upon instantiation.
change generate_label_time -> label.
change generate_feature_matrix -> featurize.
allow the user to inspect label_times and feature_matrix.
This should be the class public interface:
classCardea:
def__init__(self,
data: str=DEFAULT_DATA,
labeler: FunctionType=DEFAULT_LABELER,
pipeline: Union[str, dict, MLPipeline] =DEFAULT_PIPELINE,
hyperparameters: dict=None):
passdeflabel(self,
labeler: FunctionType=None,
parameter: dict=None) ->pd.DataFrame:
"""Create label times using the data labeler. Args: labeler (function): Function that defines the prediction task, it should return a tuple of labeling function, the dataframe, and the name of the target entity. parameter (dict): Variables to change the default parameters, if any. Returns: pandas.DataFrame: A dataframe of cutoff times and their target labels. """passdeffeaturize(self,
label_times: pd.DataFrame,
verbose: bool=False) ->pd.DataFrame:
"""Returns a the calculated feature matrix. Args: label_times (pandas.DataFrame): A dataframe that indicates cutoff time for each instance. verbose (bool): Indicate verbosity of the featurization. Returns: pandas.DataFrame: Generated feature matrix. """passdeffit(self,
X: Union[np.ndarray, pd.DataFrame],
y: Union[np.ndarray, pd.Series, list],
tune: bool=False,
max_evals: int=10,
scoring: str=None,
verbose: bool=False) ->None:
"""Train the cardea pipeline. Args: X (pandas.DataFrame or numpy.ndarray): Inputs to the pipeline. y (pandas.Series, numpy.ndarray or list): Target values. tune (bool): Whether to optimize hyper-parameters of the pipelines. max_evals (int): Maximum number of hyper-parameter optimization iterations. scoring (str): The name of the scoring function used in the hyper-parameter optimization. verbose (bool): Whether to log information during processing. """passdefpredict(self, X: Union[np.ndarray, pd.DataFrame]) ->Union[np.ndarray, list]:
"""Get predictions from the cardea pipeline. Args: X (pandas.DataFrame or numpy.ndarray): Inputs to the pipeline. Returns: numpy.ndarray or list: Predictions to the input data. """passdeffit_predict(self,
X: Union[np.ndarray, pd.DataFrame],
y: Union[np.ndarray, pd.Series, list],
tune: bool=False,
max_evals: int=10,
scoring: str=None,
verbose: bool=False) ->Union[np.ndarray, list]:
"""Train a cardea pipeline then make predictions. Args: X (pandas.DataFrame or numpy.ndarray): Inputs to the pipeline. y (pandas.Series, numpy.ndarray or list): Target values. tune (bool): Whether to optimize hyper-parameters of the pipelines. max_evals (int): Maximum number of hyper-parameter optimization iterations. scoring (str): The name of the scoring function used in the hyper-parameter optimization. verbose (bool): Whether to log information during processing. Returns: numpy.ndarray: Predictions to the input data. """passdefevaluate(self,
X: Union[np.ndarray, pd.DataFrame],
y: Union[np.ndarray, pd.Series, list],
test_size: float=0.2,
shuffle: bool=True, fit: bool=False,
tune: bool=False,
max_evals: int=10,
scoring: str=None,
metrics: List[str] =DEFAULT_METRICS,
verbose: bool=False) ->pd.Series:
"""Evaluate the cardea pipeline. Args: X (pandas.DataFrame or numpy.ndarray): Inputs to the pipeline. y (pandas.Series, numpy.ndarray or list): Target values. test_size (float): The proportion of the dataset to include in the test dataset. shuffle (bool): Whether or not to shuffle the data before splitting. fit (bool): Whether to fit the pipeline before evaluating it. Defaults to ``False``. tune (bool): Whether to optimize hyper-parameters of the pipelines. max_evals (int): Maximum number of hyper-parameter optimization iterations. scoring (str): The name of the scoring function used in the hyper-parameter optimization. metrics (list): A list of scoring function names. The scoring functions should be consistent with the problem type. verbose (bool): Whether to log information during processing. Returns: pandas.Series: ``pandas.Series`` containing one element for each metric applied, with the metric name as index. """passdefsave(self, path: str) ->None:
"""Save this object using pickle. Args: path (str): Path to the file where the serialization of this object will be stored. """passdefload(cls, path: str) ->Cardea:
"""Load an Cardea instance from a pickle file. Args: path (str): Path to the file where the instance has been previously serialized. Returns: Cardea: A Cardea instance Raises: ValueError: If the serialized object is not an Cardea instance. """pass
In addition to the main APIs, there will be helper functions such as set_pipeline, and train_test_split to allow the users to have a bit of flexibility in modifying the MLPipeline and hyperparameters.
The text was updated successfully, but these errors were encountered:
This issue is to track the development of the Cardea class. Previous updates were also mentioned in #73.
The Cardea class is responsible for handling and interacting with all the components (
data_assembler
,data_labeler
,featurizer
, andmodeler
).Overall, the Cardea class:
design choices:
load_entityset
and make the assumption that a cardea instance only deals with one data source. The data is loaded upon instantiation.generate_label_time
->label
.generate_feature_matrix
->featurize
.label_times
andfeature_matrix
.This should be the class public interface:
In addition to the main APIs, there will be helper functions such as
set_pipeline
, andtrain_test_split
to allow the users to have a bit of flexibility in modifying the MLPipeline and hyperparameters.The text was updated successfully, but these errors were encountered: