Name		Name	Last commit message	Last commit date
parent directory ..
parallel_run_env		parallel_run_env
README.md		README.md
model.pkl		model.pkl
parallel_runconfig.yml		parallel_runconfig.yml
pipeline.ipynb		pipeline.ipynb
score_parallel.py		score_parallel.py

README.md

Exercise Instructions

Open pipeline.ipynb and follow the instructions in the notebook.

Knowledge Check

❓ Question: How can we change where ParallelRunStep should write its output to?

✅ See solution!

We can use the OutputFileDatasetConfig class. There, we can define the destination, which points to a folder on a datastore:

# Direct path
output_dataset = OutputFileDatasetConfig(name='batch_results', destination=(datastore, 'batch-scoring-results/'))

# run-id is replaced with the run's id
output_dataset = OutputFileDatasetConfig(name='batch_results', destination=(datastore, 'batch-scoring-results/{run-id}/'))

# output-name is replaced with the name, in this case batch_results
output_dataset = OutputFileDatasetConfig(name='batch_results', destination=(datastore, 'batch-scoring-results/{output-name}/'))

# Lastly, we can automatically register it as a Dataset in the workspace
output_dataset = OutputFileDatasetConfig(name='batch_results', destination=(datastore, 'batch-scoring-results/')).register_on_complete(name='batch-scoring-results')

❓ Question: How does ParallelRunStep know that the minibatch has been successfully processed?

✅ See solution!

The method def run(file_list) in your score_parallel.py is expected to return an array or Dataframe with the same number of elements/rows as len(file_list).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipelines-parallel-run-step

pipelines-parallel-run-step

README.md

Exercise Instructions

Knowledge Check

Files

pipelines-parallel-run-step

Directory actions

More options

Directory actions

More options

Latest commit

History

pipelines-parallel-run-step

Folders and files

parent directory

README.md

Exercise Instructions

Knowledge Check