This example shows a simple way of leveraging some of the most widely used Machine Learning libraries available in Python.
The DApp generates a logistic regression model using scikit-learn, NumPy and pandas, and then uses m2cgen (Model to Code Generator) to transpile that model into native Python code with no dependencies. This approach is inspired by Davis David's Machine Learning tutorial, and is useful for a Cartesi DApp because it removes the need of porting all those Machine Learning libraries to the Cartesi Machine's RISC-V architecture, making the development process easier and the final back-end code simpler to execute.
The practical goal of this application is to predict a classification based on the Titanic dataset, which shows characteristics of people onboard the Titanic and whether that person survived the disaster or not. As such, users can submit inputs describing a person's features to find out if that person is likely to have survived.
The model currently takes into account only three characteristics of a person to predict their survival, even though other attributes are available in the dataset:
- Age
- Sex, which can be
male
orfemale
- Embarked, which corresponds to the port of embarkation and can be
C
(Cherbourg),Q
(Queenstown), orS
(Southampton)
As such, inputs to the DApp should be given as a JSON string such as the following:
{ "Age": 37, "Sex": "male", "Embarked": "S" }
The predicted classification result will be given as 0
(did not survive) or 1
(did survive).
We can use the frontend-console application to interact with the DApp. Ensure that the application has already been built before using it.
First, go to a separate terminal window and switch to the frontend-console
directory:
cd frontend-console
Then, send an input as follows:
yarn start input send --payload '{"Age": 37, "Sex": "male", "Embarked": "S"}'
In order to verify the notices generated by your inputs, run the command:
yarn start notice list
The payload of the notice corresponds to whether the person survived ("1"
) or not ("0"
).
In this case, it should be "0"
.
When developing an application, it is often important to easily test and debug it. For that matter, it is possible to run the Cartesi Rollups environment in host mode, so that the DApp's back-end can be executed directly on the host machine, allowing it to be debugged using regular development tools such as an IDE.
In order to start the back-end, run the following commands in a dedicated terminal:
cd m2cgen
python3 -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
ROLLUP_HTTP_SERVER_URL="http://127.0.0.1:5004" python3 m2cgen.py
The final command will effectively run the back-end and send corresponding outputs to port 5004
.
It can optionally be configured in an IDE to allow interactive debugging using features like breakpoints.
You can also use a tool like entr to restart the back-end automatically when the code changes. For example:
ls *.py | ROLLUP_HTTP_SERVER_URL="http://127.0.0.1:5004" entr -r python3 m2cgen.py
After the back-end successfully starts, it should print an output like the following:
INFO:__main__:HTTP rollup_server url is http://127.0.0.1:5004
INFO:__main__:Sending finish
After that, you can interact with the application normally as explained above.
This DApp was implemented in a rather generic way and, as such, it is possible to easily change the target dataset as well as the predictor algorithm.
To change those, open the file m2cgen/model/build_model.py
and change the following variables defined at the beginning of the script:
model
: defines the scikit-learn predictor algorithm to use. While it currently usessklearn.linear_model.LogisticRegression
, many other possibilities are available, from several types of linear regressions to solutions such as support vector machines (SVMs).train_csv
: a URL or file path to a CSV file containing the dataset. It should contain a first row with the feature names, followed by the data.include
: an optional list indicating a subset of the dataset's features to be used in the prediction model.dependent_var
: the feature to be predicted, such as the entry's classification