Hi all,
I am trying to integrate mlflow with dask-sql machine learning in a distributed cluster, however I could not find any documentation on that.
I found that, it is possible to export the trained model as mlflow format, but my concern is whether we can use mlflow to track and compare multiple models as we do in usual ml in python.
Hi @guillaumeeb
I also looked that article form the internet. Actually he tried machine learning in python not in dask-sql.
I want similar to that in dask-sql only. I want to log all the models to mlflow as in the above article.
My question is whether the dask-sql support such kind of mlflow model tracking ? so that I can log all the model details into the mlflow dashboard.
Do you have some example code of your workflow using dask-sql?
I don’t think you’ll be able to use MLFlow within dask-sql without injecting some Python code. But you should be able to mix dask-sql with other Dask interfaces.
This is the code that I want to track with mlflow.
query = “”"
CREATE EXPERIMENT my_exp WITH (
model_class = ‘sklearn.ensemble.GradientBoostingClassifier’,
experiment_class = ‘sklearn.model_selection.GridSearchCV’,
tune_parameters = (n_estimators = ARRAY [16, 32, 2],
learning_rate = ARRAY [0.1,0.01,0.001],
max_depth = ARRAY [3,4,5,10]
),
target_column = ‘target’
) AS (
SELECT sepal_length, sepal_width, petal_length, petal_width,
CASE
WHEN species = ‘Iris-setosa’ THEN 0
WHEN species = ‘Iris-versicolor’ THEN 1
WHEN species = ‘Iris-virginica’ THEN 2
END AS target
FROM flower1
LIMIT 100
)
“”"
result1 = c.sql(query)
c.sql(“”"
SELECT * FROM PREDICT (
MODEL my_exp,
SELECT sepal_length, sepal_width, petal_length, petal_width,
CASE
WHEN species = ‘Iris-setosa’ THEN 0
WHEN species = ‘Iris-versicolor’ THEN 1
WHEN species = ‘Iris-virginica’ THEN 2
END AS actual
FROM flower1
OFFSET 100
)
“”").compute()
I could not find a document/resource to include mlflow tracking to this code?
Thanks, I was not even aware there was such a functionality in dask-sql!
You might be achieve what is in the linked post above with custom functions, but honestly, I think if you really want an MLFlow integration, you probably should use plain Dask Python code instead of dask-sql. I don’t think there is any example of integrating MLFlow in dask-sql as of now.