Mlflow integration with Dask-SQL machine learning

Hi all,
I am trying to integrate mlflow with dask-sql machine learning in a distributed cluster, however I could not find any documentation on that.

I found that, it is possible to export the trained model as mlflow format, but my concern is whether we can use mlflow to track and compare multiple models as we do in usual ml in python.

Please add your commands on this.

Thanks in advance

Hi @Nirajkanth,

It is not very clear to me what you are trying to do.

I’ve found this resource on internet with a quick search:

Does it correspond to what you are after?

Hi @guillaumeeb
I also looked that article form the internet. Actually he tried machine learning in python not in dask-sql.

I want similar to that in dask-sql only. I want to log all the models to mlflow as in the above article.
My question is whether the dask-sql support such kind of mlflow model tracking ? so that I can log all the model details into the mlflow dashboard.

Thank you for your valuable time

Do you have some example code of your workflow using dask-sql?

I don’t think you’ll be able to use MLFlow within dask-sql without injecting some Python code. But you should be able to mix dask-sql with other Dask interfaces.

This is the code that I want to track with mlflow.

query = “”"
CREATE EXPERIMENT my_exp WITH (
model_class = ‘sklearn.ensemble.GradientBoostingClassifier’,
experiment_class = ‘sklearn.model_selection.GridSearchCV’,
tune_parameters = (n_estimators = ARRAY [16, 32, 2],
learning_rate = ARRAY [0.1,0.01,0.001],
max_depth = ARRAY [3,4,5,10]
),
target_column = ‘target’
) AS (
SELECT sepal_length, sepal_width, petal_length, petal_width,
CASE
WHEN species = ‘Iris-setosa’ THEN 0
WHEN species = ‘Iris-versicolor’ THEN 1
WHEN species = ‘Iris-virginica’ THEN 2
END AS target
FROM flower1
LIMIT 100
)
“”"
result1 = c.sql(query)

c.sql(“”"
SELECT * FROM PREDICT (
MODEL my_exp,
SELECT sepal_length, sepal_width, petal_length, petal_width,
CASE
WHEN species = ‘Iris-setosa’ THEN 0
WHEN species = ‘Iris-versicolor’ THEN 1
WHEN species = ‘Iris-virginica’ THEN 2
END AS actual
FROM flower1
OFFSET 100
)
“”").compute()
I could not find a document/resource to include mlflow tracking to this code?

Thanks, I was not even aware there was such a functionality in dask-sql!

You might be achieve what is in the linked post above with custom functions, but honestly, I think if you really want an MLFlow integration, you probably should use plain Dask Python code instead of dask-sql. I don’t think there is any example of integrating MLFlow in dask-sql as of now.

Thanks a lot for the clarification.