A Generic ML Command in PPL#971
Conversation
Signed-off-by: Jing Zhang <jngz@amazon.com>
Signed-off-by: Jing Zhang <jngz@amazon.com>
penghuo
left a comment
There was a problem hiding this comment.
running git checkout sql-jdbc/docs/img/tableau_graph.PNG restore deleted file. reference. #865 (comment)
| } | ||
|
|
||
| /** | ||
| * Kmeans command. |
There was a problem hiding this comment.
Good catch, will correct it to ml.
| | (ANOMALY_SCORE_THRESHOLD EQUAL anomaly_score_threshold=decimalLiteral) | ||
| ; | ||
|
|
||
| mlCommand |
There was a problem hiding this comment.
does ml also no arguments? e.g. source = index | ml. The syntax actaully allow this.
There was a problem hiding this comment.
If no action or algorithm arguments, it will throw exception. I don't think we have to restrict the command content at command parsing stage, we leave all parameters validation to ml-client.
| @Getter | ||
| @ToString | ||
| @EqualsAndHashCode(callSuper = true) | ||
| public class LogicalML extends LogicalPlan { |
There was a problem hiding this comment.
Do you plan deprecated Kmeans/AD in Logical plan?
There was a problem hiding this comment.
Yes, we will deprecate them in the following version like 2.5, then remove them in 3.0.
| public Map<String, ExprCoreType> getOutputSchema(TypeEnvironment env) { | ||
| switch (getAction()) { | ||
| case TRAIN: | ||
| env.clearAllFields(); |
There was a problem hiding this comment.
The ml train command will return only model/task id, and status, so remove all fields from input fields.
|
|
||
| LogicalPlan actual = analyze(AstDSL.project( | ||
| new ML(AstDSL.relation("schema"), argumentMap), AstDSL.allFields())); | ||
| assertTrue(((LogicalProject) actual).getProjectList().size() >= 2); |
There was a problem hiding this comment.
why >= 2, does the result non deterministic?
There was a problem hiding this comment.
It is deterministic, just at least 2.
| * @param nodeClient node client | ||
| * @return ml-commons result | ||
| */ | ||
| protected MLOutput getMLOutput(DataFrame inputDataFrame, |
There was a problem hiding this comment.
could also deprecated getMLPredictionResult if it is not required?
There was a problem hiding this comment.
It is required for now, we can remove it when removing kmeans and ad commands.
| */ | ||
| @RequiredArgsConstructor | ||
| @EqualsAndHashCode(callSuper = false) | ||
| public class MLOperator extends MLCommonsOperatorActions { |
There was a problem hiding this comment.
what is difference with MLCommonsOperator, deptecated MLCommonsOperator?
There was a problem hiding this comment.
MLOperator is totally new operator for all ml algorithms from ml-commons. I don't want to mix it with old operator as there is a big gap between them, and also for smoothly deprecating old operator in the future.
Signed-off-by: Jing Zhang <jngz@amazon.com>
penghuo
left a comment
There was a problem hiding this comment.
Thanks for the change!
Please add ML commands related doc. And add deprecated describtion for AD and Kmeans command.
* Add generic ml command in ppl. Signed-off-by: Jing Zhang <jngz@amazon.com> * Recover ml client dependency. Signed-off-by: Jing Zhang <jngz@amazon.com> * Address the comments I. Signed-off-by: Jing Zhang <jngz@amazon.com> Signed-off-by: Jing Zhang <jngz@amazon.com> (cherry picked from commit c6b234c)

Description
A generic ml command in ppl to apply algorithms in ml-commons.
Issues Resolved
#849
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.