A Generic ML Command in PPL by jngz-es · Pull Request #971 · opensearch-project/sql · GitHub
Skip to content

A Generic ML Command in PPL#971

Merged
dai-chen merged 5 commits into
opensearch-project:2.xfrom
jngz-es:ml
Oct 31, 2022
Merged

A Generic ML Command in PPL#971
dai-chen merged 5 commits into
opensearch-project:2.xfrom
jngz-es:ml

Conversation

@jngz-es

@jngz-es jngz-es commented Oct 26, 2022

Copy link
Copy Markdown
Contributor

Description

A generic ml command in ppl to apply algorithms in ml-commons.

Issues Resolved

#849

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jing Zhang <jngz@amazon.com>
Signed-off-by: Jing Zhang <jngz@amazon.com>
@jngz-es jngz-es requested a review from a team as a code owner October 26, 2022 19:22
@codecov-commenter

codecov-commenter commented Oct 26, 2022

Copy link
Copy Markdown

@penghuo penghuo added the v2.4.0 'Issues and PRs related to version v2.4.0' label Oct 26, 2022

@penghuo penghuo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running git checkout sql-jdbc/docs/img/tableau_graph.PNG restore deleted file. reference. #865 (comment)

}

/**
* Kmeans command.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct comments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will correct it to ml.

| (ANOMALY_SCORE_THRESHOLD EQUAL anomaly_score_threshold=decimalLiteral)
;

mlCommand

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does ml also no arguments? e.g. source = index | ml. The syntax actaully allow this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no action or algorithm arguments, it will throw exception. I don't think we have to restrict the command content at command parsing stage, we leave all parameters validation to ml-client.

@Getter
@ToString
@EqualsAndHashCode(callSuper = true)
public class LogicalML extends LogicalPlan {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan deprecated Kmeans/AD in Logical plan?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will deprecate them in the following version like 2.5, then remove them in 3.0.

public Map<String, ExprCoreType> getOutputSchema(TypeEnvironment env) {
switch (getAction()) {
case TRAIN:
env.clearAllFields();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why clean all fields?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ml train command will return only model/task id, and status, so remove all fields from input fields.


LogicalPlan actual = analyze(AstDSL.project(
new ML(AstDSL.relation("schema"), argumentMap), AstDSL.allFields()));
assertTrue(((LogicalProject) actual).getProjectList().size() >= 2);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why >= 2, does the result non deterministic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is deterministic, just at least 2.

* @param nodeClient node client
* @return ml-commons result
*/
protected MLOutput getMLOutput(DataFrame inputDataFrame,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also deprecated getMLPredictionResult if it is not required?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required for now, we can remove it when removing kmeans and ad commands.

*/
@RequiredArgsConstructor
@EqualsAndHashCode(callSuper = false)
public class MLOperator extends MLCommonsOperatorActions {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is difference with MLCommonsOperator, deptecated MLCommonsOperator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLOperator is totally new operator for all ml algorithms from ml-commons. I don't want to mix it with old operator as there is a big gap between them, and also for smoothly deprecating old operator in the future.

@dai-chen dai-chen added ml Issues related to integration with ML commons and plugin PPL Piped processing language labels Oct 27, 2022
@jngz-es

jngz-es commented Oct 28, 2022

Copy link
Copy Markdown
Contributor Author

Signed-off-by: Jing Zhang <jngz@amazon.com>
@jngz-es jngz-es requested a review from penghuo October 28, 2022 20:52

@penghuo penghuo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change!
Please add ML commands related doc. And add deprecated describtion for AD and Kmeans command.

@dai-chen dai-chen merged commit c6b234c into opensearch-project:2.x Oct 31, 2022
opensearch-trigger-bot Bot pushed a commit that referenced this pull request Oct 31, 2022
* Add generic ml command in ppl.

Signed-off-by: Jing Zhang <jngz@amazon.com>

* Recover ml client dependency.

Signed-off-by: Jing Zhang <jngz@amazon.com>

* Address the comments I.

Signed-off-by: Jing Zhang <jngz@amazon.com>

Signed-off-by: Jing Zhang <jngz@amazon.com>
(cherry picked from commit c6b234c)
penghuo pushed a commit that referenced this pull request Nov 1, 2022
* Address the comments I.

Signed-off-by: Jing Zhang <jngz@amazon.com>

Signed-off-by: Jing Zhang <jngz@amazon.com>
(cherry picked from commit c6b234c)

Co-authored-by: Jing Zhang <jngz@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.4 ml Issues related to integration with ML commons and plugin PPL Piped processing language v2.4.0 'Issues and PRs related to version v2.4.0'

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants