Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relational directives #641

Open
wants to merge 56 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
a161c7c
testing
saimukkamala May 19, 2023
e1fbdd5
poc for running on spark sql
saimukkamala May 19, 2023
0efb2d3
add RelationalDirective interface to system directive registry
saimukkamala May 22, 2023
e6603da
fixing bug
saimukkamala May 24, 2023
400892a
fixing registry nullPointer exception
saimukkamala May 24, 2023
b4dd875
move relation execution into directive
saimukkamala May 30, 2023
aa99ccd
Case transform directives
shrverma Jun 19, 2023
6a736c9
Implement trim space directives
shrverma Jun 19, 2023
9e492e3
Implement rename directive
shrverma Jun 19, 2023
ddf5f7a
implement keep and copy directives
shrverma Jun 20, 2023
1e0eda0
Implement merge directive
shrverma Jun 20, 2023
2cdc2b3
Undo changes in Lower.execute
shrverma Jun 20, 2023
cd4c680
Implement set-type directive
shrverma Jun 24, 2023
b25accf
Add utility class for set-type sql expression
shrverma Jun 24, 2023
2bb3fe2
Implement ChangeColumnCase Directive
shrverma Jun 25, 2023
f7c5bd3
Move getExpressionfactory() to the Directive interface
shrverma Jun 25, 2023
6d4093b
Move generateColumnExpMap to Directive.java
shrverma Jun 27, 2023
376e313
Clean up code
shrverma Jun 28, 2023
c3002ef
Move sql expression generator functions to a new util class
shrverma Jun 28, 2023
51d3d05
Change util class name
shrverma Jun 28, 2023
31e6f84
Fix checkstyle errors
shrverma Jun 28, 2023
e217358
Changes to set type util function
shrverma Jun 30, 2023
c5165f1
Fix rename directive implementation
shrverma Jul 4, 2023
67755cf
Merge branch 'develop' into relational-directives
shrverma Jul 11, 2023
7cb8aa3
Add UI toggle to wrangler
shrverma Jul 11, 2023
1adbb2d
Fix checkstyle error
shrverma Jul 11, 2023
2a1c2b7
Implement swap directive
shrverma Jul 12, 2023
19518e6
Refactor execution logic
shrverma Jul 14, 2023
35fbf93
Implement filter directives
shrverma Jul 14, 2023
0158b38
Change UI toggle
shrverma Jul 14, 2023
cb5eb7f
Move feature flag checks to separate function
shrverma Jul 14, 2023
8f926a5
Implement SetRecordDelimiter directive
shrverma Jul 15, 2023
68785b5
Implement split-email directive
shrverma Jul 15, 2023
6404c8d
Implement transformation directives
shrverma Jul 17, 2023
ca89034
Refactor code
shrverma Jul 22, 2023
09aa10e
Implement UUID, split-rows and JSON-object directives
shrverma Jul 22, 2023
278a30e
Remove row filter directive implementation
shrverma Jul 22, 2023
480f2aa
Implement fill-null-or-empty
shrverma Jul 22, 2023
7f3f3cb
Implement URL encoding and decoding directives
shrverma Jul 22, 2023
63a5932
Move partially supported directives
shrverma Jul 24, 2023
69e8527
Implement fixed-length-parser
shrverma Jul 24, 2023
3ee71d7
Update Directive.java
shrverma Jul 25, 2023
016c989
Update ChangeColCaseNames.java
shrverma Jul 25, 2023
5899112
Merge pull request #646 from data-integrations/sql-directives
shrverma Jul 25, 2023
ff57c83
Merge branch 'relational-directives' into UI-change-wrangler
shrverma Jul 25, 2023
1e1d898
Merge pull request #648 from data-integrations/UI-change-wrangler
shrverma Jul 25, 2023
7ebc4c3
Add directiverelationaltransform interface
shrverma Jul 25, 2023
6f7d14d
Refactor execution logic
shrverma Jul 25, 2023
29eda87
Refactor code
shrverma Jul 25, 2023
a656f47
Add sql directive validation
shrverma Jul 26, 2023
5ce676d
Refactor code
shrverma Jul 26, 2023
b5ce529
Refactor code
shrverma Jul 26, 2023
ed0e9a4
Fix class not found error
shrverma Jul 26, 2023
36326f4
Merge pull request #653 from data-integrations/Directive-validation
shrverma Jul 31, 2023
bec9679
Remove extra function
shrverma Aug 1, 2023
39e360f
Merge pull request #651 from data-integrations/sql-temp
shrverma Aug 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@

package io.cdap.wrangler.api;

import io.cdap.cdap.etl.api.relational.LinearRelationalTransform;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.parser.UsageDefinition;

import java.util.List;
Expand Down Expand Up @@ -51,7 +54,8 @@
* }
* </code>
*/
public interface Directive extends Executor<List<Row>, List<Row>>, EntityMetrics {
public interface Directive extends Executor<List<Row>, List<Row>>, EntityMetrics,
DirectiveRelationalTransform {
/**
* This defines a interface variable that is static and final for specify
* the {@code type} of the plugin this interface would provide.
Expand Down Expand Up @@ -126,4 +130,5 @@ default List<EntityCountMetric> getCountMetrics() {
// no op
return null;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* Copyright © 2023 Cask Data, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/

package io.cdap.wrangler.api;

import io.cdap.cdap.etl.api.relational.InvalidRelation;
import io.cdap.cdap.etl.api.relational.LinearRelationalTransform;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.cdap.etl.api.relational.RelationalTransform;

/**
* {@link DirectiveRelationalTransform} provides relational transform support for
* wrangler directives.
*/
public interface DirectiveRelationalTransform extends LinearRelationalTransform {

/**
* Implementation of linear relational transform for each supported directive.
*
* @param relationalTranformContext transformation context with engine, input and output parameters
* @param relation input relation upon which the transformation is applied.
* @return transformed relation as the output relation. By default, returns an Invalid relation
* for unsupported directives.
*/
default Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
return new InvalidRelation("SQL execution for the directive is currently not supported.");
}

/**
* Indicates whether the directive is supported by relational transformation or not.
*
* @return boolean value for the directive SQL support.
* By default, returns false, indicating that the directive is currently not supported.
*/
default boolean isSQLSupported() {
return false;
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
* Copyright © 2017-2019 Cask Data, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You may obtain a copy of
* the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
* License for the specific language governing permissions and limitations under
* the License.
*/

package io.cdap.wrangler.api;

import io.cdap.cdap.etl.api.relational.LinearRelationalTransform;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;

/**
* Directive interface which supports Relational transformations
*/
public interface RelationalDirective extends Directive, LinearRelationalTransform {

@Override
default Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
// no-op
return relation;
}
}
22 changes: 21 additions & 1 deletion wrangler-core/src/main/java/io/cdap/directives/column/Copy.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.relational.ExpressionFactory;
import io.cdap.cdap.etl.api.relational.InvalidRelation;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.Arguments;
import io.cdap.wrangler.api.Directive;
import io.cdap.wrangler.api.DirectiveExecutionException;
Expand All @@ -28,11 +32,11 @@
import io.cdap.wrangler.api.Row;
import io.cdap.wrangler.api.annotations.Categories;
import io.cdap.wrangler.api.lineage.Lineage;
import io.cdap.wrangler.api.lineage.Many;
import io.cdap.wrangler.api.lineage.Mutation;
import io.cdap.wrangler.api.parser.ColumnName;
import io.cdap.wrangler.api.parser.TokenType;
import io.cdap.wrangler.api.parser.UsageDefinition;
import io.cdap.wrangler.utils.SqlExpressionGenerator;

import java.util.List;

Expand Down Expand Up @@ -110,4 +114,20 @@ public Mutation lineage() {
.conditional(source.value(), destination.value())
.build();
}
@Override
public Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
java.util.Optional<ExpressionFactory<String>> expressionFactory = SqlExpressionGenerator
.getExpressionFactory(relationalTranformContext);
if (!expressionFactory.isPresent()) {
return new InvalidRelation("Cannot find an Expression Factory");
}
return relation.setColumn(destination.value(), expressionFactory.get().compile(source.value()));
}

@Override
public boolean isSQLSupported() {
return true;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.relational.ExpressionFactory;
import io.cdap.cdap.etl.api.relational.InvalidRelation;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.Arguments;
import io.cdap.wrangler.api.Directive;
import io.cdap.wrangler.api.DirectiveExecutionException;
Expand All @@ -32,6 +36,7 @@
import io.cdap.wrangler.api.parser.ColumnNameList;
import io.cdap.wrangler.api.parser.TokenType;
import io.cdap.wrangler.api.parser.UsageDefinition;
import io.cdap.wrangler.utils.SqlExpressionGenerator;

import java.util.ArrayList;
import java.util.Arrays;
Expand Down Expand Up @@ -101,4 +106,18 @@ public Mutation lineage() {
.relation(Many.columns(columns), targetColumn)
.build();
}

@Override
public Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
java.util.Optional<ExpressionFactory<String>> expressionFactory = SqlExpressionGenerator
.getExpressionFactory(relationalTranformContext);
if (!expressionFactory.isPresent()) {
return new InvalidRelation("Cannot find an Expression Factory");
}
String getColumnString = String.join(",", columns);
return relation.setColumn(targetColumn, expressionFactory.get().compile(String
.format("struct(%s)", getColumnString)));
}

}
17 changes: 17 additions & 0 deletions wrangler-core/src/main/java/io/cdap/directives/column/Drop.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,14 @@
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.Arguments;
import io.cdap.wrangler.api.Directive;
import io.cdap.wrangler.api.DirectiveExecutionException;
import io.cdap.wrangler.api.DirectiveParseException;
import io.cdap.wrangler.api.ExecutorContext;
import io.cdap.wrangler.api.RelationalDirective;
import io.cdap.wrangler.api.Row;
import io.cdap.wrangler.api.annotations.Categories;
import io.cdap.wrangler.api.lineage.Lineage;
Expand Down Expand Up @@ -88,4 +91,18 @@ public Mutation lineage() {
.drop(Many.of(columns))
.build();
}

@Override
public Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
for (String col: columns) {
relation = relation.dropColumn(col);
}
return relation;
}

@Override
public boolean isSQLSupported() {
return true;
}
}
26 changes: 26 additions & 0 deletions wrangler-core/src/main/java/io/cdap/directives/column/Keep.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.relational.Expression;
import io.cdap.cdap.etl.api.relational.ExpressionFactory;
import io.cdap.cdap.etl.api.relational.InvalidRelation;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.Arguments;
import io.cdap.wrangler.api.Directive;
import io.cdap.wrangler.api.DirectiveExecutionException;
Expand All @@ -32,9 +37,12 @@
import io.cdap.wrangler.api.parser.ColumnNameList;
import io.cdap.wrangler.api.parser.TokenType;
import io.cdap.wrangler.api.parser.UsageDefinition;
import io.cdap.wrangler.utils.SqlExpressionGenerator;

import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;

/**
Expand Down Expand Up @@ -93,4 +101,22 @@ public Mutation lineage() {
keep.forEach(column -> builder.relation(column, column));
return builder.build();
}
@Override
public Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
Optional<ExpressionFactory<String>> expressionFactory = SqlExpressionGenerator
.getExpressionFactory(relationalTranformContext);
if (!expressionFactory.isPresent()) {
return new InvalidRelation("Cannot find an Expression Factory");
}
Map<String, Expression> keepCol = SqlExpressionGenerator
.generateColumnExpMap(keep, expressionFactory.get());
return relation.select(keepCol);
}

@Override
public boolean isSQLSupported() {
return true;
}

}
22 changes: 22 additions & 0 deletions wrangler-core/src/main/java/io/cdap/directives/column/Merge.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.relational.ExpressionFactory;
import io.cdap.cdap.etl.api.relational.InvalidRelation;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.Arguments;
import io.cdap.wrangler.api.Directive;
import io.cdap.wrangler.api.DirectiveExecutionException;
Expand All @@ -33,10 +37,12 @@
import io.cdap.wrangler.api.parser.Text;
import io.cdap.wrangler.api.parser.TokenType;
import io.cdap.wrangler.api.parser.UsageDefinition;
import io.cdap.wrangler.utils.SqlExpressionGenerator;
import org.apache.commons.lang3.StringEscapeUtils;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;

/**
* A directive for merging two columns and creates a third column.
Expand Down Expand Up @@ -108,4 +114,20 @@ public Mutation lineage() {
.relation(Many.columns(col1, col2), Many.of(col1, col2, dest))
.build();
}
public Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
Optional<ExpressionFactory<String>> expressionFactory = SqlExpressionGenerator
.getExpressionFactory(relationalTranformContext);
if (!expressionFactory.isPresent()) {
return new InvalidRelation("Cannot find an Expression Factory");
}
return relation.setColumn(dest, expressionFactory.get()
.compile(String.format("CONCAT(%s,'%s',%s)", col1, delimiter, col2)));
}

@Override
public boolean isSQLSupported() {
return true;
}

}
23 changes: 23 additions & 0 deletions wrangler-core/src/main/java/io/cdap/directives/column/Rename.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
import io.cdap.cdap.api.annotation.Description;
import io.cdap.cdap.api.annotation.Name;
import io.cdap.cdap.api.annotation.Plugin;
import io.cdap.cdap.etl.api.relational.ExpressionFactory;
import io.cdap.cdap.etl.api.relational.InvalidRelation;
import io.cdap.cdap.etl.api.relational.Relation;
import io.cdap.cdap.etl.api.relational.RelationalTranformContext;
import io.cdap.wrangler.api.Arguments;
import io.cdap.wrangler.api.Directive;
import io.cdap.wrangler.api.DirectiveExecutionException;
Expand All @@ -31,8 +35,10 @@
import io.cdap.wrangler.api.parser.TokenType;
import io.cdap.wrangler.api.parser.UsageDefinition;
import io.cdap.wrangler.utils.ColumnConverter;
import io.cdap.wrangler.utils.SqlExpressionGenerator;

import java.util.List;
import java.util.Optional;

/**
* A directive for renaming columns.
Expand Down Expand Up @@ -82,4 +88,21 @@ public Mutation lineage() {
.relation(source, target)
.build();
}
@Override
public Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
Optional<ExpressionFactory<String>> expressionFactory = SqlExpressionGenerator
.getExpressionFactory(relationalTranformContext);
if (!expressionFactory.isPresent()) {
return new InvalidRelation("Cannot find an Expression Factory");
}
relation = relation.setColumn(target.value(), expressionFactory.get().compile(source.value()));
return relation.dropColumn(source.value());
}

@Override
public boolean isSQLSupported() {
return true;
}

}
Loading