diff --git a/flink-connector-aws/flink-connector-redshift/README.md b/flink-connector-aws/flink-connector-redshift/README.md new file mode 100644 index 00000000..30fe503d --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/README.md @@ -0,0 +1,121 @@ +# Flink Redshift Connector + +This is the initial Proof of Concept for Flink connector redshift in 2 modes + +- read.mode = JDBC +- read.mode = COPY + +This POC only supports Sink Table. + +## Connector Options +| Option | Required | Default | Type | Description | +|:-------|:---------|:---------|:-----|:------------| + hostname | required | none | String | Redshift connection hostname + port | required | 5439 | Integer | Redshift connection port + username | required | none | String | Redshift user username + password | required | none | String | Redshift user password + database-name | required | dev | String | Redshift database to connect + table-name | required | none | String | Reshift table name + sink.batch-size | optional | 1000 | Integer | The max flush size, over this will flush data. + sink.flush-interval | optional | 1s | Duration | Over this flush interval mills, asynchronous threads will flush data. + sink.max-retries | optional | 3 | Integer | The max retry times when writing records to the database failed. + copy-mode | required | false | Boolean | Using Redshift COPY command to insert/upsert or not. + copy-temp-s3-uri | conditional required | none | String | If the copy-mode=true then then Redshift COPY command must need a S3 URI. + iam-role-arn | conditional required | none | String | If the copy-mode=true then then Redshift COPY command must need a IAM role. And this role must have the privilege and attache to the Redshift cluser. + +**Update/Delete Data Considerations:** +The data is updated and deleted by the primary key. + +## Data Type Mapping + +| Flink Type | Redshift Type | +|:--------------------|:--------------------------------------------------------| +| CHAR | VARCHAR | +| VARCHAR | VARCHAR | +| STRING | VARCHAR | +| BOOLEAN | Boolean | +| BYTES | Not supported | +| DECIMAL | Decimal | +| TINYINT | Int8 | +| SMALLINT | Int16 | +| INTEGER | Int32 | +| BIGINT | Int64 | +| FLOAT | Float32 | +| DOUBLE | Float64 | +| DATE | Date | +| TIME | Timestamp | +| TIMESTAMP | Timestamp | +| TIMESTAMP_LTZ | Timestamp | +| INTERVAL_YEAR_MONTH | Int32 | +| INTERVAL_DAY_TIME | Int64 | +| ARRAY | Not supported | +| MAP | Not supported | +| ROW | Not supported | +| MULTISET | Not supported | +| RAW | Not supported | + + + +## How POC is Tested + +### Create and sink a table in pure JDBC mode + +```SQL + +-- register a Redshift table `t_user` in flink sql. +CREATE TABLE t_user ( + `user_id` BIGINT, + `user_type` INTEGER, + `language` STRING, + `country` STRING, + `gender` STRING, + `score` DOUBLE, + PRIMARY KEY (`user_id`) NOT ENFORCED +) WITH ( + 'connector' = 'redshift', + 'hostname' = 'xxxx.redshift.awsamazon.com', + 'port' = '5439', + 'username' = 'awsuser', + 'password' = 'passwordxxxx', + 'database-name' = 'tutorial', + 'table-name' = 'users', + 'sink.batch-size' = '500', + 'sink.flush-interval' = '1000', + 'sink.max-retries' = '3' +); + +-- write data into the Redshift table from the table `T` +INSERT INTO t_user +SELECT cast(`user_id` as BIGINT), `user_type`, `lang`, `country`, `gender`, `score`) FROM T; + +``` + +### Create and sink a table in COPY mode + +```SQL + +-- register a Redshift table `t_user` in flink sql. +CREATE TABLE t_user ( + `user_id` BIGINT, + `user_type` INTEGER, + `language` STRING, + `country` STRING, + `gender` STRING, + `score` DOUBLE, + PRIMARY KEY (`user_id`) NOT ENFORCED +) WITH ( + 'connector' = 'redshift', + 'hostname' = 'xxxx.redshift.awsamazon.com', + 'port' = '5439', + 'username' = 'awsuser', + 'password' = 'passwordxxxx', + 'database-name' = 'tutorial', + 'table-name' = 'users', + 'sink.batch-size' = '500', + 'sink.flush-interval' = '1000', + 'sink.max-retries' = '3', + 'copy-mode' = 'true', + 'copy-temp-s3-uri' = 's3://bucket-name/key/temp', + 'iam-role-arn' = 'arn:aws:iam::xxxxxxxx:role/xxxxxRedshiftS3Rolexxxxx' +); +``` diff --git a/flink-connector-aws/flink-connector-redshift/pom.xml b/flink-connector-aws/flink-connector-redshift/pom.xml new file mode 100644 index 00000000..e7749c53 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/pom.xml @@ -0,0 +1,125 @@ + + + + + 4.0.0 + + + org.apache.flink + flink-connector-aws-parent + 4.3-SNAPSHOT + + + flink-connector-redshift + Flink : Connectors : AWS : Amazon Redshift + + + 2.1.0.17 + 1.2 + 1.10.0 + 2.12 + + + jar + + + + org.apache.flink + flink-connector-aws-base + ${parent.version} + provided + + + + + + + + + + + + + + + + + + + + + + + + + + + + org.apache.flink + flink-table-common + ${flink.version} + provided + + + + + + + + + + + + com.amazon.redshift + redshift-jdbc42 + ${redshift.jdbc.version} + provided + + + + org.apache.commons + commons-csv + ${commons-csv.version} + + + + com.amazonaws + aws-java-sdk-core + ${aws.sdkv1.version} + + + com.amazonaws + aws-java-sdk-s3 + ${aws.sdkv1.version} + + + + + + commons-logging + commons-logging + ${commons-logging.version} + + + + + \ No newline at end of file diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/connection/RedshiftConnectionProvider.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/connection/RedshiftConnectionProvider.java new file mode 100644 index 00000000..a834b632 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/connection/RedshiftConnectionProvider.java @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.connection; + +import org.apache.flink.connector.redshift.options.RedshiftOptions; + +import com.amazon.redshift.jdbc.RedshiftConnectionImpl; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.Serializable; +import java.sql.DriverManager; +import java.sql.SQLException; + +/** Redshift Connection Provider. */ +public class RedshiftConnectionProvider implements Serializable { + private static final long serialVersionUID = 1L; + + static final Logger LOG = LoggerFactory.getLogger(RedshiftConnectionProvider.class); + + private static final String REDSHIFT_DRIVER_NAME = "com.amazon.redshift.Driver"; + + private transient RedshiftConnectionImpl connection; + + private final RedshiftOptions options; + + public RedshiftConnectionProvider(RedshiftOptions options) { + this.options = options; + } + + public synchronized RedshiftConnectionImpl getConnection() throws SQLException { + if (connection == null) { + connection = + createConnection( + options.getHostname(), options.getPort(), options.getDatabaseName()); + } + return connection; + } + + private RedshiftConnectionImpl createConnection(String hostname, int port, String dbName) + throws SQLException { + // String url = parseUrl(urls); + + RedshiftConnectionImpl conn; + String url = "jdbc:redshift://" + hostname + ":" + port + "/" + dbName; + LOG.info("connection to {}", url); + + try { + Class.forName(REDSHIFT_DRIVER_NAME); + } catch (ClassNotFoundException e) { + throw new SQLException(e); + } + + if (options.getUsername().isPresent()) { + conn = + (RedshiftConnectionImpl) + DriverManager.getConnection( + url, + options.getUsername().orElse(null), + options.getPassword().orElse(null)); + } else { + conn = (RedshiftConnectionImpl) DriverManager.getConnection(url); + } + + return conn; + } + + public void closeConnection() throws SQLException { + if (this.connection != null) { + this.connection.close(); + } + } + + public RedshiftConnectionImpl getOrCreateConnection() throws SQLException { + if (connection == null) { + connection = + createConnection( + options.getHostname(), options.getPort(), options.getDatabaseName()); + } + return connection; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftConverterUtils.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftConverterUtils.java new file mode 100644 index 00000000..f1930c50 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftConverterUtils.java @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.converter; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.table.data.ArrayData; +import org.apache.flink.table.data.DecimalData; +import org.apache.flink.table.data.GenericArrayData; +import org.apache.flink.table.data.GenericMapData; +import org.apache.flink.table.data.MapData; +import org.apache.flink.table.data.StringData; +import org.apache.flink.table.data.TimestampData; +import org.apache.flink.table.types.logical.ArrayType; +import org.apache.flink.table.types.logical.DecimalType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.MapType; + +import java.math.BigDecimal; +import java.math.BigInteger; +import java.sql.Array; +import java.sql.Date; +import java.sql.SQLException; +import java.sql.Time; +import java.sql.Timestamp; +import java.time.LocalDate; +import java.time.LocalDateTime; +import java.time.LocalTime; +import java.util.HashMap; +import java.util.Map; + +/** Converter utility between Flink Rich DataTypes and Redshift DataTypes. */ +@Internal +public class RedshiftConverterUtils { + public static final int BOOL_TRUE = 1; + + /** + * Converts Flink RichDatatype to Redshift DataType. + * + * @param value Associated Value. + * @param type flink LogicalType for the field. + * @return Datatype of Redshift. + */ + public static Object toExternal(Object value, LogicalType type) { + switch (type.getTypeRoot()) { + case BOOLEAN: + case TINYINT: + case SMALLINT: + case INTEGER: + case INTERVAL_YEAR_MONTH: + case BIGINT: + case INTERVAL_DAY_TIME: + case FLOAT: + case DOUBLE: + case BINARY: + case VARBINARY: + return value; + case CHAR: + case VARCHAR: + return value.toString(); + case DATE: + return Date.valueOf(LocalDate.ofEpochDay((Integer) value)); + case TIME_WITHOUT_TIME_ZONE: + LocalTime localTime = LocalTime.ofNanoOfDay(((Integer) value) * 1_000_000L); + return toEpochDayOneTimestamp(localTime); + case TIMESTAMP_WITH_TIME_ZONE: + case TIMESTAMP_WITHOUT_TIME_ZONE: + return ((TimestampData) value).toTimestamp(); + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + return Timestamp.from(((TimestampData) value).toInstant()); + case DECIMAL: + return ((DecimalData) value).toBigDecimal(); + case ARRAY: + LogicalType elementType = + ((ArrayType) type) + .getChildren().stream() + .findFirst() + .orElseThrow( + () -> + new RuntimeException( + "Unknown array element type")); + ArrayData.ElementGetter elementGetter = ArrayData.createElementGetter(elementType); + ArrayData arrayData = ((ArrayData) value); + Object[] objectArray = new Object[arrayData.size()]; + for (int i = 0; i < arrayData.size(); i++) { + objectArray[i] = + toExternal(elementGetter.getElementOrNull(arrayData, i), elementType); + } + return objectArray; + case MAP: + LogicalType keyType = ((MapType) type).getKeyType(); + LogicalType valueType = ((MapType) type).getValueType(); + ArrayData.ElementGetter keyGetter = ArrayData.createElementGetter(keyType); + ArrayData.ElementGetter valueGetter = ArrayData.createElementGetter(valueType); + MapData mapData = (MapData) value; + ArrayData keyArrayData = mapData.keyArray(); + ArrayData valueArrayData = mapData.valueArray(); + Map objectMap = new HashMap<>(keyArrayData.size()); + for (int i = 0; i < keyArrayData.size(); i++) { + objectMap.put( + toExternal(keyGetter.getElementOrNull(keyArrayData, i), keyType), + toExternal(valueGetter.getElementOrNull(valueArrayData, i), valueType)); + } + return objectMap; + case MULTISET: + case ROW: + case RAW: + default: + throw new UnsupportedOperationException("Unsupported type:" + type); + } + } + + public static Object toInternal(Object value, LogicalType type) throws SQLException { + switch (type.getTypeRoot()) { + case NULL: + return null; + case BOOLEAN: + return BOOL_TRUE == ((Number) value).intValue(); + case FLOAT: + case DOUBLE: + case INTERVAL_YEAR_MONTH: + case INTERVAL_DAY_TIME: + case INTEGER: + case BIGINT: + case BINARY: + case VARBINARY: + return value; + case TINYINT: + return ((Integer) value).byteValue(); + case SMALLINT: + return value instanceof Integer ? ((Integer) value).shortValue() : value; + case DECIMAL: + final int precision = ((DecimalType) type).getPrecision(); + final int scale = ((DecimalType) type).getScale(); + return value instanceof BigInteger + ? DecimalData.fromBigDecimal( + new BigDecimal((BigInteger) value, 0), precision, scale) + : DecimalData.fromBigDecimal((BigDecimal) value, precision, scale); + case DATE: + return (int) (((Date) value).toLocalDate().toEpochDay()); + case TIME_WITHOUT_TIME_ZONE: + return (int) (((Time) value).toLocalTime().toNanoOfDay() / 1_000_000L); + case TIMESTAMP_WITH_TIME_ZONE: + case TIMESTAMP_WITHOUT_TIME_ZONE: + return TimestampData.fromTimestamp((Timestamp) value); + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + return TimestampData.fromInstant(((Timestamp) value).toInstant()); + case CHAR: + case VARCHAR: + return StringData.fromString((String) value); + case ARRAY: + LogicalType elementType = + type.getChildren().stream() + .findFirst() + .orElseThrow( + () -> new RuntimeException("Unknown array element type")); + Object externalArray = ((Array) value).getArray(); + int externalArrayLength = java.lang.reflect.Array.getLength(externalArray); + Object[] internalArray = new Object[externalArrayLength]; + for (int i = 0; i < externalArrayLength; i++) { + internalArray[i] = + toInternal(java.lang.reflect.Array.get(externalArray, i), elementType); + } + return new GenericArrayData(internalArray); + case MAP: + LogicalType keyType = ((MapType) type).getKeyType(); + LogicalType valueType = ((MapType) type).getValueType(); + Map externalMap = (Map) value; + Map internalMap = new HashMap<>(externalMap.size()); + for (Map.Entry entry : externalMap.entrySet()) { + internalMap.put( + toInternal(entry.getKey(), keyType), + toInternal(entry.getValue(), valueType)); + } + return new GenericMapData(internalMap); + case ROW: + case MULTISET: + case RAW: + default: + throw new UnsupportedOperationException("Unsupported type:" + type); + } + } + + public static Timestamp toEpochDayOneTimestamp(LocalTime localTime) { + LocalDateTime localDateTime = localTime.atDate(LocalDate.ofEpochDay(1)); + return Timestamp.valueOf(localDateTime); + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftCopyRowConverter.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftCopyRowConverter.java new file mode 100644 index 00000000..c7c9dc17 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftCopyRowConverter.java @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.converter; + +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.types.logical.DecimalType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.TimestampType; + +import java.io.Serializable; +import java.time.LocalDate; +import java.time.format.DateTimeFormatter; +import java.util.ArrayList; + +/** Copy Converter Utils. */ +public class RedshiftCopyRowConverter implements Serializable { + + private final LogicalType[] fieldTypes; + + public RedshiftCopyRowConverter(LogicalType[] fieldTypes) { + this.fieldTypes = fieldTypes; + } + + public String[] toExternal(RowData rowData) { + ArrayList csvLine = new ArrayList<>(); + for (int index = 0; index < rowData.getArity(); index++) { + LogicalType type = fieldTypes[index]; + String val = ""; + switch (type.getTypeRoot()) { + case BOOLEAN: + val = Boolean.toString(rowData.getBoolean(index)); + break; + case FLOAT: + val = String.valueOf(rowData.getFloat(index)); + break; + case DOUBLE: + val = String.valueOf(rowData.getDouble(index)); + break; + case INTERVAL_YEAR_MONTH: + case INTEGER: + val = String.valueOf(rowData.getInt(index)); + break; + case INTERVAL_DAY_TIME: + case BIGINT: + val = String.valueOf(rowData.getLong(index)); + break; + case TINYINT: + case SMALLINT: + case CHAR: + case VARCHAR: + val = rowData.getString(index).toString(); + break; + case BINARY: + case VARBINARY: + case DATE: + val = + LocalDate.ofEpochDay(rowData.getInt(index)) + .format(DateTimeFormatter.ISO_DATE); + break; + case TIME_WITHOUT_TIME_ZONE: + case TIMESTAMP_WITH_TIME_ZONE: + case TIMESTAMP_WITHOUT_TIME_ZONE: + final int timestampPrecision = ((TimestampType) type).getPrecision(); + final DateTimeFormatter timeFormaterr = + DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS"); + val = + rowData.getTimestamp(index, timestampPrecision) + .toLocalDateTime() + .format(timeFormaterr); + break; + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + + case DECIMAL: + final int decimalPrecision = ((DecimalType) type).getPrecision(); + final int decimalScale = ((DecimalType) type).getScale(); + val = String.valueOf(rowData.getDecimal(index, decimalPrecision, decimalScale)); + break; + case ARRAY: + + case MAP: + case MULTISET: + case ROW: + case RAW: + default: + throw new UnsupportedOperationException("Unsupported type:" + type); + } + csvLine.add(val); + } + return csvLine.toArray(new String[0]); + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftRowConverter.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftRowConverter.java new file mode 100644 index 00000000..1cfb7aeb --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/converter/RedshiftRowConverter.java @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.converter; + +import org.apache.flink.table.data.DecimalData; +import org.apache.flink.table.data.GenericRowData; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.data.StringData; +import org.apache.flink.table.data.TimestampData; +import org.apache.flink.table.types.logical.DecimalType; +import org.apache.flink.table.types.logical.LocalZonedTimestampType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.RowType; +import org.apache.flink.table.types.logical.RowType.RowField; +import org.apache.flink.table.types.logical.TimestampType; +import org.apache.flink.util.Preconditions; + +import com.amazon.redshift.jdbc.RedshiftPreparedStatement; + +import java.io.Serializable; +import java.math.BigDecimal; +import java.math.BigInteger; +import java.sql.Date; +import java.sql.ResultSet; +import java.sql.SQLException; +import java.sql.Time; +import java.sql.Timestamp; +import java.time.LocalDate; +import java.time.LocalTime; +import java.util.UUID; + +/** Redshift Row Converter. */ +public class RedshiftRowConverter implements Serializable { + private static final long serialVersionUID = 1L; + + private RowType rowType; + + private final DeserializationConverter[] toInternalConverters; + + private final SerializationConverter[] toExternalConverters; + + public RedshiftRowConverter(RowType rowType) { + this.rowType = Preconditions.checkNotNull(rowType); + LogicalType[] logicalTypes = + rowType.getFields().stream().map(RowField::getType).toArray(LogicalType[]::new); + this.toInternalConverters = new DeserializationConverter[rowType.getFieldCount()]; + this.toExternalConverters = new SerializationConverter[rowType.getFieldCount()]; + + for (int i = 0; i < rowType.getFieldCount(); i++) { + this.toInternalConverters[i] = createToInternalConverter(rowType.getTypeAt(i)); + this.toExternalConverters[i] = createToExternalConverter(logicalTypes[i]); + } + } + + public RowData toInternal(ResultSet resultSet) throws SQLException { + GenericRowData genericRowData = new GenericRowData(rowType.getFieldCount()); + for (int pos = 0; pos < rowType.getFieldCount(); pos++) { + Object field = resultSet.getObject(pos + 1); + if (field != null) { + genericRowData.setField(pos, toInternalConverters[pos].deserialize(field)); + } else { + genericRowData.setField(pos, null); + } + } + return genericRowData; + } + + public void toExternal(RowData rowData, RedshiftPreparedStatement insertStatement) + throws SQLException { + for (int index = 0; index < rowData.getArity(); index++) { + if (!rowData.isNullAt(index)) { + toExternalConverters[index].serialize(rowData, index, insertStatement); + } else { + insertStatement.setObject(index + 1, null); + } + } + } + + protected RedshiftRowConverter.DeserializationConverter createToInternalConverter( + LogicalType type) { + switch (type.getTypeRoot()) { + case NULL: + return val -> null; + case BOOLEAN: + return val -> RedshiftConverterUtils.BOOL_TRUE == ((Number) val).intValue(); + case FLOAT: + case DOUBLE: + case INTERVAL_YEAR_MONTH: + case INTERVAL_DAY_TIME: + case INTEGER: + case BIGINT: + case BINARY: + case VARBINARY: + return val -> val; + case TINYINT: + return val -> ((Integer) val).byteValue(); + case SMALLINT: + return val -> val instanceof Integer ? ((Integer) val).shortValue() : val; + case DECIMAL: + final int precision = ((DecimalType) type).getPrecision(); + final int scale = ((DecimalType) type).getScale(); + return val -> + val instanceof BigInteger + ? DecimalData.fromBigDecimal( + new BigDecimal((BigInteger) val, 0), precision, scale) + : DecimalData.fromBigDecimal((BigDecimal) val, precision, scale); + case DATE: + return val -> (int) ((Date) val).toLocalDate().toEpochDay(); + case TIME_WITHOUT_TIME_ZONE: + return val -> (int) (((Time) val).toLocalTime().toNanoOfDay() / 1_000_000L); + case TIMESTAMP_WITH_TIME_ZONE: + case TIMESTAMP_WITHOUT_TIME_ZONE: + return val -> TimestampData.fromTimestamp((Timestamp) val); + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + return val -> TimestampData.fromInstant(((Timestamp) val).toInstant()); + case CHAR: + case VARCHAR: + return val -> + val instanceof UUID + ? StringData.fromString(val.toString()) + : StringData.fromString((String) val); + case ARRAY: + case MAP: + return val -> RedshiftConverterUtils.toInternal(val, type); + case ROW: + case MULTISET: + case RAW: + default: + throw new UnsupportedOperationException("Unsupported type:" + type); + } + } + + protected RedshiftRowConverter.SerializationConverter createToExternalConverter( + LogicalType type) { + switch (type.getTypeRoot()) { + case BOOLEAN: + return (val, index, statement) -> + statement.setBoolean(index + 1, val.getBoolean(index)); + case FLOAT: + return (val, index, statement) -> + statement.setFloat(index + 1, val.getFloat(index)); + case DOUBLE: + return (val, index, statement) -> + statement.setDouble(index + 1, val.getDouble(index)); + case INTERVAL_YEAR_MONTH: + case INTEGER: + return (val, index, statement) -> statement.setInt(index + 1, val.getInt(index)); + case INTERVAL_DAY_TIME: + case BIGINT: + return (val, index, statement) -> statement.setLong(index + 1, val.getLong(index)); + case TINYINT: + return (val, index, statement) -> statement.setByte(index + 1, val.getByte(index)); + case SMALLINT: + return (val, index, statement) -> + statement.setShort(index + 1, val.getShort(index)); + case CHAR: + case VARCHAR: + // value is BinaryString + return (val, index, statement) -> + statement.setString(index + 1, val.getString(index).toString()); + case BINARY: + case VARBINARY: + return (val, index, statement) -> + statement.setBytes(index + 1, val.getBinary(index)); + case DATE: + return (val, index, statement) -> + statement.setDate( + index + 1, Date.valueOf(LocalDate.ofEpochDay(val.getInt(index)))); + case TIME_WITHOUT_TIME_ZONE: + return (val, index, statement) -> { + LocalTime localTime = LocalTime.ofNanoOfDay(val.getInt(index) * 1_000_000L); + statement.setTimestamp( + index + 1, RedshiftConverterUtils.toEpochDayOneTimestamp(localTime)); + }; + case TIMESTAMP_WITH_TIME_ZONE: + case TIMESTAMP_WITHOUT_TIME_ZONE: + final int timestampPrecision = ((TimestampType) type).getPrecision(); + return (val, index, statement) -> + statement.setTimestamp( + index + 1, + val.getTimestamp(index, timestampPrecision).toTimestamp()); + case TIMESTAMP_WITH_LOCAL_TIME_ZONE: + final int localZonedTimestampPrecision = + ((LocalZonedTimestampType) type).getPrecision(); + return (val, index, statement) -> + statement.setTimestamp( + index + 1, + Timestamp.from( + val.getTimestamp(index, localZonedTimestampPrecision) + .toInstant())); + case DECIMAL: + final int decimalPrecision = ((DecimalType) type).getPrecision(); + final int decimalScale = ((DecimalType) type).getScale(); + return (val, index, statement) -> + statement.setBigDecimal( + index + 1, + val.getDecimal(index, decimalPrecision, decimalScale) + .toBigDecimal()); + case MAP: + return (val, index, statement) -> + statement.setObject( + index + 1, + RedshiftConverterUtils.toExternal(val.getMap(index), type)); + case MULTISET: + case ROW: + case RAW: + default: + throw new UnsupportedOperationException("Unsupported type:" + type); + } + } + + @FunctionalInterface + interface SerializationConverter extends Serializable { + /** Convert an internal field to java object and fill into. */ + void serialize(RowData rowData, int index, RedshiftPreparedStatement statement) + throws SQLException; + } + + @FunctionalInterface + interface DeserializationConverter extends Serializable { + /** Convert an object to the internal data structure object. */ + Object deserialize(Object field) throws SQLException; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftBatchExecutor.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftBatchExecutor.java new file mode 100644 index 00000000..23d5eb7e --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftBatchExecutor.java @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.converter.RedshiftRowConverter; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.table.data.RowData; + +import com.amazon.redshift.jdbc.RedshiftConnectionImpl; +import com.amazon.redshift.jdbc.RedshiftPreparedStatement; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; + +/** Redshift Batch Executor for COPY Mode. */ +@Internal +public class RedshiftBatchExecutor implements RedshiftExecutor { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(RedshiftBatchExecutor.class); + + private final String insertSql; + + private final RedshiftRowConverter converter; + + private final int maxRetries; + + private transient RedshiftPreparedStatement statement; + + private transient RedshiftConnectionProvider connectionProvider; + + public RedshiftBatchExecutor( + String insertSql, RedshiftRowConverter converter, RedshiftOptions options) { + this.insertSql = insertSql; + this.converter = converter; + this.maxRetries = options.getMaxRetries(); + } + + @Override + public void prepareStatement(RedshiftConnectionImpl connection) throws SQLException { + statement = (RedshiftPreparedStatement) connection.prepareStatement(insertSql); + } + + @Override + public void prepareStatement(RedshiftConnectionProvider connectionProvider) + throws SQLException { + this.connectionProvider = connectionProvider; + prepareStatement(connectionProvider.getOrCreateConnection()); + } + + @Override + public void setRuntimeContext(RuntimeContext context) {} + + @Override + public void addToBatch(RowData record) throws SQLException { + switch (record.getRowKind()) { + case INSERT: + converter.toExternal(record, (RedshiftPreparedStatement) statement); + statement.addBatch(); + break; + case UPDATE_AFTER: + case DELETE: + case UPDATE_BEFORE: + break; + default: + throw new UnsupportedOperationException( + String.format( + "Unknown row kind, the supported row kinds is: INSERT, UPDATE_BEFORE, UPDATE_AFTER, DELETE, but get: %s.", + record.getRowKind())); + } + } + + @Override + public void executeBatch() throws SQLException { + attemptExecuteBatch(statement, maxRetries); + } + + @Override + public void closeStatement() { + if (statement != null) { + try { + statement.close(); + } catch (SQLException exception) { + LOG.warn("Redshift batch statement could not be closed.", exception); + } finally { + statement = null; + } + } + } + + @Override + public String toString() { + return "RedshiftBatchExecutor{" + + "insertSql='" + + insertSql + + '\'' + + ", maxRetries=" + + maxRetries + + ", connectionProvider=" + + connectionProvider + + '}'; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftExecutor.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftExecutor.java new file mode 100644 index 00000000..9b84dfb1 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftExecutor.java @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.converter.RedshiftRowConverter; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.connector.redshift.statement.RedshiftStatementFactory; +import org.apache.flink.table.data.GenericRowData; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.RowType; + +import com.amazon.redshift.jdbc.RedshiftConnectionImpl; +import com.amazon.redshift.jdbc.RedshiftPreparedStatement; +import org.apache.commons.lang3.ArrayUtils; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.Serializable; +import java.sql.SQLException; +import java.util.Arrays; +import java.util.function.Function; +import java.util.stream.IntStream; + +import static org.apache.flink.table.data.RowData.createFieldGetter; + +/** Redshift Executor. */ +@Internal +public interface RedshiftExecutor extends Serializable { + Logger LOG = LoggerFactory.getLogger(RedshiftUpsertExecutor.class); + + void prepareStatement(RedshiftConnectionImpl connection) throws SQLException; + + void prepareStatement(RedshiftConnectionProvider connectionProvider) throws SQLException; + + void setRuntimeContext(RuntimeContext context); + + void addToBatch(RowData rowData) throws SQLException; + + void executeBatch() throws SQLException; + + void closeStatement(); + + default void attemptExecuteBatch(RedshiftPreparedStatement stmt, int maxRetries) + throws SQLException { + attemptExecuteBatch(stmt, maxRetries, true); + } + + default void attemptExecuteBatch( + RedshiftPreparedStatement stmt, int maxRetries, Boolean batchMode) throws SQLException { + for (int i = 0; i <= maxRetries; i++) { + try { + if (batchMode) { + stmt.executeBatch(); + } else { + stmt.execute(); + } + + return; + } catch (Exception exception) { + LOG.error("Redshift executeBatch error, retry times = {}", i, exception); + if (i >= maxRetries) { + throw new SQLException( + String.format( + "Attempt to execute batch failed, exhausted retry times = %d", + maxRetries), + exception); + } + try { + Thread.sleep(1000L * i); + } catch (InterruptedException ex) { + Thread.currentThread().interrupt(); + throw new SQLException( + "Unable to flush; interrupted while doing another attempt", ex); + } + } + } + } + + static RedshiftExecutor createRedshiftExecutor( + String[] fieldNames, + String[] keyFields, + LogicalType[] fieldTypes, + RedshiftOptions options) { + if (keyFields.length > 0) { + switch (options.getSinkMode()) { + case COPY: + LOG.info("Creating COPY Mode UPSERT Executor."); + return createUploadUpsertExecutor(fieldNames, keyFields, fieldTypes, options); + case JDBC: + LOG.info("Creating JDBC Mode UPSERT Executor."); + return createUpsertExecutor(fieldNames, keyFields, fieldTypes, options); + default: + throw new IllegalArgumentException( + "Sink Mode " + + options.getSinkMode() + + " not recognised. " + + "Flink Connector Redshift Supports only JDBC / COPY mode."); + } + + } else { + switch (options.getSinkMode()) { + case COPY: + LOG.info("Creating COPY Mode batch Executor."); + return createUploadBatchExecutor(fieldNames, fieldTypes, options); + case JDBC: + LOG.info("Creating JDBC Mode batch Executor."); + return createBatchExecutor(fieldNames, fieldTypes, options); + default: + throw new IllegalArgumentException( + "Sink Mode " + + options.getSinkMode() + + " not recognised. " + + "Flink Connector Redshift Supports only JDBC / COPY mode."); + } + } + } + + /** + * + * @param fieldNames field names. + * @param fieldTypes dataTypes for the field. + * @param options Redshift options. + * @return { @RedshiftUploadBatchExecutor } executor. + */ + static RedshiftUploadBatchExecutor createUploadBatchExecutor( + final String[] fieldNames, + final LogicalType[] fieldTypes, + final RedshiftOptions options) { + return new RedshiftUploadBatchExecutor(fieldNames, fieldTypes, options); + } + + static RedshiftUploadUpsertExecutor createUploadUpsertExecutor( + String[] fieldNames, + String[] keyFields, + LogicalType[] fieldTypes, + RedshiftOptions options) { + int[] delFields = + Arrays.stream(keyFields) + .mapToInt(pk -> ArrayUtils.indexOf(fieldNames, pk)) + .toArray(); + LogicalType[] delTypes = + Arrays.stream(delFields).mapToObj(f -> fieldTypes[f]).toArray(LogicalType[]::new); + + return new RedshiftUploadUpsertExecutor( + fieldNames, + keyFields, + fieldTypes, + new RedshiftRowConverter(RowType.of(delTypes)), + createExtractor(fieldTypes, delFields), + options); + } + + static RedshiftBatchExecutor createBatchExecutor( + String[] fieldNames, LogicalType[] fieldTypes, RedshiftOptions options) { + String insertSql = + RedshiftStatementFactory.getInsertIntoStatement(options.getTableName(), fieldNames); + RedshiftRowConverter converter = new RedshiftRowConverter(RowType.of(fieldTypes)); + return new RedshiftBatchExecutor(insertSql, converter, options); + } + + static RedshiftUpsertExecutor createUpsertExecutor( + String[] fieldNames, + String[] keyFields, + LogicalType[] fieldTypes, + RedshiftOptions options) { + String tableName = options.getTableName(); + String insertSql = RedshiftStatementFactory.getInsertIntoStatement(tableName, fieldNames); + String updateSql = + RedshiftStatementFactory.getUpdateStatement(tableName, fieldNames, keyFields); + String deleteSql = RedshiftStatementFactory.getDeleteStatement(tableName, keyFields); + + // Re-sort the order of fields to fit the sql statement. + int[] delFields = + Arrays.stream(keyFields) + .mapToInt(pk -> ArrayUtils.indexOf(fieldNames, pk)) + .toArray(); + int[] updatableFields = + IntStream.range(0, fieldNames.length) + .filter(idx -> !ArrayUtils.contains(keyFields, fieldNames[idx])) + .toArray(); + int[] updFields = ArrayUtils.addAll(updatableFields, delFields); + + LogicalType[] delTypes = + Arrays.stream(delFields).mapToObj(f -> fieldTypes[f]).toArray(LogicalType[]::new); + LogicalType[] updTypes = + Arrays.stream(updFields).mapToObj(f -> fieldTypes[f]).toArray(LogicalType[]::new); + + return new RedshiftUpsertExecutor( + insertSql, + updateSql, + deleteSql, + new RedshiftRowConverter(RowType.of(fieldTypes)), + new RedshiftRowConverter(RowType.of(updTypes)), + new RedshiftRowConverter(RowType.of(delTypes)), + createExtractor(fieldTypes, updFields), + createExtractor(fieldTypes, delFields), + options); + } + + static Function createExtractor(LogicalType[] logicalTypes, int[] fields) { + final RowData.FieldGetter[] fieldGetters = new RowData.FieldGetter[fields.length]; + for (int i = 0; i < fields.length; i++) { + fieldGetters[i] = createFieldGetter(logicalTypes[fields[i]], fields[i]); + } + + return row -> { + GenericRowData rowData = new GenericRowData(row.getRowKind(), fieldGetters.length); + for (int i = 0; i < fieldGetters.length; i++) { + rowData.setField(i, fieldGetters[i].getFieldOrNull(row)); + } + return rowData; + }; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftS3Util.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftS3Util.java new file mode 100644 index 00000000..55e25e6e --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftS3Util.java @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; + +import com.amazonaws.services.s3.AmazonS3; +import com.amazonaws.services.s3.AmazonS3URI; +import org.apache.commons.csv.CSVFormat; +import org.apache.commons.csv.CSVPrinter; + +import java.io.IOException; +import java.io.OutputStreamWriter; +import java.io.Serializable; +import java.util.List; +import java.util.Optional; +import java.util.UUID; + +/** S3 Utils for Redshift for COPY Mode. */ +@Internal +public class RedshiftS3Util implements Serializable { + + private static final long serialVersionUID = 1L; + + public static void s3OutputCsv(AmazonS3 s3Client, String s3Uri, List csvData) + throws IOException { + try { + String bucket = getS3Parts(s3Uri)[0]; + String key = getS3Parts(s3Uri)[1]; + + S3OutputStream tempOut = new S3OutputStream(s3Client, bucket, key); + OutputStreamWriter tempOutWriter = new OutputStreamWriter(tempOut); + CSVPrinter csvPrinter = new CSVPrinter(tempOutWriter, CSVFormat.DEFAULT); + csvPrinter.printRecords(csvData); + tempOutWriter.close(); + tempOut.close(); + csvPrinter.close(); + } catch (Exception e) { + throw new IOException(String.format("The S3 URI {} is not correct! ", s3Uri), e); + } + } + + public static void s3DeleteObj(AmazonS3 s3Client, String s3Uri) throws IOException { + try { + String bucket = getS3Parts(s3Uri)[0]; + String key = getS3Parts(s3Uri)[1]; + s3Client.deleteObject(bucket, key); + } catch (Exception e) { + throw new IOException(String.format("The S3 object {} delete error!", s3Uri), e); + } + } + + public static String[] getS3Parts(String s3Uri) throws IllegalArgumentException { + AmazonS3URI amazonS3Uri = new AmazonS3URI(s3Uri); + String key = amazonS3Uri.getKey(); + if ((key.charAt(key.length() - 1)) == '/') { + key = removeLastCharOptional(key); + } + return new String[] {amazonS3Uri.getBucket(), key}; + } + + public static String removeLastCharOptional(String s) { + return Optional.ofNullable(s) + .filter(str -> str.length() != 0) + .map(str -> str.substring(0, str.length() - 1)) + .orElse(s); + } + + public static String getS3UriWithFileName(String s3Uri) { + String tempFileName = UUID.randomUUID().toString(); + String tempS3Uri = s3Uri; + if (tempS3Uri.charAt(tempS3Uri.length() - 1) == '/') { + tempS3Uri = removeLastCharOptional(tempS3Uri); + } + return tempS3Uri + "/" + tempFileName; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUploadBatchExecutor.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUploadBatchExecutor.java new file mode 100644 index 00000000..357d3e50 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUploadBatchExecutor.java @@ -0,0 +1,144 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.converter.RedshiftCopyRowConverter; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.connector.redshift.statement.RedshiftStatementFactory; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.types.logical.LogicalType; + +import com.amazon.redshift.jdbc.RedshiftConnectionImpl; +import com.amazon.redshift.jdbc.RedshiftPreparedStatement; +import com.amazonaws.services.s3.AmazonS3; +import com.amazonaws.services.s3.AmazonS3ClientBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.List; + +/** Upload Batch Executor. */ +@Internal +public class RedshiftUploadBatchExecutor implements RedshiftExecutor { + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(RedshiftUploadBatchExecutor.class); + + private final int maxRetries; + + private final String tableName; + + private final String[] fieldNames; + + private String copySql; + + private final RedshiftCopyRowConverter copyRowConverter; + + private final String tempS3Uri; + + private final String iamRoleArn; + + private transient List csvData; + + private transient AmazonS3 s3Client; + + private transient RedshiftPreparedStatement statement; + + private transient RedshiftConnectionProvider connectionProvider; + + public RedshiftUploadBatchExecutor( + String[] fieldNames, LogicalType[] fieldTypes, RedshiftOptions options) { + this.tableName = options.getTableName(); + this.fieldNames = fieldNames; + this.maxRetries = options.getMaxRetries(); + this.csvData = new ArrayList<>(); + this.s3Client = AmazonS3ClientBuilder.defaultClient(); + this.copyRowConverter = new RedshiftCopyRowConverter(fieldTypes); + + this.tempS3Uri = RedshiftS3Util.getS3UriWithFileName(options.getTempS3Uri()); + this.iamRoleArn = options.getIamRoleArn(); + } + + @Override + public void prepareStatement(RedshiftConnectionImpl connection) throws SQLException { + copySql = + RedshiftStatementFactory.getTableCopyStatement( + tableName, tempS3Uri, fieldNames, iamRoleArn); + statement = (RedshiftPreparedStatement) connection.prepareStatement(copySql); + } + + @Override + public void prepareStatement(RedshiftConnectionProvider connectionProvider) + throws SQLException { + this.connectionProvider = connectionProvider; + prepareStatement(connectionProvider.getOrCreateConnection()); + } + + @Override + public void setRuntimeContext(RuntimeContext context) {} + + @Override + public void addToBatch(RowData record) throws SQLException { + csvData.add(copyRowConverter.toExternal(record)); + } + + @Override + public void executeBatch() throws SQLException { + LOG.info("Begin to COPY command."); + try { + RedshiftS3Util.s3OutputCsv(s3Client, tempS3Uri, csvData); + attemptExecuteBatch(statement, maxRetries, false); + RedshiftS3Util.s3DeleteObj(s3Client, tempS3Uri); + } catch (Exception e) { + throw new SQLException("Batch Copy failed!", e); + } + + LOG.info("End to COPY command."); + } + + @Override + public void closeStatement() { + if (statement != null) { + try { + statement.close(); + } catch (SQLException exception) { + LOG.warn("Redshift batch statement could not be closed.", exception); + } finally { + statement = null; + } + } + } + + @Override + public String toString() { + return "RedshiftUploadBatchExecutor{" + + "copySql='" + + copySql + + '\'' + + ", maxRetries=" + + maxRetries + + ", connectionProvider=" + + connectionProvider + + '}'; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUploadUpsertExecutor.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUploadUpsertExecutor.java new file mode 100644 index 00000000..cc00a33a --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUploadUpsertExecutor.java @@ -0,0 +1,245 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.converter.RedshiftCopyRowConverter; +import org.apache.flink.connector.redshift.converter.RedshiftRowConverter; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.connector.redshift.statement.RedshiftStatementFactory; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.types.logical.LogicalType; + +import com.amazon.redshift.jdbc.RedshiftConnectionImpl; +import com.amazon.redshift.jdbc.RedshiftPreparedStatement; +import com.amazonaws.services.s3.AmazonS3; +import com.amazonaws.services.s3.AmazonS3ClientBuilder; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.function.Function; + +/** Redshift Upload Upsert Executor. */ +@Internal +public class RedshiftUploadUpsertExecutor implements RedshiftExecutor { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(RedshiftUploadUpsertExecutor.class); + + private final int maxRetries; + + private final String tableName; + + private final String stageTableName; + + private final String[] fieldNames; + + private final String[] keyFields; + + private final String tempS3Uri; + + private final String iamRoleArn; + + private String copyInsertSql; + + private String updateTrxSql; + + private String deleteSql; + + private final RedshiftRowConverter deleteConverter; + + private final Function deleteExtractor; + + private transient RedshiftPreparedStatement insertStatement; + + private transient RedshiftPreparedStatement deleteStatement; + + private transient RedshiftPreparedStatement updateTrxStatement; + + private transient List csvInsertData; + + private transient List csvUpdateData; + + private final RedshiftCopyRowConverter copyRowConverter; + + private transient AmazonS3 s3Client; + + private RedshiftConnectionProvider connectionProvider; + + public RedshiftUploadUpsertExecutor( + String[] fieldNames, + String[] keyFields, + LogicalType[] fieldTypes, + RedshiftRowConverter deleteConverter, + Function deleteExtractor, + RedshiftOptions options) { + + this.maxRetries = options.getMaxRetries(); + this.fieldNames = fieldNames; + this.keyFields = keyFields; + this.deleteConverter = deleteConverter; + this.deleteExtractor = deleteExtractor; + this.csvInsertData = new ArrayList<>(); + this.csvUpdateData = new ArrayList<>(); + + this.tableName = options.getTableName(); + this.iamRoleArn = options.getIamRoleArn(); + this.s3Client = AmazonS3ClientBuilder.defaultClient(); + this.copyRowConverter = new RedshiftCopyRowConverter(fieldTypes); + this.stageTableName = options.getDatabaseName() + "/" + tableName + "_stage"; + this.tempS3Uri = RedshiftS3Util.getS3UriWithFileName(options.getTempS3Uri()); + } + + @Override + public void prepareStatement(RedshiftConnectionImpl connection) throws SQLException { + final String createTableSql = + RedshiftStatementFactory.getCreateTempTableAsStatement(tableName, stageTableName); + final String insertSql = + RedshiftStatementFactory.getInsertFromStageTable( + tableName, stageTableName, fieldNames); + deleteSql = RedshiftStatementFactory.getDeleteStatement(tableName, keyFields); + final String deleteFromStageSql = + RedshiftStatementFactory.getDeleteFromStageTable( + tableName, stageTableName, keyFields); + final String truncateSql = RedshiftStatementFactory.getTruncateTable(stageTableName); + copyInsertSql = + RedshiftStatementFactory.getTableCopyStatement( + tableName, tempS3Uri, fieldNames, iamRoleArn); + final String copyUpdateSql = + RedshiftStatementFactory.getTableCopyStatement( + stageTableName, tempS3Uri, fieldNames, iamRoleArn); + + updateTrxSql = + "BEGIN;" + + createTableSql + + "; " + + truncateSql + + ";" + + copyUpdateSql + + "; " + + deleteFromStageSql + + "; " + + insertSql + + "; " + + "END;"; + + insertStatement = (RedshiftPreparedStatement) connection.prepareStatement(copyInsertSql); + updateTrxStatement = (RedshiftPreparedStatement) connection.prepareStatement(updateTrxSql); + deleteStatement = (RedshiftPreparedStatement) connection.prepareStatement(deleteSql); + } + + @Override + public void prepareStatement(RedshiftConnectionProvider connectionProvider) + throws SQLException { + this.connectionProvider = connectionProvider; + prepareStatement(connectionProvider.getOrCreateConnection()); + } + + @Override + public void setRuntimeContext(RuntimeContext context) {} + + @Override + public void addToBatch(RowData record) throws SQLException { + switch (record.getRowKind()) { + case INSERT: + csvInsertData.add(copyRowConverter.toExternal(record)); + break; + case UPDATE_AFTER: + csvUpdateData.add(copyRowConverter.toExternal(record)); + break; + case DELETE: + deleteConverter.toExternal(deleteExtractor.apply(record), deleteStatement); + deleteStatement.addBatch(); + break; + case UPDATE_BEFORE: + break; + default: + throw new UnsupportedOperationException( + String.format( + "Unknown row kind, the supported row kinds is: INSERT, UPDATE_BEFORE, UPDATE_AFTER, DELETE, but get: %s.", + record.getRowKind())); + } + } + + @Override + public void executeBatch() throws SQLException { + + LOG.info("Begin to COPY command."); + try { + if (!csvInsertData.isEmpty()) { + RedshiftS3Util.s3OutputCsv(s3Client, tempS3Uri, csvInsertData); + attemptExecuteBatch(insertStatement, maxRetries, false); + } + if (!csvUpdateData.isEmpty()) { + RedshiftS3Util.s3OutputCsv(s3Client, tempS3Uri, csvUpdateData); + attemptExecuteBatch(updateTrxStatement, maxRetries, false); + } + if (deleteStatement != null) { + attemptExecuteBatch(deleteStatement, maxRetries); + } + + RedshiftS3Util.s3DeleteObj(s3Client, tempS3Uri); + csvInsertData = new ArrayList<>(); + csvUpdateData = new ArrayList<>(); + } catch (Exception e) { + throw new SQLException("Redshift COPY mode execute error!", e); + } + + LOG.info("End to COPY command."); + } + + @Override + public void closeStatement() { + for (RedshiftPreparedStatement redshiftStatement : + Arrays.asList(insertStatement, updateTrxStatement, deleteStatement)) { + if (redshiftStatement != null) { + try { + redshiftStatement.close(); + } catch (SQLException exception) { + LOG.warn("Redshift statement could not be closed.", exception); + } + } + } + } + + @Override + public String toString() { + return "RedshiftUploadUpsertExecutor{" + + "copyInsertSql='" + + copyInsertSql + + '\'' + + "updateTrxSql='" + + updateTrxSql + + '\'' + + ", deleteSql='" + + deleteSql + + '\'' + + ", maxRetries=" + + maxRetries + + ", connectionProvider=" + + connectionProvider + + '}'; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUpsertExecutor.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUpsertExecutor.java new file mode 100644 index 00000000..ef04601d --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/RedshiftUpsertExecutor.java @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.converter.RedshiftRowConverter; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.table.data.RowData; + +import com.amazon.redshift.jdbc.RedshiftConnectionImpl; +import com.amazon.redshift.jdbc.RedshiftPreparedStatement; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.sql.SQLException; +import java.util.Arrays; +import java.util.function.Function; + +/** Upsert Executor. */ +@Internal +public class RedshiftUpsertExecutor implements RedshiftExecutor { + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(RedshiftUpsertExecutor.class); + + private final String insertSql; + + private final String updateSql; + + private final String deleteSql; + + private final RedshiftRowConverter insertConverter; + + private final RedshiftRowConverter updateConverter; + + private final RedshiftRowConverter deleteConverter; + + private final Function updateExtractor; + + private final Function deleteExtractor; + + private final int maxRetries; + + private transient RedshiftPreparedStatement insertStatement; + + private transient RedshiftPreparedStatement updateStatement; + + private transient RedshiftPreparedStatement deleteStatement; + + private transient RedshiftConnectionProvider connectionProvider; + + public RedshiftUpsertExecutor( + String insertSql, + String updateSql, + String deleteSql, + RedshiftRowConverter insertConverter, + RedshiftRowConverter updateConverter, + RedshiftRowConverter deleteConverter, + Function updateExtractor, + Function deleteExtractor, + RedshiftOptions options) { + this.insertSql = insertSql; + this.updateSql = updateSql; + this.deleteSql = deleteSql; + this.insertConverter = insertConverter; + this.updateConverter = updateConverter; + this.deleteConverter = deleteConverter; + this.updateExtractor = updateExtractor; + this.deleteExtractor = deleteExtractor; + this.maxRetries = options.getMaxRetries(); + } + + @Override + public void prepareStatement(RedshiftConnectionImpl connection) throws SQLException { + this.insertStatement = + (RedshiftPreparedStatement) connection.prepareStatement(this.insertSql); + this.updateStatement = + (RedshiftPreparedStatement) connection.prepareStatement(this.updateSql); + this.deleteStatement = + (RedshiftPreparedStatement) connection.prepareStatement(this.deleteSql); + } + + @Override + public void prepareStatement(RedshiftConnectionProvider connectionProvider) + throws SQLException { + this.connectionProvider = connectionProvider; + prepareStatement(connectionProvider.getOrCreateConnection()); + } + + @Override + public void setRuntimeContext(RuntimeContext context) {} + + @Override + public void addToBatch(RowData record) throws SQLException { + // TODO: how to handle the ROW sequece? + switch (record.getRowKind()) { + case INSERT: + insertConverter.toExternal(record, insertStatement); + insertStatement.addBatch(); + break; + case UPDATE_AFTER: + updateConverter.toExternal(updateExtractor.apply(record), updateStatement); + updateStatement.addBatch(); + break; + case DELETE: + deleteConverter.toExternal(deleteExtractor.apply(record), deleteStatement); + deleteStatement.addBatch(); + break; + case UPDATE_BEFORE: + break; + default: + throw new UnsupportedOperationException( + String.format( + "Unknown row kind, the supported row kinds is: INSERT, UPDATE_BEFORE, UPDATE_AFTER, DELETE, but get: %s.", + record.getRowKind())); + } + } + + @Override + public void executeBatch() throws SQLException { + for (RedshiftPreparedStatement redshiftStatement : + Arrays.asList(insertStatement, updateStatement, deleteStatement)) { + if (redshiftStatement != null) { + attemptExecuteBatch(redshiftStatement, maxRetries); + } + } + } + + @Override + public void closeStatement() { + for (RedshiftPreparedStatement redshiftStatement : + Arrays.asList(insertStatement, updateStatement, deleteStatement)) { + if (redshiftStatement != null) { + try { + redshiftStatement.close(); + } catch (SQLException exception) { + LOG.warn("Redshift upsert statement could not be closed.", exception); + } + } + } + } + + @Override + public String toString() { + return "RedshiftUpsertExecutor{" + + "insertSql='" + + insertSql + + '\'' + + ", updateSql='" + + updateSql + + '\'' + + ", deleteSql='" + + deleteSql + + '\'' + + ", maxRetries=" + + maxRetries + + ", connectionProvider=" + + connectionProvider + + '}'; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/S3OutputStream.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/S3OutputStream.java new file mode 100644 index 00000000..b409c1e9 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/executor/S3OutputStream.java @@ -0,0 +1,198 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.executor; + +import org.apache.flink.annotation.Internal; + +import com.amazonaws.services.s3.AmazonS3; +import com.amazonaws.services.s3.model.AbortMultipartUploadRequest; +import com.amazonaws.services.s3.model.CannedAccessControlList; +import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest; +import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest; +import com.amazonaws.services.s3.model.InitiateMultipartUploadResult; +import com.amazonaws.services.s3.model.ObjectMetadata; +import com.amazonaws.services.s3.model.PartETag; +import com.amazonaws.services.s3.model.PutObjectRequest; +import com.amazonaws.services.s3.model.UploadPartRequest; +import com.amazonaws.services.s3.model.UploadPartResult; + +import java.io.ByteArrayInputStream; +import java.io.OutputStream; +import java.util.ArrayList; +import java.util.List; + +/** S3 Output Steam. */ +@Internal +public class S3OutputStream extends OutputStream { + + /** Default chunk size is 10MB. */ + protected static final int BUFFER_SIZE = 10000000; + + /** The bucket-name on Amazon S3. */ + private final String bucket; + + /** The path (key) name within the bucket. */ + private final String path; + + /** The temporary buffer used for storing the chunks. */ + private final byte[] buf; + + /** The position in the buffer. */ + private int position; + + /** Amazon S3 client. */ + private final AmazonS3 s3Client; + + /** The unique id for this upload. */ + private String uploadId; + + /** Collection of the etags for the parts that have been uploaded. */ + private final List etags; + + /** indicates whether the stream is still open / valid. */ + private boolean open; + + /** + * Creates a new S3 OutputStream. + * + * @param s3Client the AmazonS3 client + * @param bucket name of the bucket + * @param path path within the bucket + */ + public S3OutputStream(AmazonS3 s3Client, String bucket, String path) { + this.s3Client = s3Client; + this.bucket = bucket; + this.path = path; + this.buf = new byte[BUFFER_SIZE]; + this.position = 0; + this.etags = new ArrayList<>(); + this.open = true; + } + + /** + * Write an array to the S3 output stream. + * + * @param b the byte-array to append + */ + @Override + public void write(byte[] b) { + write(b, 0, b.length); + } + + /** + * Writes an array to the S3 Output Stream. + * + * @param byteArray the array to write + * @param o the offset into the array + * @param l the number of bytes to write + */ + @Override + public void write(final byte[] byteArray, final int o, final int l) { + this.assertOpen(); + int ofs = o, len = l; + int size; + while (len > (size = this.buf.length - position)) { + System.arraycopy(byteArray, ofs, this.buf, this.position, size); + this.position += size; + flushBufferAndRewind(); + ofs += size; + len -= size; + } + System.arraycopy(byteArray, ofs, this.buf, this.position, len); + this.position += len; + } + + /** Flushes the buffer by uploading a part to S3. */ + @Override + public synchronized void flush() { + this.assertOpen(); + } + + protected void flushBufferAndRewind() { + if (uploadId == null) { + final InitiateMultipartUploadRequest request = + new InitiateMultipartUploadRequest(this.bucket, this.path) + .withCannedACL(CannedAccessControlList.BucketOwnerFullControl); + InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(request); + this.uploadId = initResponse.getUploadId(); + } + uploadPart(); + this.position = 0; + } + + protected void uploadPart() { + UploadPartResult uploadResult = + this.s3Client.uploadPart( + new UploadPartRequest() + .withBucketName(this.bucket) + .withKey(this.path) + .withUploadId(this.uploadId) + .withInputStream(new ByteArrayInputStream(buf, 0, this.position)) + .withPartNumber(this.etags.size() + 1) + .withPartSize(this.position)); + this.etags.add(uploadResult.getPartETag()); + } + + @Override + public void close() { + if (this.open) { + this.open = false; + if (this.uploadId != null) { + if (this.position > 0) { + uploadPart(); + } + this.s3Client.completeMultipartUpload( + new CompleteMultipartUploadRequest(bucket, path, uploadId, etags)); + } else { + final ObjectMetadata metadata = new ObjectMetadata(); + metadata.setContentLength(this.position); + final PutObjectRequest request = + new PutObjectRequest( + this.bucket, + this.path, + new ByteArrayInputStream(this.buf, 0, this.position), + metadata) + .withCannedAcl(CannedAccessControlList.BucketOwnerFullControl); + this.s3Client.putObject(request); + } + } + } + + public void cancel() { + this.open = false; + if (this.uploadId != null) { + this.s3Client.abortMultipartUpload( + new AbortMultipartUploadRequest(this.bucket, this.path, this.uploadId)); + } + } + + @Override + public void write(int b) { + this.assertOpen(); + if (position >= this.buf.length) { + flushBufferAndRewind(); + } + this.buf[position++] = (byte) b; + } + + private void assertOpen() { + if (!this.open) { + throw new IllegalStateException("Closed"); + } + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/format/AbstractRedshiftOutputFormat.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/format/AbstractRedshiftOutputFormat.java new file mode 100644 index 00000000..5b37c1d3 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/format/AbstractRedshiftOutputFormat.java @@ -0,0 +1,189 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.format; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.api.common.io.RichOutputFormat; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.executor.RedshiftExecutor; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.types.DataType; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.util.Preconditions; +import org.apache.flink.util.concurrent.ExecutorThreadFactory; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.Flushable; +import java.io.IOException; +import java.util.Arrays; +import java.util.concurrent.ScheduledExecutorService; +import java.util.concurrent.ScheduledFuture; +import java.util.concurrent.ScheduledThreadPoolExecutor; +import java.util.concurrent.TimeUnit; + +/** Abstract Redshift Output format. */ +@Internal +public abstract class AbstractRedshiftOutputFormat extends RichOutputFormat + implements Flushable { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(AbstractRedshiftOutputFormat.class); + + protected transient volatile boolean closed = false; + + protected transient ScheduledExecutorService scheduler; + + protected transient ScheduledFuture scheduledFuture; + + protected transient volatile Exception flushException; + + public AbstractRedshiftOutputFormat() {} + + @Override + public void configure(Configuration parameters) {} + + public void scheduledFlush(long intervalMillis, String executorName) { + Preconditions.checkArgument(intervalMillis > 0, "flush interval must be greater than 0"); + scheduler = new ScheduledThreadPoolExecutor(1, new ExecutorThreadFactory(executorName)); + scheduledFuture = + scheduler.scheduleWithFixedDelay( + () -> { + synchronized (this) { + if (!closed) { + try { + flush(); + } catch (Exception e) { + flushException = e; + } + } + } + }, + intervalMillis, + intervalMillis, + TimeUnit.MILLISECONDS); + } + + public void checkBeforeFlush(final RedshiftExecutor executor) throws IOException { + checkFlushException(); + try { + executor.executeBatch(); + } catch (Exception e) { + throw new IOException(e); + } + } + + @Override + public synchronized void close() { + if (!closed) { + closed = true; + + try { + flush(); + } catch (Exception exception) { + LOG.warn("Flushing records to Redshift failed.", exception); + } + + if (scheduledFuture != null) { + scheduledFuture.cancel(false); + this.scheduler.shutdown(); + } + + closeOutputFormat(); + checkFlushException(); + } + } + + protected abstract void closeOutputFormat(); + + protected void checkFlushException() { + if (flushException != null) { + throw new RuntimeException("Flush exception found.", flushException); + } + } + + /** Builder class. */ + public static class Builder { + + private static final Logger LOG = + LoggerFactory.getLogger(AbstractRedshiftOutputFormat.Builder.class); + + private DataType[] fieldTypes; + + private LogicalType[] logicalTypes; + + private RedshiftOptions connectionProperties; + + private String[] fieldNames; + + private String[] primaryKeys; + + public Builder() {} + + public Builder withConnectionProperties(RedshiftOptions connectionProperties) { + this.connectionProperties = connectionProperties; + return this; + } + + public Builder withFieldTypes(DataType[] fieldTypes) { + this.fieldTypes = fieldTypes; + this.logicalTypes = + Arrays.stream(fieldTypes) + .map(DataType::getLogicalType) + .toArray(LogicalType[]::new); + return this; + } + + public Builder withFieldNames(String[] fieldNames) { + this.fieldNames = fieldNames; + return this; + } + + public Builder withPrimaryKey(String[] primaryKeys) { + this.primaryKeys = primaryKeys; + return this; + } + + public AbstractRedshiftOutputFormat build() { + Preconditions.checkNotNull(fieldNames); + Preconditions.checkNotNull(fieldTypes); + Preconditions.checkNotNull(primaryKeys); + if (primaryKeys.length > 0) { + LOG.warn("If primary key is specified, connector will be in UPSERT mode."); + LOG.warn( + "The data will be updated / deleted by the primary key, you will have significant performance loss."); + } else { + LOG.warn("No primary key is specified, connector will be INSERT only mode."); + } + + RedshiftConnectionProvider connectionProvider = + new RedshiftConnectionProvider(connectionProperties); + + return new RedshiftOutputFormat( + connectionProvider, + fieldNames, + primaryKeys, + logicalTypes, + connectionProperties); + } + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/format/RedshiftOutputFormat.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/format/RedshiftOutputFormat.java new file mode 100644 index 00000000..4417da00 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/format/RedshiftOutputFormat.java @@ -0,0 +1,120 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.format; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.connector.redshift.connection.RedshiftConnectionProvider; +import org.apache.flink.connector.redshift.executor.RedshiftExecutor; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.util.Preconditions; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.sql.SQLException; + +/** Redshift Output Format. */ +@Internal +public class RedshiftOutputFormat extends AbstractRedshiftOutputFormat { + + private static final long serialVersionUID = 1L; + + private static final Logger LOG = LoggerFactory.getLogger(RedshiftOutputFormat.class); + + private final RedshiftConnectionProvider connectionProvider; + + private final String[] fieldNames; + + private final String[] keyFields; + + private final LogicalType[] fieldTypes; + + private final RedshiftOptions options; + + private transient RedshiftExecutor executor; + + private transient int batchCount = 0; + + protected RedshiftOutputFormat( + RedshiftConnectionProvider connectionProvider, + String[] fieldNames, + String[] keyFields, + LogicalType[] fieldTypes, + RedshiftOptions options) { + this.connectionProvider = Preconditions.checkNotNull(connectionProvider); + this.fieldNames = Preconditions.checkNotNull(fieldNames); + this.keyFields = Preconditions.checkNotNull(keyFields); + this.fieldTypes = Preconditions.checkNotNull(fieldTypes); + this.options = Preconditions.checkNotNull(options); + } + + @Override + public void open(int taskNumber, int numTasks) throws IOException { + try { + executor = + RedshiftExecutor.createRedshiftExecutor( + fieldNames, keyFields, fieldTypes, options); + + executor.prepareStatement(connectionProvider); + executor.setRuntimeContext(getRuntimeContext()); + + LOG.info("Executor: " + executor); + + long flushIntervalMillis = options.getFlushInterval().toMillis(); + scheduledFlush(flushIntervalMillis, "redshift-batch-output-format"); + } catch (Exception exception) { + throw new IOException("Unable to establish connection with Redshift.", exception); + } + } + + @Override + public synchronized void writeRecord(RowData record) throws IOException { + checkFlushException(); + + try { + executor.addToBatch(record); + batchCount++; + if (batchCount >= options.getBatchSize()) { + flush(); + } + } catch (SQLException exception) { + throw new IOException("Writing record to Redshift statement failed.", exception); + } + } + + @Override + public synchronized void flush() throws IOException { + if (batchCount > 0) { + checkBeforeFlush(executor); + batchCount = 0; + } + } + + @Override + public synchronized void closeOutputFormat() { + try { + executor.closeStatement(); + connectionProvider.closeConnection(); + } catch (SQLException exception) { + LOG.error("Close Redshift statement failed.", exception); + } + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/mode/SinkMode.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/mode/SinkMode.java new file mode 100644 index 00000000..53840ab5 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/mode/SinkMode.java @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.mode; + +import org.apache.flink.annotation.Internal; + +/** + * Declared Sink Modes for AWS Redshift. Redshift Support 3 types of modes : 1. JDBC 2. ODBC 3. COPY + */ +@Internal +public enum SinkMode { + JDBC("JDBC"), + COPY("COPY"); + + private final String mode; + + SinkMode(String mode) { + this.mode = mode; + } + + @Override + public String toString() { + return mode; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/options/RedshiftOptions.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/options/RedshiftOptions.java new file mode 100644 index 00000000..a14ef491 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/options/RedshiftOptions.java @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.options; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.connector.redshift.mode.SinkMode; +import org.apache.flink.util.Preconditions; + +import java.io.Serializable; +import java.time.Duration; +import java.util.Optional; + +/** Options. */ +@Internal +public class RedshiftOptions implements Serializable { + private static final long serialVersionUID = 1L; + + private final String hostname; + + private final int port; + + private final String username; + + private final String password; + + private final String databaseName; + + private final String tableName; + + private final int batchSize; + + private final Duration flushInterval; + + private final int maxRetries; + + private final SinkMode sinkMode; + + private final String tempFileS3Uri; + + private final String iamRoleArn; + + private RedshiftOptions( + String hostname, + int port, + String username, + String password, + String databaseName, + String tableName, + int batchSize, + Duration flushInterval, + int maxRetires, + SinkMode sinkMode, + String tempFileS3Uri, + String iamRoleArn) { + this.hostname = hostname; + this.port = port; + this.username = username; + this.password = password; + this.databaseName = databaseName; + this.tableName = tableName; + this.batchSize = batchSize; + this.flushInterval = flushInterval; + this.maxRetries = maxRetires; + this.sinkMode = sinkMode; + this.tempFileS3Uri = tempFileS3Uri; + this.iamRoleArn = iamRoleArn; + } + + public String getHostname() { + return this.hostname; + } + + public int getPort() { + return this.port; + } + + public Optional getUsername() { + return Optional.ofNullable(this.username); + } + + public Optional getPassword() { + return Optional.ofNullable(this.password); + } + + public String getDatabaseName() { + return this.databaseName; + } + + public String getTableName() { + return this.tableName; + } + + public int getBatchSize() { + return this.batchSize; + } + + public Duration getFlushInterval() { + return this.flushInterval; + } + + public int getMaxRetries() { + return this.maxRetries; + } + + public SinkMode getSinkMode() { + return this.sinkMode; + } + + public String getTempS3Uri() { + return this.tempFileS3Uri; + } + + public String getIamRoleArn() { + return this.iamRoleArn; + } + + /** Builder Class. */ + public static class Builder { + private String hostname; + + private int port; + + private String username; + + private String password; + + private String databaseName; + + private String tableName; + + private int batchSize; + + private Duration flushInterval; + + private int maxRetries; + + private SinkMode sinkMode; + + private String tempS3Uri; + + private String iamRoleArn; + + public Builder withHostname(String hostname) { + this.hostname = hostname; + return this; + } + + public Builder withPort(int port) { + this.port = port; + return this; + } + + public Builder withUsername(String username) { + this.username = username; + return this; + } + + public Builder withPassword(String password) { + this.password = password; + return this; + } + + public Builder withDatabaseName(String databaseName) { + this.databaseName = databaseName; + return this; + } + + public Builder withTableName(String tableName) { + this.tableName = tableName; + return this; + } + + public Builder withBatchSize(int batchSize) { + this.batchSize = batchSize; + return this; + } + + public Builder withFlushInterval(Duration flushInterval) { + this.flushInterval = flushInterval; + return this; + } + + public Builder withMaxRetries(int maxRetries) { + this.maxRetries = maxRetries; + return this; + } + + public Builder withSinkMode(SinkMode sinkMode) { + this.sinkMode = sinkMode; + return this; + } + + public Builder withTempS3Uri(String tempS3Uri) { + this.tempS3Uri = tempS3Uri; + return this; + } + + public Builder withIamRoleArn(String iamRoleArn) { + this.iamRoleArn = iamRoleArn; + return this; + } + + public RedshiftOptions build() { + Preconditions.checkNotNull(this.hostname, "No hostname supplied."); + Preconditions.checkNotNull(this.port, "No port supplied."); + Preconditions.checkNotNull(this.databaseName, "No databaseName supplied."); + Preconditions.checkNotNull(this.tableName, "No tableName supplied."); + Preconditions.checkNotNull(this.sinkMode, "No copyMode supplied."); + return new RedshiftOptions( + this.hostname, + this.port, + this.username, + this.password, + this.databaseName, + this.tableName, + this.batchSize, + this.flushInterval, + this.maxRetries, + this.sinkMode, + this.tempS3Uri, + this.iamRoleArn); + } + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/statement/RedshiftStatementFactory.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/statement/RedshiftStatementFactory.java new file mode 100644 index 00000000..eb2256c7 --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/statement/RedshiftStatementFactory.java @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.statement; + +import org.apache.flink.annotation.Internal; + +import org.apache.commons.lang3.ArrayUtils; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.stream.Collectors; + +/** Statement Factory. */ +@Internal +public class RedshiftStatementFactory implements Serializable { + private static final long serialVersionUID = 1L; + + public static String getInsertIntoStatement(String tableName, String[] fieldNames) { + String columns = + Arrays.stream(fieldNames) + .map(RedshiftStatementFactory::quoteIdentifier) + .collect(Collectors.joining(", ")); + String placeholders = + Arrays.stream(fieldNames).map(f -> "?").collect(Collectors.joining(", ")); + return "INSERT INTO " + + quoteIdentifier(tableName) + + "(" + + columns + + ") VALUES (" + + placeholders + + ")"; + } + + public static String getUpdateStatement( + String tableName, String[] fieldNames, String[] keyFields) { + String setClause = + Arrays.stream(fieldNames) + .filter(f -> !ArrayUtils.contains(keyFields, f)) + .map(f -> quoteIdentifier(f) + "= ?") + .collect(Collectors.joining(", ")); + String conditionClause = + Arrays.stream(keyFields) + .map(f -> quoteIdentifier(f) + "= ?") + .collect(Collectors.joining(" AND ")); + + return "UPDATE " + + quoteIdentifier(tableName) + + " set " + + setClause + + " WHERE " + + conditionClause; + } + + public static String getDeleteStatement(String tableName, String[] conditionFields) { + String conditionClause = + Arrays.stream(conditionFields) + .map(f -> quoteIdentifier(f) + "=?") + .collect(Collectors.joining(" AND ")); + return "DELETE FROM " + quoteIdentifier(tableName) + " WHERE " + conditionClause; + } + + public static String getRowExistsStatement(String tableName, String[] conditionFields) { + String fieldExpressions = + Arrays.stream(conditionFields) + .map(f -> quoteIdentifier(f) + "=?") + .collect(Collectors.joining(" AND ")); + return "SELECT 1 FROM " + quoteIdentifier(tableName) + " WHERE " + fieldExpressions; + } + + public static String getUpsertStatement( + String tableName, + String stageTableName, + String[] fieldNames, + String[] conditionFields) { + String columns = + Arrays.stream(fieldNames) + .map(RedshiftStatementFactory::quoteIdentifier) + .collect(Collectors.joining(", ")); + String matchCondition = + Arrays.stream(conditionFields) + .map( + f -> + quoteIdentifier(tableName) + + "." + + quoteIdentifier(f) + + "=" + + quoteIdentifier("stage") + + "." + + quoteIdentifier(f)) + .collect(Collectors.joining(" AND ")); + String setClause = + Arrays.stream(fieldNames) + .filter(f -> !ArrayUtils.contains(conditionFields, f)) + .map( + f -> + quoteIdentifier(f) + + "=" + + quoteIdentifier("stage") + + "." + + quoteIdentifier(f)) + .collect(Collectors.joining(", ")); + String insertValue = + Arrays.stream(fieldNames) + .map(f -> quoteIdentifier("stage") + "." + quoteIdentifier(f)) + .collect(Collectors.joining(", ")); + return "MERGE INTO " + + quoteIdentifier(tableName) + + " USING " + + quoteIdentifier(stageTableName) + + " stage on " + + matchCondition + + " WHEN MATCHED THEN UPDATE SET " + + setClause + + " WHEN NOT MATCHED THEN INSERT (" + + columns + + ") VALUES (" + + insertValue + + ")"; + } + + public static String getDropTableStatement(String tableName) { + return "DROP TABLE IF EXISTS " + quoteIdentifier(tableName); + } + + public static String getCreateTempTableAsStatement(String tableName, String tempTableName) { + return "CREATE TEMP TABLE IF NOT EXISTS " + + quoteIdentifier(tempTableName) + + "(LIKE " + + quoteIdentifier(tableName) + + ")"; + } + + public static String getTableCopyStatement( + String tableName, String s3Uri, String[] fieldNames, String iamRoleArn) { + final String columns = Arrays.stream(fieldNames).collect(Collectors.joining(", ")); + return "COPY " + + quoteIdentifier(tableName) + + "(" + + columns + + ")" + + " FROM " + + "'" + + s3Uri + + "'" + + " iam_role " + + "'" + + iamRoleArn + + "' " + + "FORMAT AS CSV"; + } + + public static String getInsertFromStageTable( + String tableName, String tempTableName, String[] fieldNames) { + String columns = + Arrays.stream(fieldNames) + .map(RedshiftStatementFactory::quoteIdentifier) + .collect(Collectors.joining(", ")); + return "INSERT INTO " + + quoteIdentifier(tableName) + + "(" + + columns + + ") SELECT " + + columns + + " FROM " + + quoteIdentifier(tempTableName); + } + + public static String getDeleteFromStageTable( + String tableName, String tempTableName, String[] conditionFields) { + String matchCondition = + Arrays.stream(conditionFields) + .map( + f -> + quoteIdentifier(tableName) + + "." + + quoteIdentifier(f) + + "=" + + quoteIdentifier(tempTableName) + + "." + + quoteIdentifier(f)) + .collect(Collectors.joining(" AND ")); + return "DELETE FROM " + + quoteIdentifier(tableName) + + " USING " + + quoteIdentifier(tempTableName) + + " WHERE " + + matchCondition; + } + + public static String getTruncateTable(String tableName) { + return "TRUNCATE " + quoteIdentifier(tableName); + } + + public static String quoteIdentifier(String identifier) { + return identifier; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/table/RedshiftDynamicTableFactory.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/table/RedshiftDynamicTableFactory.java new file mode 100644 index 00000000..c18a4a6a --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/table/RedshiftDynamicTableFactory.java @@ -0,0 +1,213 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.table; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.configuration.ConfigOption; +import org.apache.flink.configuration.ConfigOptions; +import org.apache.flink.configuration.ReadableConfig; +import org.apache.flink.connector.redshift.executor.RedshiftS3Util; +import org.apache.flink.connector.redshift.mode.SinkMode; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.table.catalog.ResolvedSchema; +import org.apache.flink.table.catalog.UniqueConstraint; +import org.apache.flink.table.connector.sink.DynamicTableSink; +import org.apache.flink.table.factories.DynamicTableSinkFactory; +import org.apache.flink.table.factories.FactoryUtil; +import org.apache.flink.table.types.DataType; + +import java.time.Duration; +import java.util.Arrays; +import java.util.HashSet; +import java.util.Set; + +/** Dynamic Table Factory. */ +@PublicEvolving +public class RedshiftDynamicTableFactory implements DynamicTableSinkFactory { + public static final String IDENTIFIER = "redshift"; + + public static final ConfigOption HOSTNAME = + ConfigOptions.key("hostname") + .stringType() + .noDefaultValue() + .withDeprecatedKeys("AWS Redshift cluster hostname."); + + public static final ConfigOption PORT = + ConfigOptions.key("port") + .intType() + .defaultValue(5439) + .withDeprecatedKeys("AWS Redshift port number.\nDefault value : 5439."); + + public static final ConfigOption USERNAME = + ConfigOptions.key("username") + .stringType() + .noDefaultValue() + .withDescription("AWS Redshift Cluster username."); + + public static final ConfigOption PASSWORD = + ConfigOptions.key("password") + .stringType() + .noDefaultValue() + .withDescription("AWS Redshift cluster password."); + + public static final ConfigOption DATABASE_NAME = + ConfigOptions.key("sink.database-name") + .stringType() + .defaultValue("dev") + .withDescription( + "AWS Redshift cluster database name. Default value set to `dev`."); + + public static final ConfigOption TABLE_NAME = + ConfigOptions.key("sink.table-name") + .stringType() + .noDefaultValue() + .withDescription("AWS Redshift cluster sink table name."); + + public static final ConfigOption SINK_BATCH_SIZE = + ConfigOptions.key("sink.batch-size") + .intType() + .defaultValue(1000) + .withDescription( + "`sink.batch-size` determines the maximum size of batch, in terms of the number of records, " + + "at which data will trigger a flush operation." + + " When the number of records exceeds this threshold, the system initiates a flush to manage the data.\n" + + "Default Value: 1000"); + + public static final ConfigOption SINK_FLUSH_INTERVAL = + ConfigOptions.key("sink.flush-interval") + .durationType() + .defaultValue(Duration.ofSeconds(1L)) + .withDescription( + "the flush interval mills, over this time, asynchronous threads will flush data. The default value is 1s."); + + public static final ConfigOption SINK_MAX_RETRIES = + ConfigOptions.key("sink.max-retries") + .intType() + .defaultValue(3) + .withDescription("the max retry times if writing records to database failed."); + + public static final ConfigOption SINK_MODE = + ConfigOptions.key("sink.mode") + .enumType(SinkMode.class) + .noDefaultValue() + .withDescription( + "Currently, 2 modes are supported for Flink connector redshift.\n" + + "\t 1) COPY Mode." + + "\t 2) JDBC Mode."); + public static final ConfigOption TEMP_S3_URI = + ConfigOptions.key("sink.copy-mode.aws.s3-uri") + .stringType() + .noDefaultValue() + .withDescription("using Redshift COPY command must provide a S3 URI."); + public static final ConfigOption IAM_ROLE_ARN = + ConfigOptions.key("sink.aws.iam-role-arn") + .stringType() + .noDefaultValue() + .withDescription( + "using Redshift COPY function must provide a IAM Role which have attached to the Cluster."); + + @Override + public DynamicTableSink createDynamicTableSink(Context context) { + FactoryUtil.TableFactoryHelper helper = FactoryUtil.createTableFactoryHelper(this, context); + ReadableConfig config = helper.getOptions(); + helper.validate(); + validateConfigOptions(config); + ResolvedSchema resolvedSchema = context.getCatalogTable().getResolvedSchema(); + String[] primaryKeys = + resolvedSchema + .getPrimaryKey() + .map(UniqueConstraint::getColumns) + .map(keys -> keys.toArray(new String[0])) + .orElse(new String[0]); + String[] fieldNames = resolvedSchema.getColumnNames().toArray(new String[0]); + DataType[] fieldDataTypes = resolvedSchema.getColumnDataTypes().toArray(new DataType[0]); + return new RedshiftDynamicTableSink( + getOptions(config), primaryKeys, fieldNames, fieldDataTypes); + } + + @Override + public String factoryIdentifier() { + return IDENTIFIER; + } + + @Override + public Set> requiredOptions() { + return new HashSet<>(Arrays.asList(HOSTNAME, PORT, DATABASE_NAME, TABLE_NAME, SINK_MODE)); + } + + @Override + public Set> optionalOptions() { + return new HashSet<>( + Arrays.asList( + USERNAME, + PASSWORD, + SINK_BATCH_SIZE, + SINK_FLUSH_INTERVAL, + SINK_MAX_RETRIES, + TEMP_S3_URI, + IAM_ROLE_ARN)); + } + + private RedshiftOptions getOptions(ReadableConfig config) { + return (new RedshiftOptions.Builder()) + .withHostname(config.get(HOSTNAME)) + .withPort(config.get(PORT)) + .withUsername(config.get(USERNAME)) + .withPassword(config.get(PASSWORD)) + .withDatabaseName(config.get(DATABASE_NAME)) + .withTableName(config.get(TABLE_NAME)) + .withBatchSize(config.get(SINK_BATCH_SIZE)) + .withFlushInterval(config.get(SINK_FLUSH_INTERVAL)) + .withMaxRetries(config.get(SINK_MAX_RETRIES)) + .withSinkMode(config.get(SINK_MODE)) + .withTempS3Uri(config.get(TEMP_S3_URI)) + .withIamRoleArn(config.get(IAM_ROLE_ARN)) + .build(); + } + + private void validateConfigOptions(ReadableConfig config) { + if (config.get(SINK_MODE) == SinkMode.COPY + && !config.getOptional(TEMP_S3_URI).isPresent()) { + throw new IllegalArgumentException("A S3 URL must be provided when sink mode is COPY"); + } else if (config.getOptional(TEMP_S3_URI).isPresent()) { + String uri = config.get(TEMP_S3_URI); + try { + RedshiftS3Util.getS3Parts(uri); + } catch (Exception e) { + throw new IllegalArgumentException( + "The attempt to access S3 has failed due to an incorrect S3 URL provided." + + " Please verify the authentication credentials and ensure the accessibility of the specified bucket." + + "Resolution Steps:\n" + + "\n" + + "Double-check the accuracy of the provided S3 URL.\n" + + "Verify the correctness of the authentication credentials.\n" + + "Ensure that the specified bucket is accessible and properly configured.", + e); + } + } + + if (config.get(SINK_MODE) == SinkMode.COPY + && !config.getOptional(IAM_ROLE_ARN).isPresent()) { + throw new IllegalArgumentException( + "Requirement for COPY Mode in Amazon Redshift Cluster\n" + + "\n" + + "To utilize the COPY mode, it is mandatory to furnish the IAM Role ARN linked to the Amazon Redshift cluster. " + + "Please ensure that the IAM Role ARN is accurately specified to enable seamless functionality in COPY mode."); + } + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/table/RedshiftDynamicTableSink.java b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/table/RedshiftDynamicTableSink.java new file mode 100644 index 00000000..daaa732d --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/java/org/apache/flink/connector/redshift/table/RedshiftDynamicTableSink.java @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.connector.redshift.table; + +import org.apache.flink.annotation.PublicEvolving; +import org.apache.flink.connector.redshift.format.AbstractRedshiftOutputFormat; +import org.apache.flink.connector.redshift.options.RedshiftOptions; +import org.apache.flink.table.connector.ChangelogMode; +import org.apache.flink.table.connector.sink.DynamicTableSink; +import org.apache.flink.table.connector.sink.OutputFormatProvider; +import org.apache.flink.table.types.DataType; +import org.apache.flink.types.RowKind; +import org.apache.flink.util.Preconditions; + +/** AWS Redshift Dynamic Table Sink . */ +@PublicEvolving +public class RedshiftDynamicTableSink implements DynamicTableSink { + private final String[] primaryKeys; + + private final String[] fieldNames; + + private final DataType[] fieldDataTypes; + + private final RedshiftOptions options; + + public RedshiftDynamicTableSink( + RedshiftOptions options, + String[] primaryKeys, + String[] fieldNames, + DataType[] fieldDataTypes) { + + this.primaryKeys = primaryKeys; + this.fieldNames = fieldNames; + this.fieldDataTypes = fieldDataTypes; + this.options = options; + } + + @Override + public ChangelogMode getChangelogMode(ChangelogMode requestedMode) { + validatePrimaryKey(requestedMode); + return ChangelogMode.newBuilder() + .addContainedKind(RowKind.INSERT) + .addContainedKind(RowKind.UPDATE_AFTER) + .addContainedKind(RowKind.DELETE) + .build(); + } + + private void validatePrimaryKey(ChangelogMode requestedMode) { + Preconditions.checkState( + (ChangelogMode.insertOnly().equals(requestedMode) || primaryKeys.length > 0), + "Declare primary key for UPSERT operation."); + } + + @Override + public SinkRuntimeProvider getSinkRuntimeProvider(Context context) { + + AbstractRedshiftOutputFormat outputFormat = + new AbstractRedshiftOutputFormat.Builder() + .withConnectionProperties(options) + .withFieldNames(fieldNames) + .withFieldTypes(fieldDataTypes) + .withPrimaryKey(primaryKeys) + .build(); + return OutputFormatProvider.of(outputFormat); + } + + @Override + public DynamicTableSink copy() { + return new RedshiftDynamicTableSink( + this.options, this.primaryKeys, this.fieldNames, this.fieldDataTypes); + } + + @Override + public String asSummaryString() { + return "Amazon Redshift Sink"; + } +} diff --git a/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/licenses/.gitkeep b/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/licenses/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/services/.gitkeep b/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/services/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory b/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory new file mode 100644 index 00000000..e281664a --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory @@ -0,0 +1,16 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +org.apache.flink.connector.redshift.table.RedshiftDynamicTableFactory diff --git a/flink-connector-aws/flink-connector-redshift/src/main/resources/log4j2.properties b/flink-connector-aws/flink-connector-redshift/src/main/resources/log4j2.properties new file mode 100644 index 00000000..c64a340a --- /dev/null +++ b/flink-connector-aws/flink-connector-redshift/src/main/resources/log4j2.properties @@ -0,0 +1,25 @@ +################################################################################ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +################################################################################ + +rootLogger.level = OFF +rootLogger.appenderRef.console.ref = ConsoleAppender + +appender.console.name = ConsoleAppender +appender.console.type = CONSOLE +appender.console.layout.type = PatternLayout +appender.console.layout.pattern = %d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n