Spark Component Release Notes

This page lists the main features added to the Spark Component.

Change Log

Version 2023.1.10

Bug fixes

  • DI-8797: Template LOAD RDBMS TO SPARK: Fixed incorrect query generation with externalized table names.

Version 2023.1.4

Bug fixes

  • DI-7837: The LOAD HDFS XML TO SPARK template generates wrong queries in some cases, producing errors at run.

Version 2023.1.3

Bug fixes

  • DI-7361: INTEGRATION Spark to Hdfs Parquet: An error is raised when one of the columns of the target datastore is not mapped.

Version 2023.1.0

Feature improvements

  • DI-6403: The INTEGRATION Spark to XML template allows writing HDFS XML files from Spark.

  • DI-6406: Hierarchical fields are now supported in the INTEGRATION Spark to Hdfs Parquet template.

Bug fixes

  • DI-6396: Spark templates do not generate Java files encoded in UTF-8.

  • DI-6572: When loading null values using the LOAD Spark to Rdbms template, a NullPointerException is raised.

Version 5.3.8 (Component Pack)

Feature improvements

  • DI-5872: The decimal precision defined is unexpectedly ignored.

  • DI-6088: The LOAD Hdfs XML to Spark template is now available.

  • DI-6397: Spark 1.6 templates have been removed.

  • DI-6571: The LOAD Hdfs XML to Spark template has been updated to ensure that the datastore and column names are not truncated.

Version 3.0.0 (Component Pack)

Feature improvements

  • DI-4508: Update Components and Designer to take into account dedicated license permissions

  • DI-4727: Rebranding: Templates and sample projects

  • DI-4731: Rebranding: Template messages

  • DI-4962: Improved component dependencies and requirements management

Version 2.1.0 (Spark Component)

Bug fixes

  • DI-3028: Mappings - Spark Templates were unexpectedly proposed in some situations in Mappings even when Spark was not involved

Version 2.0.5 (Spark Component)

Feature improvements

  • DI-4011: Addition of ability to specify deploy mode (cluster or client)

  • DI-4012: Addition of ability to work with resource files through a new dedicated node in Metadata and a new parameter on the Spark Submit TOOL

  • DI-4042: Addition of ability to handle spark session configuration when multiple targets are loaded with Spark

Bug fixes

  • DI-4037: Template LOAD Hdfs File to Spark - The error message has been updated to be more clearer (instead of NullPointerException) when an error occurs while reading a file

Version 2.0.4 (Spark Component)

Feature improvements

  • DI-3580: Template - LOAD Hdfs File to Spark - new parameter "In File List"

  • DI-3581: Template - LOAD Hdfs File to Spark - support compressed files when using fileDriver Read Method

  • DI-3719: Template - Load Hdfs Json to Spark - new Template to load JSON files stored in HDFS into Spark

  • DI-3800: TOOL - Spark Execution Unit Launcher- add number of partitions to debug prints

Bug fixes

  • DI-3579: Templates - use createOrReplaceTempView instead of registerTempTable which is deprecated

Version 2.0.3 (Spark Component)

Feature improvements

  • DI-2002: Spark - add support for HTTPS and Kerberized Livy connections

  • DI-2539: Spark - enhance datatype conversion when reading through JDBC

Bug fixes

  • DI-2540: Spark - when a Spark session executed by Livy fails, the error cause is unexpectedly not returned

Version 2.0.2 (Spark Component)

Feature improvements

  • DI-1912: Templates updated - support having CDC sources on Templates which were not supporting it (such as staging templates)

  • DI-1909: Templates updated - New Parameters 'Unlock Cdc Table' and 'Lock Cdc Table' to configure the behaviour of CDC tables locking

Bug fixes

  • DI-1907: Templates updated - The 'Cdc Subscriber' parameter was ignored in some Templates when querying the source data