Skip to content

Latest commit

 

History

History
54 lines (34 loc) · 2.03 KB

File metadata and controls

54 lines (34 loc) · 2.03 KB

Sparklyr 0.5.0 (UNRELEASED)

  • Resolved an issue where predict() could produce results in the wrong order for large Spark DataFrames.

  • Implemented support for na.action with the various Spark.ML routines, and set the default as na.omit. Users can customize the na.action argument through the ml.options object accepted by all ML routines.

  • Fixed windows spark_connect with long paths and spaces.

  • The lag() window function now accepts numeric values for n. (#249)

  • Added support to configure spark environment variables using spark.env.* config.

  • Added support for the Tokenizer and RegexTokenizer feature transformers. These are exported as the ft_tokenizer() and ft_regex_tokenizer() functions.

  • Resolved an issue where attempting to call copy_to() with an R data.frame containing many columns could fail with a Java StackOverflow. (#244)

  • Resolved an issue where attempting to call collect() on a Spark DataFrame containing many columns could produce the wrong result. (#242)

  • Added support to parameterize network timeouts using the sparklyr.backend.timeout, sparklyr.gateway.start.timeout and sparklyr.gateway.connect.timeout config settings.

  • Improved logging while establishing connections to sparklyr.

  • Added sparklyr.gateway.port and sparklyr.gateway.address as config settings.

  • Added eclipse project to ease development of the scala codebase within sparklyr.

  • Added filter parameter to spark_log to fitler with ease entries by a character string.

  • Increased network timeout for sparklyr.backend.timeout.

  • Moved spark.jars.default setting from options to spark config.

  • sparklyr now properly respects the Hive metastore directory with the sdf_save_table() and sdf_load_table() APIs for Spark < 2.0.0.

  • Added sdf_quantile() as a means of computing (approximate) quantiles for a column of a Spark DataFrame.

  • Added support for n_distinct(...), based on call to Hive function count(DISTINCT ...). (#220)

Sparklyr 0.4.0

  • First release to CRAN.