-
Resolved an issue where
predict()could produce results in the wrong order for large Spark DataFrames. -
Implemented support for
na.actionwith the various Spark.ML routines, and set the default asna.omit. Users can customize thena.actionargument through theml.optionsobject accepted by all ML routines. -
Fixed windows
spark_connectwith long paths and spaces. -
The
lag()window function now accepts numeric values forn. (#249) -
Added support to configure spark environment variables using
spark.env.*config. -
Added support for the
TokenizerandRegexTokenizerfeature transformers. These are exported as theft_tokenizer()andft_regex_tokenizer()functions. -
Resolved an issue where attempting to call
copy_to()with an Rdata.framecontaining many columns could fail with a Java StackOverflow. (#244) -
Resolved an issue where attempting to call
collect()on a Spark DataFrame containing many columns could produce the wrong result. (#242) -
Added support to parameterize network timeouts using the
sparklyr.backend.timeout,sparklyr.gateway.start.timeoutandsparklyr.gateway.connect.timeoutconfig settings. -
Improved logging while establishing connections to
sparklyr. -
Added
sparklyr.gateway.portandsparklyr.gateway.addressas config settings. -
Added eclipse project to ease development of the scala codebase within
sparklyr. -
Added
filterparameter tospark_logto fitler with ease entries by a character string. -
Increased network timeout for sparklyr.backend.timeout.
-
Moved
spark.jars.defaultsetting from options to spark config. -
sparklyrnow properly respects the Hive metastore directory with thesdf_save_table()andsdf_load_table()APIs for Spark < 2.0.0. -
Added
sdf_quantile()as a means of computing (approximate) quantiles for a column of a Spark DataFrame. -
Added support for
n_distinct(...), based on call to Hive functioncount(DISTINCT ...). (#220)
- First release to CRAN.