Update java/src/com/twitter/pycascading/Util.java#3
Update java/src/com/twitter/pycascading/Util.java#3TaoLinVT wants to merge 2 commits intoianoc:casc2from
Conversation
Add gzip compression support (Also need to enable it in python code).
There was a problem hiding this comment.
There's no if statement around any of this, we are changing the upstream behavior of pycascading ?
|
There are two reasons (Reformat): (1). My test shows that these have no impact unless python code enables compression. For example changes following to: Only the one with compress enabled will generate gz files. All the others remain the same. (2). Consider the use case in which one flow has multiple sinks. If there is one sink with compress.enabled scheme, then we have to enable those java compress codes. Those properties will also be applied to other sinks as well, even if they do not have compress scheme. Because this cannot be avoid, there is no point to add if statement around those new statements. Thanks, |
|
Its very specific to GZIP and a particular set of flags to go into upstream. I'll look at being able to pass in a set of hadoop properties into the run option instead. Changing the code upstream if we decide on a different codec seems like a bad idea. If its passed into run rather than a sink to control what compression is used then it will effect all sinks as expected? |
|
OK, I see. How about the following which allows us to use different codec later by adding entries to config: Supported codec strings are defined by: |
|
This seems to:
The config map is a String -> Object map, so we can have a case where it maps to a string -> string map i.e. config.get("pycascading.hadoop.mapred.options") -> Map<String, String> We then iterate through this string map setting all the key's present. It should leave the existing behavior in pycascading as it was and let us easily pass in a hash map to set any options (for compression or anything else in future too). Thoughts? |
|
OK. Here is plan: pycascading.pipe.config = dict() defined in python/pycascading/bootstrap.py (line 70). And it will be passed to python/pycascading/tap.py (line 237) as parameter config to Util.run(). So
|
…script and default behavior is the same as before.
|
Please see the new diff. Thank you! -- Tao |
Add gzip compression support (Also need to enable it in python code).