-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
questionFurther information is requestedFurther information is requested
Description
This fails:
netconv.read('/home/main/Downloads/imdb.graphml', 'graphml')
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-4-7aff295fd165> in <module>
----> 1 netconv.read('/home/main/Downloads/imdb.graphml', 'graphml')
~/git/netconv/netconv/__init__.py in read(fname, fmt, *args, **kwargs)
29 def read(fname, fmt, *args, **kwargs):
30 with open(fname) as file:
---> 31 text = file.read()
32 return decode(text, fmt, *args, **kwargs)
33
~/.pyenv/versions/miniconda3-latest/lib/python3.6/codecs.py in decode(self, input, final)
319 # decode input (taking the buffer into account)
320 data = self.buffer + input
--> 321 (result, consumed) = self._buffer_decode(data, self.errors, final)
322 # keep undecoded input until the next call
323 self.buffer = data[consumed:]
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 7734: invalid continuation byte
As does this:
with open("/home/main/Downloads/imdb.graphml", 'r', encoding='utf-8') as fin:
text = fin.read()
G = netconv.decoders.decode_graphml(text) This works:
with open("/home/main/Downloads/imdb.graphml", 'r', encoding='latin-1') as fin:
text = fin.read()
G = netconv.decoders.decode_graphml(text) However, the imdb dataset has a great deal of unicode characters.
I'm inclined to dismiss it as a data quality issue but it might be good to be able to pass the choice of encoding to netconv.read().
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested