Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding issue #18

Open
jaredforth opened this issue Aug 28, 2024 · 2 comments
Open

Decoding issue #18

jaredforth opened this issue Aug 28, 2024 · 2 comments

Comments

@jaredforth
Copy link

Hello,

I'm running python fs_to_json.py ../2024-08-27T13:49:35_42810/all_namespaces/all_kinds/ out where ../2024-08-27T13:49:35_42810/all_namespaces/all_kinds/ is

├── all_namespaces_all_kinds.export_metadata
├── output-0
├── output-1
├── output-10
├── output-11
├── output-12
├── output-13
├── output-14
├── output-15
├── output-16
├── output-17
├── output-18
├── output-19
├── output-2
├── output-3
├── output-4
├── output-5
├── output-6
├── output-7
├── output-8
└── output-9

1 directory, 21 files

The conversion worked for 9 files but created 12 empty JSON files. Is this a known bug and are there any tips on how to resolve?

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/Users/jaredforth/Downloads/firestore-export-json/converter/command.py", line 145, in process_file
    json.dumps(json_tree, default=serialize_json, ensure_ascii=False, indent=2)
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 202, in encode
    chunks = list(chunks)
             ^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 432, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 406, in _iterencode_dict
    yield from chunks
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 326, in _iterencode_list
    yield from chunks
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/json/encoder.py", line 439, in _iterencode
    o = _default(o)
        ^^^^^^^^^^^
  File "/Users/jaredforth/Downloads/firestore-export-json/converter/utils.py", line 77, in serialize_json
    return str(obj)
           ^^^^^^^^
  File "/Users/jaredforth/Downloads/firestore-export-json/venv/lib/python3.12/site-packages/google/appengine/api/datastore_types.py", line 1227, in __str__
    return self.decode('utf-8')
           ^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 24: invalid continuation byte
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jaredforth/Downloads/firestore-export-json/fs_to_json.py", line 14, in <module>
    main()
  File "/Users/jaredforth/Downloads/firestore-export-json/fs_to_json.py", line 10, in main
    command.main(args=args)
  File "/Users/jaredforth/Downloads/firestore-export-json/converter/command.py", line 93, in main
    process_files(
  File "/Users/jaredforth/Downloads/firestore-export-json/converter/command.py", line 113, in process_files
    p.map(f, files)
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.5/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/pool.py", line 774, in get
    raise self._value
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 24: invalid continuation byte
@LeoPraktisk
Copy link

LeoPraktisk commented Sep 9, 2024

I am having the same issue. I also am having issues when parsing Docuemnts that have Collection within them.

@LeoPraktisk
Copy link

I have been fidgeting with a little and sortof found a solution.

I change the serialize_json function within the utils.py, with the following code:

def serialize_json(obj):
    try:
        if isinstance(obj, datetime.datetime):
            if obj.utcoffset() is not None:
                obj = obj - obj.utcoffset()
            millis = int(calendar.timegm(obj.timetuple()) * 1000 + obj.microsecond / 1000)
            return millis
        return str(obj)
    except UnicodeDecodeError:
        return obj.decode("utf-8", errors="ignore")

the old code looked like this:

def serialize_json(obj):
      if isinstance(obj, datetime.datetime):
          if obj.utcoffset() is not None:
              obj = obj - obj.utcoffset()
          millis = int(calendar.timegm(obj.timetuple()) * 1000 + obj.microsecond / 1000)
          return millis
      return str(obj)

This fixes the error or morso bypasses the problem, but there are still parts that don't work. When it runs over a list of objects, it seems unable to parse it properly and the list becomes a jumble of unicode escape sequences.

I don't know if this helps, but it will allow you to run the program without any errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants