You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
There seems to be an issue when downloading/reading the lotte datasets.
Affected dataset(s)
LoTTE
To Reproduce
Run in Python:
import ir_datasets
dataset = ir_datasets.load("lotte/recreation/test")
for doc in dataset.docs_iter():
print(doc)
break
Get the error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/home/user/misc/ir-datasets.ipynb Cell 3 line 4
1 import ir_datasets
3 dataset = ir_datasets.load("lotte/recreation/test")
----> 4 for doc in dataset.docs_iter():
5 print(doc)
6 break
File ~/miniconda3/envs/py311/lib/python3.11/site-packages/ir_datasets/util/__init__.py:147, in DocstoreSplitter.__next__(self)
146 def __next__(self):
--> 147 return next(self.it)
File ~/miniconda3/envs/py311/lib/python3.11/site-packages/ir_datasets/formats/tsv.py:92, in TsvIter.__next__(self)
91 def __next__(self):
---> 92 line = next(self.line_iter)
93 cols = line.rstrip('\n').split('\t')
94 num_cols = len(self.cls._fields)
File ~/miniconda3/envs/py311/lib/python3.11/site-packages/ir_datasets/formats/tsv.py:28, in FileLineIter.__next__(self)
26 self.stream = io.TextIOWrapper(self.ctxt.enter_context(self.dlc[self.stream_idx].stream()))
27 else:
---> 28 self.stream = io.TextIOWrapper(self.ctxt.enter_context(self.dlc.stream()))
29 while self.pos < self.start:
30 line = self.stream.readline()
File ~/miniconda3/envs/py311/lib/python3.11/contextlib.py:502, in _BaseExitStack.enter_context(self, cm)
499 except AttributeError:
500 raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "
501 f"not support the context manager protocol") from None
--> 502 result = _enter(cm)
503 self._push_cm_exit(cm, _exit)
504 return result
File ~/miniconda3/envs/py311/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
135 del self.args, self.kwds, self.func
136 try:
--> 137 return next(self.gen)
138 except StopIteration:
139 raise RuntimeError("generator didn't yield") from None
File ~/miniconda3/envs/py311/lib/python3.11/site-packages/ir_datasets/util/fileio.py:148, in RelativePath.stream(self)
146 @contextlib.contextmanager
147 def stream(self):
--> 148 with open(self.path(), 'rb') as f:
149 yield f
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/.ir_datasets/lotte/lotte_extracted/lotte/recreation/test/collection.tsv'
Expected behavior
I should be seeing the first doc in the collection, as I successfully get with msmarco:
dataset = ir_datasets.load("beir/msmarco/test")
for doc in dataset.docs_iter():
print(doc)
break
returns:
GenericDoc(doc_id='0', text='The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated.')
Additional context
In the terminal, cd ~/.ir_datasets/lotte && ls -R . returns:
.:
lotte_extracted
./lotte_extracted:
lotte
./lotte_extracted/lotte:
lifestyle recreation
./lotte_extracted/lotte/lifestyle:
test
./lotte_extracted/lotte/lifestyle/test:
collection.tsv.pklz4
./lotte_extracted/lotte/lifestyle/test/collection.tsv.pklz4:
bin bin.meta
./lotte_extracted/lotte/recreation:
test
./lotte_extracted/lotte/recreation/test:
collection.tsv.pklz4
./lotte_extracted/lotte/recreation/test/collection.tsv.pklz4:
bin bin.meta
I'm working with:
Python implementation: CPython
Python version : 3.11.0
IPython version : 8.14.0
ir_datasets: 0.5.5
Compiler : GCC 11.3.0
OS : Linux
Release : 5.15.0-84-generic
Machine : x86_64
Processor : x86_64
CPU cores : 16
Architecture: 64bit
The text was updated successfully, but these errors were encountered:
Describe the bug
There seems to be an issue when downloading/reading the lotte datasets.
Affected dataset(s)
LoTTE
To Reproduce
Run in Python:
Get the error:
Expected behavior
I should be seeing the first doc in the collection, as I successfully get with msmarco:
returns:
Additional context
In the terminal,
cd ~/.ir_datasets/lotte && ls -R .
returns:I'm working with:
The text was updated successfully, but these errors were encountered: