Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing support for exporting threads #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aytey
Copy link

@aytey aytey commented Dec 22, 2020

Hi,

This PR adds support for dumping threads (in the regular Slack format) from channels.

Please see here for some more background: zach-snell#40

Cheers,

Andrew

Signed-off-by: Andrew V. Jones [email protected]

@filipedematos
Copy link

Just got this message, any clues?

Successfully authenticated for team ******* and user *****
Found 13 Users
Found 2 Public Channels
Traceback (most recent call last):
File "slack_export.py", line 384, in
bootstrapKeyValues()
File "slack_export.py", line 284, in bootstrapKeyValues
groups = slack.conversations.list(limit = 1000, types=('private_channel', 'mpim')).body['channels']
File "/Users/syox/Library/Python/2.7/lib/python/site-packages/slacker/init.py", line 217, in list
'limit': limit
File "/Users/syox/Library/Python/2.7/lib/python/site-packages/slacker/init.py", line 120, in get
api, **kwargs
File "/Users/syox/Library/Python/2.7/lib/python/site-packages/slacker/init.py", line 102, in _request
raise Error(response.error)
slacker.Error: missing_scope

Thank you

@aytey
Copy link
Author

aytey commented Nov 29, 2021

Just got this message, any clues?

Successfully authenticated for team ******* and user ***** Found 13 Users Found 2 Public Channels Traceback (most recent call last): File "slack_export.py", line 384, in bootstrapKeyValues() File "slack_export.py", line 284, in bootstrapKeyValues groups = slack.conversations.list(limit = 1000, types=('private_channel', 'mpim')).body['channels'] File "/Users/syox/Library/Python/2.7/lib/python/site-packages/slacker/init.py", line 217, in list 'limit': limit File "/Users/syox/Library/Python/2.7/lib/python/site-packages/slacker/init.py", line 120, in get api, **kwargs File "/Users/syox/Library/Python/2.7/lib/python/site-packages/slacker/init.py", line 102, in _request raise Error(response.error) slacker.Error: missing_scope

Thank you

@filipedematos: you get this error if the "bot" you've created to get access to the channel does not have enough access permissions to read what you're trying to export.

@geverl
Copy link

geverl commented Nov 29, 2021

I'm trying to export my channels but the script keeps trying to fetch the first channel forever, irrespective of which channel (public or private) I choose.
Output:
"Fetching history for Public Channel: b'what_is_slack'
..............................................................................................................................................................................

..............................................................................................................................................................................

..............................................................................................................................................................................

..............................................................................................................................................................................

..............................................................................................................................................................................

..............................................................................................................................................................................

..............................................................................................................................................................................

..............................................................................................................................................................................

.............................................................................................................................................................................. "

@aytey
Copy link
Author

aytey commented Nov 29, 2021

Why do you think it is not still running? This export is extremely slow because Slack heavily rate-limits connections.

@geverl
Copy link

geverl commented Nov 29, 2021

Because when using https://github.com/lumbric/slack-export/tree/working-draft the job is done in less than 5 minutes (yes, without threads), whereas this script has now been running for 2 hours and is apparently still working on the first (public) channel, which contains barely any messages. And it has not written any data during those 2 hours expect at the beginning, when the channels.json, dms.json, groups.json, mpims.json and users.json files were created. The what_is_slack directory is still empty.

@aytey
Copy link
Author

aytey commented Nov 30, 2021

Are you getting "fresh dots"? I no longer have access to a Slack instance to heavily test this with (which is why I wrote it).

channels.json, dms.json, groups.json, mpims.json and users.json files were created.

These don't get updated when dumping threads; you get separate json files for each date (and in each channel). I can't remember if they're written as dumping happens or at the end.

whereas this script has now been running for 2 hours and is apparently still working on the first (public) channel

I left this running over a whole weekend to dump a "small" channel. The rate limits on Slack are something like 10 messages (or maybe 100) every 1 second -- it really takes a long time to dump this way.

If you care, before this line:

add a print of response["messages"] to see what's being dumped.

@geverl
Copy link

geverl commented Nov 30, 2021

After running for about 20 hours, the script breaks:
Traceback (most recent call last):
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "", line 3, in raise_from
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "/home/asterix/anaconda3/lib/python3.8/http/client.py", line 1332, in getresponse
response.begin()
File "/home/asterix/anaconda3/lib/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/home/asterix/anaconda3/lib/python3.8/http/client.py", line 264, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/asterix/anaconda3/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/home/asterix/anaconda3/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/home/asterix/anaconda3/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/asterix/anaconda3/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 724, in urlopen
retries = retries.increment(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/util/retry.py", line 403, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 428, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/asterix/anaconda3/lib/python3.8/site-packages/urllib3/connectionpool.py", line 335, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='slack.com', port=443): Read timed out. (read timeout=10)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "slack_export.py", line 479, in
fetchPublicChannels(selectedChannels)
File "slack_export.py", line 235, in fetchPublicChannels
messages = getHistory(slack.conversations, channel['id'])
File "slack_export.py", line 126, in getHistory
messages.extend(getReplies(channelId, message["thread_ts"], pageSize))
File "slack_export.py", line 23, in getReplies
response = conversationObject.replies(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/slacker/init.py", line 246, in replies
return self.get(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/slacker/init.py", line 120, in get
return self._request(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/slacker/init.py", line 97, in _request
response = request_method(
File "/home/asterix/anaconda3/lib/python3.8/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/home/asterix/anaconda3/lib/python3.8/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/home/asterix/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/home/asterix/anaconda3/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/home/asterix/anaconda3/lib/python3.8/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='slack.com', port=443): Read timed out. (read timeout=10)

@aytey
Copy link
Author

aytey commented Nov 30, 2021

¯\(ツ)

you're welcome to dive into the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants