Cache dump to disk #41

Ortovoxx · 2021-03-12T23:20:28Z

Currently creates a .sessions folder with just sessions.json and guild.json to then be restored at a later date.

Fails when constructing roles.

Added a cache options array for client options but might want to take out and just have it auto dump to disk with the hot reloading option

timotejroiko · 2021-03-20T00:57:34Z

i havent forgotten about this yet, im working on some other projects. I'll revisit this soon

Ortovoxx · 2021-03-21T17:19:21Z

Need to add the _unpatch method to the bot user and take a look at client options etc to do some cleaning up

timotejroiko · 2021-03-21T18:20:25Z

Nice job with the unpatch.

Here are my thoughts about it:
There is enough cache control as is, there is no need for further cache control when rehydrating, lets just dump everything that is currently cached and restore everything. Its the only way to make sure no weird things happen and its also the closest thing to a true session reload, zero data loss.

With that said, the unpatch method should be moved to the client, and should loop over users, channels and guilds.
From each user, get the raw data according to the api, for each channel to the same, and for each guild do the same except for guild channels.
Guild channels should be rehydrated from the channel cache as a reference since they are already linked by reference anyway.

As for the exit event, i dont feel there is a need for that. if the user wants custom cache dumping behavior they should add their own event listeners to exit, sigint, etc... and use the newly exposed client.unpatch method. We only need a single function to dump all, and a single client option to specify the source to rehydrate all (can be file path or object).

All related client options can then be moved into a single object, something like this:

Client({
    hotreload: {
        enabled: true,
        sessionData: { // defaults to ".sessions/sessions.json", can also be passed an object like this
            0: { id: "wef23bf", sequence: 4234 },
            1: { id: "wef23bf", sequence: 4234 }
        },
        unpatchOnExit: true, // if false, users can use client.unpatch in their own exit listeners
        // onUnpatch: cache => {} // alternatively replace the above with something like this
        patchSource: "./cache.json" // defaults to ".sessions/cache.json", can also be passed a cache object for people who use redis etc...
    }
})

We have two possibilities here, we could keep sessions and cache separate, each with its own dump method and restore option, or simplify it even further and put everything together.

Another issue is getting all of this to be compatible with the sharding manager, an example being that processes cannot write concurrently to the same file, but this can be dealt with later on.

Generally the direction i want to follow is to try and make this as simple as possible, nothing too complex that would change discord.js too much. Any super-advanced feature would be better suited for a future standalone library.

Ortovoxx · 2021-03-21T19:45:14Z

Simplifying it down is a good idea and it's better to use the client options already presented when dumping as you say it saves weird things from happening and keeps it simple.

Although I think that there should still be a private unpatch method for channels, guilds and members which will return that objects raw discord api data and then in the client there should be a public unpatch method which loops over all of the objects we want to cache and calls their private unpatch method.

Hydrating like that sounds like a good solution though.

Yes I agree with the exit events it's better to leave the customisation up to the user I see you added a client option in for turning our native disk caching off which is good. Using the client.unpatch to get all the cache is a better solution as it allows for more extension by users.

In terms of how the cache should be stored it's tricky. I think that sessions should be stored separate to the cache as it allows for potential swapping of caches and sessions across machines etc. The simplicity of this will not matter as much as the user will have limited exposure to the .sessions folder.

Potentially we create a guilds, channels and users subfolder (or file prefix) and store their data in a file mapped by their IDs. Ie we have .sessions/guilds/581072557512458241.json with all the data about that guild. I don't know about the efficiency of writing all of these files to disk like that but is an alternative way of storing them instead of having 1 massive JSON file. Might work better if people want to restore cache across shards when resharding? Might also solve some problems with the sharding manager later down the line as they won't have to write to the same file

I agree that we should wait until we have internal sharding working and then look at the sharding manager and make changes to work with that.

timotejroiko · 2021-03-21T20:18:38Z

yes, private unpatch for channel user and guild is good.

we can try going with the one file per id and see how it will works out. performance will likely be atrocious for very large bots, but at that point the user should switch to a better backend anyway so it should be fine.

lets try something like this for the client options:

Client({
    hotreload: {
        enabled: true,
        sessionData: { // if not provided, load sessions from .sessions/sessions/[id].json
            0: { id: "wef23bf", sequence: 4234 },
            1: { id: "wef23bf", sequence: 4234 }
        },
        cacheData: { // if not provided, load data from .sessions/guilds/[id].json etc...
            guilds: { ... },
            channels: { ... },
            users: { ... }
        },
        onUnload: sessions => { // if not provided, store sessions in json files
            // else do something with sessions
        },
        onUnpatch: data => { // if not provided, store data in json files
            // else do something with data
        }
    }
})

it kinda goes against what i said before about the exit event, but maybe it will turn out to be simpler like this after all.

we dont need to support file paths since if the user wants a custom file, they might as well just load it themselves then just give us the object

Ortovoxx · 2021-03-21T21:09:15Z

Okay that all looks good! Although I will remove the enabled property as users can disable it by setting hotreload to false and enable it by setting it to an object with all of their settings.

Maybe better to combine the dumping of sessions and data into 1 function?

I'll start implementing some features over the next week or so

timotejroiko · 2021-03-21T21:13:14Z

could be, but then there would be no way of turning them off individually, a single function would shut down the json files option for both, which might not be a big deal tbh, maybe it will be good to keep it all-or-nothing like everything else in the lib

Ortovoxx · 2021-03-21T21:17:51Z

True, thinking about it there doesn't seem to be a use case where someone would want to use one but not the other. We'll stick with the 1 for now and can split them into two separate functions if the need arises later

Ortovoxx · 2021-03-21T23:33:15Z

Just added some helper methods and general scaffolding. Need to now add in the storing of cache on exit and the patching of cache back in on startup. Will probably have a work on it next weekend

this._patchCache() Needs to take in { guilds, channels, users } and rehydrate the caches. The cache supplied is either the user supplied cache or the cache from disk if the user supplies none. Shouldn't error if no cache supplied.

this.dumpCache() Should take a snapshot of all the cache and return it in discord api form( but not do anything with it hence why it's public for people to add their own cache storage solution ). Need to go through channel, guild and user to create a _unpatch() method for each which will return their data in discord api form. Just loop through everything in cache and call that method for each.

this.onUnload() Should call dumpCache and store all of that to disk. Users can override this function with their own cache storage mathods

Ortovoxx · 2021-03-22T22:21:45Z

Going to need your help as to were to place the _unpatch() methods on the user and channels as well as writing the rehydration part as I don't think I have enough knowledge in how all discord.js types and classes interact.

Just the _unpatch() methods and cache rehydration to go.

Ortovoxx

We're storing each object as a JSON file with the objects ID as its file name. We could easily rehydrate guilds to shards with the sharding formula. Channels and users could either be prefixed with the shard id 0_267292139208048641 2_804030576759144468 etc or potentially have another file which "points" each of the shard ids to the different caches.

Ie a JSON file like this: (or have 1 JSON file per shard to prevent 2 shards writing to same file but premise is the same)

{
    "0": {
        "guilds": [267292139208048641,267292139208048641],
        "channels": [267292139208048641,267292139208048641],
        "users": [267292139208048641,267292139208048641],
    },
    "1": {
        "guilds": [267292139208048641,267292139208048641],
        "channels": [267292139208048641,267292139208048641],
        "users": [267292139208048641,267292139208048641],
    }
}

timotejroiko · 2021-04-01T14:10:58Z

we can use the sharding formula for guilds yes, good idea.
channels can be obtained from their guildID, falling back to shard 0 if they dont have one.
users should be tied to members, as members are loaded, so will their users. in fact maybe we dont need a user cache at all, i'll look into it

timotejroiko · 2021-04-01T21:15:14Z

the cache storing should be fine like this for now, we dont actually need DM users nor DM channels.
GuildMembers already include the user data, and we can include the channel data from guilds as well.

Now we need to move the loading of the cache to the websocket manager, so the loading only happens after we have a list of shard IDs. From there we either pick matching data from the user-supplied cache object, or we load the matching data from the json files.

timotejroiko · 2021-04-02T15:42:13Z

should be mostly ready for testing, let me know if theres anything i missed

client.js

init.js

Ortovoxx · 2021-04-02T17:31:51Z

There is a slight problem. Because we only fetch the web socket shard session data from disk inside identify() the web socket manager is currently queueing up shards like it normally would and is not fast connecting. The only way to fix this would be to know how many shards that manager is expected to handle at start up and whether we can correctly resume for them meaning we would have to read all the data from disk at start up.

I think that we definitely want to keep a way to fast connect as if you have lots of shards you would be waiting unnecessary time.

Fixed with 76f7622

…ists Fixes timotejroiko#41 (comment)

Ortovoxx · 2021-04-03T00:51:22Z

Fixed a bunch of the minor issues and tested it. Seems to work pretty smoothly so far!

We need to consider what we do when a client increases their shard size and still provides session data as currently if 2 shard sessions are stored and then the user increases their shards to 3 and loads those sessions which resumes 2 and identifies for 1 more. This is obviously not the desired behaviour as we now have 2 sessions handling all guilds and then 1 more session handling 1/3 of guilds.

Potentially we can just force identify all shards if a different shard count is present when compared to the number of stored shards. However this presents a problem for when a user downshards ( as there will be redundant shards stored on disk ) or if a user simply wants the behavior mentioned above ( for a user potentially wanting to increase the sessions handled by a client )

We can solve this by adding a hotReload.forceIdentify option but the same effect can be achieved by not providing any data ( but would mean guild cache is not loaded )

Lots of the problems that arise from this are edge cases I understand but the first issue is definitely not and most users would encounter a problem here. I will leave it up to you to decide how you with to handle it.

timotejroiko · 2021-04-03T09:19:34Z

classes.js

@@ -287,8 +287,8 @@ Discord.Structures.extend("Guild", G => {
 						this.members.add(member);
 					}
 				}
-				if(!this.members.cache.has(this.client.user.id)) {
-					this.members.fetch(this.client.user.id).catch(() => {});
+				if(!this.members.cache.has(this.client.user?.id)) {


this is a problem: optional chaining short circuits to undefined so if client.user is falsey, then its doing .has(undefined) which will count as user not cached and attempt to fetch(undefined).

There shouldnt ever be a situation where a guild is loaded without the client having logged in, especially now that you moved the cache loading to the resume event.

timotejroiko · 2021-04-03T09:28:31Z

There is a slight problem. Because we only fetch the web socket shard session data from disk inside identify() the web socket manager is currently queueing up shards like it normally would and is not fast connecting. The only way to fix this would be to know how many shards that manager is expected to handle at start up and whether we can correctly resume for them meaning we would have to read all the data from disk at start up.

I think that we definitely want to keep a way to fast connect as if you have lots of shards you would be waiting unnecessary time.

Fixed with 76f7622

I have a suggestion:

Move the session loading to the manager. We need the session to check if we should resume or reidentify, we already have a shard id in createShards(), so we take that id and load the session file there if revelant.

timotejroiko · 2021-04-03T09:30:02Z

Fixed a bunch of the minor issues and tested it. Seems to work pretty smoothly so far!

We need to consider what we do when a client increases their shard size and still provides session data as currently if 2 shard sessions are stored and then the user increases their shards to 3 and loads those sessions which resumes 2 and identifies for 1 more. This is obviously not the desired behaviour as we now have 2 sessions handling all guilds and then 1 more session handling 1/3 of guilds.

Potentially we can just force identify all shards if a different shard count is present when compared to the number of stored shards. However this presents a problem for when a user downshards ( as there will be redundant shards stored on disk ) or if a user simply wants the behavior mentioned above ( for a user potentially wanting to increase the sessions handled by a client )

We can solve this by adding a hotReload.forceIdentify option but the same effect can be achieved by not providing any data ( but would mean guild cache is not loaded )

Lots of the problems that arise from this are edge cases I understand but the first issue is definitely not and most users would encounter a problem here. I will leave it up to you to decide how you with to handle it.

We can store the total shards in each session object as well since the session ID literally depends on it. IF the shard file exists but the total shards in the file is different than the total shards in the client, then reidentify instead of resume

Ortovoxx · 2021-04-03T11:36:37Z

These are both better ideas. Just added them along with some extra debug logs. Works like a charm

#41 (comment) Resolved by 7280818

#41 (comment) Resolved by a160704

client.js

init.js

Ortovoxx · 2021-04-03T12:58:24Z

init.js

+				shard.loadCacheTimeout = this.client.setTimeout(() => {
+					this.debug("Shard cache was never loaded as the session didn't resume in 15s", shard);
+					shard.loadCacheTimeout = null;
+					shard.removeEventListener(Constants.ShardEvents.RESUMED) // Remove the event in a better way?


What is the preferred way to remove the event?

818d0e9 Fixes errors and it seems to work. Double check to make sure it is the correct way to do things

if we join the session loading and cache loading then we dont need to remove the event, we add it only if there is a session load

Client will emit the standard ready event once all of its shards are ready and connected.

… is gitignored

Promises are supported inside of uncaughtException, SIGINT and SIGTERM but not exit. Most restarts call SIGINT or SIGTERM before they exit. I don't know why you removed the check for an infinite loop of uncaught exceptions? If there is an uncaught exception inside of the user defined function or our function it will loop forever. I added a try catch block around the user function so it won't happen for user functions but if you don't want a check for it we need to be 100% sure our code won't cause a loop

Fixed some copy paste fails assigning an object to a value with : as well as a spelling error and creating folders for each of the caches

Removed obj variable as it JSON.stringify() can be called directly inside of write file

…identifies Sessions timeout after about 1-2 minutes so we should check for when the session was last connected and if it is more than a minute old ( therefore probably stale ) we shouldn't try and reconnect as the identify will probably fail.

Spend ages trying to figure out why shard 0 wasn't being treated as truthy... Of course JS treats the number 0 falsy xD

Different data types were being assigned to the same variables change the loadcache method to reflect the same data that the user will supply as it makes it easier to work with. Did a hard compare for filter to stop any falsy value returns ( like shard id 0 being wrongly filtered because the number 0 is falsy )

If session resume checks fail the client will try to identify and not resume. If they are identifying they will receive new data so we don't need to restore old data

I kept getting 4003 Not authenticated errors from the gateway meaning we were sending packets before identifying. There must be some WS connection in our load cache method so now we wait until after the session has resumed before loading cache

…ists Fixes timotejroiko#41 (comment)

First loops through all shards and requeues them if they need to identify

Checks if the session is required to reidentify. Added a few extra debug logs

Moved cache patching to WebSocketManager and added timeout of 15s to remove the loadCache event. Now only attaches the event if a session ID is present to stop the event being attached on an identify connection attempt I don't know the best way to remove the event listener in the timeout so let me know how you would do it.

…ight into cache-dump

Swapped Channel around and added optional chaining to nullable properties

timotejroiko · 2021-08-06T18:26:39Z

this essentially needs to be started from scratch now, if there is still interest. However, djs maintainers also stated a certain interest in making something like this a thing in djs v14 or djs-next, so im not sure if its worth working on this.

Ortovoxx commented Apr 1, 2021

View reviewed changes

timotejroiko reviewed Apr 2, 2021

View reviewed changes

client.js Show resolved Hide resolved

init.js Outdated Show resolved Hide resolved

init.js Outdated Show resolved Hide resolved

Ortovoxx added a commit to Ortovoxx/discord.js-light that referenced this pull request Apr 3, 2021

Checks the close sequence of the last shard and doesn't wait if it ex…

76f7622

…ists Fixes timotejroiko#41 (comment)

timotejroiko reviewed Apr 3, 2021

View reviewed changes

client.js Outdated Show resolved Hide resolved

timotejroiko reviewed Apr 3, 2021

View reviewed changes

init.js Outdated Show resolved Hide resolved

timotejroiko reviewed Apr 3, 2021

View reviewed changes

init.js Show resolved Hide resolved

Ortovoxx commented Apr 3, 2021

View reviewed changes

Ortovoxx added 4 commits April 3, 2021 15:55

Client now properly emits ready event on hot reload

ed2b4bb

Client will emit the standard ready event once all of its shards are ready and connected.

Added directory property and added all session cached to folder which…

0952695

… is gitignored

Creates folder if it does not exist

3e1bde3

Added hot reloading explanation to README

fe0b745

Ortovoxx and others added 29 commits April 3, 2021 16:13

prevent exit events from bleeding into each other

0d6ef2d

store caches

996c5d6

unpatch channels in the guild object

46fb575

Small syntax fixes

f233e03

Fixed some copy paste fails assigning an object to a value with : as well as a spelling error and creating folders for each of the caches

rework cache loading and storing

c5c3dbc

load cache on identify

52efd4c

fix conflict

984a73b

Small fixes

7ea9e94

Removed obj variable as it JSON.stringify() can be called directly inside of write file

readme

36236e7

Updated cache and session types. Maybe remove from readme

0d7bd76

Small fixes and checks

c6b21de

Fixed _loadSession + now returns data in line with user supplied data

0ce72a1

Spend ages trying to figure out why shard 0 wasn't being treated as truthy... Of course JS treats the number 0 falsy xD

Identify without loading cache if creating a new session

c966058

If session resume checks fail the client will try to identify and not resume. If they are identifying they will receive new data so we don't need to restore old data

Update init.js

76c52aa

Moved loading of cache into resume event

74868b9

I kept getting 4003 Not authenticated errors from the gateway meaning we were sending packets before identifying. There must be some WS connection in our load cache method so now we wait until after the session has resumed before loading cache

Temporary fix until we load client user

f9cf35b

Checks the close sequence of the last shard and doesn't wait if it ex…

8189e36

…ists Fixes timotejroiko#41 (comment)

Moved session data to createShards

28a3e19

First loops through all shards and requeues them if they need to identify

Stores shardCount in session object

9f7df3b

Checks if the session is required to reidentify. Added a few extra debug logs

shards.size -> totalShards

46d3d2e

Event listener removed

f738d16

Cleanup

0627789

Merge branch 'cache-dump' of https://github.com/Ortovoxx/discord.js-l…

f37f68d

…ight into cache-dump

Fixes from rebase

b4ee875

Swapped Channel around and added optional chaining to nullable properties

Merge branch 'master' into cache-dump

5dafdcd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache dump to disk #41

Cache dump to disk #41

Ortovoxx commented Mar 12, 2021

timotejroiko commented Mar 20, 2021

Ortovoxx commented Mar 21, 2021

timotejroiko commented Mar 21, 2021 •

edited

Loading

Ortovoxx commented Mar 21, 2021

timotejroiko commented Mar 21, 2021 •

edited

Loading

Ortovoxx commented Mar 21, 2021 •

edited

Loading

timotejroiko commented Mar 21, 2021 •

edited

Loading

Ortovoxx commented Mar 21, 2021

Ortovoxx commented Mar 21, 2021

Ortovoxx commented Mar 22, 2021 •

edited

Loading

Ortovoxx left a comment •

edited

Loading

timotejroiko commented Apr 1, 2021

timotejroiko commented Apr 1, 2021

timotejroiko commented Apr 2, 2021

Ortovoxx commented Apr 2, 2021 •

edited

Loading

Ortovoxx commented Apr 3, 2021

timotejroiko Apr 3, 2021 •

edited

Loading

timotejroiko commented Apr 3, 2021

timotejroiko commented Apr 3, 2021

Ortovoxx commented Apr 3, 2021

Ortovoxx Apr 3, 2021

Ortovoxx Apr 3, 2021

timotejroiko Apr 3, 2021

timotejroiko commented Aug 6, 2021

Cache dump to disk #41

Are you sure you want to change the base?

Cache dump to disk #41

Conversation

Ortovoxx commented Mar 12, 2021

timotejroiko commented Mar 20, 2021

Ortovoxx commented Mar 21, 2021

timotejroiko commented Mar 21, 2021 • edited Loading

Ortovoxx commented Mar 21, 2021

timotejroiko commented Mar 21, 2021 • edited Loading

Ortovoxx commented Mar 21, 2021 • edited Loading

timotejroiko commented Mar 21, 2021 • edited Loading

Ortovoxx commented Mar 21, 2021

Ortovoxx commented Mar 21, 2021

Ortovoxx commented Mar 22, 2021 • edited Loading

Ortovoxx left a comment • edited Loading

Choose a reason for hiding this comment

timotejroiko commented Apr 1, 2021

timotejroiko commented Apr 1, 2021

timotejroiko commented Apr 2, 2021

Ortovoxx commented Apr 2, 2021 • edited Loading

Ortovoxx commented Apr 3, 2021

timotejroiko Apr 3, 2021 • edited Loading

Choose a reason for hiding this comment

timotejroiko commented Apr 3, 2021

timotejroiko commented Apr 3, 2021

Ortovoxx commented Apr 3, 2021

Ortovoxx Apr 3, 2021

Choose a reason for hiding this comment

Ortovoxx Apr 3, 2021

Choose a reason for hiding this comment

timotejroiko Apr 3, 2021

Choose a reason for hiding this comment

timotejroiko commented Aug 6, 2021

timotejroiko commented Mar 21, 2021 •

edited

Loading

timotejroiko commented Mar 21, 2021 •

edited

Loading

Ortovoxx commented Mar 21, 2021 •

edited

Loading

timotejroiko commented Mar 21, 2021 •

edited

Loading

Ortovoxx commented Mar 22, 2021 •

edited

Loading

Ortovoxx left a comment •

edited

Loading

Ortovoxx commented Apr 2, 2021 •

edited

Loading

timotejroiko Apr 3, 2021 •

edited

Loading