Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache dump to disk #41

Open
wants to merge 122 commits into
base: master
Choose a base branch
from
Open

Cache dump to disk #41

wants to merge 122 commits into from

Conversation

Ortovoxx
Copy link
Contributor

Currently creates a .sessions folder with just sessions.json and guild.json to then be restored at a later date.

Fails when constructing roles.

Added a cache options array for client options but might want to take out and just have it auto dump to disk with the hot reloading option

@timotejroiko
Copy link
Owner

i havent forgotten about this yet, im working on some other projects. I'll revisit this soon

@Ortovoxx
Copy link
Contributor Author

Need to add the _unpatch method to the bot user and take a look at client options etc to do some cleaning up

@timotejroiko
Copy link
Owner

timotejroiko commented Mar 21, 2021

Nice job with the unpatch.

Here are my thoughts about it:
There is enough cache control as is, there is no need for further cache control when rehydrating, lets just dump everything that is currently cached and restore everything. Its the only way to make sure no weird things happen and its also the closest thing to a true session reload, zero data loss.

With that said, the unpatch method should be moved to the client, and should loop over users, channels and guilds.
From each user, get the raw data according to the api, for each channel to the same, and for each guild do the same except for guild channels.
Guild channels should be rehydrated from the channel cache as a reference since they are already linked by reference anyway.

As for the exit event, i dont feel there is a need for that. if the user wants custom cache dumping behavior they should add their own event listeners to exit, sigint, etc... and use the newly exposed client.unpatch method. We only need a single function to dump all, and a single client option to specify the source to rehydrate all (can be file path or object).

All related client options can then be moved into a single object, something like this:

Client({
    hotreload: {
        enabled: true,
        sessionData: { // defaults to ".sessions/sessions.json", can also be passed an object like this
            0: { id: "wef23bf", sequence: 4234 },
            1: { id: "wef23bf", sequence: 4234 }
        },
        unpatchOnExit: true, // if false, users can use client.unpatch in their own exit listeners
        // onUnpatch: cache => {} // alternatively replace the above with something like this
        patchSource: "./cache.json" // defaults to ".sessions/cache.json", can also be passed a cache object for people who use redis etc...
    }
})

We have two possibilities here, we could keep sessions and cache separate, each with its own dump method and restore option, or simplify it even further and put everything together.

Another issue is getting all of this to be compatible with the sharding manager, an example being that processes cannot write concurrently to the same file, but this can be dealt with later on.

Generally the direction i want to follow is to try and make this as simple as possible, nothing too complex that would change discord.js too much. Any super-advanced feature would be better suited for a future standalone library.

@Ortovoxx
Copy link
Contributor Author

Simplifying it down is a good idea and it's better to use the client options already presented when dumping as you say it saves weird things from happening and keeps it simple.

Although I think that there should still be a private unpatch method for channels, guilds and members which will return that objects raw discord api data and then in the client there should be a public unpatch method which loops over all of the objects we want to cache and calls their private unpatch method.

Hydrating like that sounds like a good solution though.

Yes I agree with the exit events it's better to leave the customisation up to the user I see you added a client option in for turning our native disk caching off which is good. Using the client.unpatch to get all the cache is a better solution as it allows for more extension by users.

In terms of how the cache should be stored it's tricky. I think that sessions should be stored separate to the cache as it allows for potential swapping of caches and sessions across machines etc. The simplicity of this will not matter as much as the user will have limited exposure to the .sessions folder.

Potentially we create a guilds, channels and users subfolder (or file prefix) and store their data in a file mapped by their IDs. Ie we have .sessions/guilds/581072557512458241.json with all the data about that guild. I don't know about the efficiency of writing all of these files to disk like that but is an alternative way of storing them instead of having 1 massive JSON file. Might work better if people want to restore cache across shards when resharding? Might also solve some problems with the sharding manager later down the line as they won't have to write to the same file

I agree that we should wait until we have internal sharding working and then look at the sharding manager and make changes to work with that.

@timotejroiko
Copy link
Owner

timotejroiko commented Mar 21, 2021

yes, private unpatch for channel user and guild is good.

we can try going with the one file per id and see how it will works out. performance will likely be atrocious for very large bots, but at that point the user should switch to a better backend anyway so it should be fine.

lets try something like this for the client options:

Client({
    hotreload: {
        enabled: true,
        sessionData: { // if not provided, load sessions from .sessions/sessions/[id].json
            0: { id: "wef23bf", sequence: 4234 },
            1: { id: "wef23bf", sequence: 4234 }
        },
        cacheData: { // if not provided, load data from .sessions/guilds/[id].json etc...
            guilds: { ... },
            channels: { ... },
            users: { ... }
        },
        onUnload: sessions => { // if not provided, store sessions in json files
            // else do something with sessions
        },
        onUnpatch: data => { // if not provided, store data in json files
            // else do something with data
        }
    }
})

it kinda goes against what i said before about the exit event, but maybe it will turn out to be simpler like this after all.

we dont need to support file paths since if the user wants a custom file, they might as well just load it themselves then just give us the object

@Ortovoxx
Copy link
Contributor Author

Ortovoxx commented Mar 21, 2021

Okay that all looks good! Although I will remove the enabled property as users can disable it by setting hotreload to false and enable it by setting it to an object with all of their settings.

Maybe better to combine the dumping of sessions and data into 1 function?

I'll start implementing some features over the next week or so

@timotejroiko
Copy link
Owner

timotejroiko commented Mar 21, 2021

could be, but then there would be no way of turning them off individually, a single function would shut down the json files option for both, which might not be a big deal tbh, maybe it will be good to keep it all-or-nothing like everything else in the lib

@Ortovoxx
Copy link
Contributor Author

True, thinking about it there doesn't seem to be a use case where someone would want to use one but not the other. We'll stick with the 1 for now and can split them into two separate functions if the need arises later

@Ortovoxx
Copy link
Contributor Author

Just added some helper methods and general scaffolding. Need to now add in the storing of cache on exit and the patching of cache back in on startup. Will probably have a work on it next weekend

this._patchCache() Needs to take in { guilds, channels, users } and rehydrate the caches. The cache supplied is either the user supplied cache or the cache from disk if the user supplies none. Shouldn't error if no cache supplied.

this.dumpCache() Should take a snapshot of all the cache and return it in discord api form( but not do anything with it hence why it's public for people to add their own cache storage solution ). Need to go through channel, guild and user to create a _unpatch() method for each which will return their data in discord api form. Just loop through everything in cache and call that method for each.

this.onUnload() Should call dumpCache and store all of that to disk. Users can override this function with their own cache storage mathods

@Ortovoxx
Copy link
Contributor Author

Ortovoxx commented Mar 22, 2021

Going to need your help as to were to place the _unpatch() methods on the user and channels as well as writing the rehydration part as I don't think I have enough knowledge in how all discord.js types and classes interact.

Just the _unpatch() methods and cache rehydration to go.

Copy link
Contributor Author

@Ortovoxx Ortovoxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're storing each object as a JSON file with the objects ID as its file name. We could easily rehydrate guilds to shards with the sharding formula. Channels and users could either be prefixed with the shard id 0_267292139208048641 2_804030576759144468 etc or potentially have another file which "points" each of the shard ids to the different caches.

Ie a JSON file like this: (or have 1 JSON file per shard to prevent 2 shards writing to same file but premise is the same)

{
    "0": {
        "guilds": [267292139208048641,267292139208048641],
        "channels": [267292139208048641,267292139208048641],
        "users": [267292139208048641,267292139208048641],
    },
    "1": {
        "guilds": [267292139208048641,267292139208048641],
        "channels": [267292139208048641,267292139208048641],
        "users": [267292139208048641,267292139208048641],
    }
}

@timotejroiko
Copy link
Owner

we can use the sharding formula for guilds yes, good idea.
channels can be obtained from their guildID, falling back to shard 0 if they dont have one.
users should be tied to members, as members are loaded, so will their users. in fact maybe we dont need a user cache at all, i'll look into it

@timotejroiko
Copy link
Owner

the cache storing should be fine like this for now, we dont actually need DM users nor DM channels.
GuildMembers already include the user data, and we can include the channel data from guilds as well.

Now we need to move the loading of the cache to the websocket manager, so the loading only happens after we have a list of shard IDs. From there we either pick matching data from the user-supplied cache object, or we load the matching data from the json files.

@timotejroiko
Copy link
Owner

should be mostly ready for testing, let me know if theres anything i missed

client.js Show resolved Hide resolved
init.js Outdated Show resolved Hide resolved
init.js Outdated Show resolved Hide resolved
@Ortovoxx
Copy link
Contributor Author

Ortovoxx commented Apr 2, 2021

There is a slight problem. Because we only fetch the web socket shard session data from disk inside identify() the web socket manager is currently queueing up shards like it normally would and is not fast connecting. The only way to fix this would be to know how many shards that manager is expected to handle at start up and whether we can correctly resume for them meaning we would have to read all the data from disk at start up.

I think that we definitely want to keep a way to fast connect as if you have lots of shards you would be waiting unnecessary time.

Fixed with 76f7622

Ortovoxx added a commit to Ortovoxx/discord.js-light that referenced this pull request Apr 3, 2021
@Ortovoxx
Copy link
Contributor Author

Ortovoxx commented Apr 3, 2021

Fixed a bunch of the minor issues and tested it. Seems to work pretty smoothly so far!

We need to consider what we do when a client increases their shard size and still provides session data as currently if 2 shard sessions are stored and then the user increases their shards to 3 and loads those sessions which resumes 2 and identifies for 1 more. This is obviously not the desired behaviour as we now have 2 sessions handling all guilds and then 1 more session handling 1/3 of guilds.

Potentially we can just force identify all shards if a different shard count is present when compared to the number of stored shards. However this presents a problem for when a user downshards ( as there will be redundant shards stored on disk ) or if a user simply wants the behavior mentioned above ( for a user potentially wanting to increase the sessions handled by a client )

We can solve this by adding a hotReload.forceIdentify option but the same effect can be achieved by not providing any data ( but would mean guild cache is not loaded )

Lots of the problems that arise from this are edge cases I understand but the first issue is definitely not and most users would encounter a problem here. I will leave it up to you to decide how you with to handle it.

@@ -287,8 +287,8 @@ Discord.Structures.extend("Guild", G => {
this.members.add(member);
}
}
if(!this.members.cache.has(this.client.user.id)) {
this.members.fetch(this.client.user.id).catch(() => {});
if(!this.members.cache.has(this.client.user?.id)) {
Copy link
Owner

@timotejroiko timotejroiko Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a problem: optional chaining short circuits to undefined so if client.user is falsey, then its doing .has(undefined) which will count as user not cached and attempt to fetch(undefined).

There shouldnt ever be a situation where a guild is loaded without the client having logged in, especially now that you moved the cache loading to the resume event.

@timotejroiko
Copy link
Owner

There is a slight problem. Because we only fetch the web socket shard session data from disk inside identify() the web socket manager is currently queueing up shards like it normally would and is not fast connecting. The only way to fix this would be to know how many shards that manager is expected to handle at start up and whether we can correctly resume for them meaning we would have to read all the data from disk at start up.

I think that we definitely want to keep a way to fast connect as if you have lots of shards you would be waiting unnecessary time.

Fixed with 76f7622

I have a suggestion:

Move the session loading to the manager. We need the session to check if we should resume or reidentify, we already have a shard id in createShards(), so we take that id and load the session file there if revelant.

@timotejroiko
Copy link
Owner

Fixed a bunch of the minor issues and tested it. Seems to work pretty smoothly so far!

We need to consider what we do when a client increases their shard size and still provides session data as currently if 2 shard sessions are stored and then the user increases their shards to 3 and loads those sessions which resumes 2 and identifies for 1 more. This is obviously not the desired behaviour as we now have 2 sessions handling all guilds and then 1 more session handling 1/3 of guilds.

Potentially we can just force identify all shards if a different shard count is present when compared to the number of stored shards. However this presents a problem for when a user downshards ( as there will be redundant shards stored on disk ) or if a user simply wants the behavior mentioned above ( for a user potentially wanting to increase the sessions handled by a client )

We can solve this by adding a hotReload.forceIdentify option but the same effect can be achieved by not providing any data ( but would mean guild cache is not loaded )

Lots of the problems that arise from this are edge cases I understand but the first issue is definitely not and most users would encounter a problem here. I will leave it up to you to decide how you with to handle it.

We can store the total shards in each session object as well since the session ID literally depends on it. IF the shard file exists but the total shards in the file is different than the total shards in the client, then reidentify instead of resume

@Ortovoxx
Copy link
Contributor Author

Ortovoxx commented Apr 3, 2021

These are both better ideas. Just added them along with some extra debug logs. Works like a charm

#41 (comment) Resolved by 7280818

#41 (comment) Resolved by a160704

client.js Outdated Show resolved Hide resolved
init.js Outdated Show resolved Hide resolved
init.js Show resolved Hide resolved
init.js Outdated
shard.loadCacheTimeout = this.client.setTimeout(() => {
this.debug("Shard cache was never loaded as the session didn't resume in 15s", shard);
shard.loadCacheTimeout = null;
shard.removeEventListener(Constants.ShardEvents.RESUMED) // Remove the event in a better way?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preferred way to remove the event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

818d0e9 Fixes errors and it seems to work. Double check to make sure it is the correct way to do things

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we join the session loading and cache loading then we dont need to remove the event, we add it only if there is a session load

Ortovoxx and others added 29 commits April 3, 2021 16:13
Promises are supported inside of uncaughtException, SIGINT and SIGTERM but not exit. Most restarts call SIGINT or SIGTERM before they exit.

I don't know why you removed the check for an infinite loop of uncaught exceptions? If there is an uncaught exception inside of the user defined function or our function it will loop forever. I added a try catch block around the user function so it won't happen for user functions but if you don't want a check for it we need to be 100% sure our code won't cause a loop
Fixed some copy paste fails assigning an object to a value with : as well as a spelling error and creating folders for each of the caches
Removed obj variable as it JSON.stringify() can be called directly inside of write file
…identifies

Sessions timeout after about 1-2 minutes so we should check for when the session was last connected and if it is more than a minute old ( therefore probably stale ) we shouldn't try and reconnect as the identify will probably fail.
Spend ages trying to figure out why shard 0 wasn't being treated as truthy... Of course JS treats the number 0 falsy xD
Different data types were being assigned to the same variables change the loadcache method to reflect the same data that the user will supply as it makes it easier to work with. Did a hard compare for filter to stop any falsy value returns ( like shard id 0 being wrongly filtered because the number 0 is falsy )
If session resume checks fail the client will try to identify and not resume. If they are identifying they will receive new data so we don't need to restore old data
I kept getting 4003 Not authenticated errors from the gateway meaning we were sending packets before identifying. There must be some WS connection in our load cache method so now we wait until after the session has resumed before loading cache
First loops through all shards and requeues them if they need to identify
Checks if the session is required to reidentify.
Added a few extra debug logs
Moved cache patching to WebSocketManager and added timeout of 15s to remove the loadCache event. Now only attaches the event if a session ID is present to stop the event being attached on an identify connection attempt

I don't know the best way to remove the event listener in the timeout so let me know how you would do it.
Swapped Channel around and added optional chaining to nullable properties
@timotejroiko
Copy link
Owner

this essentially needs to be started from scratch now, if there is still interest. However, djs maintainers also stated a certain interest in making something like this a thing in djs v14 or djs-next, so im not sure if its worth working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants