Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change NIP-01 to only allow a single filter per REQ #1645

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fiatjaf
Copy link
Member

@fiatjaf fiatjaf commented Dec 13, 2024

In practice the existence of the possibility of multiple filters in a single REQ has no benefit and only serves to hurt performance and increase complexity of codebases.

Anything that can be done with a single REQ and multiple filters can much more easily be done with multiple REQs each with a single filter.

I know this is a breaking change, but not a very drastic one. We can easily transition to this by having clients stop being built with this "feature" in mind, and relays can follow later. If a client sends a REQ with more than one filter to a relay that only supports one that's also ok, the client will just get less results than it expected.

@mikedilger
Copy link
Contributor

I do use this multiple-filter thing, but not in many cases. I'm not sure the benefit of this change outweighs the cost of breaking things. And I'm a bit afraid relays will not like even more subscriptions per connection. Many relays have unwisely low caps on how many subscriptions you can have open at once.

@alexgleason
Copy link
Member

I wish you did this 2 years ago. 😂 Then I would have supported it enthusiastically. Now I have mixed feelings.

@arthurfranca
Copy link
Contributor

What's wrong with that feature is that the limit is individual for each filter. That's why it's not really an OR query but equivalent to multiple REQs, i.e. events end up not being sorted together.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

Many relays have unwisely low caps on how many subscriptions you can have open at once.

This is exactly the point. Relays have these stupid limits and the only reason people use multiple filters per REQ is to get around those limits. It's an unnecessary complication on all both sides and also unfair because today clients that incur in unnecessary complication gain access to higher limits on these relays, meanwhile the entire thing just makes no sense on the relay side, it hurts performance for relays to have to check duplicates and reorder the results of multiple queries together when 100% of the times what clients really want are separate queries.

I'm pretty sure relays will increase their default limits if we pass this. Arguably they would have done it already and limit by filter, not by REQ, already, but haven't done because that was too hard.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

I wish you did this 2 years ago. 😂 Then I would have supported it enthusiastically. Now I have mixed feelings.

Are those mixed feelings stemming from the fact that you have now already spent a lot of programming time implementing convoluted batch-then-segregate-filters logic on the client side?

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 13, 2024

I am not sure if this solves anything.

Keep in mind that having multiple filters within a single subscription does address some problems inherent to Nostr's data model. For instance, to correctly handle EOSE when monitoring multiple event IDs and their associated author information simultaneously (e.g., for author metadata, replies, reactions, etc.), such as in a thread view, each event ID or pubkey-based filter might require its own since and until properties. Since all of these filters relate to the same screen or UI component, it makes sense for them to be part of the same subscription.

If this approach is merged, each of these filters would become an individual subscription, which might consume more resources.

On Amethyst, we found that it is best to adhere to the lowest common limitations, which generally is:

  • The number of different kinds in a single filter must be ≤10
  • The number of filters must be ≤10.
  • The number of subscriptions must be ≤12.
  • A single REQ cannot exceed 65,535 characters, which some follow lists or thread loading operations could easily surpass.
  • Use only one connection per IP.
  • Rate limits about 200ms. Filters can be rotated 5 times per second.

Because we frequently hit these limits—particularly the character limit for REQ—the app often splits and rotates filters very quickly. This rotation, in turn, hits rate limits.

When this happens and Tor is enabled (which is by default), I mitigate the issue by creating two connections to the same relay, one over the clearnet and another over Tor, and splitting the filters between the two. This effectively doubles all limits. It's a hack but it works.

If limits are the issue you are trying to solve, we can work on that. The current PR just shifts things around from filters to subs.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

each event ID or pubkey-based filter might require its own since and until properties

Because they are completely independent, and mashing them up together in the same REQ just leads to confusion.

each of these filters would become an individual subscription

They already are, in a way. Making that explicit is good. I can't see any performance downsides, only upsides.

If limits are the issue you are trying to solve

Not only. I am mostly worried about the perspective of relays handling custom queries in a way that isn't just the dumb "give the user everything he wants" and dealing with custom CLOSED messages. If each filter is its own subscription it becomes straightforward for both the relay and the client to agree on what is allowed and what isn't, what requires AUTH and what doesn't and so on.

The simplest example, today, is NIP-17 recommending that relays reject REQs for messages not addressed to the authenticated user. This is quite simple and straightforward -- but if the client decides to send two filters in the same REQ (one for messages and another for profile metadata, for example; or, who knows) then it becomes an unsolvable quagmire.

The same problem also showed up recently on @staab's implementation of NIP-29 conflicting with my relay assumptions: fiatjaf/relay29#13, and for anyone who expects Nostr relay ecosystem to grow in diversity, specifically relay diversity (like I do and expect and hope) then issues like this will continue to show up.

Maybe in these cases the solution could be just to recommend single filter REQs unless you are pretty sure multiple filters are ok, but then we hit the problem of libraries that implement filter batching by default, which feels wrong to me but it's something the current NIP-01 text encourages; and also the problem of the inconsistency and misguided harshness of filter limits as mentioned above -- so I think it's better to phase out multi-filter slowly if we can agree.

@vitorpamplona
Copy link
Collaborator

I think this "relay diversity" issue goes beyond single filters.

NIP-17 use case

Multi-account clients with NIP-17 downloads require multiple connections to AUTH with a different user. Many relays don't accept multiple connections per IP, so the app has to either rotate (with wasteful since/until +/- 48 hours) or subscribe using Tor with multiple exit nodes, one for each account.

if the client decides to send two filters in the same REQ (one for messages and another for profile metadata, for example; or, who knows) then it becomes an unsolvable quagmire

I do not think this is unsolvable. It is hard, but matching incoming filters, no matter how complex they are, with the authorization the logged-in user has should be solvable. Modifying the filter is hard, but running the filter and then applying a second filter on top of the results to filter what the user can see from the resulting event set is a lot easier.

@nostr-wine had many of these issues last year when building inbox.nostr.wine. Maybe we can learn from those.

batching by default, etc.

I can't see a future where nostr libraries will not rearrange all these filters/subs by default. There is just no way App devs can deal with the complexity of building and splitting filters/subscriptions/connections. We will need to provide higher-level tools and find great ways to convert them back into what each relay operator wants their Clients to do.

If we want to give relays more power to diversify their responses while letting Clients know how to deal with those customizations, we need a new abstraction layer.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 13, 2024

@fiatjaf do you think custom relays should still be called "relays"? As you said, optionality sucks. There should be a pitch somewhere to separate the relay network as we know today, with a simple, well-defined behavior, from new networks of custom relays. Clients could then declare compliance with the main network and/or other networks/schemes.

We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

Multi-account clients with NIP-17 downloads require multiple connections to AUTH with a different user. Many relays don't accept multiple connections per IP

This is a bug on the relay though. A relay that deals with AUTH should accept more than one connection per IP as long as these connections are authenticated. The best course of action here is to talk to the relay maintainer.

I do not think this is unsolvable. It is hard, but matching incoming filters, no matter how complex they are, with the authorization the logged-in user has should be solvable. Modifying the filter is hard, but running the filter and then applying a second filter on top of the results to filter what the user can see from the resulting event set is a lot easier.

This can be done, of course, but my point is that it becomes unnecessarily confusing. As the relay you can't just return a simple CLOSED notifying the client of what are your expectations, you have to do an inefficient filtering operation and return just a subset of results -- and for what? The user/client won't get what they expected, they will think there is a bug on their side or on the client side because events are not showing, or they will not perform authentication as expected because the relay didn't return the expected CLOSED / auth-required message -- and so on.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 13, 2024

@fiatjaf do you think custom relays should still be called "relays"? As you said, optionality sucks. There should be a pitch somewhere to separate the relay network as we know today, with a simple, well-defined behavior, from new networks of custom relays. Clients could then declare compliance with the main network and/or other networks/schemes.

It's an interesting idea, although I can't imagine it right now. I think the current idioms we have are enough to power a thousand of "custom relays" already, so if make these things separate entities we risk splitting the network too much into different API interfaces or whatever. But I'm not opposed to it, I want to see what you have in mind.

We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules.

Well, I wasn't expecting all clients to do it, I was expecting clients to hardcode some behaviors specifically tailored to some relays (i.e. DM apps hardcode behavior to NIP-17 relays, group apps to hardcode behavior to NIP-29 relays etc) -- so yeah maybe having more specific "relay APIs" (or whatever you wanna call them) for each use case could be interesting, as long as they are standardized.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Dec 13, 2024

you have to do an inefficient filtering operation and return just a subset of results -- and for what? The user/client won't get what they expected

Don't assume users are not getting what they are expecting, getting mad, etc... The current thinking is the opposite. Relays are always removing/hiding events. Users/Client almost never see what they expect to see. We are already used to have to work with whatever we get from each relay. To me, relays SHOULD take any filter but only return what the user can see, with whatever rule they want to apply. Only CLOSE when there is nothing to return/monitor.

@staab
Copy link
Member

staab commented Dec 13, 2024

I agree that it would probably only result in performance improvements, since in every relay implementation I've seen the relay just iterates through filters and sends the results anyway, keeping the subscription open slightly longer than if each filter had its own. Also, with batching, subscriptions are held open until the last subscription finishes. Breaking into separate subscriptions would result in better concurrency at the very slight cost of more messages being sent back and forth.

Unfortunately, the core architecture of most sophisticated libraries (at least welshman and NDK, and it sounds like Alex has a similar thing going on) is based around splitting/merging, so a lot of work would have to go into taking that out. On the other hand, it would probably make things way simpler. Right now my request flow looks something like this:

  • Make a request
  • Wait 800ms (or so) for other requests to come in
  • Group subscriptions by timeout, auth timeout, and whether to close on eose
  • Union filters for each relay and send to each relay independently
  • Keep track of how many open subscriptions there are and wait for others to close before sending the next one
  • When an event comes in, match it against the caller subscription's filters and pass it along

I've spent an incredible amount of time developing (and troubleshooting) this logic. Sunk cost notwithstanding, it would probably be better for everyone if we moved away from multiple filters per request. It would simplify the process to:

  • Make a request
  • Send to the relay immediately
  • Keep track of how many open subscriptions there are and wait for others to close before sending the next one
  • When an event comes in, pass it along

This would remove the need for the batching timeout, and plenty of performance-intensive and error-prone filter matching/merging.

I originally started batching at @pablof7z's suggestion, specifically to deal with relay.nostr.band's very low concurrent subscription limits (8 subs at a time as I recall). So yeah, batching is almost entirely a workaround for stingy relay limits, although I think there are a few cases in which batching makes sense, like when fetching reactions/replies/zaps for a parent note. Instead of injecting that logic into the data loading code, you can just put it in the leaf component, which simplifies things a lot. Of course, it would not be hard to write batching logic for that case which only results in a single filter, so maybe that's a moot point.

Another good point fiatjaf made in fiatjaf/relay29#13:

If we are to allow all filters even those that we disallow just because they might be grouped together with other filters that might be allowed that means the CLOSED becomes useless

CLOSED runs into the same problem that custom relay policies do, and it's baked into the protocol.

Another thing:

We shouldn't expect every client to be able to deal with ALL of these custom relays and their rules.

I've been saying for a long time that relays are defined by an interface, and differ by policy. Graceful degradation and progressive enhancement through feature support signaling are the way to handle this. NIP 29 (despite the rigidity of the relay29 implementation in that it rejects any non-h-tagged event) only breaks the relay interface in this one place (which is why I had to address the relay implementation rather than work around it), despite its policy being wildly different from most relays on the network. I think this is great.

TL;DR I support clients moving away from multiple filters, although I'd like to see how well that works in practice before really committing to the switch. If this really doesn't cause any problems for clients, relays should watch for multi-filter requests to stop coming in before dropping support.

@alexgleason
Copy link
Member

@fiatjaf Are those mixed feelings stemming from the fact that you have now already spent a lot of programming time implementing convoluted batch-then-segregate-filters logic on the client side?

Yes. I am also depending on it now too. But after looking at my code, it's less than I thought. So, I think it wouldn't be so bad after all.

@staab every relay implementation I've seen the relay just iterates through filters and sends the results anyway

Why would we accept such shoddy implementations? Of course I generate one SQL query for combined filters. https://gitlab.com/soapbox-pub/nostrify/-/blob/main/packages/db/NPostgres.ts?ref_type=heads#L265

@mikedilger
Copy link
Contributor

Unfortunately, the core architecture of most sophisticated libraries (at least welshman and NDK, and it sounds like Alex has a similar thing going on) is based around splitting/merging,

Gossip must be pretty dumb by comparison then. I just ask for what I want, when I want it, and if the relays complain I consider it a relay problem. I don't split or merge. I don't try to limit kinds, filters, or subscriptions to small numbers. I don't rate limit other than timing out for 15 seconds if the relay was rude. Then again, I don't think any of my REQs are unreasonable. By far the most common relay complaint is that I didn't AUTH and it insists that I AUTH just to do a REQ.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 14, 2024

I generate one SQL query for combined filters.

But the SQL database underneath probably just iterates through all the filters independently.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 14, 2024

Gossip must be pretty dumb by comparison then.

This is good to hear because I have never done any of the batching people are doing and understood why people felt that was so necessary, but I was never sure because none of my clients are heavily used. Gossip works basically perfectly though, and the fact that it is so naïve reinforces my beliefs.

It's weird that if I started to hit relay limits so much that I felt the need to start complicating my client code with hacks in order to make basic things work I wouldn't think that was normal and proceed by piling hacks on top of hacks, I would think the protocol was broken and that it needed a serious reboot (because if it doesn't work today, damn, imagine if we had more than 16 users).

@mikedilger
Copy link
Contributor

FYI:

For most relays in most situations, gossip has only 4 subscriptions (FeedFuture, FeedChunk, Augments, and Metadata). Augments gets rewritten everytime you scroll but no faster than once every 5 seconds (that is probably too infrequent I'm gonna speed that up to 2 seconds). It might have multiple FeedChunks open if someone hits "load more" before the previous one was finished.

In coincidences, there may be more than 4 subscriptions: If that relay happens to also be a discovery relay, it may have Discover. If it happens to be your outbox, it will subscribe to your Config. And if it is your nip46 bunker it will subscribe to NIP46.

So in that worst case, that is only 7 subscriptions (with possibly more if you quickly spam the load more button). Every other filter set only gets subscribed if you navigate somewhere else (in which case some of the above ones close): InboxFeed, InboxFuture, DMChannel, Giftwraps, Search, RepliesToById, RepliesToByAddr and FollowersOf. These don't stack much at all, always less than the 7 worst case I already mentioned. This is all defined in gossip-lib/src/filter_set.rs (it used to be spread out but I cleaned it up recently).

Notably there are things gossip could be subscribing to that we choose not to. For example

  • In the feed views you do not see a number counting how many replies a note has. That would require another count subscription.

Copy link
Member

@alexgleason alexgleason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth it.

I always thought it was wrong that you can pass multiple filters in a single REQ. It makes the "limit" property make no sense, and harms your ability to do rate limiting and set timeouts on the relay side.

However, clients still need to request multiple filters sometimes. This responsibility is now shifted to the client. You could argue it always was. But still, clients now need to loop over the filters and possibly aggregate the results. This is actually more powerful for the client, so it probably should have always been a client responsibility.

Finally, it will simplify logs, and simplify queries. By removing this we actually get more predictable outcomes. This will make it easier to improve performance.

@bezysoftware
Copy link
Contributor

Would this change also propagate to COUNT?

@nostrband
Copy link
Collaborator

I like this.

But I don't see a great path to obsolete multi-filter-REQs. Asking for serious rewrites of critical hard-to-debug parts of big clients for the benefit of relays is a very one-sided deal. Maybe clients could ask for higher rate-limits first, and if relays agreed - clients wouldn't need the batching and the use of multi-filter-REQs would just fade away? Without higher rate limits this change won't fly anyway.

@fiatjaf
Copy link
Member Author

fiatjaf commented Dec 26, 2024

@nostrband you're right, I guess we have to start first with the limits.

If I got it right @hoytech has agreed to change strfry limits to be based on number of filters instead of subscriptions, so 1 subscription with 2 filters will be equivalent to 2 subscriptions with 1 filter each. I don't know when he plans to implement that, but once he does I believe he will also increase the default limit.

After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly.

@nostrband
Copy link
Collaborator

I don't know when he plans to implement that, but once he does I believe he will also increase the default limit.

After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly.

Ok this sounds promising!

@kehiy
Copy link
Contributor

kehiy commented Dec 26, 2024

Would this change also propagate to COUNT?

COUNT seems to be deprecated. nobody uses it.

@kehiy
Copy link
Contributor

kehiy commented Dec 26, 2024

@nostrband you're right, I guess we have to start first with the limits.

If I got it right @hoytech has agreed to change strfry limits to be based on number of filters instead of subscriptions, so 1 subscription with 2 filters will be equivalent to 2 subscriptions with 1 filter each. I don't know when he plans to implement that, but once he does I believe he will also increase the default limit.

After that I think we can nudge clients to start using REQs with a single filter, and after that we can deprecate multi-filter REQs softly.

@fiatjaf we can support it as well in our relay. i need to take a look at the code but it probably makes the process more lightweight. currently, we aggregate queries based on kind and send separate queries to Mongo DB (we still haven't found the best practice for that and even there is the possibility of moving to another database). this makes it more lightweight i think since we make fewer queries (most of the time) and each query will be searched in a smaller collection. not between all events. but i believe this change must still make it better. also, we need to provide this in the nip-11 document and remove the current max filters thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants