Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to get_random_community_id: add filters for nsfw and private, use algorthm that doesn't scan the entire table #5267

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
2 changes: 1 addition & 1 deletion crates/api/src/community/random.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pub async fn get_random_community(
let local_user = local_user_view.as_ref().map(|u| &u.local_user);

let random_community_id =
Community::get_random_community_id(&mut context.pool(), &data.type_).await?;
Community::get_random_community_id(&mut context.pool(), &data.type_, data.show_nsfw).await?;

let is_mod_or_admin = is_mod_or_admin_opt(
&mut context.pool(),
Expand Down
2 changes: 2 additions & 0 deletions crates/api_common/src/community.rs
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,8 @@ pub struct TransferCommunity {
pub struct GetRandomCommunity {
#[cfg_attr(feature = "full", ts(optional))]
pub type_: Option<ListingType>,
#[cfg_attr(feature = "full", ts(optional))]
pub show_nsfw: Option<bool>,
}

#[skip_serializing_none]
Expand Down
67 changes: 54 additions & 13 deletions crates/db_schema/src/impls/community.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,13 @@ use crate::{
utils::{
action_query,
find_action,
functions::{coalesce, lower, random},
functions::{coalesce, coalesce_2_nullable, lower, random_smallint},
get_conn,
now,
uplete,
DbPool,
},
CommunityVisibility,
ListingType,
SubscribedType,
};
Expand Down Expand Up @@ -209,23 +210,62 @@ impl Community {
pub async fn get_random_community_id(
pool: &mut DbPool<'_>,
type_: &Option<ListingType>,
show_nsfw: Option<bool>,
) -> Result<CommunityId, Error> {
let conn = &mut get_conn(pool).await?;

let mut query = community::table
.filter(not(community::deleted))
.filter(not(community::removed))
.into_boxed();
// This is based on the random page selection algorithm in MediaWiki. It assigns a random number
// X to each item. To pick a random one, it generates a random number Y and gets the item with
// the lowest X value where X >= Y.
//
// https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/specials/SpecialRandomPage.php;763c5f084101676ab1bc52862e1ffbd24585a365
//
// The difference is we also regenerate the item's assigned number when the item is picked.
// Without this, items would have permanent variations in the probability of being picked.
// Additionally, in each group of multiple items that are assigned the same random number (a
// more likely occurence with `smallint`), there would be only one item that ever gets
// picked.

let try_pick = || {
dessalines marked this conversation as resolved.
Show resolved Hide resolved
let mut query = community::table
.filter(not(
community::deleted
.or(community::removed)
.or(community::visibility.eq(CommunityVisibility::Private)),
))
.order(community::random_number.asc())
.select(community::id)
.into_boxed();

if let Some(ListingType::Local) = type_ {
query = query.filter(community::local);
}

if let Some(ListingType::Local) = type_ {
query = query.filter(community::local);
}
if !show_nsfw.unwrap_or(false) {
query = query.filter(not(community::nsfw));
}

query
};

query
.select(community::id)
.order(random())
.limit(1)
.first::<CommunityId>(conn)
diesel::update(community::table)
.filter(
community::id.nullable().eq(coalesce_2_nullable(
try_pick()
.filter(community::random_number.nullable().ge(
// Without `select` and `single_value`, this would call `random_smallint` separately
// for each row
select(random_smallint()).single_value(),
))
.single_value(),
// Wrap to the beginning if the generated number is higher than all
// `community::random_number` values, just like in the MediaWiki algorithm
try_pick().single_value(),
)),
)
.set(community::random_number.eq(random_smallint()))
.returning(community::id)
.get_result::<CommunityId>(conn)
.await
}
}
Expand Down Expand Up @@ -570,6 +610,7 @@ mod tests {
posting_restricted_to_mods: false,
instance_id: inserted_instance.id,
visibility: CommunityVisibility::Public,
random_number: inserted_community.random_number,
};

let community_follower_form = CommunityFollowerForm {
Expand Down
1 change: 1 addition & 0 deletions crates/db_schema/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,7 @@ diesel::table! {
visibility -> CommunityVisibility,
#[max_length = 150]
description -> Nullable<Varchar>,
random_number -> Int2,
}
}

Expand Down
2 changes: 2 additions & 0 deletions crates/db_schema/src/source/community.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ pub struct Community {
/// A shorter, one-line description of the site.
#[cfg_attr(feature = "full", ts(optional))]
pub description: Option<String>,
#[serde(skip)]
pub random_number: i16,
}

#[derive(Debug, Clone, derive_new::new)]
Expand Down
4 changes: 4 additions & 0 deletions crates/db_schema/src/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -545,13 +545,17 @@ pub mod functions {

define_sql_function!(fn random() -> Text);

define_sql_function!(fn random_smallint() -> SmallInt);

// really this function is variadic, this just adds the two-argument version
define_sql_function!(fn coalesce<T: diesel::sql_types::SqlType + diesel::sql_types::SingleValue>(x: diesel::sql_types::Nullable<T>, y: T) -> T);

define_sql_function! {
#[aggregate]
fn json_agg<T: diesel::sql_types::SqlType + diesel::sql_types::SingleValue>(obj: T) -> Json
}

define_sql_function!(#[sql_name = "coalesce"] fn coalesce_2_nullable<T: diesel::sql_types::SqlType + diesel::sql_types::SingleValue>(x: diesel::sql_types::Nullable<T>, y: diesel::sql_types::Nullable<T>) -> diesel::sql_types::Nullable<T>);
}

pub const DELETED_REPLACEMENT_TEXT: &str = "*Permanently Deleted*";
Expand Down
1 change: 1 addition & 0 deletions crates/db_views/src/comment_view.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,7 @@ mod tests {
moderators_url: data.inserted_community.moderators_url.clone(),
featured_url: data.inserted_community.featured_url.clone(),
visibility: CommunityVisibility::Public,
random_number: data.inserted_community.random_number,
},
counts: CommentAggregates {
comment_id: data.inserted_comment_0.id,
Expand Down
1 change: 1 addition & 0 deletions crates/db_views/src/post_view.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1873,6 +1873,7 @@ mod tests {
moderators_url: inserted_community.moderators_url.clone(),
featured_url: inserted_community.featured_url.clone(),
visibility: CommunityVisibility::Public,
random_number: inserted_community.random_number,
},
counts: PostAggregates {
post_id: inserted_post.id,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
ALTER TABLE community
DROP COLUMN random_number;

DROP FUNCTION random_smallint;

Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
-- * inclusive bounds of `smallint` range from https://www.postgresql.org/docs/17/datatype-numeric.html
-- * built-in `random` function has `VOLATILE` and `PARALLEL RESTRICTED` according to:
-- * https://www.postgresql.org/docs/current/parallel-safety.html#PARALLEL-LABELING
-- * https://www.postgresql.org/docs/17/xfunc-volatility.html
CREATE FUNCTION random_smallint ()
RETURNS smallint
LANGUAGE sql
VOLATILE PARALLEL RESTRICTED RETURN
-- https://stackoverflow.com/questions/1400505/generate-a-random-number-in-the-range-1-10/1400752#1400752
-- (65536 = exclusive upper bound - inclusive lower bound)
trunc ((random() * (65536)) - 32768
);

ALTER TABLE community
ADD COLUMN random_number smallint NOT NULL DEFAULT random_smallint ();

CREATE INDEX idx_community_random_number ON community (random_number) INCLUDE (local, nsfw)
WHERE
NOT (deleted OR removed OR visibility = 'Private');

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why smallint? due to birthday paradox you have a high collision probability already with 2^(n/2) communities = 2^8 = 256 communities

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses less space, and collisions are fine because the random number is updated after the community is picked

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you update the number after picking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its described here. I'm sure there are others ways of doing it, allowing collisions and updating the random number after picking, being one way to do it.

Copy link
Collaborator

@phiresky phiresky Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, i missed the last paragraph there. Still seems a bit awkward to me, especially having a read-only innocuous request cause a table write.

Why don't you just do select * from community where id = 1+(random() * (select max(id) from community))::int in a loop? The id of communities is already unique and uniformly distributed, with a fill rate of like 99% (so needing more than one/two queries is exceedingly unlikely) - no new column or modifying query or function needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would make filtered (nswf/local) more difficult, mh.