-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute release 2024-09-25 #9151
Commits on Sep 24, 2024
-
1
Configuration menu - View commit details
-
Copy full SHA for a65d437 - Browse repository at this point
Copy the full SHA a65d437View commit details -
Move the patch to compute (#9120)
## Problem All the other patches were moved to the compute directory, and only one was left in the patches subdirectory in the root directory. ## Summary of changes The patch was moved to the compute directory as others
1Configuration menu - View commit details
-
Copy full SHA for b224a5a - Browse repository at this point
Copy the full SHA b224a5aView commit details -
test: Make test_hot_standby_feedback more forgiving of slow initializ…
…ation (#9113) Don't start waiting for the index to appear in the secondary until it has been created in the primary. Before, if the "pgbench -i" step took more than 60 s, we would give up. There was a flaky test failure along those lines at: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9105/10997477941/index.html#suites/950eff205b552e248417890b8b8f189e/73cf4b5648fa6f74/ Hopefully, this avoids such failures in the future.
1Configuration menu - View commit details
-
Copy full SHA for 70fe007 - Browse repository at this point
Copy the full SHA 70fe007View commit details -
test: Skip fsync when initdb'ing the storage controller db
After initdb, we configure it with "fsync=off" anyway.
Configuration menu - View commit details
-
Copy full SHA for 589594c - Browse repository at this point
Copy the full SHA 589594cView commit details -
test: Poll pageserver availability more aggressively at test startup
Even with the 100 ms interval, on my laptop the pageserver always becomes available on second attempt, so this saves about 900 ms at every test startup.
1Configuration menu - View commit details
-
Copy full SHA for 2f7ceca - Browse repository at this point
Copy the full SHA 2f7cecaView commit details -
pageserver: handle decompression outside vectored
read_blobs
(#8942)Part of #8130. ## Problem Currently, decompression is performed within the `read_blobs` implementation and the decompressed blob will be appended to the end of the `BytesMut` buffer. We will lose this flexibility of extending the buffer when we switch to using our own dio-aligned buffer (WIP in #8730). To facilitate the adoption of aligned buffer, we need to refactor the code to perform decompression outside `read_blobs`. ## Summary of changes - `VectoredBlobReader::read_blobs` will return `VectoredBlob` without performing decompression and appending decompressed blob. It becomes the caller's responsibility to decompress the buffer. - Added a new `BufView` type that functions as `Cow<Bytes, &[u8]>`. - Perform decompression within `VectoredBlob::read` so that people don't have to explicitly thinking about compression when using the reader interface. Signed-off-by: Yuchen Liang <[email protected]>
1Configuration menu - View commit details
-
Copy full SHA for 4f67b02 - Browse repository at this point
Copy the full SHA 4f67b02View commit details -
Catch Cancelled and don't print a warning for it (#9121)
In the `imitate_synthetic_size_calculation_worker` function, we might obtain the `Cancelled` error variant instead of hitting the cancellation token based path. Therefore, catch `Cancelled` and handle it analogously to the cancellation case. Fixes #8886.
1Configuration menu - View commit details
-
Copy full SHA for c47f355 - Browse repository at this point
Copy the full SHA c47f355View commit details -
Fix compiler warnings on macOS (#9128)
## Problem Compilation of neon extension on macOS produces a warning ``` pgxn/neon/neon_perf_counters.c:50:1: error: non-void function does not return a value [-Werror,-Wreturn-type] ``` ## Summary of changes - Change the return type of `NeonPerfCountersShmemInit` to void
1Configuration menu - View commit details
-
Copy full SHA for 523cf71 - Browse repository at this point
Copy the full SHA 523cf71View commit details -
test: Make test_lfc_resize more robust (#9117)
1. Increase statement_timeout. It defaults to 120 s, which is not quite enough on slow or busy systems with debug build. On my laptop, the index creation takes about 100 s. On buildfarm, we've seen failures, e.g: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9084/10997888708/index.html#suites/821f97908a487f1d7d3a2a4dd1571e99/db1834bddfe8c5b9/ 2. Keep twiddling the LFC size through the whole test. Before, we would do it for the first 10 seconds, but that only covers a small part of the pgbench initialization phase. Change the loop so that the pgbench run time determines how long the test runs, and we keep changing the LFC for the whole time. In the passing, also fix bogus test description, copy-pasted from a completely unrelated test.
1Configuration menu - View commit details
-
Copy full SHA for af5c54e - Browse repository at this point
Copy the full SHA af5c54eView commit details -
Remove TenantState::Loading (#9118)
The last real use was removed in commit de90bf4. It was still used in a few unit tests, but they can use Attaching too.
1Configuration menu - View commit details
-
Copy full SHA for 5cbf5b4 - Browse repository at this point
Copy the full SHA 5cbf5b4View commit details -
chore(docker-compose): fix typo in readme (#9133)
Typo in the readme inside docker-compose folder ## Summary of changes - Update the readme
1Configuration menu - View commit details
-
Copy full SHA for 938b163 - Browse repository at this point
Copy the full SHA 938b163View commit details -
fix(test): storage scrubber should only log to stdout with info (#9067)
As @koivunej mentioned in the storage channel, for regress test, we don't need to create a log file for the scrubber, and we should reduce noisy logs. ## Summary of changes * Disable log file creation for storage scrubber * Only log at info level --------- Signed-off-by: Alex Chi Z <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5f2f31e - Browse repository at this point
Copy the full SHA 5f2f31eView commit details
Commits on Sep 25, 2024
-
storcon: add tags to scheduler logs (#9127)
We log something at info level each time we schedule a shard to a non-secondary location. Might as well have context for it.
1Configuration menu - View commit details
-
Copy full SHA for a26cc29 - Browse repository at this point
Copy the full SHA a26cc29View commit details -
1
Configuration menu - View commit details
-
Copy full SHA for 7dcfccc - Browse repository at this point
Copy the full SHA 7dcfcccView commit details -
storcon: do az aware scheduling (#9083)
## Problem Storage controller didn't previously consider AZ locality between compute and pageservers when scheduling nodes. Control plane has this feature, and, since we are migrating tenants away from it, we need feature parity to avoid perf degradations. ## Summary of changes The change itself is fairly simple: 1. Thread az info into the scheduler 2. Add an extra member to the scheduling scores Step (2) deserves some more discussion. Let's break it down by the shard type being scheduled: **Attached Shards** We wish for attached shards of a tenant to end up in the preferred AZ of the tenant since that is where the compute is like to be. The AZ member for `NodeAttachmentSchedulingScore` has been placed below the affinity score (so it's got the second biggest weight for picking the node). The rationale for going below the affinity score is to avoid having all shards of a single tenant placed on the same node in 2 node regions, since that would mean that one tenant can drive the general workload of an entire pageserver. I'm not 100% sure this is the right decision, so open to discussing hoisting the AZ up to first place. **Secondary Shards** We wish for secondary shards of a tenant to be scheduled in a different AZ from the preferred one for HA purposes. The AZ member for `NodeSecondarySchedulingScore` has been placed first, so nodes in different AZs from the preferred one will always be considered first. On small clusters, this can mean that all the secondaries of a tenant are scheduled to the same pageserver, but secondaries don't use up as many resources as the attached location, so IMO the argument made for attached shards doesn't hold. Related: #8848
1Configuration menu - View commit details
-
Copy full SHA for 2cf47b1 - Browse repository at this point
Copy the full SHA 2cf47b1View commit details -
storage controller: make proxying of GETs to pageservers more robust (#…
…9065) ## Problem These commits are split off from https://github.com/neondatabase/neon/pull/8971/commits where I was fixing this to make a better scale test pass -- Vlad also independently recognized these issues with cloudbench in #9062. 1. The storage controller proxies GET requests to pageservers based on their intent, not the ground truth of where they're really attached. 2. Proxied requests can race with scheduling to tenants, resulting in 404 responses if the request hits the wrong pageserver. Closes: #9062 ## Summary of changes 1. If a shard has a running reconciler, then use the database generation_pageserver to decide who to proxy the request to 2. If such a request gets a 404 response and its scheduled node has changed since the request was dispatched.
1Configuration menu - View commit details
-
Copy full SHA for 4b711ca - Browse repository at this point
Copy the full SHA 4b711caView commit details -
1
Configuration menu - View commit details
-
Copy full SHA for 518f598 - Browse repository at this point
Copy the full SHA 518f598View commit details -
Build images for PG17 using Debian 12 "Bookworm" (#9132)
This increases the support window of the OS used for PG17 by 2 years compared to the previous usage of Debian 11 "Bullseye".
1Configuration menu - View commit details
-
Copy full SHA for c4f5736 - Browse repository at this point
Copy the full SHA c4f5736View commit details -
storcon: include timeline ID in LSN waiting logs (#9141)
## Problem Hard to tell which timeline is holding the migration. ## Summary of Changes Add timeline id to log.
1Configuration menu - View commit details
-
Copy full SHA for c597238 - Browse repository at this point
Copy the full SHA c597238View commit details -
fix(pageserver): handle lsn lease requests for unnormalized lsns (#9137)
Fixes #9098. ## Problem See #9098 (comment). ### Related A similar problem happened with branch creation, which was discussed [here](#2143 (comment)) and fixed by #2529. ## Summary of changes - Normalize the lsn on pageserver side upon lsn lease request, stores the normalized LSN. Signed-off-by: Yuchen Liang <[email protected]>
1Configuration menu - View commit details
-
Copy full SHA for d447f49 - Browse repository at this point
Copy the full SHA d447f49View commit details