Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(PE-6695/PE-6696): remove resolver, add redis support for arns cache #200

Merged
merged 10 commits into from
Sep 10, 2024
24 changes: 1 addition & 23 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,11 @@ services:
- WEBHOOK_INDEX_FILTER=${WEBHOOK_INDEX_FILTER:-}
- WEBHOOK_BLOCK_FILTER=${WEBHOOK_BLOCK_FILTER:-}
- CONTIGUOUS_DATA_CACHE_CLEANUP_THRESHOLD=${CONTIGUOUS_DATA_CACHE_CLEANUP_THRESHOLD:-}
- TRUSTED_ARNS_RESOLVER_URL=${TRUSTED_ARNS_RESOLVER_URL:-}
- TRUSTED_ARNS_GATEWAY_URL=${TRUSTED_ARNS_GATEWAY_URL:-https://__NAME__.arweave.net}
- ARNS_RESOLVER_PRIORITY_ORDER=${ARNS_RESOLVER_PRIORITY_ORDER:-on-demand,gateway}
- ARNS_CACHE_TTL_SECONDS=${ARNS_CACHE_TTL_SECONDS:-3600}
- ARNS_CACHE_MAX_KEYS=${ARNS_CACHE_MAX_KEYS:-10000}
- ARNS_CACHE_TYPE=${ARNS_CACHE_TYPE:-redis}
- ENABLE_MEMPOOL_WATCHER=${ENABLE_MEMPOOL_WATCHER:-false}
- MEMPOOL_POOLING_INTERVAL_MS=${MEMPOOL_POOLING_INTERVAL_MS:-}
- AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-}
Expand Down Expand Up @@ -139,28 +139,6 @@ services:
networks:
- ar-io-network

resolver:
image: ghcr.io/ar-io/arns-resolver:${RESOLVER_IMAGE_TAG:-7fe02ecda2027e504248d3f3716579f60b561de5}
restart: on-failure
ports:
- 6000:6000
environment:
- PORT=6000
- LOG_LEVEL=${LOG_LEVEL:-info}
- IO_PROCESS_ID=${IO_PROCESS_ID:-}
- RUN_RESOLVER=${RUN_RESOLVER:-false}
- EVALUATION_INTERVAL_MS=${EVALUATION_INTERVAL_MS:-}
- ARNS_CACHE_TTL_MS=${RESOLVER_CACHE_TTL_MS:-}
- ARNS_CACHE_PATH=${ARNS_CACHE_PATH:-./data/arns}
- AO_CU_URL=${AO_CU_URL:-}
- AO_MU_URL=${AO_MU_URL:-}
- AO_GATEWAY_URL=${AO_GATEWAY_URL:-}
- AO_GRAPHQL_URL=${AO_GRAPHQL_URL:-}
volumes:
- ${ARNS_CACHE_PATH:-./data/arns}:/app/data/arns
networks:
- ar-io-network
Comment on lines -142 to -162
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋


litestream:
image: ghcr.io/ar-io/ar-io-litestream:${LITESTREAM_IMAGE_TAG:-latest}
build:
Expand Down
18 changes: 5 additions & 13 deletions src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,8 @@ export const WEBHOOK_BLOCK_FILTER = createFilter(
// ArNS Resolution
//

export const ARNS_CACHE_TYPE = env.varOrDefault('ARNS_CACHE_TYPE', 'node');

export const ARNS_CACHE_TTL_SECONDS = +env.varOrDefault(
'ARNS_CACHE_TTL_SECONDS',
`${60 * 60}`, // 1 hour
Expand All @@ -269,23 +271,13 @@ export const ARNS_CACHE_MAX_KEYS = +env.varOrDefault(
);

export const ARNS_RESOLVER_PRIORITY_ORDER = env
.varOrDefault('ARNS_RESOLVER_PRIORITY_ORDER', 'resolver,on-demand,gateway')
.varOrDefault('ARNS_RESOLVER_PRIORITY_ORDER', 'on-demand,gateway')
.split(',');

// TODO: support multiple gateway urls
export const TRUSTED_ARNS_GATEWAY_URL = env.varOrDefault(
'TRUSTED_ARNS_GATEWAY_URL',
'https://__NAME__.arweave.dev',
);

// @deprecated - use ARNS_RESOLVER_PRIORITY_ORDER instead to specify the order
// of resolvers to try if the first one is not available.
export const TRUSTED_ARNS_RESOLVER_TYPE = env.varOrDefault(
'TRUSTED_ARNS_RESOLVER_TYPE',
'gateway',
);

export const TRUSTED_ARNS_RESOLVER_URL = env.varOrUndefined(
'TRUSTED_ARNS_RESOLVER_URL',
'https://__NAME__.arweave.net',
);

//
Expand Down
49 changes: 37 additions & 12 deletions src/init/resolvers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,30 +16,61 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
import { Logger } from 'winston';
import { StandaloneArNSResolver } from '../resolution/standalone-arns-resolver.js';
import { OnDemandArNSResolver } from '../resolution/on-demand-arns-resolver.js';
import { TrustedGatewayArNSResolver } from '../resolution/trusted-gateway-arns-resolver.js';
import { NameResolver } from '../types.js';
import { KVBufferStore, NameResolver } from '../types.js';
import { AoIORead } from '@ar.io/sdk';
import { CompositeArNSResolver } from '../resolution/composite-arns-resolver.js';
import { RedisKvStore } from '../store/redis-kv-store.js';
import { NodeKvStore } from '../store/node-kv-store.js';
import { KvArnsStore } from '../store/kv-arns-store.js';

const supportedResolvers = ['on-demand', 'resolver', 'gateway'] as const;
const supportedResolvers = ['on-demand', 'gateway'] as const;
export type ArNSResolverType = (typeof supportedResolvers)[number];

export const isArNSResolverType = (type: string): type is ArNSResolverType => {
return supportedResolvers.includes(type as ArNSResolverType);
};

export const createArNSKvStore = ({
log,
type,
redisUrl,
ttlSeconds,
maxKeys,
}: {
type: 'redis' | 'node' | string;
log: Logger;
redisUrl: string;
ttlSeconds: number;
maxKeys: number;
}): KVBufferStore => {
log.info(`Using ${type} as KVBufferStore for arns`, {
type,
redisUrl,
ttlSeconds,
maxKeys,
});
if (type === 'redis') {
return new RedisKvStore({
log,
redisUrl,
ttlSeconds,
});
}
return new NodeKvStore({ ttlSeconds, maxKeys });
};

export const createArNSResolver = ({
log,
cache,
resolutionOrder,
standaloneArnResolverUrl,
trustedGatewayUrl,
networkProcess,
}: {
log: Logger;
cache: KvArnsStore;
resolutionOrder: (ArNSResolverType | string)[];
standaloneArnResolverUrl?: string;
trustedGatewayUrl?: string;
networkProcess?: AoIORead;
}): NameResolver => {
Expand All @@ -49,13 +80,6 @@ export const createArNSResolver = ({
log,
networkProcess,
}),
resolver:
standaloneArnResolverUrl !== undefined
? new StandaloneArNSResolver({
log,
resolverUrl: standaloneArnResolverUrl,
})
: undefined,
gateway:
trustedGatewayUrl !== undefined
? new TrustedGatewayArNSResolver({
Expand All @@ -82,5 +106,6 @@ export const createArNSResolver = ({
return new CompositeArNSResolver({
log,
resolvers,
cache,
});
};
15 changes: 15 additions & 0 deletions src/metrics.ts
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,21 @@ export const redisErrorCounter = new promClient.Counter({
help: 'Number of errors redis cache has received',
});

export const arnsCacheHitCounter = new promClient.Counter({
name: 'arns_cache_hit_total',
help: 'Number of hits in the arns cache',
});

export const arnsCacheMissCounter = new promClient.Counter({
name: 'arns_cache_miss_total',
help: 'Number of misses in the arns cache',
});

export const arnsResolutionTime = new promClient.Summary({
name: 'arns_resolution_time_ms',
help: 'Time in ms it takes to resolve an arns name',
});

// Data source metrics

export const getDataErrorsTotal = new promClient.Counter({
Expand Down
7 changes: 5 additions & 2 deletions src/middleware/arns.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import { headerNames } from '../constants.js';
import { sendNotFound } from '../routes/data/handlers.js';
import { DATA_PATH_REGEX } from '../constants.js';
import { NameResolver } from '../types.js';

import * as metrics from '../metrics.js';
const EXCLUDED_SUBDOMAINS = new Set('www');

export const createArnsMiddleware = ({
Expand Down Expand Up @@ -53,7 +53,7 @@ export const createArnsMiddleware = ({
if (
EXCLUDED_SUBDOMAINS.has(arnsSubdomain) ||
// Avoid collisions with sandbox URLs by ensuring the subdomain length
// is below the mininimum length of a sandbox subdomain. Undernames are
// is below the minimum length of a sandbox subdomain. Undernames are
// are an exception because they can be longer and '_' cannot appear in
// base32.
(arnsSubdomain.length > 48 && !arnsSubdomain.match(/_/))
Expand All @@ -67,15 +67,18 @@ export const createArnsMiddleware = ({
return;
}

const start = Date.now();
const { resolvedId, ttl, processId } =
await nameResolver.resolve(arnsSubdomain);
metrics.arnsResolutionTime.observe(Date.now() - start);
if (resolvedId === undefined) {
sendNotFound(res);
return;
}
res.header(headerNames.arnsResolvedId, resolvedId);
res.header(headerNames.arnsTtlSeconds, ttl.toString());
res.header(headerNames.arnsProcessId, processId);
// TODO: add a header for arns cache status
res.header('Cache-Control', `public, max-age=${ttl}`);
dataHandler(req, res, next);
});
52 changes: 44 additions & 8 deletions src/resolution/composite-arns-resolver.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,34 +16,70 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
import winston from 'winston';
import { NameResolution, NameResolver } from '../types.js';
import { KVBufferStore, NameResolution, NameResolver } from '../types.js';
import * as metrics from '../metrics.js';
import { KvArnsStore } from '../store/kv-arns-store.js';

export class CompositeArNSResolver implements NameResolver {
private log: winston.Logger;
private resolvers: NameResolver[];
private cache: KVBufferStore;

constructor({
log,
resolvers,
cache,
}: {
log: winston.Logger;
resolvers: NameResolver[];
cache: KvArnsStore;
}) {
this.log = log.child({ class: 'CompositeArNSResolver' });
this.log = log.child({ class: this.constructor.name });
this.resolvers = resolvers;
this.cache = cache;
}

async resolve(name: string): Promise<NameResolution> {
this.log.info('Resolving name...', { name });

try {
const cachedResolutionBuffer = await this.cache.get(name);
if (cachedResolutionBuffer) {
const cachedResolution: NameResolution = JSON.parse(
cachedResolutionBuffer.toString(),
);
if (
cachedResolution !== undefined &&
cachedResolution.resolvedAt !== undefined &&
cachedResolution.ttl !== undefined &&
cachedResolution.resolvedAt + cachedResolution.ttl * 1000 > Date.now()
) {
metrics.arnsCacheHitCounter.inc();
this.log.info('Cache hit for arns name', { name });
return cachedResolution;
}
}
metrics.arnsCacheMissCounter.inc();
this.log.info('Cache miss for arns name', { name });

for (const resolver of this.resolvers) {
this.log.debug('Attempting to resolve name with resolver', {
resolver,
});
const resolution = await resolver.resolve(name);
if (resolution.resolvedId !== undefined) {
return resolution;
try {
this.log.info('Attempting to resolve name with resolver', {
type: resolver.constructor.name,
name,
});
const resolution = await resolver.resolve(name);
if (resolution.resolvedId !== undefined) {
await this.cache.set(name, Buffer.from(JSON.stringify(resolution)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the need to await the cache set here. For other caches we generally don't wait for it to be set.
In this case, I imagine if we call resolve for the same name several times (while the first is being cached) the sequential calls, while the name is not cached yet, will not wait for the cache, right? If so I don't think we should wait for the cache to be set.

What do you think?

Copy link
Collaborator Author

@dtfiedler dtfiedler Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call out, removed in f1bd0cf -

when thinking about the implications of this, i also realized we aren't handling concurrent requests well. I'd like to propose adding a request cache in the arns middleware that protects against concurrent calls to nameResolver.resolve(name) by storing the promise in a NodeCache

    // check if this instance is already in the process of resolving the requested name, and return that promise if so, otherwise set it in the cache
    const getArnsResolutionPromise = async (): Promise<NameResolution> => {
      if (arnsRequestCache.has(arnsSubdomain)) {
        const arnsResolutionPromise =
          arnsRequestCache.get<Promise<NameResolution>>(arnsSubdomain);
        if (arnsResolutionPromise) {
          return arnsResolutionPromise;
        }
      }
      const arnsResolutionPromise = nameResolver.resolve(arnsSubdomain);
      arnsRequestCache.set(arnsSubdomain, arnsResolutionPromise);
      return arnsResolutionPromise;
    };

    const start = Date.now();
    const { resolvedId, ttl, processId } =
      await getArnsResolutionPromise().finally(() => {
        // remove from cache after resolution
        arnsRequestCache.del(arnsSubdomain);
      });
    metrics.arnsResolutionTime.observe(Date.now() - start);
    if (resolvedId === undefined) {
      sendNotFound(res);
      return;
    }
    res.header(headerNames.arnsResolvedId, resolvedId);
    res.header(headerNames.arnsTtlSeconds, ttl.toString());
    res.header(headerNames.arnsProcessId, processId);
    // TODO: add a header for arns cache status
    res.header('Cache-Control', `public, max-age=${ttl}`);
    dataHandler(req, res, next);

After adding this change, I did some tests using hey.

Test: 100 concurrent requests for an arns name

Before:

Each arns request created independent promises to AO, resulting in a wide range of resolution times and some experiencing rate limiting/throttling having to fallback to TrustedArNSGateway resolvers.

❯ hey -n 100 -c 100 -t 60 -host 'gateways.example.com' http://localhost:4000

Summary:
  Total:        65.3634 secs
  Slowest:      65.3604 secs
  Fastest:      17.1049 secs
  Average:      31.1251 secs
  Requests/sec: 1.5299
  
  Total data:   172414200 bytes
  Size/request: 1724142 bytes

Response time histogram:
  17.105 [1]    |■
  21.930 [71]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  26.756 [3]    |■■
  31.582 [0]    |
  36.407 [0]    |
  41.233 [0]    |
  46.058 [0]    |
  50.884 [0]    |
  55.709 [0]    |
  60.535 [6]    |■■■
  65.360 [19]   |■■■■■■■■■■■


Latency distribution:
  10% in 19.6950 secs
  25% in 20.5840 secs
  50% in 21.3610 secs
  75% in 59.9175 secs
  90% in 63.3682 secs
  95% in 64.5496 secs
  99% in 65.3604 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0074 secs, 17.1049 secs, 65.3604 secs
  DNS-lookup:   0.0026 secs, 0.0016 secs, 0.0042 secs
  req write:    0.0002 secs, 0.0000 secs, 0.0020 secs
  resp wait:    27.6524 secs, 15.5326 secs, 65.3426 secs
  resp read:    3.4650 secs, 0.0111 secs, 6.5513 secs

Status code distribution:
  [200] 100 responses

Logs - these are printed 100 times (one for each request) and several experience rate limits from AO infrastructure, forcing the CompositeArNSResolver to fallback to gateways to get resolution data

core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.052Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.052Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.054Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.054Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.055Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.056Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.056Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.057Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.060Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.060Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.060Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.064Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.065Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.065Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.069Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.069Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.069Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.074Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.074Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.074Z"}
core-1  | warn: Unable to resolve name: fetch failed {"class":"OnDemandArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.078Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.078Z","type":"TrustedGatewayArNSResolver"}
core-1  | info: Resolving name... {"class":"TrustedGatewayArNSResolver","name":"joose","timestamp":"2024-09-10T14:19:59.078Z"}

After

All 100 requests were satisfied with the single request to AO, and resolution times were approx. the same

❯ hey -n 100 -c 100 -t 60 -host 'gateways.example.com' http://localhost:4000

Summary:
  Total:        44.0368 secs
  Slowest:      44.0354 secs
  Fastest:      43.6696 secs
  Average:      43.8166 secs
  Requests/sec: 2.2708
  
  Total data:   172414200 bytes
  Size/request: 1724142 bytes

Response time histogram:
  43.670 [1]    |■
  43.706 [0]    |
  43.743 [8]    |■■■■■■■■■■
  43.779 [22]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  43.816 [31]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  43.853 [10]   |■■■■■■■■■■■■■
  43.889 [17]   |■■■■■■■■■■■■■■■■■■■■■■
  43.926 [3]    |■■■■
  43.962 [4]    |■■■■■
  43.999 [3]    |■■■■
  44.035 [1]    |■


Latency distribution:
  10% in 43.7457 secs
  25% in 43.7732 secs
  50% in 43.8027 secs
  75% in 43.8615 secs
  90% in 43.9001 secs
  95% in 43.9574 secs
  99% in 44.0354 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0091 secs, 43.6696 secs, 44.0354 secs
  DNS-lookup:   0.0021 secs, 0.0015 secs, 0.0029 secs
  req write:    0.0002 secs, 0.0000 secs, 0.0016 secs
  resp wait:    42.9208 secs, 42.9030 secs, 42.9412 secs
  resp read:    0.8865 secs, 0.7390 secs, 1.0992 secs

Status code distribution:
  [200] 100 responses

Logs - only printed once for all 100 requests, no falling back to gateway resolver necessary

core-1  | info: Cache miss for arns name {"class":"CompositeArNSResolver","name":"gateways","timestamp":"2024-09-10T13:34:09.999Z"}
core-1  | info: Attempting to resolve name with resolver {"class":"CompositeArNSResolver","name":"gateways","timestamp":"2024-09-10T13:34:09.999Z","type":"OnDemandArNSResolver"}
core-1  | info: Resolving name... {"class":"OnDemandArNSResolver","name":"gateways","timestamp":"2024-09-10T13:34:10.000Z"}

This should also help slamming AO on fresh or expired names and likely a pattern would could extend to other request paths.

cc @djwhitt

this.log.info('Resolved name', { name, resolution });
return resolution;
}
} catch (error: any) {
this.log.error('Error resolving name with resolver', {
resolver,
message: error.message,
stack: error.stack,
});
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fall back to the cached resolution here, perhaps with some staleness threshold, if we have one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, can add that, we can use the TTL of the cache as the staleness threshold. that would mean if the cache has it, and we can't fetch anything new - return what the cache has until it expires.

Copy link
Collaborator Author

@dtfiedler dtfiedler Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified to return the cached resolution data on error - if we have it - here - b9e4fe6

this.log.warn('Unable to resolve name against all resolvers', { name });
Expand Down
Loading