Persistent cache for tracing #15928

frskplis · 2023-05-09T14:27:39Z

frskplis
May 9, 2023

Is it possible to write persistent cache for jit tracers such that tracing won't be executed again after python restarts, rather it will be retrieved from disk?

For example:
on second run (after python restart I get):

import os
from jax.experimental.compilation_cache import compilation_cache as cc
import logging
import jax
from jax import numpy as jnp, jit
from jax.config import config
config.update("jax_persistent_cache_min_compile_time_secs", 0.0001)
logging.getLogger("jax").setLevel(logging.DEBUG)
cc.initialize_cache("mycache")

@jit
def fun(x):
  return x ** 2

fun(jnp.array([1.0, 2.0]))

DEBUG:jax._src.dispatch:Finished tracing + transforming jit(convert_element_type) in 0.00046253204345703125 sec
DEBUG:jax._src.xla_bridge:Initializing backend 'interpreter'
DEBUG:jax._src.xla_bridge:Backend 'interpreter' initialized
DEBUG:jax._src.xla_bridge:Initializing backend 'cpu'
DEBUG:jax._src.xla_bridge:Backend 'cpu' initialized
DEBUG:jax._src.xla_bridge:Initializing backend 'cuda'
DEBUG:jax._src.xla_bridge:Backend 'cuda' initialized
DEBUG:jax._src.xla_bridge:Initializing backend 'rocm'
INFO:jax._src.xla_bridge:Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter CUDA Host
DEBUG:jax._src.xla_bridge:Initializing backend 'tpu'
INFO:jax._src.xla_bridge:Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
DEBUG:jax._src.xla_bridge:Initializing backend 'plugin'
INFO:jax._src.xla_bridge:Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
DEBUG:jax._src.interpreters.pxla:Compiling convert_element_type for with global shapes and types (ShapedArray(float32[2]),). Argument mapping: (GSPMDSharding({replicated}),).
DEBUG:jax._src.dispatch:Finished tracing + transforming fun for pjit in 0.0014641284942626953 sec
DEBUG:jax._src.interpreters.pxla:Compiling fun for with global shapes and types [ShapedArray(float32[2])]. Argument mapping: (GSPMDSharding({replicated}),).
DEBUG:jax._src.xla_bridge:get_compile_options: num_replicas=1 num_partitions=1 device_assignment=[[StreamExecutorGpuDevice(id=0, process_index=0, slice_index=0)]]
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash of serialized computation: 572f8cf11060af92f8bb95d7ad7d3189f01405d7b3507434484f507245cc269f
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash after serializing computation: 572f8cf11060af92f8bb95d7ad7d3189f01405d7b3507434484f507245cc269f
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash of serialized compile_options: 8ced0d286136aa8f08abc87ac840b0da227f567c786631189f09e095b25eafff
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash after serializing compile_options: e3d292efbed60f086398887cf8705af5cdefb6ddeba7b2c842735ae9e056e9c0
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash of serialized jax_lib version: 838376a0281034b0707818222fe8086b480b573ced89ccc4c809eb3c6200d85f
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash after serializing jax_lib version: 3928f7f36252566dd2c43ef2bf7f9a34981b8e1ccaa0f85e31c0348aa67638a6
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash of serialized the backend: 9de530d7fd5294cb35ef4031b429dc12950d5f8cc588811aa1f7a328a0f483a6
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash after serializing the backend: 417c7dacbebbcc8332bc7b58170a01d0ff14b46da70e5c3e1eb538189095e2c8
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash of serialized XLA flags: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
DEBUG:jax.experimental.compilation_cache.compilation_cache:get_cache_key hash after serializing XLA flags: 417c7dacbebbcc8332bc7b58170a01d0ff14b46da70e5c3e1eb538189095e2c8
INFO:jax._src.dispatch:Persistent compilation cache hit for 'jit_fun'
DEBUG:jax._src.dispatch:Finished XLA compilation of jit(fun) in 0.0487363338470459 sec

Array([1., 4.], dtype=float32)

If I read thet correctly this line

DEBUG:jax._src.dispatch:Finished tracing + transforming fun for pjit in 0.0014641284942626953 sec

means that tracing is performed. For my actual function the tracing takes a long time to finish and I was thinking how can I store the tracing information so it won't be recomputed?
If I run this function again (without python restart) no tracing is performed this time.

For my actual case I have one function that takes many combinations of input dimensions and tracing being run after each dimension change is troublesome and takes many seconds while actual computation is done in miliseconds. Why do we have this mechanism at all when persistent compiled cache is on?

frskplis · 2023-05-26T17:50:27Z

frskplis
May 26, 2023
Author

After reading autodidax manual, jax source code and unit tests for persistent cache, I came up with this solution:

import jax
from jax._src import compilation_cache as cc
from jax._src import xla_bridge
import numpy as np
import jax.numpy as jnp
from jax import jit

@jit
def fun(x):
  print("tracing!")
  return x ** 2

x_input = jnp.array([1.0, 2.0])

# Compile and save compilation to disk, run this code only the first time
cc.initialize_cache("mycache")
devices = np.array([[jax.local_devices()[0]]])
compile_options = xla_bridge.get_compile_options(
    num_replicas=1, num_partitions=1
)
computation = (
          jax.jit(fun)
          .lower(x_input)
          .compiler_ir()
)
executable = backend.compile(str(computation), compile_options)
cc.put_executable("myexecutable", "afun", executable, backend)

# After python restart just run the following:
cc.initialize_cache("mycache")
backend = xla_bridge.get_backend()
compile_options = xla_bridge.get_compile_options(
    num_replicas=1, num_partitions=1
)
executable = cc.get_executable("myexecutable", compile_options, backend)
executable.execute([jnp.array([2.0, 3.0])])[0]

This directly loads executable from the disk and executes it directly, without running tracers so this is really fast. I am aware that executable is compiled only for specific shape of the input and it will not work if the shape changes, but I can manage this problem by recompiling the function for different input shapes. I have one question though, is this solution safe to use?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent cache for tracing #15928

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Persistent cache for tracing #15928

frskplis May 9, 2023

Replies: 1 comment

frskplis May 26, 2023 Author

frskplis
May 9, 2023

frskplis
May 26, 2023
Author