Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deduce type of copy call void @llvm.memcpy.p10i8.p0i8.i64 #1547

Closed
vchuravy opened this issue Jun 20, 2024 · 12 comments
Closed

Cannot deduce type of copy call void @llvm.memcpy.p10i8.p0i8.i64 #1547

vchuravy opened this issue Jun 20, 2024 · 12 comments

Comments

@vchuravy
Copy link
Member

Reproducer:

git clone https://github.com/vchuravy/WaterLily.jl
cd WaterLily.jl/examples
git checkout vc/enzyme
# instantiate local project
julia +1.10 --project=. TandemFoilOptim.jl
ERROR: LoadError: Enzyme execution failed.
Enzyme cannot deduce type
Current scope: 
; Function Attrs: mustprogress willreturn
define internal fastcc void @preprocess_julia__make_foils_1_2261([6 x {} addrspace(10)*]* noalias nocapture nofree noundef nonnull writeonly sret([6 x {} addrspace(10)*]) align
 8 dereferenceable(48) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,8]:Pointer, [-1,16]:Pointer, [-1,32]:Pointer}" %0, float "enzyme_type"="{[-1]:Float@float}" "enzymejl_p
armtype"="138083780338720" "enzymejl_parmtype_ref"="0" %1) unnamed_addr #42 !dbg !725 {
; ...

Cannot deduce type of copy   call void @llvm.memcpy.p10i8.p0i8.i64(i8 addrspace(10)* noundef align 1 dereferenceable(7) %newstruct31.sroa.3.sroa.2.0.newstruct31.sroa.3.0..sroa_
raw_idx.sroa_raw_idx, i8* noundef nonnull align 1 dereferenceable(7) %newstruct31.sroa.3.sroa.2.1.newstruct24.sroa.3.0.sroa_idx.sroa_idx, i64 noundef 7, i1 noundef false) #44, 
!dbg !85

Caused by:
Stacktrace:
 [1] Simulation
   @ ~/src/WaterLily/src/WaterLily.jl:65
 [2] #make_foils#1
   @ ~/src/WaterLily/examples/TandemFoilOptim.jl:24

Full log: https://gist.github.com/vchuravy/8e70c7ff38fd150f941fef6a7af6cc92

@wsmoses
Copy link
Member

wsmoses commented Jun 21, 2024

The problem is that this type doesn't have any ino when taking a typetree of it between bytes 16 and 24.

  %box34 = call noalias nonnull dereferenceable(240) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Float@float, [-1,40]:Float@double, [-1,48]:Integer, [-1,49]:Integer, [-1,50]:Integer, [-1,51]:Integer, [-1,52]:Integer, [-1,53]:Integer, [-1,54]:Integer, [-1,55]:Integer, [-1,56]:Float@double, [-1,64]:Integer, [-1,65]:Integer, [-1,66]:Integer, [-1,67]:Integer, [-1,68]:Integer, [-1,69]:Integer, [-1,70]:Integer, [-1,71]:Integer, [-1,72]:Integer, [-1,73]:Integer, [-1,74]:Integer, [-1,75]:Integer, [-1,76]:Integer, [-1,77]:Integer, [-1,78]:Integer, [-1,79]:Integer, [-1,80]:Integer, [-1,81]:Integer, [-1,82]:Integer, [-1,83]:Integer, [-1,84]:Integer, [-1,85]:Integer, [-1,86]:Integer, [-1,87]:Integer, [-1,88]:Integer, [-1,89]:Integer, [-1,90]:Integer, [-1,91]:Integer, [-1,92]:Integer, [-1,93]:Integer, [-1,94]:Integer, [-1,95]:Integer, [-1,96]:Integer, [-1,97]:Integer, [-1,98]:Integer, [-1,99]:Integer, [-1,100]:Integer, [-1,101]:Integer, [-1,102]:Integer, [-1,103]:Integer, [-1,104]:Integer, [-1,105]:Integer, [-1,106]:Integer, [-1,107]:Integer, [-1,108]:Integer, [-1,109]:Integer, [-1,110]:Integer, [-1,111]:Integer, [-1,112]:Integer, [-1,113]:Integer, [-1,114]:Integer, [-1,115]:Integer, [-1,116]:Integer, [-1,117]:Integer, [-1,118]:Integer, [-1,119]:Integer, [-1,120]:Integer, [-1,121]:Integer, [-1,122]:Integer, [-1,123]:Integer, [-1,124]:Integer, [-1,125]:Integer, [-1,126]:Integer, [-1,127]:Integer, [-1,128]:Integer, [-1,136]:Integer, [-1,137]:Integer, [-1,138]:Integer, [-1,139]:Integer, [-1,140]:Integer, [-1,141]:Integer, [-1,142]:Integer, [-1,143]:Integer, [-1,144]:Float@float, [-1,152]:Float@double, [-1,160]:Integer, [-1,161]:Integer, [-1,162]:Integer, [-1,163]:Integer, [-1,164]:Integer, [-1,165]:Integer, [-1,166]:Integer, [-1,167]:Integer, [-1,168]:Float@double, [-1,176]:Integer, [-1,177]:Integer, [-1,178]:Integer, [-1,179]:Integer, [-1,180]:Integer, [-1,181]:Integer, [-1,182]:Integer, [-1,183]:Integer, [-1,184]:Integer, [-1,185]:Integer, [-1,186]:Integer, [-1,187]:Integer, [-1,188]:Integer, [-1,189]:Integer, [-1,190]:Integer, [-1,191]:Integer, [-1,192]:Integer, [-1,193]:Integer, [-1,194]:Integer, [-1,195]:Integer, [-1,196]:Integer, [-1,197]:Integer, [-1,198]:Integer, [-1,199]:Integer, [-1,200]:Integer, [-1,201]:Integer, [-1,202]:Integer, [-1,203]:Integer, [-1,204]:Integer, [-1,205]:Integer, [-1,206]:Integer, [-1,207]:Integer, [-1,208]:Integer, [-1,209]:Integer, [-1,210]:Integer, [-1,211]:Integer, [-1,212]:Integer, [-1,213]:Integer, [-1,214]:Integer, [-1,215]:Integer, [-1,216]:Integer, [-1,217]:Integer, [-1,218]:Integer, [-1,219]:Integer, [-1,220]:Integer, [-1,221]:Integer, [-1,222]:Integer, [-1,223]:Integer, [-1,224]:Integer, [-1,225]:Integer, [-1,226]:Integer, [-1,227]:Integer, [-1,228]:Integer, [-1,229]:Integer, [-1,230]:Integer, [-1,231]:Integer, [-1,232]:Integer, [-1,233]:Integer, [-1,234]:Integer, [-1,235]:Integer, [-1,236]:Integer, [-1,237]:Integer, [-1,238]:Integer, [-1,239]:Integer}" {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1, i64 noundef 240, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 137776915697616 to {}*) to {} addrspace(10)*)) #46, !dbg !757
  %35 = bitcast {} addrspace(10)* %box34 to i8 addrspace(10)*, !dbg !757


julia> obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

@wsmoses
Copy link
Member

wsmoses commented Jun 21, 2024

okay I'm deeply confused by this memcpy of 7 bytes. Why is this happening. where does it come from?

@wsmoses
Copy link
Member

wsmoses commented Jun 21, 2024

logs of relevance so we don't need to rerun:

julia> obj(x) = Base.unsafe_pointer_to_objref(Base.reinterpret(Ptr{Cvoid}, x))
obj (generic function with 1 method)

julia> obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

julia> T =obj(137776915697616)
AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

julia> fieldtypes(T)
(WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}})

julia> fieldoffsets(T)
ERROR: UndefVarError: `fieldoffsets` not defined
Stacktrace:
 [1] top-level scope
   @ REPL[10]:1

julia> fieldoffset(T, 1)
0x0000000000000000

julia> fieldoffset(T, 2)
0x0000000000000080

julia> T = fieldtypes(T)[1]
WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}

julia> fieldtypes(T)
(Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}})

julia> fieldtypes(T, 1)
ERROR: MethodError: no method matching fieldtypes(::Type{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, ::Int64)

Closest candidates are:
  fieldtypes(::Type)
   @ Base reflection.jl:919

Stacktrace:
 [1] top-level scope
   @ REPL[15]:1

julia> fieldoffset(T, 1)
0x0000000000000000

julia> fieldoffset(T, 2)
0x0000000000000008

julia> fieldoffset(T, 3)
0x0000000000000010

julia> fieldoffset(T, 4)
ERROR: BoundsError: attempt to access DataType at index [4]
Stacktrace:
 [1] fieldoffset(x::DataType, idx::Int64)
   @ Base ./reflection.jl:779
 [2] top-level scope
   @ REPL[19]:1

julia> S = fieldtype(T, 3)
var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}

julia> size(S)
ERROR: MethodError: no method matching size(::Type{var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}})

Closest candidates are:
  size(::LLVM.FunctionBlockSet)
   @ LLVM ~/.julia/packages/LLVM/6cDbl/src/core/function.jl:129
  size(::BitVector)
   @ Base bitarray.jl:104
  size(::BitVector, ::Integer)
   @ Base bitarray.jl:107
  ...

Stacktrace:
 [1] top-level scope
   @ REPL[21]:1

julia> sizeof(S)
112

julia> using LLVM
 │ Package LLVM not found, but a package named LLVM is available from a registry. 
 │ Install package?
 │   (examples) pkg> add LLVM 
 └ (y/n/o) [y]: y
   Resolving package versions...
    Updating `~/git/Enzyme.jl/WaterLily.jl/examples/Project.toml`
  [929cbde3] + LLVM v7.2.1
  No Changes to `~/git/Enzyme.jl/WaterLily.jl/examples/Manifest.toml`
Precompiling project...
  ✗ GLMakie
  74 dependencies successfully precompiled in 59 seconds. 296 already precompiled.
  3 dependencies had output during precompilation:
┌ WaterLily → WaterLilyWriteVTKExt
│  ┌ Warning: 
│  │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│  │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│  └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└  
┌ WaterLily → WaterLilyCUDAExt
│  ┌ Warning: 
│  │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│  │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│  └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└  
┌ WaterLily → WaterLilyReadVTKExt
│  ┌ Warning: 
│  │ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│  │ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
│  └ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
└  
  1 dependency errored.
  For a report of the errors see `julia> err`. To retry use `pkg> precompile`

julia> ctx = LLVM.Context()
LLVM.Context(0x0000000005bc7470, typed ptrs)

julia> tt(T) = string(Enzyme.typetree(T, ctx, ""))
tt (generic function with 1 method)

julia> tt(S)
"{[0]:Integer, [8]:Integer, [9]:Integer, [10]:Integer, [11]:Integer, [12]:Integer, [13]:Integer, [14]:Integer, [15]:Integer, [16]:Float@float, [24]:Float@double, [32]:Integer, [33]:Integer, [34]:Integer, [35]:Integer, [36]:Integer, [37]:Integer, [38]:Integer, [39]:Integer, [40]:Float@double, [48]:Integer, [49]:Integer, [50]:Integer, [51]:Integer, [52]:Integer, [53]:Integer, [54]:Integer, [55]:Integer, [56]:Integer, [57]:Integer, [58]:Integer, [59]:Integer, [60]:Integer, [61]:Integer, [62]:Integer, [63]:Integer, [64]:Integer, [65]:Integer, [66]:Integer, [67]:Integer, [68]:Integer, [69]:Integer, [70]:Integer, [71]:Integer, [72]:Integer, [73]:Integer, [74]:Integer, [75]:Integer, [76]:Integer, [77]:Integer, [78]:Integer, [79]:Integer, [80]:Integer, [81]:Integer, [82]:Integer, [83]:Integer, [84]:Integer, [85]:Integer, [86]:Integer, [87]:Integer, [88]:Integer, [89]:Integer, [90]:Integer, [91]:Integer, [92]:Integer, [93]:Integer, [94]:Integer, [95]:Integer, [96]:Integer, [97]:Integer, [98]:Integer, [99]:Integer, [100]:Integer, [101]:Integer, [102]:Integer, [103]:Integer, [104]:Integer, [105]:Integer, [106]:Integer, [107]:Integer, [108]:Integer, [109]:Integer, [110]:Integer, [111]:Integer}"

julia> fieldtypes(S)
(Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}})

julia> fieldoffset(S, 1)
0x0000000000000000

julia> fieldoffset(S, 2)
0x0000000000000008

julia> fieldoffset(S, 3)
0x0000000000000010

julia> fieldoffset(S, 4)
0x0000000000000018

julia> Int(fieldoffset(S, 4))
24

julia> Int(fieldoffset(S, 3))
16

julia> fieldtypes(S)[3]
Float32

@vchuravy
Copy link
Member Author

Why is this happening. where does it come from?

This is likely LLVM optimizing a copy loop? But why 7 and not 9 I do not know.

@wsmoses
Copy link
Member

wsmoses commented Sep 28, 2024

okay I've fixed the actual issues from this issue at hand.

However now it.....segfaults

@wsmoses
Copy link
Member

wsmoses commented Sep 28, 2024

This is now resolved on main, both original error and segfault. The total code doesn't run however due to Enzyme's cache algorithm getting confused:

(base) wmoses-macbookpro2:examples wmoses$ julia --project=. TandemFoilOptim.jl 
┌ Warning: 
│ Using WaterLily in serial (ie. JULIA_NUM_THREADS=1) is not recommended because it disables the GPU backend and defaults to serial CPU.
│ Use JULIA_NUM_THREADS=auto, or any number of threads greater than 1, to allow multi-threading in CPU or GPU backends.
└ @ WaterLily ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:142
ERROR: LoadError: Enzyme compilation failed.
Current scope: 
; Function Attrs: mustprogress nofree willreturn
define internal fastcc void @preprocess_julia___kern_421_131_9980({ [2 x i64], {} addrspace(10)* } addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Pointer, [-1,16,-1]:Pointer}" "enzymejl_parmtype"="5475090832" "enzymejl_parmtype_ref"="1" %0, {} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %1, i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4855344304" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* nocapture nofree noundef nonnull readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %3) unnamed_addr #22 !dbg !265 {
top:
  %4 = call {}*** @julia.get_pgcstack() #26
  %ptls_field72 = getelementptr inbounds {}**, {}*** %4, i64 2
  %5 = bitcast {}*** %ptls_field72 to i64***
  %ptls_load7374 = load i64**, i64*** %5, align 8, !tbaa !12
  %6 = getelementptr inbounds i64*, i64** %ptls_load7374, i64 2
  %safepoint = load i64*, i64** %6, align 8, !tbaa !16
  fence syncscope("singlethread") seq_cst
  call void @julia.safepoint(i64* %safepoint) #26, !dbg !266
  fence syncscope("singlethread") seq_cst
  %7 = getelementptr inbounds { [2 x i64], {} addrspace(10)* }, { [2 x i64], {} addrspace(10)* } addrspace(11)* %0, i64 0, i32 0, i64 1, !dbg !267
  %unbox2 = load i64, i64 addrspace(11)* %7, align 8, !dbg !271, !tbaa !16, !alias.scope !64, !noalias !65, !enzyme_inactive !0
  %8 = add i64 %unbox2, -1, !dbg !271
  %9 = call i64 @llvm.smax.i64(i64 %8, i64 noundef 1) #26, !dbg !273
  %10 = icmp ult i64 %9, 2, !dbg !276
  br i1 %10, label %L208, label %L36.preheader, !dbg !280

L36.preheader:                                    ; preds = %top
  %11 = getelementptr inbounds { [2 x i64], {} addrspace(10)* }, { [2 x i64], {} addrspace(10)* } addrspace(11)* %0, i64 0, i32 0, i64 0, !dbg !267
  %unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !271, !tbaa !16, !alias.scope !64, !noalias !65
  %12 = add i64 %unbox, -1, !dbg !271
  %13 = call i64 @llvm.smax.i64(i64 %12, i64 noundef 1) #26, !dbg !281
  %14 = icmp ult i64 %13, 2
  %.not76 = icmp eq i64 %2, 1
  %.not77 = icmp eq i64 %2, 2
  %15 = select i1 %.not77, i64 -2, i64 -1
  %.phi.trans.insert64 = addrspacecast {} addrspace(10)* %3 to {} addrspace(10)* addrspace(11)*
  %arraysize_ptr.phi.trans.insert = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %.phi.trans.insert64, i64 3
  %.phi.trans.insert65 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr.phi.trans.insert to i64 addrspace(11)*
  %arraysize_ptr31.phi.trans.insert = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %.phi.trans.insert64, i64 4
  %.phi.trans.insert68 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr31.phi.trans.insert to i64 addrspace(11)*
  %16 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*
  %17 = add i64 %2, -1
  %18 = addrspacecast {} addrspace(10)* %1 to {} addrspace(10)* addrspace(11)*
  %arraysize_ptr47 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 3
  %19 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr47 to i64 addrspace(11)*
  %arraysize_ptr50 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 4
  %20 = bitcast {} addrspace(10)* addrspace(11)* %arraysize_ptr50 to i64 addrspace(11)*
  %21 = addrspacecast {} addrspace(10)* %1 to float addrspace(13)* addrspace(11)*
  %22 = select i1 %.not76, i64 2, i64 3
  %23 = add nsw i64 %13, -2
  br label %L36, !dbg !282

L36:                                              ; preds = %L187, %L36.preheader
  %iv = phi i64 [ %iv.next, %L187 ], [ 0, %L36.preheader ]
  %24 = shl nuw i64 %iv, 1, !dbg !282
  %25 = add i64 %24, 2, !dbg !282
  %26 = add nuw i64 %iv, 2, !dbg !282
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !282
  br i1 %14, label %L187, label %L47.lr.ph, !dbg !282

L47.lr.ph:                                        ; preds = %L36
  %27 = shl nuw i64 %26, 1
  %28 = add i64 %27, -2
  %29 = add i64 %27, %15
  %.not79 = icmp sgt i64 %28, %29
  %30 = add i64 %27, -3
  %value_phi16 = select i1 %.not79, i64 %30, i64 %29
  %31 = icmp sgt i64 %28, %value_phi16
  %arraysize.pre = load i64, i64 addrspace(11)* %.phi.trans.insert65, align 8, !enzyme_inactive !0
  %arraysize32.pre = load i64, i64 addrspace(11)* %.phi.trans.insert68, align 16, !enzyme_inactive !0
  %arrayptr.pre80 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %16, align 16
  %32 = mul i64 %arraysize32.pre, %17
  %33 = add i64 %32, -1
  %arraysize48 = load i64, i64 addrspace(11)* %19, align 8, !enzyme_inactive !0
  %34 = add nsw i64 %26, -1
  %arraysize51 = load i64, i64 addrspace(11)* %20, align 16, !enzyme_inactive !0
  %arrayptr5482 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %21, align 16
  %35 = mul i64 %arraysize51, %17
  %reass.add85 = add i64 %34, %35
  %reass.mul86 = mul i64 %reass.add85, %arraysize48
  br label %L66, !dbg !283

L66:                                              ; preds = %L178, %L47.lr.ph
  %iv1 = phi i64 [ %iv.next2, %L178 ], [ 0, %L47.lr.ph ]
  %36 = shl nuw i64 %iv1, 1, !dbg !284
  %37 = add i64 %36, 2, !dbg !284
  %iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !284
  %38 = shl nuw i64 %iv1, 1, !dbg !287
  %39 = add nuw i64 %38, 2, !dbg !295
  %40 = add nuw i64 %22, %38, !dbg !295
  %.not78 = icmp sgt i64 %39, %40, !dbg !298
  %41 = or i64 %38, 1, !dbg !300
  %value_phi15 = select i1 %.not78, i64 %41, i64 %40, !dbg !300
  %42 = icmp sgt i64 %39, %value_phi15, !dbg !306
  %not. = or i1 %31, %42, !dbg !309
  br i1 %not., label %L178, label %L130.outer.preheader, !dbg !292

L130.outer.preheader:                             ; preds = %L66
  br label %L130.outer, !dbg !310

L130.outer:                                       ; preds = %L130.outer.preheader, %L148
  %iv3 = phi i64 [ 0, %L130.outer.preheader ], [ %iv.next4, %L148 ]
  %value_phi30.ph = phi float [ %48, %L148 ], [ 0.000000e+00, %L130.outer.preheader ]
  %43 = add i64 %25, %iv3
  %iv.next4 = add nuw nsw i64 %iv3, 1
  %reass.add = add i64 %33, %43
  %reass.mul = mul i64 %reass.add, %arraysize.pre
  %44 = add i64 %reass.mul, -1
  br label %L130, !dbg !310

L130:                                             ; preds = %L130, %L130.outer
  %iv5 = phi i64 [ %iv.next6, %L130 ], [ 0, %L130.outer ]
  %value_phi30 = phi float [ %48, %L130 ], [ %value_phi30.ph, %L130.outer ]
  %45 = add i64 %37, %iv5, !dbg !313
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !313
  %46 = add i64 %44, %45, !dbg !313
  %47 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80, i64 %46, !dbg !313
  %arrayref = load float, float addrspace(13)* %47, align 4, !dbg !313, !tbaa !134, !alias.scope !31, !noalias !34
  %48 = fadd fast float %arrayref, %value_phi30, !dbg !316
  %49 = add i64 %45, 1, !dbg !317
  %50 = icmp sgt i64 %39, %49, !dbg !319
  %51 = icmp sgt i64 %49, %value_phi15, !dbg !319
  %52 = or i1 %50, %51, !dbg !310
  %53 = icmp eq i64 %45, %value_phi15
  %or.cond = or i1 %53, %52, !dbg !310
  br i1 %or.cond, label %L148, label %L130, !dbg !310

L148:                                             ; preds = %L130
  %54 = add i64 %43, 1, !dbg !322
  %55 = icmp sle i64 %28, %54, !dbg !325
  %56 = icmp sle i64 %54, %value_phi16, !dbg !325
  %57 = and i1 %55, %56, !dbg !329
  %58 = icmp ne i64 %43, %value_phi16, !dbg !328
  %extract.t = and i1 %58, %57, !dbg !330
  br i1 %extract.t, label %L130.outer, label %L178.loopexit, !dbg !312

L178.loopexit:                                    ; preds = %L148
  br label %L178, !dbg !331

L178:                                             ; preds = %L178.loopexit, %L66
  %value_phi46 = phi float [ 0.000000e+00, %L66 ], [ %48, %L178.loopexit ]
  %59 = fmul fast float %value_phi46, 5.000000e-01, !dbg !331
  %60 = add i64 %iv.next2, %reass.mul86, !dbg !333
  %61 = getelementptr inbounds float, float addrspace(13)* %arrayptr5482, i64 %60, !dbg !333
  store float %59, float addrspace(13)* %61, align 4, !dbg !333, !tbaa !134, !alias.scope !31, !noalias !335
  %exitcond.not = icmp eq i64 %iv1, %23, !dbg !338
  br i1 %exitcond.not, label %L187.loopexit, label %L66, !dbg !283, !llvm.loop !339

L187.loopexit:                                    ; preds = %L178
  br label %L187, !dbg !340

L187:                                             ; preds = %L187.loopexit, %L36
  %62 = add nuw i64 %26, 1, !dbg !340
  %63 = icmp slt i64 %62, 2, !dbg !344
  %64 = icmp sgt i64 %62, %9, !dbg !344
  %65 = icmp eq i64 %26, %9, !dbg !347
  %not.not.84 = or i1 %63, %64, !dbg !347
  %narrow83 = or i1 %65, %not.not.84, !dbg !347
  br i1 %narrow83, label %L208.loopexit, label %L36, !dbg !343

L208.loopexit:                                    ; preds = %L187
  br label %L208, !dbg !270

L208:                                             ; preds = %L208.loopexit, %top
  ret void, !dbg !270
}

Illegal replace ficticious phi for:   %unbox_replacementA = phi i64 , !dbg !21 of   %unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !28, !tbaa !16, !alias.scope !34, !noalias !37
; Function Attrs: mustprogress nofree willreturn
define internal fastcc void @diffejulia___kern_421_131_9980({ [2 x i64], {} addrspace(10)* } addrspace(11)* nocapture nofree readonly align 8 dereferenceable(24) "enzyme_type"="{[-1]:Pointer, [-1,0]:Integer, [-1,1]:Integer, [-1,2]:Integer, [-1,3]:Integer, [-1,4]:Integer, [-1,5]:Integer, [-1,6]:Integer, [-1,7]:Integer, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Pointer, [-1,16,-1]:Pointer}" "enzymejl_parmtype"="5475090832" "enzymejl_parmtype_ref"="1" %0, {} addrspace(10)* nocapture nofree readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %1, {} addrspace(10)* nocapture nofree align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %"'", i64 signext "enzyme_inactive" "enzyme_type"="{[-1]:Integer}" "enzymejl_parmtype"="4855344304" "enzymejl_parmtype_ref"="0" %2, {} addrspace(10)* nocapture nofree readonly align 16 dereferenceable(40) "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %3, {} addrspace(10)* nocapture nofree align 16 "enzyme_type"="{[-1]:Pointer, [-1,0]:Pointer, [-1,0,-1]:Float@float, [-1,8]:Integer, [-1,9]:Integer, [-1,10]:Integer, [-1,11]:Integer, [-1,12]:Integer, [-1,13]:Integer, [-1,14]:Integer, [-1,15]:Integer, [-1,16]:Integer, [-1,17]:Integer, [-1,18]:Integer, [-1,19]:Integer, [-1,20]:Integer, [-1,21]:Integer, [-1,22]:Integer, [-1,23]:Integer, [-1,24]:Integer, [-1,25]:Integer, [-1,26]:Integer, [-1,27]:Integer, [-1,28]:Integer, [-1,29]:Integer, [-1,30]:Integer, [-1,31]:Integer, [-1,32]:Integer, [-1,33]:Integer, [-1,34]:Integer, [-1,35]:Integer, [-1,36]:Integer, [-1,37]:Integer, [-1,38]:Integer, [-1,39]:Integer}" "enzymejl_parmtype"="4548309328" "enzymejl_parmtype_ref"="2" %"'1", { i64, i64, i64*, i64** } %tapeArg) unnamed_addr #22 !dbg !631 {
top:
  %4 = call {}*** @julia.get_pgcstack() #26
  %ptls_field72_replacementA = phi {}*** 
  %_replacementA14 = phi i64*** 
  %ptls_load7374_replacementA = phi i64** 
  %_replacementA13 = phi i64** 
  %safepoint_replacementA = phi i64* 
  %_replacementA12 = phi i64 addrspace(11)* , !dbg !632
  %unbox2_replacementA = phi i64 , !dbg !636
  %_replacementA = phi i64 , !dbg !636
  %5 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !638
  %6 = icmp ult i64 %5, 2, !dbg !641
  br i1 %6, label %L208, label %L36.preheader, !dbg !645

L36.preheader:                                    ; preds = %top
  %_replacementA21 = phi i64 addrspace(11)* , !dbg !632
  %unbox_replacementA = phi i64 , !dbg !636
  %_replacementA20 = phi i64 , !dbg !636
  %7 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !646
  %8 = icmp ult i64 %7, 2
  %.not76 = icmp eq i64 %2, 1
  %.not77 = icmp eq i64 %2, 2
  %9 = select i1 %.not77, i64 -2, i64 -1
  %.phi.trans.insert64_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %arraysize_ptr.phi.trans.insert_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %arraysize_ptr31.phi.trans.insert_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %.phi.trans.insert68_replacementA = phi i64 addrspace(11)* 
  %"'ipc26" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*
  %10 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*
  %_replacementA19 = phi i64 
  %_replacementA18 = phi {} addrspace(10)* addrspace(11)* 
  %arraysize_ptr47_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %_replacementA17 = phi i64 addrspace(11)* 
  %arraysize_ptr50_replacementA = phi {} addrspace(10)* addrspace(11)* 
  %_replacementA16 = phi i64 addrspace(11)* 
  %"'ipc" = addrspacecast {} addrspace(10)* %"'" to float addrspace(13)* addrspace(11)*
  %_replacementA15 = phi float addrspace(13)* addrspace(11)* 
  %11 = select i1 %.not76, i64 2, i64 3
  %12 = add i64 %7, -2
  %13 = add nsw i64 %5, -2, !dbg !647
  %14 = add nuw nsw i64 %5, 1, !dbg !647
  %smax = call i64 @llvm.smax.i64(i64 %14, i64 3), !dbg !647
  %15 = add nsw i64 %smax, -3, !dbg !647
  %umin = call i64 @llvm.umin.i64(i64 %13, i64 %15), !dbg !647
  %16 = add nuw i64 %umin, 1, !dbg !647
  %17 = add nuw i64 %12, 1, !dbg !647
  %18 = mul nuw nsw i64 %17, %16, !dbg !647
  %19 = mul nuw i64 %18, 8, !dbg !647
  %20 = call noalias nonnull i8* @malloc(i64 %19), !dbg !647, !enzyme_cache_alloc !648
  %loopLimit_malloccache = bitcast i8* %20 to i64*, !dbg !647
  store i64* %loopLimit_malloccache, i64** %loopLimit_cache, align 8, !dbg !647, !invariant.group !650
  store i64 %7, i64* %_cache81, align 8, !dbg !647, !invariant.group !651
  store i64 %unbox_replacementA, i64* %unbox_cache, align 8, !dbg !647, !tbaa !16, !invariant.group !652
  %21 = mul nuw i64 %18, 8, !dbg !647
  %22 = call noalias nonnull i8* @malloc(i64 %21), !dbg !647, !enzyme_cache_alloc !653
  %loopLimit_malloccache3 = bitcast i8* %22 to i64**, !dbg !647
  store i64** %loopLimit_malloccache3, i64*** %loopLimit_cache2, align 8, !dbg !647, !invariant.group !655
  %23 = mul nuw i64 %18, 8, !dbg !647
  %24 = mul nuw i64 %16, 8, !dbg !647
  br label %L36, !dbg !647

L36:                                              ; preds = %L187, %L36.preheader
  %iv = phi i64 [ %iv.next, %L187 ], [ 0, %L36.preheader ]
  %iv.next = add nuw nsw i64 %iv, 1, !dbg !647
  %25 = shl nuw i64 %iv, 1, !dbg !647
  %26 = add i64 %25, 2, !dbg !647
  %27 = add nuw i64 %iv, 2, !dbg !647
  br i1 %8, label %L187, label %L47.lr.ph, !dbg !647

L47.lr.ph:                                        ; preds = %L36
  %28 = shl nuw i64 %27, 1
  %29 = add i64 %28, -2
  %30 = add i64 %28, %9
  %.not79 = icmp sgt i64 %29, %30
  %31 = add i64 %28, -3
  %value_phi16 = select i1 %.not79, i64 %31, i64 %30
  %32 = icmp sgt i64 %29, %value_phi16
  %arraysize.pre_replacementA = phi i64 
  %arraysize32.pre_replacementA = phi i64 
  %"arrayptr.pre80'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc26", align 16, !alias.scope !656, !noalias !659, !invariant.group !661
  %arrayptr.pre80 = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %10, align 16, !alias.scope !659, !noalias !656, !invariant.group !662
  %_replacementA25 = phi i64 
  %_replacementA24 = phi i64 
  %arraysize48_replacementA = phi i64 
  %_replacementA23 = phi i64 
  %arraysize51_replacementA = phi i64 
  %"arrayptr5482'ipl" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc", align 16, !alias.scope !663, !noalias !666, !invariant.group !668
  %arrayptr5482_replacementA = phi float addrspace(13)* 
  %_replacementA22 = phi i64 
  %reass.add85_replacementA = phi i64 
  %33 = load i64*, i64** %mdyncache_fromtape_cache93, align 8, !dbg !669, !dereferenceable !236, !invariant.group !670
  %34 = getelementptr inbounds i64, i64* %33, i64 %iv, !dbg !669
  %reass.mul86 = load i64, i64* %34, align 8, !dbg !669, !invariant.group !671
  br label %L66, !dbg !669

L66:                                              ; preds = %L178, %L47.lr.ph
  %iv1 = phi i64 [ %iv.next2, %L178 ], [ 0, %L47.lr.ph ]
  %iv.next2 = add nuw nsw i64 %iv1, 1, !dbg !672
  %35 = shl nuw i64 %iv1, 1, !dbg !672
  %36 = add i64 %35, 2, !dbg !672
  %37 = shl nuw i64 %iv1, 1, !dbg !675
  %38 = add nuw i64 %37, 2, !dbg !683
  %39 = add nuw i64 %11, %37, !dbg !683
  %.not78 = icmp sgt i64 %38, %39, !dbg !686
  %40 = or i64 %37, 1, !dbg !688
  %value_phi15 = select i1 %.not78, i64 %40, i64 %39, !dbg !688
  %41 = icmp sgt i64 %38, %value_phi15, !dbg !694
  %not. = or i1 %32, %41, !dbg !697
  br i1 %not., label %L178, label %L130.outer.preheader, !dbg !680

L130.outer.preheader:                             ; preds = %L66
  %42 = mul nuw nsw i64 %17, %16, !dbg !698
  %43 = mul nuw nsw i64 %iv, %17, !dbg !698
  %44 = add nuw nsw i64 %iv1, %43, !dbg !698
  %45 = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !698, !invariant.group !701
  %46 = getelementptr inbounds i64*, i64** %45, i64 %44, !dbg !698
  store i64* null, i64** %46, align 8, !dbg !698
  %47 = mul nuw nsw i64 %17, %16, !dbg !698
  %48 = mul nuw nsw i64 %iv, %17, !dbg !698
  %49 = add nuw nsw i64 %iv1, %48, !dbg !698
  %50 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3, !dbg !698
  %51 = getelementptr inbounds i64*, i64** %50, i64 %49, !dbg !698
  %52 = mul nuw nsw i64 %17, %16, !dbg !698
  %53 = mul nuw nsw i64 %iv, %17, !dbg !698
  %54 = add nuw nsw i64 %iv1, %53, !dbg !698
  %55 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !invariant.group !702
  %56 = getelementptr inbounds i64*, i64** %55, i64 %54, !dbg !698
  br label %L130.outer, !dbg !698

L130.outer:                                       ; preds = %L148, %L130.outer.preheader
  %iv3 = phi i64 [ 0, %L130.outer.preheader ], [ %iv.next4, %L148 ]
  %value_phi30.ph_replacementA = phi float 
  %iv.next4 = add nuw nsw i64 %iv3, 1
  %57 = load i64*, i64** %51, align 8
  %58 = load i64*, i64** %46, align 8
  %59 = bitcast i64* %58 to i8*
  %loopLimit_realloccache = call i8* @__enzyme_exponentialallocationzero(i8* %59, i64 %iv.next4, i64 8)
  %60 = bitcast i8* %loopLimit_realloccache to i64*
  store i64* %60, i64** %46, align 8
  %61 = add i64 %26, %iv3
  %reass.mul_replacementA = phi i64 
  %62 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !dereferenceable !236, !invariant.group !703
  %63 = mul nuw nsw i64 %17, %16, !dbg !698
  %64 = mul nuw nsw i64 %iv, %17, !dbg !698
  %65 = add nuw nsw i64 %iv1, %64, !dbg !698
  %66 = getelementptr inbounds i64*, i64** %62, i64 %65, !dbg !698
  %67 = load i64*, i64** %66, align 8, !dbg !698, !dereferenceable !236, !invariant.group !704
  %68 = getelementptr inbounds i64, i64* %67, i64 %iv3, !dbg !698
  %69 = load i64, i64* %68, align 8, !dbg !698, !invariant.group !705
  %70 = mul nuw nsw i64 %17, %16, !dbg !698
  %71 = mul nuw nsw i64 %iv, %17, !dbg !698
  %72 = add nuw nsw i64 %iv1, %71, !dbg !698
  br label %L130, !dbg !698

L130:                                             ; preds = %L130, %L130.outer
  %iv5 = phi i64 [ %iv.next6, %L130 ], [ 0, %L130.outer ]
  %value_phi30_replacementA = phi float 
  %iv.next6 = add nuw nsw i64 %iv5, 1, !dbg !706
  %73 = add i64 %36, %iv5, !dbg !706
  %74 = add i64 %69, %73, !dbg !706
  %"'ipg" = getelementptr inbounds float, float addrspace(13)* %"arrayptr.pre80'ipl", i64 %74, !dbg !706
  %75 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80, i64 %74, !dbg !706
  %arrayref_replacementA = phi float , !dbg !706
  %_replacementA27_replacementA = phi float , !dbg !709
  %76 = add i64 %73, 1, !dbg !710
  %77 = icmp sgt i64 %38, %76, !dbg !712
  %78 = icmp sgt i64 %76, %value_phi15, !dbg !712
  %79 = or i1 %77, %78, !dbg !698
  %80 = icmp eq i64 %73, %value_phi15
  %or.cond = or i1 %80, %79, !dbg !698
  br i1 %or.cond, label %L148, label %L130, !dbg !698

L148:                                             ; preds = %L130
  %81 = phi i64 [ %iv5, %L130 ], !dbg !715
  %82 = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !715, !dereferenceable !236, !invariant.group !655
  %83 = mul nuw nsw i64 %17, %16, !dbg !715
  %84 = mul nuw nsw i64 %iv, %17, !dbg !715
  %85 = add nuw nsw i64 %iv1, %84, !dbg !715
  %86 = getelementptr inbounds i64*, i64** %82, i64 %85, !dbg !715
  %87 = load i64*, i64** %86, align 8, !dbg !715, !dereferenceable !236, !invariant.group !718
  %88 = getelementptr inbounds i64, i64* %87, i64 %iv3, !dbg !715
  store i64 %81, i64* %88, align 8, !dbg !715, !invariant.group !719
  %89 = add i64 %61, 1, !dbg !715
  %90 = icmp sle i64 %29, %89, !dbg !720
  %91 = icmp sle i64 %89, %value_phi16, !dbg !720
  %92 = and i1 %90, %91, !dbg !724
  %93 = icmp ne i64 %61, %value_phi16, !dbg !723
  %extract.t = and i1 %93, %92, !dbg !725
  br i1 %extract.t, label %L130.outer, label %L178.loopexit, !dbg !700

L178.loopexit:                                    ; preds = %L148
  %94 = phi i64 [ %iv3, %L148 ], !dbg !726
  %95 = load i64*, i64** %loopLimit_cache, align 8, !dbg !726, !dereferenceable !236, !invariant.group !650
  %96 = mul nuw nsw i64 %17, %16, !dbg !726
  %97 = mul nuw nsw i64 %iv, %17, !dbg !726
  %98 = add nuw nsw i64 %iv1, %97, !dbg !726
  %99 = getelementptr inbounds i64, i64* %95, i64 %98, !dbg !726
  store i64 %94, i64* %99, align 8, !dbg !726, !invariant.group !730
  br label %L178, !dbg !726

L178:                                             ; preds = %L178.loopexit, %L66
  %value_phi46_replacementA = phi float 
  %_replacementA67_replacementA = phi float , !dbg !726
  %100 = add i64 %iv.next2, %reass.mul86, !dbg !728
  %"'ipg58" = getelementptr inbounds float, float addrspace(13)* %"arrayptr5482'ipl", i64 %100, !dbg !728
  %_replacementA66 = phi float addrspace(13)* , !dbg !728
  %exitcond.not = icmp eq i64 %iv1, %12, !dbg !731
  br i1 %exitcond.not, label %L187.loopexit, label %L66, !dbg !669, !llvm.loop !732

L187.loopexit:                                    ; preds = %L178
  br label %L187, !dbg !733

L187:                                             ; preds = %L187.loopexit, %L36
  %101 = add nuw i64 %27, 1, !dbg !733
  %102 = icmp slt i64 %101, 2, !dbg !737
  %103 = icmp sgt i64 %101, %5, !dbg !737
  %104 = icmp eq i64 %27, %5, !dbg !740
  %not.not.84 = or i1 %102, %103, !dbg !740
  %narrow83 = or i1 %104, %not.not.84, !dbg !740
  br i1 %narrow83, label %L208.loopexit, label %L36, !dbg !736

L208.loopexit:                                    ; preds = %L187
  br label %L208, !dbg !635

L208:                                             ; preds = %L208.loopexit, %top
  br label %invertL208, !dbg !635

allocsForInversion:                               ; No predecessors!
  %"iv'ac" = alloca i64, align 8
  %"iv1'ac" = alloca i64, align 8
  %"iv3'ac" = alloca i64, align 8
  %loopLimit_cache = alloca i64*, align 8
  %"iv5'ac" = alloca i64, align 8
  %loopLimit_cache2 = alloca i64**, align 8
  %unbox_cache = alloca i64, align 8
  %"value_phi30.ph'de" = alloca float, align 4
  %105 = getelementptr float, float* %"value_phi30.ph'de", i64 0
  store float 0.000000e+00, float* %105, align 4
  %"'de" = alloca float, align 4
  %106 = getelementptr float, float* %"'de", i64 0
  store float 0.000000e+00, float* %106, align 4
  %"arrayref'de" = alloca float, align 4
  %107 = getelementptr float, float* %"arrayref'de", i64 0
  store float 0.000000e+00, float* %107, align 4
  %"value_phi30'de" = alloca float, align 4
  %108 = getelementptr float, float* %"value_phi30'de", i64 0
  store float 0.000000e+00, float* %108, align 4
  %_cache = alloca i64**, align 8
  %reass.mul86_cache = alloca i64*, align 8
  %"'de65" = alloca float, align 4
  %109 = getelementptr float, float* %"'de65", i64 0
  store float 0.000000e+00, float* %109, align 4
  %"value_phi46'de" = alloca float, align 4
  %110 = getelementptr float, float* %"value_phi46'de", i64 0
  store float 0.000000e+00, float* %110, align 4
  %_cache81 = alloca i64, align 8
  %111 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3
  %mdyncache_fromtape_cache = alloca i64**, align 8
  store i64** %111, i64*** %mdyncache_fromtape_cache, align 8
  %112 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 2
  %mdyncache_fromtape_cache93 = alloca i64*, align 8
  store i64* %112, i64** %mdyncache_fromtape_cache93, align 8

inverttop:                                        ; preds = %invertL208, %invertL36.preheader
  fence syncscope("singlethread") seq_cst
  fence syncscope("singlethread") seq_cst
  ret void

invertL36.preheader:                              ; preds = %invertL36
  %113 = load i64, i64* %"iv'ac", align 8
  %114 = load i64, i64* %"iv1'ac", align 8
  %forfree = load i64*, i64** %loopLimit_cache, align 8, !dereferenceable !236, !invariant.group !650
  %115 = bitcast i64* %forfree to i8*
  call void @free(i8* nonnull %115), !dbg !741, !enzyme_cache_free !648
  %116 = load i64, i64* %"iv'ac", align 8
  %117 = load i64, i64* %"iv1'ac", align 8
  %forfree4 = load i64**, i64*** %loopLimit_cache2, align 8, !dereferenceable !236, !invariant.group !655
  %118 = bitcast i64** %forfree4 to i8*
  call void @free(i8* nonnull %118), !dbg !741, !enzyme_cache_free !653
  %119 = load i64, i64* %"iv'ac", align 8
  %120 = load i64, i64* %"iv1'ac", align 8
  %121 = load i64, i64* %"iv'ac", align 8
  %122 = load i64, i64* %"iv'ac", align 8
  %123 = load i64, i64* %"iv1'ac", align 8
  %forfree87 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dereferenceable !236, !invariant.group !703
  %124 = bitcast i64** %forfree87 to i8*
  call void @free(i8* nonnull %124), !dbg !741
  %125 = load i64, i64* %"iv'ac", align 8
  %forfree94 = load i64*, i64** %mdyncache_fromtape_cache93, align 8, !dereferenceable !236, !invariant.group !670
  %126 = bitcast i64* %forfree94 to i8*
  call void @free(i8* nonnull %126), !dbg !741
  br label %inverttop

invertL36:                                        ; preds = %invertL187, %invertL47.lr.ph
  %127 = load i64, i64* %"iv'ac", align 8
  %128 = icmp eq i64 %127, 0
  %129 = xor i1 %128, true
  br i1 %128, label %invertL36.preheader, label %incinvertL36

incinvertL36:                                     ; preds = %invertL36
  %130 = load i64, i64* %"iv'ac", align 8
  %131 = add nsw i64 %130, -1
  store i64 %131, i64* %"iv'ac", align 8
  br label %invertL187

invertL47.lr.ph:                                  ; preds = %invertL66
  br label %invertL36

invertL66:                                        ; preds = %invertL178, %invertL130.outer.preheader
  %132 = load i64, i64* %"iv1'ac", align 8
  %133 = icmp eq i64 %132, 0
  %134 = xor i1 %133, true
  br i1 %133, label %invertL47.lr.ph, label %incinvertL66

incinvertL66:                                     ; preds = %invertL66
  %135 = load i64, i64* %"iv1'ac", align 8
  %136 = add nsw i64 %135, -1
  store i64 %136, i64* %"iv1'ac", align 8
  br label %invertL178

invertL130.outer.preheader:                       ; preds = %invertL130.outer
  %137 = load i64, i64* %"iv'ac", align 8
  %138 = load i64, i64* %"iv1'ac", align 8
  %139 = load i64, i64* %"iv3'ac", align 8
  %_unwrap = load i64**, i64*** %loopLimit_cache2, align 8, !dbg !698, !invariant.group !701
  %140 = load i64, i64* %unbox_cache, align 8, !dbg !636, !tbaa !16, !alias.scope !64, !noalias !65, !invariant.group !652
  %_unwrap5 = add i64 %140, -1
  %_unwrap97 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !646
  %_unwrap6 = add i64 %_unwrap97, -2
  %_unwrap7 = add nuw i64 %_unwrap6, 1
  %_unwrap8 = mul nuw nsw i64 %137, %_unwrap7
  %_unwrap9 = add nuw nsw i64 %138, %_unwrap8
  %_unwrap10 = getelementptr inbounds i64*, i64** %_unwrap, i64 %_unwrap9
  %forfree11 = load i64*, i64** %_unwrap10, align 8, !dereferenceable !236, !invariant.group !718
  %141 = bitcast i64* %forfree11 to i8*
  call void @free(i8* nonnull %141), !dbg !741
  %142 = load i64, i64* %"iv3'ac", align 8
  %143 = load i64, i64* %"iv'ac", align 8
  %144 = load i64, i64* %"iv1'ac", align 8
  %145 = load i64, i64* %"iv3'ac", align 8
  %_unwrap88 = load i64**, i64*** %mdyncache_fromtape_cache, align 8, !dbg !698, !invariant.group !702
  %_unwrap89 = mul nuw nsw i64 %143, %_unwrap7
  %_unwrap90 = add nuw nsw i64 %144, %_unwrap89
  %_unwrap91 = getelementptr inbounds i64*, i64** %_unwrap88, i64 %_unwrap90
  %forfree92 = load i64*, i64** %_unwrap91, align 8, !dereferenceable !236, !invariant.group !704
  %146 = bitcast i64* %forfree92 to i8*
  call void @free(i8* nonnull %146), !dbg !741
  br label %invertL66

invertL130.outer:                                 ; preds = %invertL130_amerge
  %147 = load float, float* %"value_phi30.ph'de", align 4
  store float 0.000000e+00, float* %"value_phi30.ph'de", align 4
  %148 = load i64, i64* %"iv3'ac", align 8
  %149 = icmp eq i64 %148, 0
  %150 = xor i1 %149, true
  %151 = select fast i1 %150, float %147, float 0.000000e+00
  %152 = load float, float* %"'de", align 4
  %153 = fadd fast float %152, %147
  %154 = select fast i1 %149, float %152, float %153
  store float %154, float* %"'de", align 4
  br i1 %149, label %invertL130.outer.preheader, label %incinvertL130.outer

incinvertL130.outer:                              ; preds = %invertL130.outer
  %155 = load i64, i64* %"iv3'ac", align 8
  %156 = add nsw i64 %155, -1
  store i64 %156, i64* %"iv3'ac", align 8
  br label %invertL148

invertL130:                                       ; preds = %mergeinvertL130_L148, %incinvertL130
  %157 = load float, float* %"'de", align 4, !dbg !709
  store float 0.000000e+00, float* %"'de", align 4, !dbg !709
  %158 = load float, float* %"arrayref'de", align 4, !dbg !709
  %159 = fadd fast float %158, %157, !dbg !709
  store float %159, float* %"arrayref'de", align 4, !dbg !709
  %160 = load float, float* %"value_phi30'de", align 4, !dbg !709
  %161 = fadd fast float %160, %157, !dbg !709
  store float %161, float* %"value_phi30'de", align 4, !dbg !709
  %162 = load float, float* %"arrayref'de", align 4, !dbg !706
  store float 0.000000e+00, float* %"arrayref'de", align 4, !dbg !706
  %163 = load i64, i64* %"iv5'ac", align 8, !dbg !706
  %164 = load i64, i64* %"iv3'ac", align 8, !dbg !706
  %165 = load i64, i64* %"iv1'ac", align 8, !dbg !706
  %166 = load i64, i64* %"iv'ac", align 8, !dbg !706
  %_unwrap28 = addrspacecast {} addrspace(10)* %3 to float addrspace(13)* addrspace(11)*, !dbg !706
  %arrayptr.pre80_unwrap = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %_unwrap28, align 16, !alias.scope !659, !noalias !656, !invariant.group !662
  %_unwrap101 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !706
  %_unwrap35 = add nsw i64 %_unwrap101, -2, !dbg !706
  %_unwrap36 = add nuw nsw i64 %_unwrap101, 1, !dbg !706
  %167 = call i64 @llvm.smax.i64(i64 %_unwrap36, i64 3), !dbg !647
  %_unwrap37 = add nsw i64 %167, -3, !dbg !706
  %168 = call i64 @llvm.umin.i64(i64 %_unwrap35, i64 %_unwrap37), !dbg !647
  %169 = add nuw i64 %168, 1, !dbg !706
  %_unwrap96 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1, !dbg !706
  %_unwrap39 = add i64 %_unwrap96, -2, !dbg !706
  %170 = add nuw i64 %_unwrap39, 1, !dbg !706
  %171 = mul nuw nsw i64 %170, %169, !dbg !706
  %172 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 3, !dbg !706
  %173 = mul nuw nsw i64 %170, %169, !dbg !706
  %174 = mul nuw nsw i64 %166, %170, !dbg !706
  %175 = add nuw nsw i64 %165, %174, !dbg !706
  %176 = getelementptr inbounds i64*, i64** %172, i64 %175, !dbg !706
  %177 = load i64*, i64** %176, align 8, !dbg !706, !dereferenceable !236, !invariant.group !742
  %178 = getelementptr inbounds i64, i64* %177, i64 %164, !dbg !706
  %179 = load i64, i64* %178, align 8, !dbg !706, !invariant.group !743
  %_unwrap40 = shl nuw i64 %165, 1, !dbg !706
  %_unwrap41 = add i64 %_unwrap40, 2, !dbg !706
  %_unwrap42 = add i64 %_unwrap41, %163, !dbg !706
  %_unwrap43 = add i64 %179, %_unwrap42, !dbg !706
  %_unwrap44 = getelementptr inbounds float, float addrspace(13)* %arrayptr.pre80_unwrap, i64 %_unwrap43, !dbg !706
  %180 = load i64, i64* %"iv5'ac", align 8, !dbg !706
  %181 = load i64, i64* %"iv3'ac", align 8, !dbg !706
  %182 = load i64, i64* %"iv1'ac", align 8, !dbg !706
  %183 = load i64, i64* %"iv'ac", align 8, !dbg !706
  %"'ipc26_unwrap" = addrspacecast {} addrspace(10)* %"'1" to float addrspace(13)* addrspace(11)*, !dbg !706
  %"arrayptr.pre80'ipl_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc26_unwrap", align 16, !alias.scope !656, !noalias !659, !invariant.group !661
  %"'ipg_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr.pre80'ipl_unwrap", i64 %_unwrap43, !dbg !706
  %184 = icmp ne float addrspace(13)* %_unwrap44, %"'ipg_unwrap", !dbg !706
  br i1 %184, label %invertL130_active, label %invertL130_amerge, !dbg !706

invertL130_active:                                ; preds = %invertL130
  %185 = load float, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !706, !tbaa !134, !alias.scope !744, !noalias !747
  %186 = fadd fast float %185, %162, !dbg !706
  store float %186, float addrspace(13)* %"'ipg_unwrap", align 4, !dbg !706, !tbaa !134, !alias.scope !744, !noalias !747
  br label %invertL130_amerge, !dbg !706

invertL130_amerge:                                ; preds = %invertL130_active, %invertL130
  %187 = load float, float* %"value_phi30'de", align 4
  store float 0.000000e+00, float* %"value_phi30'de", align 4
  %188 = load i64, i64* %"iv5'ac", align 8
  %189 = icmp eq i64 %188, 0
  %190 = xor i1 %189, true
  %191 = select fast i1 %190, float %187, float 0.000000e+00
  %192 = load float, float* %"'de", align 4
  %193 = fadd fast float %192, %187
  %194 = select fast i1 %189, float %192, float %193
  store float %194, float* %"'de", align 4
  %195 = select fast i1 %189, float %187, float 0.000000e+00
  %196 = load float, float* %"value_phi30.ph'de", align 4
  %197 = fadd fast float %196, %187
  %198 = select fast i1 %189, float %197, float %196
  store float %198, float* %"value_phi30.ph'de", align 4
  br i1 %189, label %invertL130.outer, label %incinvertL130

incinvertL130:                                    ; preds = %invertL130_amerge
  %199 = load i64, i64* %"iv5'ac", align 8
  %200 = add nsw i64 %199, -1
  store i64 %200, i64* %"iv5'ac", align 8
  br label %invertL130

invertL148:                                       ; preds = %mergeinvertL130.outer_L178.loopexit, %incinvertL130.outer
  %_unwrap102 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0
  %_unwrap47 = add nsw i64 %_unwrap102, -2
  %_unwrap48 = add nuw nsw i64 %_unwrap102, 1
  %201 = call i64 @llvm.smax.i64(i64 %_unwrap48, i64 3), !dbg !647
  %_unwrap49 = add nsw i64 %201, -3
  %202 = call i64 @llvm.umin.i64(i64 %_unwrap47, i64 %_unwrap49), !dbg !647
  %203 = add nuw i64 %202, 1
  %_unwrap98 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1
  %_unwrap51 = add i64 %_unwrap98, -2
  %204 = add nuw i64 %_unwrap51, 1
  %205 = mul nuw nsw i64 %204, %203
  %206 = load i64**, i64*** %loopLimit_cache2, align 8, !dereferenceable !236, !invariant.group !655
  %207 = load i64, i64* %"iv1'ac", align 8
  %208 = load i64, i64* %"iv'ac", align 8
  %209 = mul nuw nsw i64 %204, %203
  %210 = mul nuw nsw i64 %208, %204
  %211 = add nuw nsw i64 %207, %210
  %212 = getelementptr inbounds i64*, i64** %206, i64 %211
  %213 = load i64*, i64** %212, align 8, !dereferenceable !236, !invariant.group !718
  %214 = load i64, i64* %"iv3'ac", align 8
  %215 = getelementptr inbounds i64, i64* %213, i64 %214
  %216 = load i64, i64* %215, align 8, !invariant.group !719
  br label %mergeinvertL130_L148

mergeinvertL130_L148:                             ; preds = %invertL148
  store i64 %216, i64* %"iv5'ac", align 8
  br label %invertL130

invertL178.loopexit:                              ; preds = %invertL178
  %_unwrap99 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0
  %_unwrap53 = add nsw i64 %_unwrap99, -2
  %_unwrap54 = add nuw nsw i64 %_unwrap99, 1
  %217 = call i64 @llvm.smax.i64(i64 %_unwrap54, i64 3), !dbg !647
  %_unwrap55 = add nsw i64 %217, -3
  %218 = call i64 @llvm.umin.i64(i64 %_unwrap53, i64 %_unwrap55), !dbg !647
  %219 = add nuw i64 %218, 1
  %_unwrap95 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 1
  %_unwrap57 = add i64 %_unwrap95, -2
  %220 = add nuw i64 %_unwrap57, 1
  %221 = mul nuw nsw i64 %220, %219
  %222 = load i64*, i64** %loopLimit_cache, align 8, !dereferenceable !236, !invariant.group !650
  %223 = load i64, i64* %"iv1'ac", align 8
  %224 = load i64, i64* %"iv'ac", align 8
  %225 = mul nuw nsw i64 %220, %219
  %226 = mul nuw nsw i64 %224, %220
  %227 = add nuw nsw i64 %223, %226
  %228 = getelementptr inbounds i64, i64* %222, i64 %227
  %229 = load i64, i64* %228, align 8, !invariant.group !730
  br label %mergeinvertL130.outer_L178.loopexit

mergeinvertL130.outer_L178.loopexit:              ; preds = %invertL178.loopexit
  store i64 %229, i64* %"iv3'ac", align 8
  br label %invertL148

invertL178:                                       ; preds = %mergeinvertL66_L187.loopexit, %incinvertL66
  %230 = load i64, i64* %"iv1'ac", align 8, !dbg !728
  %231 = load i64, i64* %"iv'ac", align 8, !dbg !728
  %"'ipc_unwrap" = addrspacecast {} addrspace(10)* %"'" to float addrspace(13)* addrspace(11)*, !dbg !728
  %"arrayptr5482'ipl_unwrap" = load float addrspace(13)*, float addrspace(13)* addrspace(11)* %"'ipc_unwrap", align 16, !alias.scope !663, !noalias !666, !invariant.group !668
  %iv.next2_unwrap = add nuw nsw i64 %230, 1, !dbg !728
  %_unwrap100 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 0, !dbg !728
  %_unwrap61 = add nsw i64 %_unwrap100, -2, !dbg !728
  %_unwrap62 = add nuw nsw i64 %_unwrap100, 1, !dbg !728
  %232 = call i64 @llvm.smax.i64(i64 %_unwrap62, i64 3), !dbg !647
  %_unwrap63 = add nsw i64 %232, -3, !dbg !728
  %233 = call i64 @llvm.umin.i64(i64 %_unwrap61, i64 %_unwrap63), !dbg !647
  %234 = add nuw i64 %233, 1, !dbg !728
  %235 = extractvalue { i64, i64, i64*, i64** } %tapeArg, 2, !dbg !728
  %236 = getelementptr inbounds i64, i64* %235, i64 %231, !dbg !728
  %237 = load i64, i64* %236, align 8, !dbg !728, !invariant.group !749
  %_unwrap64 = add i64 %iv.next2_unwrap, %237, !dbg !728
  %"'ipg58_unwrap" = getelementptr inbounds float, float addrspace(13)* %"arrayptr5482'ipl_unwrap", i64 %_unwrap64, !dbg !728
  %238 = load float, float addrspace(13)* %"'ipg58_unwrap", align 4, !dbg !728, !tbaa !134, !alias.scope !750, !noalias !753
  store float 0.000000e+00, float addrspace(13)* %"'ipg58_unwrap", align 4, !dbg !728, !tbaa !134, !alias.scope !750, !noalias !753
  %239 = load float, float* %"'de65", align 4, !dbg !728
  %240 = fadd fast float %239, %238, !dbg !728
  store float %240, float* %"'de65", align 4, !dbg !728
  %241 = load float, float* %"'de65", align 4, !dbg !726
  store float 0.000000e+00, float* %"'de65", align 4, !dbg !726
  %242 = fmul fast float %241, 5.000000e-01, !dbg !726
  %243 = load float, float* %"value_phi46'de", align 4, !dbg !726
  %244 = fadd fast float %243, %242, !dbg !726
  store float %244, float* %"value_phi46'de", align 4, !dbg !726
  %245 = load float, float* %"value_phi46'de", align 4
  store float 0.000000e+00, float* %"value_phi46'de", align 4
  %246 = load i64, i64* %"iv1'ac", align 8
  %247 = load i64, i64* %"iv'ac", align 8
  %_unwrap68 = add nuw i64 %247, 2
  %_unwrap69 = shl nuw i64 %_unwrap68, 1
  %_unwrap70 = add i64 %_unwrap69, -2
  %.not77_unwrap = icmp eq i64 %2, 2
  %_unwrap71 = select i1 %.not77_unwrap, i64 -2, i64 -1
  %_unwrap72 = add i64 %_unwrap69, %_unwrap71
  %.not79_unwrap = icmp sgt i64 %_unwrap70, %_unwrap72
  %_unwrap73 = add i64 %_unwrap69, -3
  %value_phi16_unwrap = select i1 %.not79_unwrap, i64 %_unwrap73, i64 %_unwrap72
  %_unwrap74 = icmp sgt i64 %_unwrap70, %value_phi16_unwrap
  %_unwrap75 = shl nuw i64 %246, 1
  %_unwrap76 = add nuw i64 %_unwrap75, 2
  %.not76_unwrap = icmp eq i64 %2, 1
  %_unwrap77 = select i1 %.not76_unwrap, i64 2, i64 3
  %_unwrap78 = add nuw i64 %_unwrap77, %_unwrap75
  %.not78_unwrap = icmp sgt i64 %_unwrap76, %_unwrap78
  %_unwrap79 = or i64 %_unwrap75, 1
  %value_phi15_unwrap = select i1 %.not78_unwrap, i64 %_unwrap79, i64 %_unwrap78
  %_unwrap80 = icmp sgt i64 %_unwrap76, %value_phi15_unwrap
  %not._unwrap = or i1 %_unwrap74, %_unwrap80
  %248 = xor i1 %not._unwrap, true
  %249 = select fast i1 %248, float %245, float 0.000000e+00
  %250 = load float, float* %"'de", align 4
  %251 = fadd fast float %250, %245
  %252 = select fast i1 %not._unwrap, float %250, float %251
  store float %252, float* %"'de", align 4
  br i1 %not._unwrap, label %invertL66, label %invertL178.loopexit

invertL187.loopexit:                              ; preds = %invertL187
  %253 = load i64, i64* %"iv'ac", align 8
  %254 = load i64, i64* %_cache81, align 8, !invariant.group !651
  %_unwrap82 = add i64 %254, -2
  br label %mergeinvertL66_L187.loopexit

mergeinvertL66_L187.loopexit:                     ; preds = %invertL187.loopexit
  store i64 %_unwrap82, i64* %"iv1'ac", align 8
  br label %invertL178

invertL187:                                       ; preds = %mergeinvertL36_L208.loopexit, %incinvertL36
  %255 = load i64, i64* %"iv'ac", align 8
  %256 = load i64, i64* %_cache81, align 8, !invariant.group !651
  %_unwrap83 = icmp ult i64 %256, 2
  br i1 %_unwrap83, label %invertL36, label %invertL187.loopexit

invertL208.loopexit:                              ; preds = %invertL208
  %_unwrap84 = add nsw i64 %5, -2
  %_unwrap85 = add nuw nsw i64 %5, 1
  %257 = call i64 @llvm.smax.i64(i64 %_unwrap85, i64 3), !dbg !647
  %_unwrap86 = add nsw i64 %257, -3
  %258 = call i64 @llvm.umin.i64(i64 %_unwrap84, i64 %_unwrap86), !dbg !647
  br label %mergeinvertL36_L208.loopexit

mergeinvertL36_L208.loopexit:                     ; preds = %invertL208.loopexit
  store i64 %258, i64* %"iv'ac", align 8
  br label %invertL187

invertL208:                                       ; preds = %L208
  br i1 %6, label %inverttop, label %invertL208.loopexit
}

LLVM.LoadInst(%unbox = load i64, i64 addrspace(11)* %11, align 8, !dbg !28, !tbaa !16, !alias.scope !34, !noalias !37)
LLVM.PHIInst(%unbox_replacementA = phi i64 , !dbg !21)


Stacktrace:
 [1] -
   @ ./int.jl:86
 [2] #127
   @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:29
 [3] map
   @ ./tuple.jl:292
 [4] macro expansion
   @ ./simdloop.jl:69
 [5] ##kern#421#131
   @ ~/git/Enzyme.jl/WaterLily.jl/src/util.jl:103

Stacktrace:
  [1] julia_error(cstr::Cstring, val::Ptr{LLVM.API.LLVMOpaqueValue}, errtype::Enzyme.API.ErrorType, data::Ptr{Nothing}, data2::Ptr{LLVM.API.LLVMOpaqueValue}, B::Ptr{LLVM.API.LLVMOpaqueBuilder})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:2713
  [2] EnzymeCreatePrimalAndGradient(logic::Enzyme.Logic, todiff::LLVM.Function, retType::Enzyme.API.CDIFFE_TYPE, constant_args::Vector{Enzyme.API.CDIFFE_TYPE}, TA::Enzyme.TypeAnalysis, returnValue::Bool, dretUsed::Bool, mode::Enzyme.API.CDerivativeMode, runtimeActivity::Bool, width::Int64, additionalArg::Ptr{LLVM.API.LLVMOpaqueType}, forceAnonymousTape::Bool, typeInfo::Enzyme.FnTypeInfo, uncacheable_args::Vector{Bool}, augmented::Ptr{Nothing}, atomicAdd::Bool)
    @ Enzyme.API ~/git/Enzyme.jl/src/api.jl:253
  [3] enzyme!(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, mod::LLVM.Module, primalf::LLVM.Function, TT::Type, mode::Enzyme.API.CDerivativeMode, width::Int64, parallel::Bool, actualRetType::Type, wrap::Bool, modifiedBetween::NTuple{5, Bool}, returnPrimal::Bool, expectedTapeType::Type, loweredArgs::Set{Int64}, boxedArgs::Set{Int64})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:5058
  [4] codegen(output::Symbol, job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}; libraries::Bool, deferred_codegen::Bool, optimize::Bool, toplevel::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing)
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8191
  [5] codegen
    @ ~/git/Enzyme.jl/src/compiler.jl:7028 [inlined]
  [6] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:9299
  [7] _thunk
    @ ~/git/Enzyme.jl/src/compiler.jl:9299 [inlined]
  [8] cached_compilation
    @ ~/git/Enzyme.jl/src/compiler.jl:9340 [inlined]
  [9] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{0x0000000000007b3e}, ::Type{Const{typeof(Core.kwcall)}}, ::Type{Const{Nothing}}, tt::Type{Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{typeof(WaterLily.restrictL!)}, Duplicated{Array{Float32, 3}}, Duplicated{Array{Float32, 3}}}}, ::Val{Enzyme.API.DEM_ReverseModePrimal}, ::Val{1}, ::Val{(true, true, true, true, true)}, ::Val{true}, ::Val{false}, ::Type{FFIABI}, ::Val{false}, ::Val{true})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:9472
 [10] #s2067#19669
    @ ~/git/Enzyme.jl/src/compiler.jl:9609 [inlined]
 [11] var"#s2067#19669"(FA::Any, A::Any, TT::Any, Mode::Any, ModifiedBetween::Any, width::Any, ReturnPrimal::Any, ShadowInit::Any, World::Any, ABI::Any, ErrIfFuncWritten::Any, RuntimeActivity::Any, ::Any, ::Type, ::Type, ::Type, tt::Any, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Type, ::Any)
    @ Enzyme.Compiler ./none:0
 [12] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core ./boot.jl:602
 [13] runtime_generic_augfwd(activity::Type{Val{(false, false, false, true, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(Core.kwcall), df::Nothing, primal_1::@NamedTuple{perdir::Tuple{}}, shadow_1_1::@NamedTuple{perdir::Tuple{}}, primal_2::typeof(WaterLily.restrictL!), shadow_2_1::Nothing, primal_3::Array{Float32, 3}, shadow_3_1::Array{Float32, 3}, primal_4::Array{Float32, 3}, shadow_4_1::Array{Float32, 3})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:468
 [14] restrictML
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:23 [inlined]
 [15] restrictML
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0 [inlined]
 [16] augmented_julia_restrictML_9751_inner_1wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0
 [17] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [18] enzyme_call
    @ ~/git/Enzyme.jl/src/compiler.jl:8795 [inlined]
 [19] AugmentedForwardThunk
    @ ~/git/Enzyme.jl/src/compiler.jl:8632 [inlined]
 [20] runtime_generic_augfwd(activity::Type{Val{(false, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(WaterLily.restrictML), df::Nothing, primal_1::Poisson{Float32, Matrix{Float32}, Array{Float32, 3}}, shadow_1_1::Poisson{Float32, Matrix{Float32}, Array{Float32, 3}})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
 [21] _
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:54
 [22] MultiLevelPoisson
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:51 [inlined]
 [23] MultiLevelPoisson
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0 [inlined]
 [24] augmented_julia_MultiLevelPoisson_9175_inner_1wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/src/MultiLevelPoisson.jl:0
 [25] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [26] enzyme_call(::Val{false}, ::Ptr{Nothing}, ::Type{Enzyme.Compiler.AugmentedForwardThunk}, ::Val{1}, ::Val{true}, ::Type{Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{Type{MultiLevelPoisson}}, Duplicated{Matrix{Float32}}, Duplicated{Array{Float32, 3}}, Duplicated{Matrix{Float32}}}}, ::Type{Duplicated{MultiLevelPoisson{Float32, Matrix{Float32}, Array{Float32, 3}}}}, ::Const{typeof(Core.kwcall)}, ::Type{@NamedTuple{1::@NamedTuple{1, 2, 3::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 4, 5, 6::@NamedTuple{1, 2, 3, 4, 5::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{UInt64, 0}}, 6, 7, 8, 9, 10, 11::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{Float32, 0}}, 12, 13, 14, 15::Bool, 16::Bool}, 7, 8, 9, 10, 11, 12::UInt64, 13::UInt64, 14::UInt64}, 4, 5::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 6, 7, 8::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 3, 4::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 5, 6, 7, 8, 9::Bool, 10, 11}, 4::UInt64, 5::UInt64}, 9, 10, 11, 12, 13, 14, 15, 16::Bool, 17::Core.LLVMPtr{UInt64, 0}, 18, 19::Core.LLVMPtr{Bool, 0}, 20::Core.LLVMPtr{UInt64, 0}, 21, 22::Core.LLVMPtr{Bool, 0}}, 2}}, ::Const{@NamedTuple{perdir::Tuple{}}}, ::Const{Type{MultiLevelPoisson}}, ::Duplicated{Matrix{Float32}}, ::Duplicated{Array{Float32, 3}}, ::Duplicated{Matrix{Float32}})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8795
 [27] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{Nothing}, Const{typeof(Core.kwcall)}, Duplicated{MultiLevelPoisson{Float32, Matrix{Float32}, Array{Float32, 3}}}, Tuple{Const{@NamedTuple{perdir::Tuple{}}}, Const{Type{MultiLevelPoisson}}, Duplicated{Matrix{Float32}}, Duplicated{Array{Float32, 3}}, Duplicated{Matrix{Float32}}}, 1, true, @NamedTuple{1::@NamedTuple{1, 2, 3::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 4, 5, 6::@NamedTuple{1, 2, 3, 4, 5::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{UInt64, 0}}, 6, 7, 8, 9, 10, 11::Tuple{UInt64, UInt64, Core.LLVMPtr{UInt64, 0}, Core.LLVMPtr{Float32, 0}}, 12, 13, 14, 15::Bool, 16::Bool}, 7, 8, 9, 10, 11, 12::UInt64, 13::UInt64, 14::UInt64}, 4, 5::@NamedTuple{1, 2::@NamedTuple{1, 2, 3, 4, 5, 6, 7::Bool, 8, 9}, 3::UInt64}, 6, 7, 8::@NamedTuple{1, 2, 3::@NamedTuple{1, 2::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 3, 4::Core.LLVMPtr{Core.LLVMPtr{UInt8, 0}, 0}, 5, 6, 7, 8, 9::Bool, 10, 11}, 4::UInt64, 5::UInt64}, 9, 10, 11, 12, 13, 14, 15, 16::Bool, 17::Core.LLVMPtr{UInt64, 0}, 18, 19::Core.LLVMPtr{Bool, 0}, 20::Core.LLVMPtr{UInt64, 0}, 21, 22::Core.LLVMPtr{Bool, 0}}, 2}})(::Const{typeof(Core.kwcall)}, ::Const{@NamedTuple{perdir::Tuple{}}}, ::Vararg{Any})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8632
 [28] runtime_generic_augfwd(activity::Type{Val{(false, false, false, true, true, true)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::typeof(Core.kwcall), df::Nothing, primal_1::@NamedTuple{perdir::Tuple{}}, shadow_1_1::Nothing, primal_2::Type{MultiLevelPoisson}, shadow_2_1::Nothing, primal_3::Matrix{Float32}, shadow_3_1::Matrix{Float32}, primal_4::Array{Float32, 3}, shadow_4_1::Array{Float32, 3}, primal_5::Matrix{Float32}, shadow_5_1::Matrix{Float32})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
 [29] _
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:76 [inlined]
 [30] _
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:0 [inlined]
 [31] augmented_julia___270_5848_inner_1wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:0
 [32] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [33] enzyme_call(::Val{false}, ::Ptr{Nothing}, ::Type{Enzyme.Compiler.AugmentedForwardThunk}, ::Val{1}, ::Val{true}, ::Type{Tuple{Const{Float64}, Active{Float64}, Const{Nothing}, Const{Nothing}, Const{Int64}, Const{Tuple{}}, Const{Nothing}, Const{Bool}, Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, Const{Type{Float32}}, Const{Type{Array}}, Const{Type{Simulation}}, Const{Tuple{Int64, Int64}}, Const{Tuple{Int64, Int64}}, Const{Int64}}}, ::Type{Duplicated{Simulation}}, ::Const{WaterLily.var"#_#270#274"}, ::Type{@NamedTuple{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38::Core.LLVMPtr{UInt8, 0}, 39::Core.LLVMPtr{UInt8, 0}, 40::Core.LLVMPtr{UInt8, 0}}}, ::Const{Float64}, ::Active{Float64}, ::Const{Nothing}, ::Const{Nothing}, ::Const{Int64}, ::Const{Tuple{}}, ::Const{Nothing}, ::Const{Bool}, ::Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, ::Const{Type{Float32}}, ::Const{Type{Array}}, ::Const{Type{Simulation}}, ::Const{Tuple{Int64, Int64}}, ::Const{Tuple{Int64, Int64}}, ::Const{Int64})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8795
 [34] (::Enzyme.Compiler.AugmentedForwardThunk{Ptr{Nothing}, Const{WaterLily.var"#_#270#274"}, Duplicated{Simulation}, Tuple{Const{Float64}, Active{Float64}, Const{Nothing}, Const{Nothing}, Const{Int64}, Const{Tuple{}}, Const{Nothing}, Const{Bool}, Active{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, Const{Type{Float32}}, Const{Type{Array}}, Const{Type{Simulation}}, Const{Tuple{Int64, Int64}}, Const{Tuple{Int64, Int64}}, Const{Int64}}, 1, true, @NamedTuple{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38::Core.LLVMPtr{UInt8, 0}, 39::Core.LLVMPtr{UInt8, 0}, 40::Core.LLVMPtr{UInt8, 0}}})(::Const{WaterLily.var"#_#270#274"}, ::Const{Float64}, ::Vararg{Any})
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/compiler.jl:8632
 [35] runtime_generic_augfwd(activity::Type{Val{(false, false, true, false, false, false, false, false, false, true, false, false, false, false, false, false)}}, runtimeActivity::Val{true}, width::Val{1}, ModifiedBetween::Val{(true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true)}, RT::Val{@NamedTuple{1, 2, 3}}, f::WaterLily.var"#_#270#274", df::Nothing, primal_1::Float64, shadow_1_1::Nothing, primal_2::Float64, shadow_2_1::Base.RefValue{Float64}, primal_3::Nothing, shadow_3_1::Nothing, primal_4::Nothing, shadow_4_1::Nothing, primal_5::Int64, shadow_5_1::Nothing, primal_6::Tuple{}, shadow_6_1::Nothing, primal_7::Nothing, shadow_7_1::Nothing, primal_8::Bool, shadow_8_1::Nothing, primal_9::AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, shadow_9_1::Base.RefValue{AutoBody{WaterLily.var"#comp#232"{Bool, var"#sdf#3"{Int64}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}, var"#map#2"{Bool, Int64, Float32, Float64, Int64, Float64, SVector{2, Rational{Int64}}, SVector{2, Rational{Int64}}}}}, primal_10::Type{Float32}, shadow_10_1::Nothing, primal_11::Type{Array}, shadow_11_1::Nothing, primal_12::Type{Simulation}, shadow_12_1::Nothing, primal_13::Tuple{Int64, Int64}, shadow_13_1::Nothing, primal_14::Tuple{Int64, Int64}, shadow_14_1::Nothing, primal_15::Int64, shadow_15_1::Nothing)
    @ Enzyme.Compiler ~/git/Enzyme.jl/src/rules/jitrules.jl:483
 [36] Simulation
    @ ~/git/Enzyme.jl/WaterLily.jl/src/WaterLily.jl:65 [inlined]
 [37] #make_foils#1
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:24
 [38] make_foils
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:3 [inlined]
 [39] mean_drag (repeats 5 times)
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:39
 [40] f
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:49 [inlined]
 [41] augmented_julia_f_2861wrap
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:0
 [42] macro expansion
    @ ~/git/Enzyme.jl/src/compiler.jl:9229 [inlined]
 [43] enzyme_call
    @ ~/git/Enzyme.jl/src/compiler.jl:8795 [inlined]
 [44] AugmentedForwardThunk
    @ ~/git/Enzyme.jl/src/compiler.jl:8632 [inlined]
 [45] autodiff
    @ ~/git/Enzyme.jl/src/Enzyme.jl:384 [inlined]
 [46] autodiff
    @ ~/git/Enzyme.jl/src/Enzyme.jl:512 [inlined]
 [47] g!
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:59 [inlined]
 [48] macro expansion
    @ ./timing.jl:279 [inlined]
 [49] top-level scope
    @ ~/git/Enzyme.jl/WaterLily.jl/examples/TandemFoilOptim.jl:269

@wsmoses
Copy link
Member

wsmoses commented Sep 28, 2024

@wsmoses
Copy link
Member

wsmoses commented Sep 28, 2024

@wsmoses
Copy link
Member

wsmoses commented Sep 28, 2024

@wsmoses
Copy link
Member

wsmoses commented Sep 29, 2024

Now with EnzymeAD/Enzyme#2089 it now hits #1781

@wsmoses wsmoses closed this as completed Sep 29, 2024
@b-fg
Copy link

b-fg commented Sep 30, 2024

Hey, thanks for working on this! I have followed the thread of fixes and I understand you managed to fix them all? Does KA need to be updated before I can try to run our example in WaterLily using Enzyme in GPU simulations?

@wsmoses
Copy link
Member

wsmoses commented Oct 1, 2024

You may need JuliaGPU/KernelAbstractions.jl#534 as well. However locally running while the above issue is fixed there was still a strange memory issue going awry with a full model. @b-fg if you're able to reduce any remaining issues to a MWE, we can try to get them fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants