-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft PR] Graphcore backend support. #1659
base: main
Are you sure you want to change the base?
Conversation
… from the documentation online to check what library is present
…page, this commit didn't work. They also have something similar - adding Tensor core backend from Nvidia as an external codegen
… the next commit" This reverts commit 429737d.
…f experimental changes which try to understand the SDFGIR and the changes to make IPUCodeGen registry into the frame_targets.
The fix was to call 'self._frame.generate_state' recursively from ipu.generate_state. This goes into framecode.py and calls the recursive function which traverses substates and calls the codegen respectively. Based on this learning a point to note is to remember to call the recursive functions inside generate_*() functions. example is 'self._dispatcher.dispatch_subgraph' in 'generate_scope'. 2. Fix ipu/ipu 2 folders were created recursively. Fix - Remove 'target_type='ipu' from CodeObject.
…ok right now into GPU/MPI
…ers and the IPUDevice, this goes in __init__/exit part and not in the SDFG
…ested might be buggy
Code snippet below checks if the file is frame(.cpp vs .cu) if so matches target and dumps the headers. ``` if backend == 'frame': #on cpu.cpp file + for target in self.targets: + if target.target_name == 'ipu': ``` Some cosmetic changes removed the prints.
… add more handcrafted tests, remove the header generation from framecode.py and add into ipu.py
…f how cpu codegen works for a tasklet
… one generates the golden code for poplar.
2. In the Host/ side the pipeline + functions are present. 3. The pipeline is not yet set, next patch might set it. 4. Lots of bugs still 5. Cerete Host/ Device/ files along with cpu/
…t the inputs to the library and the inputs to SDFG, seems like we need to dig into some other ways of allocation of variables and such details." Reverting this as this is making it hard to understand a node as of now. This reverts commit 63fad92.
…ave IPUCodegen + library codegen together.
The golden base code I plan to generate for IPU is here. The current test I plan to support is (AccessNode + Library Node) - |
…m fpga and cuda. This creates a codegen which dumps both library + nodes
This reverts commit 2ffc0c9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using more descriptive names of the memory locations will be more beneficial for future users.
I do not know the memory hierarchy of Graphcore, but for now, let us assume that Graphcore has only Global memory and an on-chip L1 memory location then I would go with the names IPU_L1 and IPU_Global.
Using the naming can also be good if IPUs change their memory hierarchy across generations. Let's say they add a level called L2 -> Then it will be IPU_Memory and IPU_L2 (and in my opinion, this will be quite confusing)
Same for the schedule, IPU_Multicore can be better (this again depends on what GraphCore calls them).
For example:
StorageType.IPU_L1: ScheduleType.IPU_Multicore,
StorageType.IPU_Global: ScheduleType.IPU_Device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are, type maps such as _CTYPES
(C/C++) or _OCL_TYPES
(OpenCL) in dtypes.py
I think having this map there with other type maps would be consistent with the previous implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init code is for the part of the code the use library requires in initialization (MPI_Init
) and finalize then the code needed for the finalization (MPI_finalize
)
From what I see, engine.run(0)
is similar to a Cuda kernel launch. Wouldn't it be better to generate this in a target/ipu.py
?
If implemented like this, wouldn't there be problems if you need to call .run
more than once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to how graphcore
requires you to use the poplar
library, using it as a library is unavoidable, I want to point that you can still put as much as you can in ipu code-gen and ipu code-gen makes this library mandatory.
Good luck and success in your DaCe backend implementation!
…lude, debugging issue on real IPU machine, pushing incomplete changes"
…ted on another testcase
… for transients and scope must be inside state
…snode is not IPU_Memory
…s no schedule= in code
…_memory() triggers
This PR is a draft PR with lots of learnings from fpga.py and cuda.py backends.
The goal of the PR is to add graphcore backend to DaCE and map from DaCE graphs onto GraphCore graphs.
Helpful comments on parts I might be doing wrong would be super useful for next set of patches.