-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variorium Connector: kernel granularity and JSON #265
base: develop
Are you sure you want to change the base?
Conversation
Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide sample output.
#include <inttypes.h> | ||
#include <chrono> | ||
#include <cstdlib> | ||
#include <cstring> | ||
#include <vector> | ||
#include <unordered_set> | ||
#include <string> | ||
#include <regex> | ||
#include <ctime> | ||
#include <cxxabi.h> | ||
#include <dlfcn.h> | ||
#include <ctime> | ||
#include <chrono> | ||
#include <iostream> | ||
#include <fstream> | ||
|
||
#include <inttypes.h> | ||
#include <iostream> | ||
#include <regex> | ||
#include <stdio.h> | ||
#include <string> | ||
#include <unordered_set> | ||
#include <vector> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which of these header files are actually required? Please make sure to only use what you need.
int power = -1; | ||
int power1 = 0; | ||
int power2 = 0; | ||
int filemake = -1; | ||
long long time1 = 0; | ||
long long time2 = 0; | ||
uint32_t gdevID = -1; | ||
std::vector<float> gpu_powers; | ||
std::vector<float> gpu_powers2; | ||
std::string output = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are all these global variables used for? Please add comments in the code explaining their usage.
int type_of_profiling = | ||
0; // 0 is for both print power & json, 1 is for print power, 2 is for json | ||
bool usingMPI = false; | ||
bool verbosePrint = false; | ||
bool mpiOutPut = false; | ||
0; // 0 is for both print power & json, 1 is for print power, 2 is for json | ||
bool usingMPI = false; | ||
bool verbosePrint = true; | ||
bool mpiOutPut = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these still used/necessary?
/*void printFile() { | ||
|
||
std::ofstream file("variorumoutput.txt", std::ios::app); // Open in append | ||
mode if (!file) { std::cerr << "Error creating the file!" << std::endl; } else { | ||
std::cout << "File created or opened successfully." << std::endl; | ||
} | ||
file.close(); // Close the file | ||
i}*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove unused code.
file.close(); // Close the file | ||
i}*/ | ||
void printFile() { | ||
const std::string filename = "variorumoutput.txt"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this configurable via an environment variable.
// std::cout << "Number of Sockets: " << num_sockets << | ||
// std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// std::cout << "Number of Sockets: " << num_sockets << | |
// std::endl; |
char *num_sockets_pos = strstr(s, "\"num_gpus_per_socket\":"); | ||
if (num_sockets_pos != nullptr) { | ||
num_sockets_pos += strlen("\"num_gpus_per_socket\":"); | ||
|
||
char *num_sockets_end_pos = strchr(num_sockets_pos, ','); | ||
if (num_sockets_end_pos == nullptr) { | ||
num_sockets_end_pos = strchr(num_sockets_pos, '}'); | ||
} | ||
if (num_sockets_end_pos != nullptr) { | ||
|
||
std::string num_sockets_str(num_sockets_pos, | ||
num_sockets_end_pos - num_sockets_pos); | ||
|
||
num_sockets = std::stoll(num_sockets_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try using jansson here as well (and below).
std::cout << " Gpu start power: " << gpu_powers2[gdevID] | ||
<< " Gpu end power: " << gpu_powers[gdevID] | ||
<< " Energy Estimation: " | ||
<< ((gpu_powers[gdevID] + gpu_powers2[0]) / 2) * | ||
((temp - time1) * .001) | ||
<< std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just not print anything to stdout but only to the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me.
/* std::cout << "kokkos library call\n" << std::endl; | ||
if (usingMPI) { | ||
variorum_call_mpi(); | ||
} else { | ||
variorum_call(); | ||
} | ||
time_t total_time; | ||
time_t end_time; | ||
time(&end_time); | ||
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | ||
<< "\n"; | ||
total_time = end_time - start_time; | ||
|
||
std::cout << "The kokkos library was alive for " << total_time << " | ||
seconds." | ||
<< std::endl;*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* std::cout << "kokkos library call\n" << std::endl; | |
if (usingMPI) { | |
variorum_call_mpi(); | |
} else { | |
variorum_call(); | |
} | |
time_t total_time; | |
time_t end_time; | |
time(&end_time); | |
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | |
<< "\n"; | |
total_time = end_time - start_time; | |
std::cout << "The kokkos library was alive for " << total_time << " | |
seconds." | |
<< std::endl;*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twilk10 Is this suggestion OK with you? If so, can you commit it?
It is good with me.
std::ostream &operator<<( | ||
std::ostream &os, | ||
const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier &identifier) { | ||
|
||
os << " Type: " << identifier.type << ""; | ||
os << " Device ID: " << identifier.device_id << ""; | ||
gdevID = identifier.device_id; | ||
os << " Instance ID: " << identifier.instance_id; | ||
// output += " Device ID: " + std::to_string(identifier.device_id) + | ||
// " Instance ID: " + | ||
// std::to_string(identifier.instance_id); | ||
writeToFile("variorumoutput.txt", output); | ||
return os; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't overload operator<<
but just write to the output file directly.
Please use |
+1 @twilk10 Thanks for submitting this PR! To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add |
Thank you I will work on getting that information up as soon as possible. |
Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass. |
I ran clang-format 8 on it I will have to evaluate what went wrong and try it again. |
@twilk10 OK. You want to do something like:
You can diff in between to see whether and/or how the file has indeed changed and then If that doesn't work, can you send the error output with the flag |
@twilk10 Can you look at @masterleinad 's comments particularly on printing output and parsing json with jansson? I think they are good suggestions and maybe you can commit those before committing other changes with clang-format. Also, make sure to note to not include any unneeded headers. |
I will take a look and try updating withthe help of these suggestions. |
I am having trouble getting access to clang-format-8 but I can get other versions. |
I have solved this issue. |
Great! All checks have passed for clang-format-8. Can you address the comments by @masterleinad ? (You can commit suggestions if you agree). I also updated the title to capture the changes you made in your code better. @masterleinad Are there any other things to resolve here? |
I am currently working on updating the connector to address the comments made by @masterleinad |
@@ -1,4 +1,5 @@ | |||
//@HEADER | |||
//@HEADER140 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//@HEADER140 | |
//@HEADER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have HEADER140
here.
std::ostream &operator<<( | ||
std::ostream &os, | ||
const Kokkos::Tools::Experimental::DeviceType &deviceType) { | ||
switch (deviceType) { | ||
case Kokkos::Tools::Experimental::DeviceType::Serial: | ||
os << "CPU"; | ||
output += "Type: CPU"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::OpenMP: | ||
os << "OpenMP"; | ||
output += "Type: OpenMP"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::Cuda: | ||
os << "cuda"; | ||
output += "Type: CUDA"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::HIP: | ||
os << "hip"; | ||
output += "Type: HIP"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::OpenMPTarget: | ||
os << "openmptarget"; | ||
output += "Type: openmptarget"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::HPX: | ||
os << "hpx"; | ||
output += "Type: hpx"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::Threads: | ||
os << "threads"; | ||
output += "Type: threads"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::SYCL: | ||
os << "sycl"; | ||
output += "Type: SYCL"; | ||
break; | ||
case Kokkos::Tools::Experimental::DeviceType::OpenACC: | ||
os << "openacc"; | ||
output += "Type: OPENACC"; | ||
break; | ||
|
||
default: | ||
os << "Unknown Device Type"; | ||
output += "Type: Uknown Device Type"; | ||
break; | ||
} | ||
time_t total_time; | ||
time_t end_time; | ||
time(&end_time); | ||
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time | ||
<< "\n"; | ||
total_time = end_time - start_time; | ||
|
||
std::cout << "The kokkos library was alive for " << total_time << " seconds." | ||
<< std::endl; | ||
return os; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a function such as
kokkos-tools/profiling/space-time-stack/kp_space_time_stack.cpp
Lines 54 to 67 in 5e895a3
Space get_space(SpaceHandle const& handle) { | |
// check that name starts with "Cuda" | |
if (strncmp(handle.name, "Cuda", 4) == 0) return SPACE_CUDA; | |
// check that name starts with "SYCL" | |
if (strncmp(handle.name, "SYCL", 4) == 0) return SPACE_SYCL; | |
// check that name starts with "OpenMPTarget" | |
if (strncmp(handle.name, "OpenMPTarget", 12) == 0) return SPACE_OMPT; | |
// check that name starts with "HIP" | |
if (strncmp(handle.name, "HIP", 3) == 0) return SPACE_HIP; | |
if (strcmp(handle.name, "Host") == 0) return SPACE_HOST; | |
abort(); | |
return SPACE_HOST; | |
} |
Kokkos::Tools::Experimental::DeviceType
implicitly printable.
This is an update to the Kokkos-tool variorum connector and can be used to get the power data an Compatible with variorum version 6.0 |
You pushed your build directory. You need to remove it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this file.
@@ -1,4 +1,5 @@ | |||
//@HEADER | |||
//@HEADER140 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
//#include <cxxabi.h> | ||
//#include <dlfcn.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//#include <cxxabi.h> | |
//#include <dlfcn.h> |
If you don't need it, remove it.
if (gdevID == 0) { | ||
switch (gdevID) { | ||
case 1: { | ||
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1"); | |
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_" + std::to_string(gdevID)); |
and remove all the switch cases.
output = " Energy Estimation " + std::to_string(((power1 + power2) / 2) * | ||
((temp - time1) * .001)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the unit?
} | ||
} catch (int e) { | ||
std::cout << "No MPI Option provided, not using per rank output" | ||
<< std::endl; | ||
usingMPI = false; | ||
// usingMPI = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// usingMPI = false; |
std::ostream& operator<<( | ||
std::ostream& os, | ||
const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier& identifier) { | ||
gdevID = identifier.device_id; | ||
|
||
output += | ||
" Device ID: " + std::to_string(identifier.device_id) + | ||
" Instance ID: " + std::to_string(identifier.instance_id) + | ||
" DeviceType: " + | ||
deviceTypeToString(static_cast<Kokkos::Tools::Experimental::DeviceType>( | ||
identifier.device_id)); | ||
|
||
writeToFile(filename, output); | ||
return os; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't introduce an overload for operator<<
but use a stringstream
or write to the file immediately instead.
writeToFile(filename, "name: " + std::string(name)); | ||
gdevID = devID; | ||
auto result = Kokkos::Tools::Experimental::identifier_from_devid(devID); | ||
std::cout << result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't print to cout
anywhere but only to the output file.
Please provide output from the incremental tests or so. |
// The variable below assist with logic and serve as a way to save measurements | ||
// for calaculation | ||
int power = -1; | ||
int power1 = 0; | ||
int power2 = 0; | ||
int filemake = -1; | ||
long long time1 = 0; | ||
long long time2 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more precise?
// This is a global variable instituted to track what the device ID is | ||
uint32_t gdevID = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need a global variable for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching that.
bool verbosePrint = false; | ||
bool mpiOutPut = false; | ||
|
||
bool verbosePrint = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need this? What does it do?
} | ||
|
||
void printFile() { | ||
// std::string filename = getFile("KOKKOS_TOOLS_VARIORUM_OUTPUT_FILE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To give the option to users to use environment variable if they chose to or have the file variorumoutput.txt put directly into their directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but why is it not active? If you want to fall back to "variorumoutput.txt", you should not error out in getFile
but rather use that if the environment variable couldn't be parsed correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea
@@ -89,19 +130,19 @@ bool mpiOutPut = false; | |||
// value. | |||
std::string variorum_print_power_call() { | |||
std::string outputString; | |||
json_t* power_obj = json_object(); | |||
char* s = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
char* s = NULL; | |
char* s = nullptr; |
json_t* root = NULL; | ||
json_t* socket_0 = NULL; | ||
json_t* timestamp_value = NULL; | ||
json_t* power_gpu_watts = NULL; | ||
json_t* gpu_0_value = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json_t* root = NULL; | |
json_t* socket_0 = NULL; | |
json_t* timestamp_value = NULL; | |
json_t* power_gpu_watts = NULL; | |
json_t* gpu_0_value = NULL; | |
json_t* root = nullptr; | |
json_t* socket_0 = nullptr; | |
json_t* timestamp_value = nullptr; | |
json_t* power_gpu_watts = nullptr; | |
json_t* gpu_0_value = nullptr; |
fprintf(stderr, "Expected 'power_gpu_watts' to be an object.\n"); | ||
} | ||
|
||
std::cout << "Device ID: " << gdevID << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::cout << "Device ID: " << gdevID << std::endl; |
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert deleting this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which line are you referring to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the comments via https://github.com/kokkos/kokkos-tools/pull/265/files.
int power = -1; // This variable is used to keep track of if variorum call has | ||
// been called twice to check if the variables need to be reset | ||
// for the next kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a bool
be better?
int filemake = -1; // This variable is to check if the file has already been | ||
// made earlier in the program |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a bool
make more sense?
double power1 = | ||
0; // This variable is used to obtain the initial power for calculation | ||
double power2 = | ||
0; // This variable is used to obtain the final power for calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a double array.
long long time1 = | ||
0; // This variable is used to obtain the initial time for calculation | ||
long long time2 = | ||
0; // This variable is used to obtain the final time for calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be a double array.
inline std::string getFile(const char* env_var_name) { | ||
char* parsed_output_file = getenv(env_var_name); | ||
if (!parsed_output_file) { | ||
std::cerr << "Couldn't parse KOKKOS_TOOLS_VARIORUM_OUTPUT_FILE environment " | ||
"variable!\n"; | ||
std::abort(); | ||
"variable! Printed to variorumoutput.txt\n"; | ||
// parsed_output_file = "variorumoutput.txt"; | ||
char vararr[19] = "variorumoutput.txt"; | ||
parsed_output_file = vararr; | ||
} | ||
return std::string(parsed_output_file); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be sufficient to call this when Kokkos
is initialized?
// parsed_output_file = "variorumoutput.txt"; | ||
char vararr[19] = "variorumoutput.txt"; | ||
parsed_output_file = vararr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// parsed_output_file = "variorumoutput.txt"; | |
char vararr[19] = "variorumoutput.txt"; | |
parsed_output_file = vararr; | |
return "variorumoutput.txt"; |
@@ -140,7 +149,7 @@ std::string variorum_print_power_call() { | |||
power_node = json_real_value(json_object_get(power_obj, "power_node")); | |||
const char* hostnameChar = | |||
json_string_value(json_object_get(power_obj, "hostname")); | |||
|
|||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious.
std::cout << "No MPI Option provided, not using per rank output" | ||
<< std::endl; | ||
|
||
} | ||
// Simple timer code to keep track of the general amount of time the | ||
// application ran for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is any of this still used?
Current out put with Incremental test
|
// Initial and final power values for a kernel | ||
double global_power[2] = {0, 0}; | ||
// Initial and final time value for a kernel | ||
long long global_time[2] = {0, 0}; | ||
uint32_t global_device_id = -1; | ||
uint32_t global_instance_id = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obvisously, this tool is not thread-safe.
uint32_t global_instance_id = -1; | ||
std::string global_filename; | ||
std::string global_kernel_name; | ||
std::string global_device_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need so many global variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to record a bunch of information in the "begin kernel" calls that we are using in the "end kernel" calls. We can't pass that through the API interfaces directly.
Updates to the current connector with updates such as json updates, energy estimation, device information. It has also been altered to get power information at the kernel level.