Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to cast Python instance to C++ type of TensorRT 8.4 when running INT8 calibration on GPU A100 #3871

Closed
yjiangling opened this issue May 16, 2024 · 2 comments

Comments

@yjiangling
Copy link

When I try to conduct INT8 quantilization in Python, it always give the following error during the calibration procedure:

[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2904, GPU 74855 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2904, GPU 74863 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2904, GPU 74839 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2904, GPU 74847 (MiB)
[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +16, now: CPU 130, GPU 272 (MiB)
[05/16/2024-18:22:28] [TRT] [I] Starting Calibration.
[ERROR] Exception caught in get_batch(): Unable to cast Python instance to C++ type (compile in debug mode for details)
[05/16/2024-18:22:30] [TRT] [I] Post Processing Calibration data in 2.704e-06 seconds.
[05/16/2024-18:22:30] [TRT] [E] 1: Unexpected exception _Map_base::at
Failed to create the engine

How can I fix it? The get_batch() function in Calibration instance are programmed like this:

class ASRCalibrator(trt.IInt8EntropyCalibrator2):
	def __init__(self, calibration_files=[], batch_size=1, cache_file="", preprocess_func=None):
		super().__init__()
		self.cache_file = cache_file
		self.batch_size = batch_size
		self.files = calibration_files
		self.batch = (None, None)

		self.batches = self.load_batches()
		self.preprocess_func = preprocess_func

	def get_batch_size(self):
		return self.batch_size

	def load_batches(self):
		for filename in self.files:
			self.batch = self.preprocess_func(filename)
			yield self.batch

	def get_batch(self, names):
		try:
			batch = next(self.batches)
			data, data_len = batch

			device_input0 = cuda.mem_alloc(data.nbytes)
			device_input1 = cuda.mem_alloc(data_len.nbytes)

			# 把校准数据从CPU搬运到GPU中
			cuda.memcpy_htod(device_input0, data.ravel())
			cuda.memcpy_htod(device_input1, data_len.ravel())

			return [(device_input0, data.shape), (device_input1, data_len.shape)]

		except StopIteration:
			return []

	def read_calibration_cache(self):
		# 如果校准表文件存在则直接从其中读取校准表
		if os.path.exists(self.cache_file):
			with open(self.cache_file, "rb") as f:
				return f.read()

	def write_calibration_cache(self, cache):
		# 如果进行了校准,则把校准表写入文件中以便下次使用
		with open(self.cache_file, "wb") as f:
			f.write(cache)
			f.flush()
@yjiangling
Copy link
Author

@rmccorm4 Hi, I write the get_batch() function followed by your instruction in issue: https://github.com/NVIDIA/TensorRT/issues/688, but it still get the Error: RuntimeError: Unable to cast Python instance to C++ type (compile in debug mode for details), could you please help me to checkout what's wrong? Thank you very much!

@liyuli1997
Copy link

When I try to conduct INT8 quantilization in Python, it always give the following error during the calibration procedure:

[05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2904, GPU 74855 (MiB) [05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2904, GPU 74863 (MiB) [05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +10, now: CPU 2904, GPU 74839 (MiB) [05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2904, GPU 74847 (MiB) [05/16/2024-18:22:28] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +16, now: CPU 130, GPU 272 (MiB) [05/16/2024-18:22:28] [TRT] [I] Starting Calibration. [ERROR] Exception caught in get_batch(): Unable to cast Python instance to C++ type (compile in debug mode for details) [05/16/2024-18:22:30] [TRT] [I] Post Processing Calibration data in 2.704e-06 seconds. [05/16/2024-18:22:30] [TRT] [E] 1: Unexpected exception _Map_base::at Failed to create the engine

How can I fix it? The get_batch() function in Calibration instance are programmed like this:

class ASRCalibrator(trt.IInt8EntropyCalibrator2):
	def __init__(self, calibration_files=[], batch_size=1, cache_file="", preprocess_func=None):
		super().__init__()
		self.cache_file = cache_file
		self.batch_size = batch_size
		self.files = calibration_files
		self.batch = (None, None)

		self.batches = self.load_batches()
		self.preprocess_func = preprocess_func

	def get_batch_size(self):
		return self.batch_size

	def load_batches(self):
		for filename in self.files:
			self.batch = self.preprocess_func(filename)
			yield self.batch

	def get_batch(self, names):
		try:
			batch = next(self.batches)
			data, data_len = batch

			device_input0 = cuda.mem_alloc(data.nbytes)
			device_input1 = cuda.mem_alloc(data_len.nbytes)

			# 把校准数据从CPU搬运到GPU中
			cuda.memcpy_htod(device_input0, data.ravel())
			cuda.memcpy_htod(device_input1, data_len.ravel())

			return [(device_input0, data.shape), (device_input1, data_len.shape)]

		except StopIteration:
			return []

	def read_calibration_cache(self):
		# 如果校准表文件存在则直接从其中读取校准表
		if os.path.exists(self.cache_file):
			with open(self.cache_file, "rb") as f:
				return f.read()

	def write_calibration_cache(self, cache):
		# 如果进行了校准,则把校准表写入文件中以便下次使用
		with open(self.cache_file, "wb") as f:
			f.write(cache)
			f.flush()

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants