Skip to content

Commit

Permalink
add toCUDA
Browse files Browse the repository at this point in the history
  • Loading branch information
chenxinfeng committed Apr 5, 2024
1 parent e312a1a commit 130c07f
Show file tree
Hide file tree
Showing 3 changed files with 109 additions and 10 deletions.
61 changes: 53 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,18 @@

English Version | [中文版本](./README_CN.md) | [Resume 开发者简历 陈昕枫](https://gitee.com/lilab/chenxinfeng-cv/blob/master/README.md)

The ffmpegcv provide Video Reader and Video Witer with ffmpeg backbone, which are faster and powerful than cv2.
The ffmpegcv provide Video Reader and Video Witer with ffmpeg backbone, which are faster and powerful than cv2. Integrating ffmpegcv into your deeplearning pipeline is very smooth.

- The ffmpegcv is api **compatible** to open-cv.
- The ffmpegcv can use **GPU accelerate** encoding and decoding*.
- The ffmpegcv support much more video **codecs** v.s. open-cv.
- The ffmpegcv support **RGB** & BGR & GRAY format as you like.
- The ffmpegcv support **Stream reading** (IP Camera) in low latency.
- The ffmpegcv can support ROI operations.You can **crop**, **resize** and **pad** the ROI.
- The ffmpegcv supports much more video **codecs** v.s. open-cv.
- The ffmpegcv supports **RGB** & BGR & GRAY format as you like.
- The ffmpegcv supports fp32 CHW & HWC format short to cuda.
- The ffmpegcv supports **Stream reading** (IP Camera) in low latency.
- The ffmpegcv supports ROI operations.You can **crop**, **resize** and **pad** the ROI.
- The ffmpegcv supports shortcut to CUDA memory copy.

In all, ffmpegcv is just similar to opencv api. But is has more codecs and does't require opencv installed.
In all, ffmpegcv is just similar to opencv api. But is has more codecs and does't require opencv installed. It's great for deeplearning pipeline.

## Functions:
- `VideoWriter`: Write a video file.
Expand All @@ -28,6 +30,7 @@ In all, ffmpegcv is just similar to opencv api. But is has more codecs and does'
- `VideoCaptureStream`: Read a RTP/RTSP/RTMP/HTTP stream.
- `VideoCaptureStreamRT`: Read a RTSP stream (IP Camera) in real time low latency as possible.
- `noblock`: Read/Write a video file in background using mulitprocssing.
- `toCUDA`: Translate a video/stream as CHW/HWC-float32 format into CUDA device, >2x faster.

## Install
You need to download ffmpeg before you can use ffmpegcv.
Expand All @@ -38,15 +41,16 @@ You need to download ffmpeg before you can use ffmpegcv.
#1D. CONDA: conda install ffmpeg=6.0.0 #don't use the default 4.x.x version
#2. python
pip install ffmpegcv
pip install ffmpegcv #stable verison
pip install git+https://github.com/chenxinfeng4/ffmpegcv #latest verison
```

## When should choose `ffmpegcv` other than `opencv`:
- The `opencv` is hard to install. The ffmpegcv only requires `numpy` and `FFmpeg`, works across Mac/Windows/Linux platforms.
- The `opencv` packages too much image processing toolbox. You just want a simple video/camero IO with GPU accessible.
- The `opencv` didn't support `h264`/`h265` and other video writers.
- You want to **crop**, **resize** and **pad** the video/camero ROI.

- You are interested in deeplearning pipeline.
## Basic example
Read a video by CPU, and rewrite it by GPU.
```python
Expand All @@ -67,6 +71,17 @@ cap = ffmpegcv.VideoCaptureCAM(0)
cap = ffmpegcv.VideoCaptureCAM("Integrated Camera")
```

Deeplearning pipeline.
```python
# video -> crop -> resize -> RGB -> CUDA:CHW float32 -> model
cap = ffmpegcv.toCUDA(
ffmpegcv.VideoCaptureNV(file, pix_fmt='nv12', resize=(W,H)),
tensor_format='CHW')

for frame_CHW_cuda in cap:
frame_CHW_cuda = (frame_CHW_cuda - mean) / std
result = model(frame_CHW_cuda)
```
## Cross platform

The ffmpegcv is based on Python+FFmpeg, it can cross platform among `Windows, Linux, Mac, X86, Arm`systems.
Expand Down Expand Up @@ -172,6 +187,36 @@ cap = ffmpegcv.VideoCapture(file, resize=(640, 480), resize_keepratio=True)
cap = ffmpegcv.VideoCapture(file, crop_xywh=(0, 0, 640, 480), resize=(512, 512))
```

## toCUDA device
---
The ffmpegcv can translate the video/stream from HWC-uint8 cpu to CHW-float32 in CUDA device. It significantly reduce your cpu load, and get >2x faster than your manually convertion.

Prepare your environment. The cuda environment is required. The `pycuda` package is required. The `pytorch` package is non-essential.
> nvcc --version # check you've installed NVIDIA CUDA Compiler
> pip install pycuda # install the pycuda, make sure
```python
# Read a video file to CUDA device, original
cap = ffmpegcv.VideoCaptureNV(file, pix_fmt='rgb24')
ret, frame_HWC_CPU = cap.read()
frame_CHW_CUDA = torch.from_numpy(frame_HWC_CPU).permute(2, 0, 1).cuda().contiguous().float() # 120fps, 1200% CPU load

# speed up
cap = toCUDA(ffmpegcv.VideoCapture(file, pix_fmt='yuv420p')) #must, yuv420p for cpu codec
cap = toCUDA(ffmpegcv.VideoCaptureNV(file, pix_fmt='nv12')) #must, nv12 for gpu codec

ret, frame_CHW_pycuda = cap.read() #380fps, 200% CPU load, [pycuda array]
ret, frame_CHW_pycudamem = cap.read_cudamem() #same as [pycuda mem_alloc]
ret, frame_CHW_CUDA = cap.read_torch() #same as [pytorch tensor]

frame_CHW_pycuda[:] = (frame_CHW_pycuda - mean) / std #normalize
```

Why `toCUDA` is faster in your deeplearning pipeline?
> 1. The ffmpeg uses the cpu to convert video pix_fmt from original YUV to RGB24, which is slow. The ffmpegcv use the cuda to accelerate pix_fmt convertion.
> 2. Use `yuv420p` or `nv12` can save the cpu load and reduce the memory copy from CPU to GPU.
> 2. The ffmpeg stores the image as HWC format. The ffmpegcv can use HWC & CHW format to accelerate the video reading.
## Video Writer
---
```python
Expand Down
50 changes: 48 additions & 2 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

[English Version](./README.md) | 中文版本 | [Resume 开发者简历 陈昕枫](https://gitee.com/lilab/chenxinfeng-cv/blob/master/README.md)

ffmpegcv提供了基于ffmpeg的视频读取器和视频编写器,比cv2更快和更强大。
ffmpegcv提供了基于ffmpeg的视频读取器和视频编写器,比cv2更快和更强大。适合深度学习的视频处理。

- ffmpegcv与open-cv具有**兼容**的API。
- ffmpegcv可以使用**GPU加速**编码和解码。
Expand All @@ -17,6 +17,7 @@ ffmpegcv提供了基于ffmpeg的视频读取器和视频编写器,比cv2更快
- ffmpegcv支持网络**流视频读取** (网线监控相机)。
- ffmpegcv支持ROI(感兴趣区域)操作,可以对ROI进行**裁剪****调整大小****填充**
总之,ffmpegcv与opencv的API非常相似。但它具有更多的编码器,并且不需要安装opencv。
- ffmpegcv支持导出图像帧到CUDA设备。

## 功能:
- `VideoWriter`:写入视频文件。
Expand All @@ -27,6 +28,7 @@ ffmpegcv提供了基于ffmpeg的视频读取器和视频编写器,比cv2更快
- `VideoCaptureStream`:读取RTP/RTSP/RTMP/HTTP流。
- `VideoCaptureStreamRT`: 读取RTSP流 (网线监控相机),实时、低延迟。
- `noblock`:在后台读取视频文件(更快),使用多进程。
- `toCUDA`:将图像帧导出到CUDA设备,以 CHW/HWC-float32 格式存储,超过2倍性能提升。

## 安装
在使用ffmpegcv之前,您需要下载`ffmpeg`
Expand All @@ -37,7 +39,8 @@ ffmpegcv提供了基于ffmpeg的视频读取器和视频编写器,比cv2更快
#1D. CONDA: conda install ffmpeg=6.0.0
#2. python
pip install ffmpegcv
pip install ffmpegcv #stable verison
pip install git+https://github.com/chenxinfeng4/ffmpegcv #latest verison
```

## 何时选择 `ffmpegcv` 而不是 `opencv`
Expand Down Expand Up @@ -67,6 +70,18 @@ cap = ffmpegcv.VideoCaptureCAM(0)
cap = ffmpegcv.VideoCaptureCAM("Integrated Camera")
```

深度学习流水线
```python
# video -> crop -> resize -> RGB -> CUDA:CHW float32 -> model
cap = ffmpegcv.toCUDA(
ffmpegcv.VideoCaptureNV(file, pix_fmt='nv12', resize=(W,H)),
tensor_format='CHW')

for frame_CHW_cuda in cap:
frame_CHW_cuda = (frame_CHW_cuda - mean) / std
result = model(frame_CHW_cuda)
```

## GPU加速
- 仅支持NVIDIA显卡,在 x86_64 上测试。
- 原生支持**Windows**, **Linux**, **Anaconda**
Expand Down Expand Up @@ -167,6 +182,37 @@ cap = ffmpegcv.VideoCapture(file, resize=(640, 480), resize_keepratio=True)
cap = ffmpegcv.VideoCapture(file, crop_xywh=(0, 0, 640, 480), resize=(512, 512))
```

## toCUDA 将图像帧快速导出到CUDA设备
---
ffmpegcv 可以将 HWC-uint8 cpu 中的视频/流转换为 CUDA 设备中的 CHW-float32。它可以显著减少你的 CPU 负载,并比你的手动转换快 2 倍以上。

准备环境。你需要具备 cuda 环境,并且安装 pycuda 包。注意,pytorch 包是非必须的。
> nvcc --version # 检查你是否已经安装了 NVIDIA CUDA 编译器
> pip install pycuda # 安装 pycuda
```python
# 读取视频到CUDA设备,加速前
cap = ffmpegcv.VideoCaptureNV(file, pix_fmt='rgb24')
ret, frame_HWC_CPU = cap.read()
frame_CHW_CUDA = torch.from_numpy(frame_HWC_CPU).permute(2, 0, 1).cuda().contiguous().float() # 120fps, 1200% CPU 使用率

# 加速后
cap = toCUDA(ffmpegcv.VideoCapture(file, pix_fmt='yuv420p')) #必须设置, yuv420p 针对 cpu
cap = toCUDA(ffmpegcv.VideoCaptureNV(file, pix_fmt='nv12')) #必须设置, nv12 针对 gpu

ret, frame_CHW_pycuda = cap.read() #380fps, 200% CPU load, [pycuda array]
ret, frame_CHW_pycudamem = cap.read_cudamem() #same as [pycuda mem_alloc]
ret, frame_CHW_CUDA = cap.read_torch() #same as [pytorch tensor]

frame_CHW_pycuda[:] = (frame_CHW_pycuda - mean) / std #归一化
```

为什么在深度学习流水线中使用 toCUDA 会更快?

> 1. ffmpeg 使用 CPU 将视频像素格式从原始 YUV 转换为 RGB24,这个过程很慢。`toCUDA` 使用 cuda 加速像素格式转换。
> 2. 使用 yuv420p 或 nv12 可以节省 CPU 负载并减少从 CPU 到 GPU 的内存复制。
> 3. ffmpeg 将图像存储为 HWC 格式。ffmpegcv 可以使用 HWC 和 CHW 格式来加速视频存储。
## 视频写入器
---
```python
Expand Down
8 changes: 8 additions & 0 deletions ffmpegcv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,3 +517,11 @@ def VideoCapturePannels(
)

VideoReaderPannels = VideoCapturePannels


def toCUDA(vid:FFmpegReader, gpu:int=0, tensor_format:str='chw') -> FFmpegReader:
"""
Convert frames to CUDA tensor float32 in 'chw' or 'hwc' format.
"""
from ffmpegcv.ffmpeg_reader_cuda import FFmpegReaderCUDA
return FFmpegReaderCUDA(vid, gpu, tensor_format)

0 comments on commit 130c07f

Please sign in to comment.