Check the CHANGELOG file to have a global overview of the latest updates / new features ! π
Check the provided notebooks to have an overview of the available features !
βββ example_data : data used for the demonstrations
βββ loggers : custom utilities for the `logging` module
β βββ __init__.py : defines useful utilities to control `logging`
β βββ telegram_handler.py : custom logger using the telegram bot api
β βββ time_logging.py : custom timer features
β βββ tts_handler.py : custom logger using the Text-To-Speech models
βββ tests : custom unit-testing for the different modules
β βββ data : test data files
β βββ __reproduction : expected output files for reproducibility tests
β βββ test_custom_train_objects.py
β βββ test_utils_audio.py
β βββ test_utils_boxes.py
β βββ test_utils_compile.py
β βββ test_utils_distance.py
β βββ test_utils_embeddings.py
β βββ test_utils_files.py
β βββ test_utils_image.py
β βββ test_utils_keras.py
β βββ test_utils_ops.py
β βββ test_utils_sequence.py
β βββ test_utils_stream.py
β βββ test_utils_text.py
βββ utils
β βββ audio : audio utilities
β β βββ audio_annotation.py : annotation features for new TTS/STT dataset creation
β β βββ audio_io.py : audio loading / writing
β β βββ audio_player.py : audio playback functionality
β β βββ audio_processing.py : audio normalization / processing
β β βββ audio_recorder.py : audio recording functionality
β β βββ audio_stream.py : audio streaming support
β β βββ mkv_utils.py : processing for .mkv video format
β β βββ noisereducev1.py : maintained version of the old `noisereduce` library
β β βββ stft.py : implementations of various mel-spectrogram methods
β βββ callbacks : callback management system
β β βββ __init__.py
β β βββ callback.py : base callback implementation
β β βββ displayer.py : display-related callbacks
β β βββ file_saver.py : file saving callbacks
β β βββ function_callback.py : function-based callbacks
β βββ datasets : dataset utilities
β β βββ audio_datasets : audio dataset implementations
β β β βββ common_voice.py : Mozilla Common Voice dataset
β β β βββ libri_speech.py : LibriSpeech dataset
β β β βββ processing.py : audio dataset processing
β β β βββ siwis.py : SIWIS dataset
β β β βββ voxforge.py : VoxForge dataset
β β βββ builder.py : dataset building utilities
β β βββ loader.py : dataset loading utilities
β β βββ summary.py : dataset summary tools
β βββ image : image features
β β βββ bounding_box : features for bounding box manipulation
β β β βββ combination.py : combines group of boxes
β β β βββ converter.py : box format conversion
β β β βββ filters.py : box filtering
β β β βββ locality_aware_nms.py : LA-NMS implementation
β β β βββ metrics.py : box metrics (IoU, etc.)
β β β βββ non_max_suppression.py : NMS implementation
β β β βββ processing.py : box processing
β β β βββ visualization.py : box extraction / drawing
β β βββ custom_cameras.py : custom camera implementations
β β βββ image_io.py : image loading / writing
β β βββ image_normalization.py : normalization schema
β β βββ image_processing.py : image processing utilities
β βββ keras : keras and hardware acceleration utilities
β β βββ ops : operation interfaces for different backends
β β β βββ builder.py : operation builder
β β β βββ core.py : core operations
β β β βββ execution_contexts.py : execution context management
β β β βββ image.py : image operations
β β β βββ linalg.py : linear algebra operations
β β β βββ math.py : mathematical operations
β β β βββ nn.py : neural network operations
β β β βββ numpy.py : numpy-compatible operations
β β β βββ random.py : random operations
β β βββ runtimes : model runtime implementations
β β β βββ onnx_runtime.py : ONNX runtime
β β β βββ runtime.py : base runtime class
β β β βββ saved_model_runtime.py : saved model runtime
β β β βββ tensorrt_llm_runtime.py : TensorRT LLM runtime
β β β βββ tensorrt_runtime.py : TensorRT runtime
β β βββ compile.py : graph compilation features
β β βββ gpu.py : GPU utilities
β βββ text : text-related features
β β βββ abreviations
β β βββ parsers : document parsers (new implementation)
β β β βββ combination.py : box combination for parsing
β β β βββ docx_parser.py : DOCX document parser
β β β βββ java_parser.py : Java code parser
β β β βββ md_parser.py : Markdown parser
β β β βββ parser.py : base parser implementation
β β β βββ pdf_parser.py : PDF parser
β β β βββ py_parser.py : Python code parser
β β β βββ txt_parser.py : text file parser
β β βββ cleaners.py : text cleaning methods
β β βββ ctc_decoder.py : CTC-decoding
β β βββ metrics.py : text evaluation metrics
β β βββ numbers.py : numbers cleaning methods
β β βββ sentencepiece_tokenizer.py : sentencepiece tokenizer interface
β β βββ text_processing.py : text processing functions
β β βββ tokenizer.py : tokenizer implementation
β β βββ tokens_processing.py : token-level processing
β βββ threading : threading utilities
β β βββ async_result.py : asynchronous result handling
β β βββ priority_queue.py : priority queue with order consistency
β β βββ process.py : process management
β β βββ stream.py : data streaming implementation
β βββ comparison_utils.py : convenient comparison features for various data types
β βββ distances.py : distance and similarity metrics
β βββ embeddings.py : embeddings saving / loading
β βββ file_utils.py : data saving / loading
β βββ generic_utils.py : generic features
β βββ plot_utils.py : plotting functions
β βββ sequence_utils.py : sequence manipulation
β βββ wrappers.py : function wrappers and decorators
βββ example_audio.ipynb
βββ example_custom_operations.ipynb
βββ example_generic.ipynb
βββ example_image.ipynb
βββ example_text.ipynb
βββ LICENSE
βββ Makefile
βββ README.md
βββ requirements.txt
The loggers
module is independant from the utils
one, making it easily reusable / extractable.
See the installation guide for a step-by-step installation π
Here is a summary of the installation procedure, if you have a working python environment :
- Clone this repository :
git clone https://github.com/yui-mhcp/data_processing.git
- Go to the root of this repository :
cd data_processing
- Install requirements :
pip install -r requirements.txt
- Open an example notebook and follow the instructions !
The utils/{audio / image / text}
modules are not loaded by default, meaning that it is not required to install the requirements for a given submodule if you do not want to use it. In this case, you can simply remove the submodule and run the pipreqs
command to compute a new requirements.txt
file !
Important Note : no backend (i.e., tensorflow
, torch
, ...) is installed by default, so make sure to properly install them before !
- Make example for audio processing
- Make example for image processing
- Make example for text processing
- Make example for plot utils
- Make example for embeddings manipulation
- Make example for the
producer-consumer
utility - Make the code keras-3 compatible
- Update the
audio
module - Update the
image
module - Update the
text
module - Update the
utils
module - Make unit-testing to check correctness for the different keras backends
- Make unit-testing for the
graph_compile
andexecuting_eagerly
in all backends - Make every function compatible with
tf.data
, no matter the keras backend (seeexample_custom_ops.ipynb
for more information)
- Update the
- Enable any backend to be aware of XLA/eager execution (i.e.,
executing_eagerly
function) - Enable
graph_compile
to support all backends compilation-
tensorflow
backend (tf.function
) -
torch
backend (torch.compile
) -
jax
backend (jax.jit
)
-
- Auto-detect
static_argnames
for thejax.jit
compilation - Allow
tf.function
withgraph_compile
regardless of thekeras
backend - Add GPU features for all backends
-
tensorflow
backend -
torch
backend -
jax
backend
-
- Extract audio from videos
- Enables audio playing without
IPython.display
autoplay feature - Implement specific
Mel spectrogram
implementations - Run the
read_audio
function intf.data
pipeline - Support audio formats :
-
wav
-
mp3
- Any
librosa
format - Any
ffmpeg
format (video support)
-
- Add image loading / writing support
- Add video loading / writing support
- Add support for rotated bounding boxes
- Implement a keras 3 Non-Maximal Suppression (NMS)
- Implement the Locality-Aware NMS (LaNMS)
- Support text encoding in
tf.data
pipeline - Implement text cleaning
- Abreviation extensions
- Time / dollar / number extensions
- unicode convertion
- Support token-splitting instead of word-splitting in
TextEncoder
- Support
transformers
tokenizers convertion - Support
sentencepiece
encoders - Extract text from documents
-
.txt
-
.md
-
.pdf
-
.docx
-
.html
-
.epub
-
- Implement token-based logits masking
- Implement batch text encoding
- Add custom tokens to
Tokenizer
- Implement CTC-decoding in keras 3 (Β΅already implemented in
keras 3.3
*)
- Make subplots easier to use via
args
andkwargs
- Make custom plot functions usable with
plot_multiple
- Add 3D plot / subplot support
- Implement custom plotting functions
- Spectrogram / attention weights
- Audio waveform
- Embeddings (d-dimensional vectors projected in 2D space)
- 3D volumes
- Classification result
- Confusion matrix (or any matrix)
Contacts :
- Mail :
[email protected]
- Discord : yui0732
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). See the LICENSE file for details.
This license allows you to use, modify, and distribute the code, as long as you include the original copyright and license notice in any copy of the software/source. Additionally, if you modify the code and distribute it, or run it on a server as a service, you must make your modified version available under the same license.
For more information about the AGPL-3.0 license, please visit the official website
-
The text cleaning module (
text.cleaners
) is inspired from NVIDIA tacotron2 repository. Their implementation ofShort-Time Fourrier Transform (STFT)
is also available inaudio/stft.py
, adapted inkeras 3
. -
The provided embeddings in
example_data/embeddings/embeddings_256_voxforge.csv
has been generated based on samples of the VoxForge dataset, and embedded with an AudioSiamese model (audio_siamese_256_mel_lstm
).
Tutorials :
- The Keras 3 API which has been (partially) adapted in the
keras_utils/ops
module to enablenumpy
backend, andtf.data
compatibility - The tf.function guide
If you find this project useful in your work, please add this citation to give it more visibility ! π
@misc{yui-mhcp
author = {yui},
title = {A Deep Learning projects centralization},
year = {2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/yui-mhcp}}
}