diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 890897803b4..f2e9f180ea1 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -31,7 +31,7 @@ To send us a pull request, please: 1. Fork the repository. 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. -3. Look at the [contributor documentation](https://github.com/awslabs/djl/tree/master/docs/development/README.md), especially the docs in bold, for help setting up your development environment and information about various conventions. +3. Look at the [contributor documentation](docs/development/README.md), especially the docs in bold, for help setting up your development environment and information about various conventions. 4. Ensure local tests pass. 5. Commit to your fork using clear commit messages. 6. Send us a pull request, answering any default questions in the pull request interface. diff --git a/docs/development/cache_management.md b/docs/development/cache_management.md index 28efbfdbc09..a38cc61941d 100644 --- a/docs/development/cache_management.md +++ b/docs/development/cache_management.md @@ -1,4 +1,4 @@ -# DJL Cache Management +# Cache Management DJL uses cache directories to store downloaded models and Engine specific native files. By default, cache directories are located at current user's home directory: diff --git a/docs/development/configure_logging.md b/docs/development/configure_logging.md index 5cdf7b2ccea..5c843db8f32 100644 --- a/docs/development/configure_logging.md +++ b/docs/development/configure_logging.md @@ -1,4 +1,4 @@ -# DJL logging configuration +# Logging configuration DJL uses [slf4j-api](http://www.slf4j.org/) to print logs. DJL library itself does not define the logging framework. Instead, users have to choose their own logging framework at deployment time. diff --git a/docs/development/memory_management.md b/docs/development/memory_management.md index e9689aac88c..a3c332dcfdf 100644 --- a/docs/development/memory_management.md +++ b/docs/development/memory_management.md @@ -26,7 +26,6 @@ Here are the rule of thumb: ## Inference case For the majority of the inference cases, you would be working on the ProcessInput and ProcessOutput. Make sure all temporary NDArrays are attached to the NDManager in TranslatorContext. -Here is the reference code [processInput](https://github.com/awslabs/djl/blob/72dbfa895329df77f980e3d59c98f27ff1c9b3a3/api/src/main/java/ai/djl/modality/cv/translator/BaseImageTranslator.java#L59). Note that if you don't specify NDManager in a NDArray operation, it uses the NDManger from the input NDArray. ## Training @@ -37,7 +36,7 @@ The intermediate NDArrays involving in training case are usually In general, all the parameters in the model should be associated with Model level NDManager. All of the input and output NDArrays should be associated with one NDManager which is one level down to the model NDManager. -Please check if you call [batch.close()](https://github.com/awslabs/djl/blob/468ce0d686758c46b3a62f6c18a084e80846bd8d/api/src/main/java/ai/djl/training/EasyTrain.java#L41) +Please check if you call [batch.close()](https://javadoc.io/static/ai.djl/api/0.6.0/ai/djl/training/dataset/Batch.html#close--) to release one batch of the dataset at the end of each batch. If you still see the memory grows as the training process goes, it is most likely that intermediate NDArrays are attached to the Model(Block) parameter level. As a result, those NDArrays would not closed until the training is finished. diff --git a/docs/development/troubleshooting.md b/docs/development/troubleshooting.md index 10bd75f9e05..efd4fa83994 100644 --- a/docs/development/troubleshooting.md +++ b/docs/development/troubleshooting.md @@ -15,7 +15,7 @@ ai.djl.engine.EngineException: No deep learning engine found. at ai.djl.examples.training.TrainPikachu.main(TrainPikachu.java:72) [main/:?] ``` -#### 1. Engine dependency is missing +### 1. Engine dependency is missing DJL currently supports four engines: MXNet, PyTorch, TensorFlow(experimental) and FastText. Please includes at least one of those engines and their native library as dependencies. For example, adding MXNet engine dependencies: @@ -47,7 +47,7 @@ Maven: ``` -#### 2. Intellij Issue +### 2. Intellij Issue The error may appear after running the `./gradlew clean` command: This issue is caused by a mismatch between IntelliJ and the Gradle runner. To fix this, navigate to: `Preferences-> Build Execution Deployment -> Build Tools -> Gradle`. Then, change the `Build and running using:` option to `Gradle`. @@ -62,7 +62,7 @@ Then, right click the resources folder and select `Rebuild`. ![FAQ1](https://djl-ai.s3.amazonaws.com/resources/images/FAQ_engine_not_found.png) -#### 3. UnsatisfiedLinkError issue +### 3. UnsatisfiedLinkError issue You might see the error when DJL tries to load the native library for the engines, but some shared libraries are missing. Let's take the PyTorch engine as an example. DJL loads libtorch.dylib when creating the Engine instance. @@ -81,7 +81,7 @@ libtorch.dylib: It shows the `libtorch.dylib` depends on `libiomp5.dylib` and `libc10.dylib`. If one of them is missing, it throws an `UnsatisfiedLinkError` exception. If you are using `ai.djl.{engine}:{engine}-native-auto`, please create an issue at `https://github.com/awslabs/djl`. -#### 4. Failed to extract native file issue +### 4. Failed to extract native file issue Sometimes you may only have read-only access on the machine. It will cause a failure during engine loading because the cache attempts to write to the home directory. For more information, please refer to [DJL Cache Management](cache_management.md). diff --git a/docs/faq.md b/docs/faq.md index 00e0745d3bd..94c7edeee1d 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -1,6 +1,6 @@ # FAQ -##### 1. Why Deep Java Library (DJL)? +### 1. Why Deep Java Library (DJL)? - Prioritizes the Java developer's experience - Makes it easy for new machine learning developers to get started @@ -10,7 +10,7 @@ - Allows developers to write code once and run it on any deep learning engine - Allows developers to use engine specific features -##### 2. Which DL engines can I run with DJL? +### 2. Which DL engines can I run with DJL? While DJL is designed to be engine-agnostic and to run with the any engine, we currently support the following engines: @@ -19,10 +19,10 @@ support the following engines: - TensorFlow (Experimental - inference only) - fastText -##### 3. Does DJL support inference on GPU? +### 3. Does DJL support inference on GPU? Yes. DJL does support inference on GPU. If GPUs are available, DJL automatically detects the GPU, and runs inference on a single GPU by default. -##### 4. Does DJL support training on GPU? +### 4. Does DJL support training on GPU? Yes. DJL offers multi-GPU support. DJL can automatically detect if GPUs are available. If GPUs are available, it will run on a single GPU by default, unless the user specifies otherwise. @@ -39,19 +39,19 @@ setting the devices. For example, if you have 7 GPUs available, and you want the All of the examples in the example folder can be run on multiple GPUs with the appropriate arguments. Follow the steps in the example to [train a ResNet50 model on CIFAR-10 dataset](https://github.com/awslabs/djl/blob/master/examples/docs/train_cifar10_resnet.md#train-using-multiple-gpus) on a GPU. -##### 5. Does DJL support inference on multiple threads? +### 5. Does DJL support inference on multiple threads? Yes. DJL offers multi-threaded inference. If using the MXNet engine for a multi-threaded inference case, you need to specify the 'MXNET_ENGINE_TYPE' environment variable to 'NaiveEngine'. For more information, see the -[Multi-threaded inference example](https://github.com/awslabs/djl/blob/master/examples/docs/multithread_inference.md). +[Multi-threaded inference example](../examples/docs/multithread_inference.md). -##### 6. Does DJL support distributed training? +### 6. Does DJL support distributed training? DJL does not currently support distributed training. -##### 7. Can I run DJL on other versions of MxNet? -This is not officially supported by DJL, but you can follow the steps outlined in the [troubleshooting document](https://github.com/awslabs/djl/blob/master/docs/development/troubleshooting.md#3-how-to-run-djl-using-other-versions-of-mxnet) +### 7. Can I run DJL on other versions of MxNet? +This is not officially supported by DJL, but you can follow the steps outlined in the [troubleshooting document](development/troubleshooting.md#4-how-to-run-djl-using-other-versions-of-mxnet) to use other versions of MXNet or built your own customized version. -##### 8. I have a model trained and saved by another DL engine. Can I load that model on to DJL? +### 8. I have a model trained and saved by another DL engine. Can I load that model on to DJL? While DJL is designed to be engine-agnostic, here is a list of the DJL engines and the formats they support: - MXNet @@ -62,6 +62,8 @@ While DJL is designed to be engine-agnostic, here is a list of the DJL engines a - TensorFlow - .pb format - Keras model - DJL only supports the [SavedModel API](https://www.tensorflow.org/guide/keras/save_and_serialize). The .h5 format is currently not supported +- ONNX Model + - .onnx format - fastText - .bin format - .ftz format diff --git a/docs/how_to_collect_metrics.md b/docs/how_to_collect_metrics.md index 23571df15c1..d9fde6509d1 100644 --- a/docs/how_to_collect_metrics.md +++ b/docs/how_to_collect_metrics.md @@ -82,7 +82,7 @@ metrics.addMetric("end_to_end_latency", (end-begin) / 1_000_000f, "ms"); For more examples of metrics use, as well as convenient utilities provided by DJL, see: -- [MemoryTrainingListener](https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/training/listener/MemoryTrainingListener.java) for memory consumption metrics -- [Trainer](https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/training/Trainer.java) for metrics during training -- [Predictor](https://github.com/awslabs/djl/blob/master/api/src/main/java/ai/djl/inference/Predictor.java) for metrics during inference +- [MemoryTrainingListener](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/listener/MemoryTrainingListener.html) for memory consumption metrics +- [Trainer](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/training/Trainer.html) for metrics during training +- [Predictor](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/inference/Predictor.html) for metrics during inference diff --git a/docs/model-zoo.md b/docs/model-zoo.md index d117801a43b..a72cc016744 100644 --- a/docs/model-zoo.md +++ b/docs/model-zoo.md @@ -2,7 +2,7 @@ Deep Java Library's (DJL) Model Zoo is more than a collection of pre-trained models. It's a bridge between a model vendor and a consumer. It provides a framework for developers to create and publish their own models. -A ZooModel has the following characteristics: +A [ZooModel](https://javadoc.io/doc/ai.djl/api/latest/ai/djl/repository/zoo/ZooModel.html) has the following characteristics: - Globally unique: similar to Java maven packages, a model has its own group ID and artifact ID that uniquely identify it. - Versioned: the model version scheme allows developers to continuously update their model without causing a backward compatibility issue. diff --git a/examples/README.md b/examples/README.md index bc905e9c179..5ef147164cd 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,4 +1,4 @@ -# DJL - examples +# Examples This module contains examples to demonstrate use of the Deep Java Library (DJL). diff --git a/model-zoo/README.md b/model-zoo/README.md index 1d4d2193252..8ef74899304 100644 --- a/model-zoo/README.md +++ b/model-zoo/README.md @@ -1,4 +1,4 @@ -# DJL - model zoo +# Model Zoo ## Introduction