In recent years, technology has been increasingly subdivided into numerous fields, each of which has witnessed a proliferation of remarkable achievements. However, limited time and resources often confine individuals to focus on only a few domains, or even specific branches within a particular field.
In order to alleviate the impact of rising barriers in various domains on beginners, and to enable them to quickly experiment and set up application environments, we have developed the SJTU-TES
(Shanghai Jiao Tong University Technology Engagement Square) platform.
Through this platform, users can gain insights into cutting-edge research across different domains and conveniently establish development or experimental environments using our interactive space , reproducible repositories , and testing datasets .
🔥 We mark work contributed by SJTU-TES with ⭐.
🔥 We have provided a demonstration video of the sjtu_tes space
here.
🔥 We provide Chinese Requirement Document, Design Document, Testing Document and Deployment Document
🔥 We primarily use the following icons to indicate the organization of each repository.
The corresponding published paper of the work, where "xxxx" refers to the name of the conference or journal in which it was published, and "arXiv" denotes the preprint version.
The corresponding github link of the work.
The storage location of the pre-trained files for this repository (usually hosted on Hugging Face or Google Drive).
The webpage address for this work.
The dataset included with the work itself, as well as the datasets provided by the SJTU-TES team that are relevant to this work.
The reproduction of certain CPU-based work using the free space service provided by Hugging Face. You can visit the corresponding space to experience some practical applications of this work.
There are some repositories that can only be run on GPUs (taking several hours or even days if run on CPUs), making it impractical to use the free space service provided by Hugging Face. Therefore, we provide reproducible repositories (including instructive README.md files) to address this limitation.
Stable Diffusion
, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. stable-diffusion-v1-4
is resumed from stable-diffusion-v1-2
- 225,000 steps at resolution 512x512 on laion-aesthetics v2 5+
and 10 % dropping of the text-conditioning to improve.
The stable-diffusion-v1-5
checkpoint was initialized with the weights of the stable-diffusion-v1-2
checkpoint and subsequently fine-tuned on 595k steps at resolution 512x512 on laion-aesthetics v2 5+
and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Click to view examples we have implemented
- Scarlett, nature, (((beauty))), (((smooth))),white,Highest quality
Latte
, a novel latent diffusion transformer for video generation, utilizes spatio-temporal tokens extracted from input videos and employs a series of Transformer blocks to model the distribution of videos in the latent space. Latte achieves state-of-the-art performance on four standard video generation datasets FaceForensics
, SkyTimelapse
, UCF101
, and Taichi-HD
.
Click to view examples we have implemented
- Yellow and black tropical fish dart through the sea.
- An epic tornado attacking above aglowing city at night.
- Slow pan upward of blazing oak fire in an indoor fireplace.
- A cat wearing sunglasses and working as a lifeguard at pool.
- Sunset over the sea.
- A dog in astronaut suit and sunglasses floating in space.
BLIP-2
, Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, BLIP-2 beats Flamingo on zero-shot VQAv2 (65.0 vs 56.3), establishing new state-of-the-art on zero-shot captioning (on NoCaps 121.6 CIDEr score vs previous best 113.2). Equipped with powerful LLMs (e.g. OPT, FlanT5), BLIP-2 also unlocks the new zero-shot instructed vision-to-language generation capabilities for various interesting applications!
Click to view examples we have implemented
- "Question: what is the main elements in the picture? "
- "Answer: the eiffel tower"
Stable Diffusion v2
, high-resolution image synthesis with latent diffusion models, This stable-diffusion-2 model is resumed from stable-diffusion-2-base (512-base-ema.ckpt) and trained for 150k steps using a v-objective on the same dataset.
Click to view examples we have implemented
- ((two)) ((dogs)) in the picture, ((nature)), (((beauty))), (((smooth))),white,Highest quality
FaceSwap
, a tool that utilizes deep learning to recognize and swap faces in pictures and videos. FaceSwap supports various operating systems(windows
, linux
, macos
) and offers powerful face swapping capabilities, utilizing a modern GPU with CUDA support for optimal performance. With FaceSwap, users can gather photos and videos, extract faces from them, train a model based on the extracted faces, and then seamlessly swap faces in your sources using the trained model.
Roop
, a fantastic tool of taking a video and replace the face in it with a face of users' choices. Users only need one image of the desired face. No dataset, no training.
UniversalFakeDetect
, proposes to perform real-vs-fake classification without learning; i.e., using a feature space not explicitly trained to distinguish real from fake images. The authors use nearest neighbor and linear probing as instantiations of this idea. When given access to the feature space of a large pretrained vision-language model, the very simple baseline of nearest neighbor classification has surprisingly good generalization ability in detecting fake images from a wide variety of generative models.
pygmtools
, Python Graph Matching Tools, provides graph matching solvers in Python. To make researchers' lives easier, pygmtools support various solvers (linear
, quadratic
, multi-graph
, neural
), various backends (numpy
, pytorch
, jittor
, paddle
, tensorflow
, mindspore
). Also, pygmtools is deep-learning-friendly, whose operations are designed to best preserve the gradient during computation and batched operations support for the best performance.
GENN-A*
, Graph Edit Neural Network (GENN), aims to accelerate the A* solver for graph edit distance problem based on Graph Neural Network. GENN-A* aided A* algorithm works by replacing the heuristic prediction module in A* by GNN. Since the accuracy of heuristic prediction is crucial for the performance of A*, this approach can significantly improve the efficiency of A*.
T2T
, Training to Testing. T2TCO framework first leverages the generative modeling to estimate the high-quality solution distribution for each instance during training, and then conducts a gradient-based search within the solution space during testing.
T2T
, Training to Testing. T2TCO framework first leverages the generative modeling to estimate the high-quality solution distribution for each instance during training, and then conducts a gradient-based search within the solution space during testing.
GNetChat
, General Networking Chat Website designed by SJTUGN Group, where students can easily form study groups, create posts, make friends, share essential resources, and collaborate on projects in real-time.
VidFetch
, an open-source dataset download tool to obtain copyright-free videos from various free video websites.
web-cpp
, an online platform that enables users to write and execute C++ code directly within their browsers.
Transmomo
, Invariance-Driven Unsupervised Video Motion Retargeting A lightweight video motion retargeting approach that is capable of transferring motion in spite of structural and view-angle disparities between the source and the target.
EverybodyDanceNow
, A simple method for "do as I do" motion transfer: Given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves.
Openpose
, Real-time multi-person keypoint detection library for pose estimation 2D real-time multi-person keypoint detection.We provide pytorch implementation of openpose including Body and Hand Pose Estimation.
RVM
, Robust High-Resolution Video Matting with Temporal Guidance RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory.
DLSec
, Deep Learning model security evaluation platform Taking attack paradigms and defense means such as anti-sample, data poisoning, backdoor attacks as examples, We studies and implements mainstream offensive and defensive algorithms for deep learning models, and builds a comprehensive and effective evaluation system for deep learning models from the perspectives of white box model and black box model.
WDAD
, Adversarial sample detection based on weak dark textures
UAP
, Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations" a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks.
WAV2COM
, Your Microphone Array Retains Your Identity: A Robust Voice Liveness Detection System for Smart Speakers
Sandbox
, helps you compile and run your C++ code within an isolated Docker image. Using Docker ensures that your code runs consistently and predictably in any Docker-enabled environment. This makes it convenient for you to develop and test your C++ project across different systems or environments.