-
Notifications
You must be signed in to change notification settings - Fork 11
Change Log
ruochiz edited this page Dec 10, 2022
·
48 revisions
- Add Tian et al. biorxiv to the gallery
- Add conda support for Fast-Higashi with noarch build.
- Add more tutorials on fast-higashi
- Adjust Fast-Higashi API for more user-friendly usage.
- Automatically adjust cpu thread for Fast-Higashi
- Add the function to use ENCODE blacklist to filter out contacts
- Major update
- Fast-Higashi with batch effects correction function by
- normalizing the l1 sum of each diagonal per batch to be consistent with the bulk data (motivated by BandNorm/HiCCompare/MultiHiCCompare)
- normalizing coverage of each bin per batch to be consistent with the bulk data
- Update Fast_process.py enabled fast processing of scHi-C data for Fast-Higashi.
- Fast-Higashi memory consumption optimization by using int16/32 instead of long when appropriate. Cut-down memory usage by at least half.
- Fast-Higashi with batch effects correction function by
- Conda install now supports all platform (Note on Nov 27, no it didn't work... still looking into options)
-
Roadmap
- Complete the API of all the CLI functions
-
Major update
- Speed improvement for model training enabled by:
- remove implicit csr_matrix generation
- new dataloader scheme
- move some of the data processing to Process.py
- using multiprocessing cpu to deal with sparse coo matrix generation
- The
Code
dir is renamed tohigashi
for future build of conda packages - Higashi is now on conda:
conda install -c ruochiz higashi
- Add tutorials for 4DN sci-Hi-C (Kim et al.) and Ramani et al.
- Speed improvement for model training enabled by:
-
Feature update
- automatic batch size selection, which improves performance on large datasets (large number of cells or high resolutions)
- new Higashi_Wrapper.py which allows running Higashi in jupyter notebook or custom scripts.
-
Bug fix
- fix Inf numbers in sqrt_norm() function
- Major update
- Add support for the list of contact pairs format (consistent with scHiCluster).
- Major update
- Higashi now supports the ZINB (zero-inflated negative binomial) regression loss mode. It is recommended to use zinb instead of the ranking mode. Classification mode still works well on low-coverage datasets.
- File structure is redesigned with much less temporary files generated
- Runtime optimization, the training speed is optimized through parallelization on the graph construction
- Memory optimization, the memory usage is reduced by using dynamic graph construction
- Improvement on the imputation accuracy, especially for bins with no captured contacts in the original scHi-C contact maps
- Feature update
- Add the default behavior of Merge2Cool.py (merge all cells when not inputing a list)
- Add --output options for scTAD.py and scCompartment.py
Thank @zengguangjie for identifying bugs.
- Bug fix
- Higashi now supports the latest pytorch version.
- When inputing one cell at a time, the program won't throw the exception now.
- The description and the behavior of the
neighbor_num
parameter is now consistent with the hyperparameterk
described in the paper
- Feature update
- Higashi2Scool.py is now functioning properly. Will update the corresponding document soon.
Thank @tarak77 for being the beta user of some of the new features and identifying bugs.
- Feature update
- We now support selecting groups of cells and save the merged imputation results in .cool format. (Merge2Cool.py)
- Remove the requirement of cell_name in data.txt
- Higashi-vis now support displaying cell name that is stored in
label_info.pickle
by the preserved key valuecell_name_higashi
- Use im.show in pyplot instead of seaborn.heatmap for faster rendering
- More reasonable multiprocessing for scCompartment and scTAD (1~3 process for IO intensive jobs and about 20 process for computational intensive jobs)
- The output of scCompartment.py now consists compartment_zscore and compartment_raw as well, which corresponds to z-score normalized scA/B compartment values and unnormalized ones.
- Improve the post-processing steps by merging multiple I/O intensive jobs to one process.
- Improve the documentation of the code usage.
- Bug fix
- Fix the scCompartment.py for chromosomes with only one arm (non-human species).
- Higashi2SCool.py is not functioning correctly. (Will be fixed in the next version)
- Feature update
- Much faster imputation with pytorch sparse operations
- Further improve the imputation results and reduce potential batch effects (corresponding options added to the configuration file)
- Bug fix
- Fix the parser for scA/B compartment calling
- Fix the parser for scTAD calling
- Feature update
- Runtime and memory optimization for processing structured dataframe (with multiprocessing support).
- Add options for not imputing (deprecated after 2021-04-01 update)
- Add options for customizable epoch numbers and automatically loading previous models
- Improve the imputation results
- Bug fix
- Previous version has an error that when the
data.txt
include a chromosome that is not included in the chrom_list, the interactions of that chromosome would be randomly included. - Previous version used wrong version of code for batch effects removal.
- Previous version has an error that when the
-
Feature update
- Post processing of the Higashi-main results
- Merge hdf5 results from multiple process
- Match the distribution of contact map values between the output and the populational Hi-C
- Higashi-vis update
- Include read_count / kernel density estimation / kernel density estimation local as color scheme for Higashi-vis
- Include local neighborhood selection function for Higashi-vis
- Include more colormap options for Higashi-vis
- Include compartment calling options for Higashi-vis
- Higashi-analysis update
- Include A/B sign calibration function and the corresponding script
- Post processing of the Higashi-main results
-
Bug fix
- Fix the calculation of weights of the neighborhood information
- Feature update
- Adding single cell TAD calling code
- Adding single cell compartment calling code
- We now use fbpca to handle PCA of extremely large feature matrices
- Beta version of removing batch effects of scHi-C (by including batch_id as part of the input)
- Memory usage optimization (The memory usage is now 20% of the previous version on the sn-m3c-seq dataset)
- Remove the optional smoothing and quantile normalization options due to computational efficiency
- Allow customizable UMAP/TSNE parameters for Higashi-vis
- Include linear-conv+rwr imputation results for visualization
Higashi ~ ~ Wiki
- Input files
- Usage (API)
- [Fast-Higashi initialized Higashi (Under construction)]
- Runtime of Fast-Higashi