-
Notifications
You must be signed in to change notification settings - Fork 35
readfish align
Newer versions of ReadFish will be removing this functionality. It was intended as a demonstration of how this experimental approach might be applied and used Run Until features. Newer code (such as BOSS_RUNS) provides more sophisticated approaches for this type of analysis.
The purpose of readfish align is to give uniform coverage over a range of sequences present on a flowcell. As an example, we have used this to reach 40x coverage on a mixed bacterial/eukaryotic sample available from Zymobiomics.
The underlying asssumption with readfish align is that the user is aware of the makeup of the sample being sequenced. If you wish to experiment with samples with an unknown composition, you should investigate readfish centrifue.
readfish align -h
usage: readfish align [-h] [--host HOST] [--port PORT] --device DEVICE
--experiment-name EXPERIMENT-NAME [--workers WORKERS]
[--channels CHANNELS CHANNELS] [--run-time RUN-TIME]
[--unblock-duration UNBLOCK-DURATION]
[--cache-size CACHE-SIZE] [--batch-size BATCH-SIZE]
[--throttle THROTTLE] [--dry-run]
[--log-level LOG-LEVEL] [--log-format LOG-FORMAT]
[--log-file LOG-FILE] --toml TOML [--paf-log PAF_LOG]
[--chunk-log CHUNK_LOG] [--watch FOLDER] [--depth DEPTH]
[--threads THREADS]
optional arguments:
-h, --help show this help message and exit
--host HOST MinKNOW server host (default: 127.0.0.1)
--port PORT MinKNOW server port (default: 9501)
--device DEVICE Name of the sequencing position e.g. MS29042 or X1
etc.
--experiment-name EXPERIMENT-NAME
Describe the experiment being run, enclose in quotes
--workers WORKERS Number of worker threads (default: 1)
--channels CHANNELS CHANNELS
Channel range to use as a sequence, expects two
integers separated by a space (default: [1, 512])
--run-time RUN-TIME Period (seconds) to run the analysis (default:
172,800)
--unblock-duration UNBLOCK-DURATION
Time, in seconds, to apply unblock voltage (default:
0.1)
--cache-size CACHE-SIZE
The size of the read cache in the ReadUntilClient
(default: 512)
--batch-size BATCH-SIZE
The maximum number of reads to pull from the read
cache (default: 512)
--throttle THROTTLE Time interval, in seconds, between requests to the
ReadUntilClient (default: 0.4)
--dry-run Run the ReadFish Until experiment without sending
unblock commands
--log-level LOG-LEVEL
One of: debug, info, warning, error or critical
--log-format LOG-FORMAT
A standard Python logging format string (default:
'%(asctime)s %(name)s %(message)s')
--log-file LOG-FILE A filename to write logs to, or None to write to the
standard stream (default: None)
--toml TOML TOML file specifying experimental parameters
--paf-log PAF_LOG PAF log
--chunk-log CHUNK_LOG
Chunk log
--watch FOLDER Top Level Folder containing fastq reads.
--depth DEPTH Desired coverage depth (default 30)
--threads THREADS Set the number of default threads to use for threaded
tasks (default 2)
Minimal commands for running readfish align are:
readfish align --device <DEVICE_ID> --toml <your_toml_file.toml> --depth <target_depth_e.g 30> --exp <"Free text describing the experiment.">
For a toml file the configuration should be as follows:
[caller_settings]
host = "127.0.0.1"
port = 5555
config_name = "dna_r9.4.1_450bps_fast"
[conditions]
reference = "<path to your reference>"
[conditions.0]
name = "Gradual Rejection"
targets = [ ]
control = false
max_chunks = inf
min_chunks = 0
multi_on = "unblock"
single_on = "unblock"
no_map = "proceed"
no_seq = "proceed"
multi_off = "stop_receiving"
single_off = "stop_receiving"
This toml file will reject any read which is found in the reference and matches a target. The starting toml file does not have to contain any targets, meaning that all reads will be sequenced. The sequenced reads will be mapped back to the reference provided. Once coverage for a specific sequence in the reference has exceeded the threshold set, the sequence name is added to the toml file targets automatically and the sequence is rejected. To do this, a new toml file is created with an .toml_live
extension.
[caller_settings]
host = "127.0.0.1"
port = 5555
config_name = "dna_r9.4.1_450bps_fast"
[conditions]
reference = "<path to your reference>"
[conditions.0]
name = "Gradual Rejection"
targets = [ "Saccharomyces_cerevisiae_V", "Listeria_monocytogenes_complete_genome",]
control = false
max_chunks = inf
min_chunks = 0
multi_on = "unblock"
single_on = "unblock"
no_map = "proceed"
no_seq = "proceed"
multi_off = "stop_receiving"
single_off = "stop_receiving"
Once all sequences in the reference have been added to the toml file, it means that the average coverage depth for all sequences is at the minimum level specified in the readfish align command and the run will be stopped.
readfish align will log a series of messages to the minKNOW interface during the run to inform the user about events.
These will look as follows:
ReadFish will also tell you what it is doing:
and let you know it really is going to do stuff....:
Then over time readfish will tell you as it is adding targets to be rejected:
Readfish will also log the proportion of reads being accepted at any given time:
Finally ReadFish will tell you when the job is done and stop the sequencing run.