Skip to content

VolSampler

Adrian Quintana edited this page Dec 11, 2017 · 1 revision

convert_vol2pseudo

Purpose

convert_vol2pseudo is used to create a set of pseudoatoms representing the density of an EM volume. This is useful for the vector quantization process needed in problems like docking, approximation of structures by Alpha Shapes, Normal Mode Analysis, etc.

The program can do the conversion using two different algorithms:

  • In the first one, the volume is approximated by Gaussians of a desired size. The user can specify whether the Gaussians can have different intensities or not, as well as the level of precision with which she desires to approximate the input EM volume. * In the second one, the volume is sampled as if it were a probability density function: the more intense is the volume at some point, the more samples will be taken from that location.

Method 1: Volume approximation


$ convert_vol2pseudo ...


Parameters

  • `` The input volume file (in Spider format)
  • `` The output of the program is: [rootname].pdb (PDB file with the pseudo atoms), [rootname].vol (the approximation volume), [rootname].hist (histogram of the Gaussian intensities), [rootname]_rawDiff.vol (difference between the input volume and its approximation), [rootname]_relativeDiff.vol (the raw difference divided by the input volume at that location; this gives an idea of how much the error represents with respect to the input);
  • -sigma [sigma1.5] = Standard deviation (in Angstroms) of the gaussians
  • -initialSeeds [N300] = Number of pseudoatoms at the beginning
  • -growSeeds [percentage30] = At each iteration the smallest percentage/2 pseudoatoms will be removed, and percentage new pseudoatoms will be created.
  • -stop [stop0.001] = At each iteration the current number of gaussians will be optimized until the average error does not decrease at least this amount relative to the previous iteration.
  • -targetError [e0.02] ).
  • `` Don't allow optimization by moving atoms
  • `` Don't allow optimization by changing the intensity of the atoms
  • -intensityColumn [coloccupancy] = This option specifies where to write the different intensities (in case this option is activated) of the Gaussians. By default it is written in the "occupancy" column of the PDB file, but it can be written in the "Bfactor" column, as well. Valid values are, therefore, occupancy and Bfactor (pay attention to capital and small letters).
  • -Nclosest [N3] = The distance histogram is built with the Nclosest atoms
  • -minDistance [d0.001] = Avoid that atoms are closer than this minimum distance
  • -sampling_rate [Ts1] = Sampling rate of the volume. It is used to generate the PDB in Angstroms.
  • `` Don't scale atom weight in the output PDB file.
  • -thr [N1] = Number of threads available

Method 2: Volume sampling


$ convert_vol2pseudo ...


Parameters

  • `` The input volume file (in Spider format)
  • `` The binary mask file name (in Spider format). Default:mask.spi
  • `` The file name for the masked original volumen (in Spider format). Default:vol_mask.spi
  • `` The output file name (data file). Default:out.dat
  • `` Set this flag if no mask is going to be used
  • `` Indicates the density threshold (density values <= T will be eliminated). Default:2
  • `` If used, a statistical sampling method will be used. If ommited, then an exhaustive method will be used
  • `` If the statistical sampling is used, this value will indicate the number of points to be generated. Default:100000
  • `` the minimum value of points to be generated to simulate "density". It will only take effect if the exhaustive method is used. Default:0
  • `` the maximum value of points to be generated to simulate "density". It will only take effect if the exhaustive method is used. Default:10
  • `` If used, then 4 dimensional vectors will be generated, the first three components will be the voxels coordinates, the fourth component will be the density

The output file will have the following format:


3 32768
12 34 54
-12 45 76
...
32 45 76


The first line indicates the dimension of the vectors (in this case 3) and the number of vectors (in this case 32768). Each subsequent row indicates voxel coordinates. If -4 parameter is used, then the output file will have the following format:



4 32768
12  34 54  0.87
-12  45 76 0.02
...
32 45  76 0.76


Where each row indicates the voxel coordinates anf the fourth component is the density value of that voxel.

If-sampling flag is used, then, a statistical sampling method is used to select the voxels. This sampling method "selects" the high density voxels with more probability than the low density voxels. If this flag is ommited, then an exhaustive generation of voxel coordinates will be employed. In this case, the density is "simulated" by repeating the same coordinate several times. The number of times that one voxel's coordinates are going to be repeated will linearly depend onminCoord andmaxCoord.

Examples and notes

With method 1


$ convert_vol2pseudo -i myvolume.vol


In the previous example pseudoatoms of standard deviation 1.5 will be added until the approximation error is smaller than 2%. The pseudoatoms are allowed to move and change their intensities.

With method 2


$ convert_vol2pseudo -vname vol.spi -bmname mask.spi -fname data.dat -sampling -npoints 300000


In this example, a Spider volume will be sampled and a set of 300000 high density voxels will be selected The output file holding the voxels's coordinates will be saved indata.dat file.

--Main.AlfredoSolano - 26 Jan 2007

Clone this wiki locally