Defining parameters

Defining groups and calculating the average precision (AP) values:

Parameters:

We will consider a small sample dataset to understand how parameters can be defined to calculate the AP values. The following dataset has features from two multi-well plates with associated metadata.

Metadata_perturbation	Metadata_plate	Metadata_Well	Metadata_Sample_type	Feature 1	Feature 2
Treatment1	P1	A1	Treated	1000	300
Treatment2	P1	A2	Treated	300	100
NA	P1	A3	Control	10	500
NA	P1	B1	Control	15	438
Treatment1	P1	B2	Treated	700	400
Treatment2	P1	B3	Treated	250	75
Treatment1	P2	A1	Treated	750	250
Treatment2	P2	A2	Treated	250	150
NA	P2	A3	Control	20	450
NA	P2	B1	Control	17	525
Treatment1	P2	B2	Treated	800	325
Treatment2	P2	B3	Treated	250	87

Suppose a user is interested in computing the AP of the treated samples against the controls, profiles of two different perturbations will be considered a positive pair, and a pair of control and perturbed profiles will be considered a negative pair. Let’s see how we can define the parameters for this particular case:

The following two parameters define the positive pairs,

pos_sameby - takes a list as input. A positive pair is defined using this parameter. In the example above, the perturbed groups are positive pairs, and any metadata that identifies a particular sample as a perturbed sample can be provided here. In this case, it will be the column ‘Metadata_perturbation’. e.g pos_sameby = [‘Metadata_perturbation’]
pos_diffby - takes a list as input. This parameter defines the profiles that should not be considered as a positive pair while computing the metrics. For example, if we would like to avoid replicates of treated/perturbed samples from the same plate or well position, then metadata of those replicates can be provided here. In this case, we will use ‘Metadata_plate’ as the input. e.g pos_diffby = [‘Metadata_plate’]

The following two parameters define the negative pairs,

neg_sameby - takes a list as input. This helps restrict the ‘neg_diffby’ samples that should be considered for calculating the metrics. If one is interested in taking the negative samples only from the same plate as the perturbed samples are, then ‘Metadata_plate’ can be given here. This ensures that control profiles from different plates are excluded for the calculation. e.g neg_sameby = [‘Metadata_plate’]
neg_diffby - takes a list as input. neg_diffby allows us to define what the perturbed samples need to be compared against (i.e whether to be compared against the controls or other perturbed samples). In this specific example, since we intend to differentiate the perturbed samples from the controls ‘Metadata_sample_type’ serves as the input data. e.g neg_diffby = [‘Metadata_sample_type’]

Other parameters,

meta : takes dataframe as input. A dataframe with only the metadata associated with the profiles should be provided.
features : takes a numpy array as an input. An numpy array of all feature values without any NaNs.
batch_size: takes integer as input. This will be the total number of pairs that will be considered for computing AP values.

Note: Define the parameters of the function in the order they appear in the function call.

Once the above parameters are defined the following command can be used to calculate the average precision,

result = copairs.map.average_precision(meta, features, pos_sameby, pos_diffby, neg_sameby, neg_diffby, batch_size)

The output of the above step is a CSV file containing AP values for all the samples along with the details of the number of positive and negative pairs that were used for the calculation.

Calculating the mAP values:

This step groups the profiles based on the ‘sameby’ value provided by the user and calculates the mean of the AP values for each unique group. A single false-discovery rate corrected p-value is also calculated for each of the unique groups.

Parameters:

result - takes the output csv that was obtained in the previous step.
sameby - takes a list as input. In the example that we are discussing, since the perturbed samples are considered positive pairs, the metadata that we used to define pos_sameby can be used here as well (i.e ‘Metadata_perturbation’)
threshold - defines the threshold value below which the calculated mAP values will be considered significant
null_size - takes an integer as input. It defines the number of points in the null distribution.
seed - takes an integer as input.

mAP = mean_average_precision(result, sameby, null_size, threshold, seed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining parameters

Defining groups and calculating the average precision (AP) values:

Parameters:

Calculating the mAP values:

Parameters:

Clone this wiki locally