How to select the right inflation parameter for clustering? #5
Replies: 5 comments 4 replies
-
Hi, good question.
|
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for the suggestion. As you mentioned, I have tried clustering at different inflation parameter like 1.4, 2 and 4. Next I was looking if there are any ways to see clustering at which inflation parameter is better as compared to other inflation parameter. While looking at the mcl manual, I found that we can use certain cluster validation tools like clm dist, clm info and clm meet. I used clm info to compare my clustering and I get the output as a table shown below:
Can someone suggest me as to how can I choose the right inflation parameter from the above table? |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for the explanation. As you mentioned in the previous comment that 'efficiency' tends to favour higher inflation values and it was also the case in previous table, I wanted to see how efficiency pans out over a large range of inflation parameter value. For this I performed clustering at various inflation paramters like 1.5, 2, 2.5... like this upto 7 in the step size of 0.5, and the results are quite interesting to me. Below is the clm info output for different inflation parameters:
Looking at the above table, I can observe that 'efficiency' increases till a certain inflation parameter (3.5 in this case) and then it decreases after that point. Does that mean that 3.5 is a good inflation parameter as compared to other inflation parameters? Also, I came across this where they have used 'modularity' as a measure to pick the best cluster inflation value among the range of inflation values. It would be good I think for my analysis also once it is available in the forthcoming release. |
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks a lot for a good insight on the data. I agree with the idea of considering stability of clusterings across a range of inflation parameter values. I looked at one of your previous comments where you mentioned that we can also use 'clm dist' to calculate the distance between different clusterings as this will give us a idea about how close/different the clusterings are. I think this will also further help us to determine whether we have a stable clustering in our data or not. Below is the table showing distances between different clustering at various inflation parameters generated using clm dist:
I tried to look at the clm dist manual but did not get much idea about how to interpret these distances. It would be great if you can comment about the distances in the above table. |
Beta Was this translation helpful? Give feedback.
-
Below is the table as you requested:
Also. the total number of nodes in the network are 110369 |
Beta Was this translation helpful? Give feedback.
-
Hi everyone,
I am doing a pan-genome analysis for my study and for that I am using a Roary software which basically uses mcl for clustering the protein sequences. I read that on changing the value of the inflation parameter, the granularity of the clusters changes. It would be great if someone can suggest me as to how can I pick the right inflation parameter for clustering?
Beta Was this translation helpful? Give feedback.
All reactions