Skip to content

Index measurement ideas

griggheo edited this page Feb 23, 2011 · 1 revision

Micah Elliott

  • Module and package naming conventions. (PEP-8 describes module-naming, but I see this broken more often than followed in practice. And it is silent on package names, but the tutorial uses capitalized names.) Some consistency here would be nice.
  • Existence of standard files. ESR goes into detail on this in his "Art of UNIX Programming" book (pp 452).
  • Existence of standard directories (those I mentioned before).
  • Output of checkee "--help" should satisfy some standards. I presently check my own tools by running "help2man" which forces me to setup optparse to follow a strict format. I have some active RFEs on optik (optparse) to address this.
  • Use of distutils. Maybe just a check for setup.py ?
  • Consistency of module length. Not sure about this one, but you might lower the score if some package modules are 10 lines while others are 10KLOC.
  • Number of modules per package. Maybe 4..20 is a good amount?
  • Extra points for existence of something like "api.html", which indicates that epydoc/pydoc generated API info.
  • Extra points for .svn/CVS/RCS directories indicating that version control is in place. Maybe even glarking of version numbers where high numbers indicate that code is checked in frequently.
  • Use of ReST in documentation, or even in docstrings.
  • Count of unit tests. Do module names map to test_modulename in test directory? How many testXXX functions exist?
  • A summary calculation of pylint/pychecker scores for each module.
  • Point deduction (or fail!) if any .doc/.xls, etc. files included in distribution.
  • Extra points for use of modules that indicate extra usability was incorporated, such as: gettext (multi-language), optparse (clean UI), configparser (fine control), etc.
  • A PEP describing the conventions (though some will argue that PEPs should be enforcable by the compiler, so maybe just a "Cheesecake Convention" document).
  • And of course anything that CPANTS offers :-)

Magnus Lie Hetland

  • I think some sort of "vitality"/viability would be useful. How long has the project existed? When was the last release? When was the last release relative to the general frequency of releases?
  • Perhaps some form of "impact factor" (similar to that used in academia), measuring how many other projects are based on this one, for example. Similarly, one could have a form of "brittleness", measuring how many other project this project is based on. I.e., if this project is based on another rather flaky project, this project inherits the flakiness.
  • I think some the style stuff may be a bit dubious. At least be sure not to impose limits that arent universally accepted. (For example, not everyone is a big fan of ReST...)
  • Perhaps, if you want to do a ranking/"shootout", the user could check the boxes for the items he/she would like to have measured, giving the items various weights (as in the programming language shootout)?

Michael Bernstein

Regarding 'vitality', Sourceforge has an 'activity' index, but I'm not sure how you would extract similar information unless you're hosting the project. For that matter, I'm not sure what counts as 'activity' on SF. If this can be done in a general way (ie, not just for SF projects) it might help. Here are some possibilities:

  • find whether there are mailing lists for the project/package
  • check to see whether the mailing list archives have recent activity
  • measure mentions of the project/package in: Google, comp.lang.python & comp.lang.python.announce (via groups.google.com), GMANE.org
  • find whether there is a wiki for the project/package
  • check to see whether the wiki has recent activity
  • check wiki pagerank (this should help guard against wikispam inflated 'activity')

Cameron Lee : Unittests / doctests etc

  • Number of tests
  • Number of tests that pass / fail. (This will be affected by dependancies! - although so would a pylint score. if you are using eggs, it is a great way to check that the 'requires' is up to date.
  • would love to see lines of code tested ( in addition to number of functions / methods). Especially for all those branches of 'if' statements.

Will Guaraldi : required files

  • For the required files and directories, it'd be nice to have Cheesecake output some documentation as to where to find documentation on such things. For example, what content should README contain and why shouldn't I put the acknowledgements in the README? I don't know if this is covered in the Art of Unix Programming or not (mentioned above)--I don't have a copy of that book. Clearly we're creating standards here, so those standards should have some documentation.
  • Would it make sense to use a 3/4 rule for non-critical required files? If a project has 3/4 of the non-critical required files, they get x points, otherwise they get 0 points. This would be instead of the 5 points per non-critical required file.
  • I think that LICENSE should be a critical file--all projects should have a clearly denoted license.

Michael Bernstein : RE: required files

According to TAOUP page 452 http://www.faqs.org/docs/artu/ch19s02.html#distpractice section 19.2.4.3 the standard file naming conventions are as follows:

Here are some standard top-level file names and what they mean. Not every distribution needs all of these.

  • README - The roadmap file, to be read first.
  • INSTALL - Configuration, build, and installation instructions.
  • AUTHORS - List of project contributors (GNU convention).
  • NEWS - Recent project news.
  • HISTORY - Project history.
  • CHANGES - Log of significant changes between revisions.
  • COPYING - Project license terms (GNU convention).