Personal tools
 
Views

Minutes for 2007-11-09

In Attendance: Greg, Nicholas, Nitin, Ross, Scott

Note taker for today: Ross

Nitin's summary of features

November2007Features

Discussion about GeneticsPed problem

GeneticsPed didn't make it into the BioC 2.1 release last month - example error that appeared probably after the changeover to R.2.5 as the development environment

Nitin's list of undocumented features

See November2007Features

No need to document unexported functions.

Links on rgenetics.org front page for developers

Need a third link to the BioC development SVN for things ready to go into the next BioC release. SF svn remains in use for unreleased packages in development.

More efficient representation:

BOOST library looks promising.

Efficiency:

Nicholas points out that profiling shows that column names are being used to access columns rather than indexes in GeneticsBase leading to bad slowdown. Fixes are being worked on. Hashing or some sophisticated matching is needed for repeated column and row access. Read column names, get indexes at start and use those. Problem will remain even with more efficient underlying data structures - precompute accessors and put them into a hash table..

Scott reminded us that banging on is what we need to get things up to speed!

Considerations for efficient GeneSet

Discussion about GeneSet row and column name cache stored hashed so lookups can be quicker. Not currently cached - needs to be done as part of new representation. Privacy for matrix dim names to keep synchronization with the accessors. Worry about developers cheating to get directly at values - we can't stop that from happening.

Greg pointed out that the allele summary functions were designed for small candidate gene data sets so are slow because they lookup names repeatedly. This could be changed easily but won't fix the general problem. Discussion about automagically hashing if no hash, and using the hash for lookup - any accessor that changes a hashed index would need to update the hash.

For subsetting, Greg proposed a mapping object - as an index set for an existing object to avoid R's innate copying behaviour in creating subsets. Apparently not always needed for S4 classes. Scott proposed lists of vectors to subset rows and columns as a "cheap" way to create subsets.


 

Powered by Plone, the Open Source Content Management System

This site conforms to the following standards: