edu.cmu.cs.sb.stem
Class STEM_DataSet

java.lang.Object
  extended by edu.cmu.cs.sb.core.DataSetCore
      extended by edu.cmu.cs.sb.stem.STEM_DataSet

public class STEM_DataSet
extends DataSetCore

Class implementing clustering methods implemented in STEM


Nested Class Summary
static class STEM_DataSet.ProfileRec
          Class implements a record for a profile, storing its ID, the number of genes assigned, the number of genes expected, and the uncorrected p-value of the number of genes assigned versus expected
static class STEM_DataSet.Profilerecdistcomparator
          Compares two profiles records based on their dprofiledist variable
 
Field Summary
static int ALLPERMSTHRESH
           
static double DSAME
           
static double FLOATERROR
           
 double[][] profileavg
          The first index is the profile, the second index is the time point.
 double[][] profilestd
          The first index is the profile, the second index is the time point.
 double[][] profilesumsq
          The first index is the profile, the second index is the time point.
 
Fields inherited from class edu.cmu.cs.sb.core.DataSetCore
badd0, bfullrepeat, bmaxminval, bspotincluded, btakelog, data, dmincorrelation, dsamplemins, dthresholdvalue, genenames, generepeatspottimedata, generepeatspottimepma, genespottimedata, genespottimepma, htFiltered, nmaxmissing, numcols, numrows, otherInputFiles, pmavalues, probenames, sortedcorrvals, szGeneHeader, szInputFile, szProbeHeader
 
Constructor Summary
STEM_DataSet(DataSetCore theDataSetCore, STEM_DataSet theSTEM_DataSet)
          Constructor that simply copies all fields in theDataSetCore and theSTEM_DataSet
STEM_DataSet(java.lang.String szInputFile, int nmaxmissing, double dminclustdist, double dthresholdvalue, double dmincorrelation, double dalphaval, double dpercentileclust, int nmaxchange, int nmaxprofiles, double dmaxcorrmodel, int nsamplesgene, long nsamplesmodel, double[][] modelprofiles, int nfdr, boolean btakelog, boolean bspotincluded, boolean brepeatset, boolean badd0, boolean bmaxminval, boolean ballpermuteval, boolean bfullrepeat)
          Constructor for a STEM method clustering
STEM_DataSet(java.lang.String szInputFile, int nmaxmissing, double dthresholdvalue, double dmincorrelation, int nmaxprofiles, int nmaxchange, boolean btakelog, boolean bspotincluded, boolean brepeatset, boolean badd0, boolean bmaxminval, boolean bfullrepeat)
          Constructor for k-means clustering
 
Method Summary
 void assignall0()
          Sets all genes to be assigned to profile index 0
 boolean closeToAllNeighbors(STEM_DataSet.ProfileRec insertProfile, java.util.TreeSet tsNeighborhood)
          Returns true iff neighborProfile correlation with all profiles in tsNeighborhood is greater than or equal to dminclustdist
 void clusterprofiles(boolean[] significant, java.util.ArrayList clustersofprofiles, boolean bnormal)
          Method for clustering profiles
 void compactprofiles2(double dmaxcorrelationprofiles, int nmaxprofiles)
          Implements the greedy algorithm to select a subset of the candidate model profiles based on dmaxcorrelationprofiles and nmaxprofiles
 void computeaveragetally()
          Computes the expected number of genes assigned to a profile based on a permutation test of time points
 void computeprofilestats()
          Computes the average and standard deviation of the expression values of genes assigned to the same profile
 void computePvaluesAssignments()
          Computes the p-value for genes assigned to the same profile
 void findbestgroupassignments()
          Finds the best profile which matches each gene
 void kmeans()
          Implements the k-means clustering method
 void modelprofilestats(double[] dsumy, double[] dsumysq, double[] dsqrty)
          Computes sum and variance statistics on model profiles
 void tallyassignments()
          Counts the number of genes assigned to each profile
 
Methods inherited from class edu.cmu.cs.sb.core.DataSetCore
addExtraToFilter, averageAndFilterDuplicates, dataSetReader, filterdistprofiles, filterDuplicates, filtergenesgeneral, filtergenesthreshold1point, filtergenesthreshold2, filterMissing, filterMissing1point, logratio2, mergeDataSets
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DSAME

public static double DSAME

FLOATERROR

public static double FLOATERROR

ALLPERMSTHRESH

public static int ALLPERMSTHRESH

profileavg

public double[][] profileavg
The first index is the profile, the second index is the time point. profileavg[i][j] is the average value of all genes assigned to the profile i at the (j+1)^th time point. The first time point which is always 0 is not included.


profilestd

public double[][] profilestd
The first index is the profile, the second index is the time point. profileavg[i][j] is the standard deviation of the value of all genes assigned to the profile i at the (j+1)^th time point. The first time point which is always 0 is not included.


profilesumsq

public double[][] profilesumsq
The first index is the profile, the second index is the time point. profileavg[i][j] is the square of the value of all genes assigned to the profile i at the (j+1)^th time. The first time point which is always 0 is not included.

Constructor Detail

STEM_DataSet

public STEM_DataSet(DataSetCore theDataSetCore,
                    STEM_DataSet theSTEM_DataSet)
Constructor that simply copies all fields in theDataSetCore and theSTEM_DataSet


STEM_DataSet

public STEM_DataSet(java.lang.String szInputFile,
                    int nmaxmissing,
                    double dthresholdvalue,
                    double dmincorrelation,
                    int nmaxprofiles,
                    int nmaxchange,
                    boolean btakelog,
                    boolean bspotincluded,
                    boolean brepeatset,
                    boolean badd0,
                    boolean bmaxminval,
                    boolean bfullrepeat)
             throws java.io.IOException,
                    java.io.FileNotFoundException,
                    java.lang.IllegalArgumentException
Constructor for k-means clustering

Throws:
java.io.IOException
java.io.FileNotFoundException
java.lang.IllegalArgumentException

STEM_DataSet

public STEM_DataSet(java.lang.String szInputFile,
                    int nmaxmissing,
                    double dminclustdist,
                    double dthresholdvalue,
                    double dmincorrelation,
                    double dalphaval,
                    double dpercentileclust,
                    int nmaxchange,
                    int nmaxprofiles,
                    double dmaxcorrmodel,
                    int nsamplesgene,
                    long nsamplesmodel,
                    double[][] modelprofiles,
                    int nfdr,
                    boolean btakelog,
                    boolean bspotincluded,
                    boolean brepeatset,
                    boolean badd0,
                    boolean bmaxminval,
                    boolean ballpermuteval,
                    boolean bfullrepeat)
             throws java.io.IOException,
                    java.io.FileNotFoundException,
                    java.lang.IllegalArgumentException
Constructor for a STEM method clustering

Throws:
java.io.IOException
java.io.FileNotFoundException
java.lang.IllegalArgumentException
Method Detail

tallyassignments

public void tallyassignments()
Counts the number of genes assigned to each profile


computeprofilestats

public void computeprofilestats()
Computes the average and standard deviation of the expression values of genes assigned to the same profile


computePvaluesAssignments

public void computePvaluesAssignments()
Computes the p-value for genes assigned to the same profile


kmeans

public void kmeans()
Implements the k-means clustering method


closeToAllNeighbors

public boolean closeToAllNeighbors(STEM_DataSet.ProfileRec insertProfile,
                                   java.util.TreeSet tsNeighborhood)
Returns true iff neighborProfile correlation with all profiles in tsNeighborhood is greater than or equal to dminclustdist


clusterprofiles

public void clusterprofiles(boolean[] significant,
                            java.util.ArrayList clustersofprofiles,
                            boolean bnormal)
                     throws java.lang.Exception
Method for clustering profiles

Throws:
java.lang.Exception

computeaveragetally

public void computeaveragetally()
Computes the expected number of genes assigned to a profile based on a permutation test of time points


assignall0

public void assignall0()
Sets all genes to be assigned to profile index 0


modelprofilestats

public void modelprofilestats(double[] dsumy,
                              double[] dsumysq,
                              double[] dsqrty)
Computes sum and variance statistics on model profiles


compactprofiles2

public void compactprofiles2(double dmaxcorrelationprofiles,
                             int nmaxprofiles)
Implements the greedy algorithm to select a subset of the candidate model profiles based on dmaxcorrelationprofiles and nmaxprofiles


findbestgroupassignments

public void findbestgroupassignments()
Finds the best profile which matches each gene