Welcome to Loot.co.za!
Sign in / Register |Wishlists & Gift Vouchers |Help | Advanced search
|
Your cart is empty |
|||
Showing 1 - 3 of 3 matches in All Departments
This book shows how to decompose high-dimensional microarrays into small subspaces (Small Matryoshkas, SMs), statistically analyze them, and perform cancer gene diagnosis. The information is useful for genetic experts, anyone who analyzes genetic data, and students to use as practical textbooks.Discriminant analysis is the best approach for microarray consisting of normal and cancer classes. Microarrays are linearly separable data (LSD, Fact 3). However, because most linear discriminant function (LDF) cannot discriminate LSD theoretically and error rates are high, no one had discovered Fact 3 until now. Hard-margin SVM (H-SVM) and Revised IP-OLDF (RIP) can find Fact3 easily. LSD has the Matryoshka structure and is easily decomposed into many SMs (Fact 4). Because all SMs are small samples and LSD, statistical methods analyze SMs easily. However, useful results cannot be obtained. On the other hand, H-SVM and RIP can discriminate two classes in SM entirely. RatioSV is the ratio of SV distance and discriminant range. The maximum RatioSVs of six microarrays is over 11.67%. This fact shows that SV separates two classes by window width (11.67%). Such easy discrimination has been unresolved since 1970. The reason is revealed by facts presented here, so this book can be read and enjoyed like a mystery novel. Many studies point out that it is difficult to separate signal and noise in a high-dimensional gene space. However, the definition of the signal is not clear. Convincing evidence is presented that LSD is a signal. Statistical analysis of the genes contained in the SM cannot provide useful information, but it shows that the discriminant score (DS) discriminated by RIP or H-SVM is easily LSD. For example, the Alon microarray has 2,000 genes which can be divided into 66 SMs. If 66 DSs are used as variables, the result is a 66-dimensional data. These signal data can be analyzed to find malignancy indicators by principal component analysis and cluster analysis.
This is the first book to compare eight LDFs by different types of datasets, such as Fisher's iris data, medical data with collinearities, Swiss banknote data that is a linearly separable data (LSD), student pass/fail determination using student attributes, 18 pass/fail determinations using exam scores, Japanese automobile data, and six microarray datasets (the datasets) that are LSD. We developed the 100-fold cross-validation for the small sample method (Method 1) instead of the LOO method. We proposed a simple model selection procedure to choose the best model having minimum M2 and Revised IP-OLDF based on MNM criterion was found to be better than other M2s in the above datasets. We compared two statistical LDFs and six MP-based LDFs. Those were Fisher's LDF, logistic regression, three SVMs, Revised IP-OLDF, and another two OLDFs. Only a hard-margin SVM (H-SVM) and Revised IP-OLDF could discriminate LSD theoretically (Problem 2). We solved the defect of the generalized inverse matrices (Problem 3). For more than 10 years, many researchers have struggled to analyze the microarray dataset that is LSD (Problem 5). If we call the linearly separable model "Matroska," the dataset consists of numerous smaller Matroskas in it. We develop the Matroska feature selection method (Method 2). It finds the surprising structure of the dataset that is the disjoint union of several small Matroskas. Our theory and methods reveal new facts of gene analysis.
This is the first book to compare eight LDFs by different types of datasets, such as Fisher's iris data, medical data with collinearities, Swiss banknote data that is a linearly separable data (LSD), student pass/fail determination using student attributes, 18 pass/fail determinations using exam scores, Japanese automobile data, and six microarray datasets (the datasets) that are LSD. We developed the 100-fold cross-validation for the small sample method (Method 1) instead of the LOO method. We proposed a simple model selection procedure to choose the best model having minimum M2 and Revised IP-OLDF based on MNM criterion was found to be better than other M2s in the above datasets. We compared two statistical LDFs and six MP-based LDFs. Those were Fisher's LDF, logistic regression, three SVMs, Revised IP-OLDF, and another two OLDFs. Only a hard-margin SVM (H-SVM) and Revised IP-OLDF could discriminate LSD theoretically (Problem 2). We solved the defect of the generalized inverse matrices (Problem 3). For more than 10 years, many researchers have struggled to analyze the microarray dataset that is LSD (Problem 5). If we call the linearly separable model "Matroska," the dataset consists of numerous smaller Matroskas in it. We develop the Matroska feature selection method (Method 2). It finds the surprising structure of the dataset that is the disjoint union of several small Matroskas. Our theory and methods reveal new facts of gene analysis.
|
You may like...
|