## Monday, December 11, 2017

### Buku Pengolahan Citra Digital dengan Matlab - Classification Learner Example Datasets

Berikut penulis sajikan dataset yang mungkin berguna untuk melakukan banyak ujicoba teknik classifier https://www.mathworks.com/content/dam/mathworks/mathworks-dot-com/products/statistics/classification-learner/files/ClassificationLearner_Example_Datasets.mat
Terdiri dari

## Fisher Iris

The Fisher Iris dataset consists of samples from three species of iris (Iris setosa, Iris virginica, and Iris versicolor). Features such as the length and the width of the sepals and petals were measured in centimeters.

Number of predictors: 4
Number of observations: 150
Number of classes: 3

## Credit Rating

The credit rating dataset contains financial ratios and industry sectors information for a list of corporate customers. The response variable consists of credit ratings (AAA, AA, A, BBB, BB, B, CCC) assigned by a rating agency.

Number of predictors: 6
Number of observations: 3932
Number of classes: 7

## Ovarian

The ovarian cancer dataset consists of high-resolution ovarian cancer data generated using the WCX2 protein array. The sample set includes 95 controls and 121 ovarian cancers.

Number of predictors: 100
Number of observations: 216
Number of classes: 2

## Don’s Easter Egg

Don’s Easter Egg dataset is a simulated dataset that consists of only 2 variables and 2 classes. Since the separation boundary is highly nonlinear between the two classes, this makes for a challenging exploratory problem.

Number of predictors: 2
Number of observations: 1873
Number of classes: 2

## Arrhythmia

The heart arrhythmia dataset consists of patient information and response variables that indicate the presence and absence of cardiac arrhythmia in the patient. In this medical diagnostic application, misclassifying patient as "normal" may have a more severe consequence than misclassifying a patient as "has arrhythmia."

Number of predictors: 279
Number of observations: 150
Number of classes: 16

## Ionosphere

The ionosphere dataset comprises preprocessed signals obtained from a phased array of 16 high-frequency antennas. Good returned radar signals are those showing evidence of some type of structure in the ionosphere. Bad signals are those that pass through the ionosphere.

Number of predictors: 34
Number of observations: 351
Number of classes: 2

### Buku belajar machine learning dengan matlab - Image Dataset

Bila kita bekerja dengan menggunakan machine learnig, tentu kita membutuhkan  beragam dataset. Penulis sajikan link berikut untuk dataset image beberapa category http://www.vision.caltech.edu/Image_Datasets/Caltech101/ berisi Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato.  The size of each image is roughly 300 x 200 pixels.

## Sunday, December 3, 2017

### Move - Sequence Matcher in python

Adalakanya kita membutuhkan pencocokan string dalam pengolahan data, misalkan untuk spelling karena user bisa saja melakukan kesalahan ejaan. Penulis biasa menggunakan Sequnce Matcher untuk melakukan hal demikian. Misalkan ada sebuah nama desa ‘Menjalin’ ketika user mengetikan ‘Mnjalin’ berapa nilai ratio kecocokannya?