Thursday, October 3, 2019

Solution Exercise Essay Example for Free

Solution Exercise Essay AM by submitting on blackboard Guideline This homework is an â€Å"individual† data mining experiment. Plagiary is definitely not allowed. If any classmate or other person helps you on doing this homework, you need to specify who and which portion you got help from. You credit will be given to the helper (it is fair, right). The helper should also mention who get your assistant on this homework. Zero point will be given if your homework is found to be the same with others without any mention. You are required to use computer language (C, C++, or Java) or computer software (matlab or Weka) to do the data mining experiment and analysis. Other software or language is allowed based on the approval of the instructor. You need to specify which software or program you are using for this homework. If you use other person’s program or any program downloaded from internet, you need to address where you get it, and who is the author. If you decide to write your own program, please submit your source code. Extra credit will be given if you write your own program on any portion of this homework. No matter which kind of method you choose for this homework, you need to be careful on adjusting the parameter, if there is. Please do an experiment on how to obtain the better parameters and write down you analysis on this homework. You need to submit your homework written by MS Word through blackboard system. The homework should not longer than 10-page limit (source code should put in the appendix). No late homework is allowed! I. Congressional Voting Records (50%) http://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records Go to the UCI Machine Learning Repository to download the â€Å"Congressional Voting Records Data Set† or download house-votes-84.csv file from blackboard. Then, chose at least two different classification methods (decision tree, rule-based, Bayesian, ANN, SVM, Ensemble) to predict party affiliation (democrat or republican). You can use any kind of statistical software (such as mintab) or Excel to show the data exploration. Please PLOT it! How do you handle the missing values? The reasons of choosing classification methods Classification method implementation or software usage Specify how you do the experiment? Which software package you are using? Or, you write your own program? Also, you need to specify all the parameters you are using for the chosen methods, and explain how you make the adjustment. Result of 10-fold cross validation for each method Show your best result! Model comparison II. Wisconsin Diagnostic Breast Cancer (WDBC) (50%) http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) Go to the UCI Machine Learning Repository to download â€Å"Wisconsin Diagnostic Breast Cancer (WDBC)† dataset or download wdbc-data.csv file from blackboard. Please make sure you download wdbc.data not wpbc.data. Then, chose at least two different classification methods (decision tree, rule-based, Bayesian, ANN, SVM, Ensemble) to predict diagnostic result (malignant or benign). You homework should contains following sections. 1. Data exploration You can use any kind of statistical software (such as mintab) or Excel to show the data exploration. Please PLOT it! 2. The reasons of choosing classification methods 3. Apply one dimension reduction technique on the dataset 4. Classification method implementation or software usage Specify how you do the experiment? Which software package you are using? Or, you write your own program? Also, you need to specify all the parameters you are using for the chosen methods, and explain how you make the adjustment. 5. Result of 10-fold cross validation for each method Show your best result! 6. Model Comparison III. Extra credit (20%) Review some classification papers (at least one paper for each dataset) which use these two dataset for their experiment. Compare your result with them. Summarize what you found.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.