the process of finding correlations or patterns among the data
- facilitates data exploration
- extract useful knowledge hidden in data
data
... [Show More] mining
using patient data for any purpose beyond providing care for the individual patient brings with it some tricky issues regarding privacy, and keeping the information from falling into the wrong hands. There are significant legal issues related to the use of patient data in data mining efforts, specifically related to the de-identification, aggregation, and storage of the data. Failing to take the appropriate steps when using personal health data as a tool for population health could lead to serious consequences
HIPPA in relation to Data mining
-perform induction on the current data in order to make predictions.
Predictive Data Mining
-ability for a device, machine, etc. to be able to take in numerous types of data and learn from the data in order to produce knowledge.
Meta-learning
- investigates how computers can learn based on data
- automatically learn to recognize complex patterns and make intelligent decisions on their own based on the data
Machine Learning
refers to the process of reducing the inputs for processing and analysis, or finding the most meaningful inputs.
Feature Selection
- be applied to obtain a compressed representation of the data set that is much smaller in volume, yet maintains the integrity of the original data.
- used when the data selected is too complex or huge
Data reduction
to request or seek out additional information on a specific subject. Makes the data more detailed.
drill down
- is an ensemble of models combined sequentially.
- can be used to classify data
- get a meta-learning device, stack the data in the device, the base learner is combined and produces the data information needed.
Stacking
- each of the data classifications are weighted.
- once the system learns, it is able to continuously update and learn which ones are incorrect, and the weight shifts to reflect the accuracy
Boosting
- method used to increase accuracy with data mining
- majority vote; more times a classification is picked, the more reliable the data.
- algorithm creates an ensemble of models for learning scheme where each model gives an equally weighted prediction
Bagging (Bootstrap Aggregating)
DMAIC steps: define, measure, analyze, improve, and control
- can explain why data behaves a certain way
- not necessarily a data mining technique, but a model used to give more of answer to "why" and "how" in regard to data information.
- adds additional steps to mining that yields better results
Six Sigma
is a term that describes the large volume of data - both structured and unstructured - that inundates a business on a day-to-day basis. But it's not the amount of data that's important. It's what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Big Data
how we make sense of the data by converting them from their raw form to a more informative one
- sometimes known as model building or pattern id
- yields a highly predictive, consistent pattern identifying model
-pattern discovery is a complex phase of data mining
Exploratory data analysis (EDA)
due to a need for standardized data mining techniques, this concept and tool was developed.
Sample - selecting the data
Explore - looking for the relationship between variables in data
Modify - methods to select, create, and transform variables in preparation for data modeling
Model - applying various modeling techniques to gain the desired outcome
Assess - looks for reliability and usefulness
SEMMA
Cross Industry Standard Process for Data Mining
six steps: business understanding, data understanding, data preparation, modeling, evaluation, and deployment
most projects move back and forth between steps as necessary
·CRISP-DM
"this data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals."
big data
producing a solution that generates useful forecasting:
1. problem identification
2. exploration of the data
3. pattern discovery
4. knowledge deployment - application to new data to forecast predictions
4 phases of data mining
transform the repositories of big data into comprehensible knowledge that is useful for guiding their practice and facilitating interdisciplinary research
Knowledge Discovery and Research
- data mining method for analyzing outcomes and service use
- used to classify and predict an outcome
Classification and Regression Trees (CART)
1. enhance business aspects
2. help to improve patient care
Benifits of KDD
1. dependent on the use of private health information
2. insure data is de-identified and confidentiality maintained
3. follow changes and specific requirements for compliance with HIPPA laws
ethics of data mining
thoughtful, planned activity that expands or refines knowledge. the purpose of research is to create generalized knowledge.
research
1. manipulation of treatment
2. random assignment to the group
difference between quasi-experimental research and experimental research
- the statistical analysis of a large collection of results from individual studies for the purpose of integrating findings
- the integrative analysis of findings from many studies that examined the same question
meta-analysis
- set of connected input/output units i which each connection has a weight associated with it
AKA connectionist learning - connection between units
neural networks
a flowchart-like structure and a decision support tool that uses a model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
Consists of three types of nodes: decision nodes, chance nodes, end nodes
decision trees
identifies patterns from if/then statements. Statistical significance tests are used on the data
Rule Induction
a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
Algorithm [Show Less]