What is data science?
What do
works with?
demand?
Joins statistics and programming in applied settings
(1) Inclusive analysis
- Can work
... [Show More] with wide variety of data
Take unstructured data and find order and value in it
High demand for data science
- both specialists and generalists
Data science Venn Diagram?
[Hacking]:
- Gather and prepare data
- Creativity needed
[Math and Stats]:
- Choosing procedures to answer procedure
- Diagnose problems
- Develop and improve procedures
[Substantive Expertise]:
- What does the field you work in value? goals?
- Constraints?
[Machine learning]:
- "Black box" predictive models
- Put things in get value
>> no understanding og what happenedn
[Traditional research]:
- Data set continuDaity and structure
[Danger Zone?]:
- Unlikely to happen
Brainpower
Read More
Previous
Play
Next
Rewind 10 seconds
Move forward 10 seconds
Unmute
0:00
/
0:15
Full screen
Describe the data science pipeline
[1. Planning]:
- Define goals
- Organize resources
- Coordinate people
- Schedule project
[2. Data Preparation]:
- Get data
- Clean date (make it fit well into program)
>> checked for errors
- Explore data
- Refine data
>> choose cases to include, choose variables, create features
[3. Modeling]:
- Create model(s)
- Validate model(s)
- Evaluate model
>> accuracy
- Refine model
>> may make tweaks
[4. Follow up]:
- Present model
- Deploy model
- Revisit model
- Archive assets
=========================================
DS is not just technical
Contextual skills critical
Data science fosters diversity
Describe roles in data science
Engineer/Developer:
- Database administrators/developers
Big Data Specialist:
- Comp sci / mathematicians
Researcher:
- Statistical expertise
- Less back-end
Analysts:
- Web analytics
- SQL
- Data visualization
Buisssiness Exec:
- Manage projects
- Frame questions and solutions
- Speak data
Entrepreneur:
- Data-based startups
- Planning and execution
Full-stack data scientist:
- Can do all elements of data science
- Unicorns (so rare)
Describe data science teams
Recruit people based in needed skills
Can combine people with skills, additive, especially if one has weak areas which the other is strong, can compliment and learn
Can create unicorns
Data science vs big data
Venn diagram?
BD w/o DS?
DS w/o BD?
BDS?
Both fields overlap as big data science
But are unique fields
[Big data w/o data science]:
- Machine learning
- Word counts
^^ No real math or stats
^^ Many V's but little coding, stats etc.
[Data science w/o big data]:
- Genetics data
- Streaming sensor data
- Facial recognition
^^ Is missing V's either volume, variety etc...
[ Big data science]:
- All three V's
- Also need coding, statistics and domain experitise
=============
Any data with variety needs data science
But if only volume and velocity, may not be data science, just big data
Describe programming
GIving computer instructions to accomplish tasks
Popular programming languages differ from software used for data science
Many shared languages though
Data science has statistics though, so requires different tools
Describe statistics
Data science requires statistics
- But is not a subset of statistics
Both use data
Different motivations and goals and backgrounds/contexts
(1) Most data scientists are not trained as statisticians
- Often computer science/engineering
(2) Techniques
- Machine learning
- Big data
(3) Environment
- Work in various settings
Ecologically distinct, though many statisticians can do data science
Describe ethical issues
Security:
- Confidentiality
Anonymity:
- Remove identifiers
Copyright
- Getting online data risky
>> may violate copyright
(1) Potential bias
- Algorithms only as neutral as data given
(2) Overconfidence
- Computers oversimplify
>> need humans in the loop too
Describe metrics
Data science is goal directed
- Goal is specific and clear [Show Less]