What can data do?
- Describe the current state of an organization or process
- Detect Anomalous events
- Diagnose the causes of events and
... [Show More] behaviors
- Predict future events
Why is data science popular?
We are collecting more data than ever before
What are the 4 steps to data workflow?
- Data collection & Storage
- Data prep
-Explore and visualization
- Experimentation and prediction
What do we need for machine learning?
- A well defined question
- Set of example data
- A new set of data to use our algorithm for
What is the Internet of Things? (IoT)
refers to gadgets that aren't standard computers
ex. smart watches, internet connected home security systems
Describe Deep learning
- Many neurons working together
- Require much more training data
- Used to solve data intensive problems (ex. image classification or language understanding)
text summary and image classification are generally what type of problems?
Deep learning
Data engineers focus on what stage of data workflow?
Data collection & storage
Data engineers do what?
- build data pipelines and storage solutions
-maintain data access
Data analyst focus on what stage of data workflow?
Data prep and exploration/visualization
Data analyst do what?
- Preform simpler analyses that describes data
- Create reports and dashboards to summarize data
- Clean data for analysis
What tools do data engineers use?
They us SQL to store and organize
Java,Scala,Python to program languages to process data
Shell to command line automate and run tasks
What tools do data analyst use?
SQL to retrieve and aggregate data
Spreadsheets to prefer simple analysis
BI tools dashboard and visualizations
Data Scientists are
- Versed in statistical methods
- Run experiments and analyses for insights
-Traditional machine learning
Data scientist focus on what part of the data work flow?
data prep and exploration/visualization and experimentation/prediction
What tools do Data scientist use?
SQL
Python and/or R
Machine learning scientist are similar to data scientist except they have
machine learning specialization
What do machine learning scientist do?
- Predictions and extrapolations
- Classification
- Deep learning
Machine learning scientist focus on what part of the data workflow?
data prep and exploration/visualization and a strong focus on experimentation/prediction
What tools do Machine learning scientist use?
Python and/or R
spreadsheet tools are important for data analyst because...
they allow analyst to share their results with less data-savvy coworkers
Data engineers need to know SQL and Java
T or F
T
Data engineers need to know Python for prediction and modeling
T or F
F they use Python for data cleaning
Database tasks are best accomplished by who?
Data engineers
Data analyst Specialize in visualization
T or F
T
A data scientist van perform what type of modeling?
predictive
What are sources of data?
-Company Data
- Open Data
Describe Company Data
- Collected by companie
- Helps make data driven decisions
Describe Open Data
- Free, open data sources
- Can be used shared, and built on by anyone
Name sources of Company Data
- Web events
- Survey Data
- Customer Data
- Logistics Data
- Financial Transactions
What is captured in web data
- Event
- Time stamps
-User info
Name 2 sources of Open data
- Data APIs
- Public Records
What does API Stand for and what does it do
Public Data APIs
-Request data over the internet
Why do we care about data type?
- Storing data
-Visualization/story telling
Explain quantitative and qualitative data
Quan: Deals with numbers and can be measured
Qual: Deals with descriptions can be observed but not measured
What to consider when storing data?
- Location
- Data type
-Retrieval
Name the types of data storage and examples + where they are typically stored
- Unstructured
Ex. email, text, social media
Document Database
-Tabular
Ex. tables
Relational Database
What query language does document and relational databases use?
-Doc: NoSQL
-Rel: SQL [Show Less]