In data science, what is meant by a "unicorn"?
A full-stack data scientist who can do everything at the professional level
In a data science team,
... [Show More] who is the person who generally frames the business-relevant questions and solutions?
The manager
The second group of steps in the data science pathway are known as "wrangling." What does wrangling mean in this context?
Getting data, cleaning data, exploring data, and refining data
The first step in the data science pathway is "define goals." Why is the best place to start a data science project?
Clarifying your project's goals up will help you at every step of the project pathway, from framing questions, to choosing data and algorithms, and interpreting and apply your results
Why is substantive/domain expertise important for data scientists?
It helps them to know what constitutes values in their field and to implement their insights
What is one of the rare qualities that creates such a high demand for data scientists?
The ability to find order, meaning, and value in unstructured data
Data science methods can contribute to business intelligence by which tasks?
Finding trends and anomalies in the data, collecting and cleaning data, and modeling outcomes (all of these answers)
According to the video, data visualization can be considered an example of what?
Data science without big data
The European Union's General Data Protection Regulation (GDPR) impacts the use of neural networks in what way?
The GDPR includes a "right to explanation" that may be difficult to meet with the frequency-opaque functioning of neural networks
According to the video, when processes are conducted by computers and shared directly with other machines, as with the Internet-of-Things, then the decisions can be described in what way?
Machine-centric
True or False? Data science companies can be fines for violating the European Union's General Data Protection Regulation (GDPR) even if they complied with their own country's privacy laws.
True
Self-generated data refers to what practice?
Programming computers to engage with themselves to create their own training data
While it is possible to gather vast amounts of data through passive collection, researchers still need to be concerned about representativeness. Why does this matter?
Without representative data from a wide range of respondents in diverse situations, the results will not generalize well
What is the principle of informed consent in research?
Potential research participants have to be given enough information about the goals, methods, and application of the research project so they can decide whether they want to participate
True or False? If a table or chart is publicly available, then it is also ethical to scrape the data for use in your data science project
False
What practice does "data scraping" refer to?
Data scraping refers to the process of extracting data from formats that were not specifically designed for data sharing
What kind of data can be accessed with APIs?
Both proprietary and open data
True or False? If data is unavailable on the Internet, then it is must be open data
False
What are the defining characteristics of open data?
Data that is free to use with no cost and no restrictions
What is potentially a major advantage of using in-house data?
You may be able to talk with the people who created the datasets
Which is a key characteristics of "tidy data"?
Each column represents a variable
What is the purpose of a "package" in a programming language like Python or R?
Packages are collection of code that give additional functionality to programming languages and simplify many common tasks
Statistical applications like SPSS or jamovi are useful to data projects in what way?
Their point-and-click interfaces make common analyses easier to non-specialists to conduct
According to the video, why are spreadsheets so important to data science?
They are the "universal data container" [Show Less]