True or false, sources of dat are becoming larger and more diverse - True, Billions or even trillions of data sources
What is the goal of data
... [Show More] processing? - To extract data that is useful
Why is the volume of data that is available so large? - Increasing number of data sources (social media, wearable tech, sensors, cameras, etc), formats, and data points
How much data is possibly generated in a day? - A petabyte (1 million GB)
What is scalable data processing? - Allows database processing systems to cope with the volume, velocity, and variety aspects that big data brings into the system
What are the different types of data processing systems? - Relational DBMS, NoSQL Graph/Document Key Value Stores, and Hadoop/Spark
What are the characteristics of a Relational DBMS? - Operational workload, presents entities and objects in the world using tables and relations between tables
What are the characteristics of a NoSQL Graph, Document, Key Value Stores? - Unstructured data, highly available systems. Runs queries that extract knowledge from the data.
What are the characteristics of Hadoop/Spark? - Not operational, for analytics over massive scale data
What is a database? - A very large, integrated collection of data that models real world enterprises using entities and relationships between those entities
What are the three goals of a DBMS? - To store, retrieve, and manage data
What are the benefits of using a DBMS? - Data independnce (don't need to know how data is organized), Efficient data access (indexes), data administration (one location where data is stored), concurrent access + crash recovery, data integrity + security, reduced app development time (don't have to worry about scalability of database or database performance)
What is a data model - collection of concepts for describing data. The mathematical equation that defines a relationship.
What is a schema? - A description of a particular collection of data in tables using a given data model. Description of data using tables in the relational model
What are the three levels of abstraction? - External Schema (Views), Conceptual Schema, and Physical Schema
What are the external schema (views)? - Describes how users see the data. Can have multiple views on top of a conceptual schema.
What is the conceptual schema? - Defines the logical structure of the relation. The attributes that make up the database. Only one conceptual schemal per relation.
What is the physical schema? - Describes the files and indexes used. How the data is sorted, stored, and indexed.
What is logical data independence? - Protection from changes in logical structure.
What is physical data independence? - Protection from physical structure of data.
Why are logical and physical data independence so beneficial? - The application that is accessing the database is not affected by changes in the logical structure or physical structure in the database thanks to the DBMS
What are the thee phases of Database Design? - "1: Requirement analysis (what users expect)
2: Conceptual database design (build entity relationship (ER) diagram)
3: Logical database design (convert ER design into a relational database schema."
What questions should be answered when defining the ER diagram? - What are the entities + relationships in the enterprise? What info about these entities and relationships should we store? What are the integrity constraints or business rules that hold?
What are entities? - Real world objects. Described using a set of attributes. Shown as an oval in an ER diagram.
What is an entity set? - Collection of similar entities. Each entity set has a key. Each attribute has a domain. All entities in a set have same number of attributes. Shown as a square in an ER diagram.
What is a relationship? - An association between entities that is uniquely identified by its entities. Shown as a diamond on an ER diagram.
What is a relationship set? - A collection of similar relationships.
What is a key constraint? - A constraint in a relationship that specifies the quantity of attribute to attribute relationships that are possible. Key constraints are shown with an underline in ER diagram. Key constraints can be one to one, one to many, or many to many.
What is a participant constraint? - The entity must participate at least once in a relationship. This is called total participation. Partial Participation is the default. For example an employee must work in a department. Represented as a bold line in ER diagram
What is a weak entity? - An entity that can be identified uniquely only by considering the primary key of an owner entity. The owner entity set and weak entity set must be one to one. Weak entity set must have total participation in identifying relationship set. Shown as an arrow on an ER diagram with the head of the pointing to the relationship and the tail at the weak entity. Example is insurance policy dependents have a one to one relationship with an employee keyed by the employee ssn and the dependents name.
What are class hierarchies? - class inheritance and generalization relationships between entities. Sub-classes inherit attributes from super class, and can also have unique attributes. Represented by a triangle in ER diagram
What is a covering constraint? - Determines whether the entites in a subclass collective include all superclass entities.
What is aggregation? - A relationship involving entity sets and a relationship set that allows us to treat a relationship set as an entity set for the purposes of participation in other relationships. Not used very often in database design.
What is a ternary relationship? - Adding a 3rd entity to a relationship in order to add additional information. Adds another identity to the relationship. Employee works in a department, duration is the ternary.
What is a relation made up of? - An instance ( a table with rows and columns) and a Schema (the name of the relation, name and type of each column)
What is the cardinality of a relation? - The number of rows in the table.
What is the degree of a relation? - The number of fields in the relation.
What is an integrity constraint? - A condition that must be true for any instance of the database. It is specified when the schema is defined, checked when relations are modified. A legal instance of a relation meets all integrity constraints. DBMS should not allow illegal instances.
What is a primary key constraint? - No two distinct tuples can have the same values in all key fields. IF there is more than one key for a relation, one of the keys is chosen by DB admin to be primary key. Can also be a combination of more than one key.
What is a Super Key? - a set of attributes within a table whose values can be used to uniquely identify a tuple [Show Less]