What is Snowflake? (key features)
Analytic data warehouse
SaaS offering
- No hardware/installation/patching
- No ongoing maintenance/tuning
- Can't
... [Show More] run privately (on-prem/hosted)
Runs completely in the Cloud (AWS, Azure, GCP)
- Has its own VPC
Decoupled compute and storage (scaled compute does not need scaled storage)
Snowflake Pricing (what determines unit cost, pricing model, editions features)
Unit costs for Credits and data storage determined by Region (not Cloud platform)
Pricing model = on-demand or discounted upfront
Editions:
- Enterprise (or greater) => 90 days time-travel (default is 1 day) + materialized view + multi-cluster warehouse
- Business Critical => more security (HIPAA, SOC 1&2, PCI DSS)
- VPS Edition => own cloud service layer (not shared with accounts)
Snowflake Regions - Is multi-region account supported?
No multi-region accounts supported
- Each account can only be in one region
Multiple regions supported for each cloud provider
- AWS: 11 regions (US, EU, Asia-Pacific)
- Azure: 8 regions (US, EU, Asia-Pacific)
- GCP: 3 regions (US, EU)
Options to Connect to Snowflake
Web-based UI
Command-Line client (SnowSQL)
ODBC, JDBC (must download driver)
Native connectors (Python, Spark, Kafka)
Third-Party connectors (e.g., Matillion)
Other (Node.js, .Net)
Connect URL Pattern
https://.[.provider].snowflakecomputing.com
example for AWS:
https://pp12345.ap-southeast-2.snowflakecomputing.com
Snowflake Architecture Overview
Hybrid of shared-disk db and shared-nothing db
- Uses central data repo for persisted storage - all compute nodes have data access
- Uses MPP clusters to process queries - each node stores a portion of the data locally
Snowflake Architecture - 3 main Layers
Storage Layer
Query Processing
Cloud Services
Snowflake Architecture - Storage Layer
All data stored as internal, optimized, compressed columnar format (micro-partitions represents logical structure of table)
Can't access data directly, only through Snowflake
Snowflake Architecture - Query Processing
Where processing of query happens
Uses virtual warehouses - MPP compute cluster
- Each cluster is independent (doesn't share resources, with, doesn't impact other Virtual WHs)
Snowflake Architecture - Cloud Services
Command center of Snowflake
- Coordinates and ties all activities together within Snowflake
Gets provisioned by Snowflake and within cloud provider (AWS, Azure, GCP)
- Customer can't access directly or modify it
- Shared with other customers unless using VPS Edition
What does the Cloud Services Layer handle?
Authentication
Infrastructure Management
Metadata Management
Query parsing and optimization
Access control
What are caches used for?
Improve query performance and reduce cost
Types of Caches
Metadata Cache
- Improves compile time for queries against commonly used tables
Result Cache - Cloud Services Layer
- Holds query results
- Used when customer runs exact same query within 24 hours (does not require warehouse to be active)
Local Disk Cache or Warehouse Cache - Storage Layer
- Caches data used by SQL query in its local SSD and memory
- Improves query performance if same data was used (less time to fetch remotely)
- Cache is deleted when warehouse is suspended
Data Loading - File Locations Supported
Local
Cloud
- AWS (directly from S3 to Snowflake)
- Azure (from Blob Storage)
- GCP (from GCS)
Data Loading - File Types Supported
Structured
- Delimited files (CSV, TSV)
Semi-Structured
- JSON
- ORC
- Avro
- Parquet (can auto-detect if Snappy compressed) [Show Less]