Aug. 10, 2021
4 – 8:45 p.m. Coordinated Universal Time
This event has ended.
Storing, processing, and moving data in the cloud efficiently and cost-effectively is a must for working with today’s enormous datasets. Data lakes answered the problem of silos found in many data warehouses. But as the pendulum swings back, there’s a growing need for an additional solution that combines the strengths of both models, a need that’s led to the emergence of the data lakehouse. But with the number of data storage systems available, it can be hard to figure out which option is right for you.
In this event, you’ll gain insights on how to increase the scalability, speed, and availability of your data, along with best practices for utilizing your data warehouse, data lake, or data lakehouse. Join in to learn how to make the right decisions for your particular use case.
About the Strata Data Superstream Series: This four-part series of half-day online events gives attendees an overarching perspective of key topics that will help your organization maximize the business impact of your data.
What you’ll learn and how you can apply it
- Get an overview of the latest technologies for storing and managing your data
- Find out how Databricks is empowering data professionals with the open data lakehouse
- Learn how to design scalable, performant, and secure data lakes
- Discover the full analytic capabilities of a cloud data warehouse
- Find out how the lakehouse model can improve your data storage and analytics
- Learn how TigerGraph is delivering smarter AI with with graph databases, data lakes, and data warehouses
This live event is for you because…
- You need to know the latest trends in storing, processing, and managing data.
- You want to improve the scalability, speed, and availability of your data.
- You work with a variety of data sources that need to be pulled together and analyzed.
- You want to better understand the systems that you already use and learn how to take full advantage of their capabilities.
The timeframes are only estimates and may vary according to how the class is progressing.
Alistair Croll: Introduction (5 minutes) – 9:00am PT | 12:00pm ET | 4:00pm UTC/GMT
- Alistair Croll welcomes you to the Strata Data Superstream.
Chris Messina: Keynote—Lakes, Streams, Tags, and Emergence (20 minutes) – 9:05am PT | 12:05pm ET | 4:05pm UTC/GMT
- Humans build tools to fix their problems. As we’re awash in data, we’re changing how we think about information. The roots of today’s tech platforms go back to early organisms—IRC, Usenet—but regardless of how much we’ve branched out, the underlying patterns are the same. We’re constantly adding context, updating, and collaborating, and as we do so, fixed knowledge is replaced by a kind of strategic forgetting. In his keynote address, Chris Messina talks about why the hashtag is a tiny facet of a massive upheaval in how humans relate to information. From the end of hierarchies to the emergent properties of information platforms to thinking of data lakes as huge state machines, Chris will challenge you to see the data platforms you’re building in a broad new light.
- Chris Messina is best known for inventing the hashtag back in 2007, but he’s been living on—and defining—the future of social media and social technology ever since. He’s served on both developer platform and product design teams at Google and Uber, consulted for startups, and cofounded a Y Combinator-backed conversational AI company.
Michael Armbrust: Keynote—Emergence of Open Data Lakehouse Architecture for Analytics and ML (Sponsored by Databricks) (10 minutes) – 9:25am PT | 12:25pm ET | 4:25pm UTC/GMT
- In his keynote address, Michael Armbrust discusses the emergence of the open data lakehouse, an architecture built on top of data lakes that natively supports data warehousing and machine learning. You’ll learn about lakehouse architectural best practices using Delta Lake, an open source project that enables you to build a lakehouse architecture on top of existing cloud storage systems.
- Michael Armbrust is a Distinguished Engineer at Databricks, a committer and PMC member of Apache Spark, and the original creator of Spark SQL and Delta Lake. He leads the team at Databricks that designed and built Structured Streaming and Delta Lake. He holds a PhD from UC Berkeley, where he was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage, and query optimization.
- This session will be followed by a 30-minute Q&A in a breakout room. Stop by if you have more questions for Michael.
Rukmani Gopalan: Designing Scalable, Performant, and Secure Data Lakes for Your Enterprise (30 minutes) – 9:35am PT | 12:35pm ET | 4:35pm UTC/GMT
- With the rapid proliferation of data all around us, enterprises continue to rely on large volumes of data to gain critical insights to help inform and transform their business. A robust, scalable data lake strategy and infrastructure are key to achieving these valuable insights. Rukmani Gopalan takes you through the dos and don’ts of building enterprise data lakes. You’ll explore commonly used patterns, how to design for scale and performance, and various options to organize and secure your data lakes, providing the necessary framework to set you up for success.
- Rukmani Gopalan is a principal program manager on the Azure Data Lake Storage team at Microsoft, where she helps customers build their big data analytics solutions on Azure. Rukmani is an experienced product manager with deep technical skills and over 15 years of experience shipping a variety of products ranging from mobile and desktop client applications to large-scale enterprise-grade services and platforms.
Joyce Avila: Analyzing Data at Scale with a Cloud Data Warehouse (40 minutes) – 10:05am PT | 1:05pm ET | 5:05pm UTC/GMT
- Today’s organizations need to analyze massive-scale data and continuous and real-time data, including structured and semistructured data from multiple sources. Unlike traditional databases, cloud data warehouses can address these challenges through their ability to handle both current and historical data and their suitability for ad hoc and exploratory analysis (as well as custom reporting). Join Joyce Avila to learn how to effectively use a cloud data warehouse to analyze data at scale.
- Joyce Avila is a principal consultant for SpringML. She’s worked in software development, architecting and building custom software applications for a major financial institution, and has also conducted research at the UTA Robotics Institute. Joyce is a Snowflake Data Super Hero and produces a running series of YouTube videos to help viewers prepare for the Snowflake SnowPro Certification. She’s also a Certified Public Accountant in Texas. Joyce is currently authoring Snowflake: The Definitive Guide for O’Reilly.
- Break (5 minutes)
Harshida Patel: Utilizing the Lake House for Data Storage and Analytics (30 minutes) – 10:50am PT | 1:50pm ET | 5:50pm UTC/GMT
- As more companies adopt technical and strategic plans to be analytics driven, there’s a need to increase access to data and insights across the entire organization. Bringing together relevant data of all structures and types and from all sources fosters improved analytics and richer insights. In order to accomplish this, organizations must be able to easily move data between their data lakes and their purpose-built stores, such as data warehouses. But as data in these systems continues to grow, the task becomes harder. Enter the lake house—an approach that enables you to easily move data around to enable peak analytics. Harshida Patel covers the ins and outs of the lake house model and shows how to employ it to address your data storage and analytics challenges.
- Harshida Patel is a senior analytics specialist solution architect at Amazon Web Services, where she helps customers build scalable data lake and data warehousing applications using AWS analytical services. Harshida has over 15 years of experience architecting and building end-to-end data pipelines in the data management space. Previously, she worked in the insurance and telecommunication industries. She holds a master’s degree in electrical and telecommunication engineering.
Victor Lee: Delivering Smarter AI with Analytical Graph Databases, Data Lakes, and Data Warehouses (Sponsored by TigerGraph) (30 minutes) – 11:20am PT | 2:20pm ET | 6:20pm UTC/GMT
- Today’s analytical graph databases are taking organizations to another level by connecting all their data across data lakes and data warehouses, representing knowledge better, and obtaining answers to questions in real time. These benefits also extend to the world of machine learning and AI. Victor Lee demonstrates how graph databases and graph analytics, in conjunction with data lakes and warehouses, can deliver smarter AI, including unsupervised ML with graph algorithms, in-database ML techniques for graphs, and ML feature extraction and enrichment with graph patterns.
- Victor Lee is head of product strategy and developer relations at TigerGraph. He brings a strong academic background, decades of industry experience, and a commitment to quality and service. Victor was a circuit designer and technology transfer manager at Rambus before returning to school for his computer science PhD, where he focused on graph data mining. Before TigerGraph, he was a visiting professor at John Carroll University.
- This session will be followed by a 30-minute Q&A in a breakout room. Stop by if you have more questions for Victor.
- Break (5 minutes)
Barr Moses and Ryan Kearns: Data Observability—How to Build More Reliable Data Warehouses & Lakes – 11:55am PT | 2:55pm ET | 6:55pm UTC/GMT
- From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, data teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this “data downtime” from happening in the first place. The solution? Data observability. Join Barr Moses and Ryan Kearns as they pull back the curtain on this emerging new approach and show you how to “travel through time” to build your own data observability monitors for your data environment.
- Barr Moses is the CEO and co-founder of Monte Carlo, a data reliability company. In her decade-long career in data, Barr has served as commander of a data intelligence unit in the Israeli Air Force, a consultant at Bain & Company, and VP of Operations at Gainsight, where she built and led their data and analytics team. Barr has worked with hundreds of data teams struggling with data reliability problems and is building a product dedicated to identifying, resolving, and preventing what she calls “data downtime,” periods of time when data is missing, erroneous, or otherwise inaccurate.
- Ryan Kearns is a founding data scientist at Monte Carlo, where he develops machine learning algorithms for the company’s data observability platform. Together with Barr Moses, he instructed the first ever course on data observability and authored Monte Carlo’s Data Observability in Practice blog post series, the first tutorial on the subject using out-of-the-box SQL. He’s also studying computer science and philosophy at Stanford University. When not coding or philosophizing, you can usually find him on a run or planning his next road trip.
Paul Lacey: The Lakehouse—Connecting Data and Teams to Bridge the Information Gap (Sponsored by Matillion) (30 minutes) – 12:35pm PT | 3:35pm ET | 7:35pm UTC/GMT
- To compete and seize opportunities, businesses need to leverage both structured and unstructured data at scale. Data teams require a common streamlined method to access data, something that a separate cloud data warehouse and data lake isn’t optimized for. A data lakehouse, however, offers the best of both worlds, enabling data and teams to connect and deliver more business value faster. Join Paul Lacey to learn what the information gap is and how to bridge it, how the lakehouse helps future-proof your data-driven business, how the right data integration platform can simplify and accelerate data project creation, and what a successful lakehouse environment looks like.
- Paul Lacey is a senior director of product marketing at Matillion, where he helps customers make their data useful by leveraging best-in-class emerging technologies in the cloud. He has extensive experience with big data processing, analytics, and OCR data extraction technologies, having led both product marketing and engineering teams across several SaaS technology companies.
- This session will be followed by a 30-minute Q&A in a breakout room. Stop by if you have more questions for Paul.
- Break (5 minutes)
Alicia Moniz: Extending Data Pipelines – Strategies for Getting Data to the Cloud Quickly (30 minutes) – 1:10pm PT | 4:10pm ET | 8:10pm UTC/GMT
- Does it really have to take so long to move your data? Whether you’re moving your data warehouse to the cloud, bridging your data between multicloud data lakes, or staging your data for application development and analytics, there are tools that can help you reduce the overhead of extending your organization’s data pipelines to the cloud. Alicia Moniz walks you through hybrid design patterns for on-premises to on-cloud and multicloud architectures.
- Alicia Moniz is a Microsoft AI MVP and cloud solutions architect at Confluent. She’s active in the Microsoft User Group community and enjoys speaking about AI, SQL Server, and Kafka, particularly on Azure topics. Alicia serves on the leadership board for the Global AI Community and is an organizer for Global AI Bootcamp – Houston Edition.
Alistair Croll: Closing Remarks (5 minutes) – 1:40pm PT | 4:40pm ET | 8:40pm UTC/GMT
- Alistair Croll closes out today’s event.
Upcoming Strata Data Superstream events:
- Business Analysis – November 9, 2021
Alistair Croll is a best-selling author specializing in technology and business strategy. He cofounded Coradiant (acquired by BMC in 2011) and helped launch Rednod, CloudOps, Bitcurrent, Year One Labs, and other early-stage companies. He’s founded or chaired Cloud Connect, Bitnorth, the International Startup Festival, the O’Reilly Strata Data & AI Conference, and more. Alistair tries to mitigate chronic ADD by writing about far too many things at Solve for Interesting.
Download Strata Data Superstream Series – Data Warehouses, Data Lakes, and Data Lakehouses Free Tutorials Direct Links