AWS Announces Amazon DataZone

At AWS re:Invent, Amazon Web Services Inc. (AWS) announced Amazon DataZone, a new data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. With Amazon DataZone, administrators and data stewards who oversee an organization’s data assets can manage and govern access to data using fine-grained controls to ensure it is accessed with the right level of privileges and in the right context. Amazon DataZone makes it easy for engineers, data scientists, product managers, analysts, and business users to access data throughout an organization so they can discover, use, and collaborate with data to derive insights. To learn more, visit aws.amazon.com/datazone.

Organizations today collect petabytes, and even exabytes, of data spread across multiple departments, services, on-premises databases, and third-party sources (e.g., partner solutions and public datasets). Before organizations can unlock the full value of this data, administrators and data stewards (i.e., data producers) who generate and manage data need to make it accessible, while maintaining control and governance to ensure it can only be accessed by the right person and in the right context. Simultaneously, employees across the company (i.e., data consumers) want to discover and analyze information from data producers to drive their decision making. Organizations must balance the need for control, to ensure data remains secure, with the need for access, to drive new insights, but it is challenging to implement governance policies that take into account the variety of data, departments, and use cases across an organization. Some businesses build catalogs to curate their information, but these systems are time consuming to maintain, require data producers to manually label each dataset with additional context (e.g., origin and description) to make it discoverable, and lack built-in access controls to make governance simple. Organizations also struggle to enforce a consistent data taxonomy, and individual data producers must keep their own information in sync, which makes it hard to search for data across an organization and can lead to information becoming stale. Even if a data consumer finds the information they need, they do not have a simple way to request access from the owner directly from the catalog, to load the data into analytics services, and to collaborate with others. As a result, decision-makers cannot get the information they need in a timely manner, or they may make poor decisions based on incomplete or outdated data.

Amazon DataZone is a new data management service that makes it easier for data producers to manage and govern access to data and enables data consumers to discover, use, and collaborate on data to drive business insights. Data producers use Amazon DataZone’s web portal to set up their own business data catalog by defining their data taxonomy, configuring governance policies, and connecting to a range of AWS services (e.g., Amazon S3 and Amazon Redshift), partner solutions (e.g., Salesforce and ServiceNow), and on-premises systems. Amazon DataZone removes the heavy lifting of maintaining a catalog by using machine learning to collect and suggest metadata (e.g., origin and data type) for each dataset and by training on a customer’s taxonomy and preferences to improve over time. After the catalog is set up, data consumers can use the Amazon DataZone web portal to search and discover data assets, examine metadata for context, and request access to datasets. When a data consumer is ready to start analyzing data, they create an Amazon DataZone Data Project—a shared space in the web portal where users can pull in different datasets, share access with colleagues, and collaborate on analysis. Amazon DataZone is integrated with AWS analytics services, such as Amazon Redshift, Amazon Athena, and Amazon QuickSight, which enables data consumers to access these services in the context of their data project, so they do not need to manage separate login credentials and their data is automatically available in these services. Amazon DataZone also provides application programming interfaces (APIs) to integrate with custom solutions or partners like DataBricks, Snowflake, and Tableau, so customers can easily publish, search, and work with all their data assets.

“Good governance is the foundation that makes data accessible to the entire organization, but we often hear from customers that it is difficult to strike the right balance between making data discoverable and maintaining control,” said Swami Sivasubramanian, vice president of Databases, Analytics, and Machine Learning at AWS. “With Amazon DataZone, customers can use a single service that balances strong governance controls with streamlined access to make it easy to find, organize, and collaborate with data. Amazon DataZone sets data free across the organization, so every employee can help drive new insights to maximize its value.”

ENGIE is a global energy company with a focus on renewable energy and low-carbon distributed energy infrastructures to help its clients achieve their decarbonization targets. “At ENGIE, our key priorities are unifying data across our businesses and allowing data sharing to improve our performance and create value at scale. To address this, we first built a Common Data Hub (CDH) internally to solve this challenge to a great extent,” said Gregory Wolowiec, chief technology officer at Data@ENGIE. “Rather than building and maintaining a platform to support our data sharing and governance needs, over the last six months we have been working with the Amazon DataZone team, as a beta customer, providing input into creating an AWS native service and are looking forward to using Amazon DataZone to disseminate data throughout the organization and gain simplified access to AWS analytics services and governance tooling. This will empower our analysts and line-of-business-leaders to create innovative projects and make data-driven decisions. We are excited to integrate Amazon DataZone into our business operations to take advantage of its robust capabilities to enable data sharing and value creation with data at scale.”

Fox Corporation is a leading producer and distributor of content through its sports, news and entertainment brands. “At FOX, unifying data across our businesses and creating a trusted ability to securely discover, publish, access, and share the data at scale is critical. We want to enable business teams to be able to discover and share data securely and without needing to do deep technical work,” said Alex Tverdohleb, vice president of Data Infrastructure at Fox Corporation. “Amazon DataZone will help streamline and automate our data discovery and sharing—with the right governance—so we can ensure it is accessed at the right time and with the right tools.”

Itaú is a global financial services firm and the largest private sector financial institution in Latin America. “Being data-driven is one of our key corporate goals, but we have to constantly balance access to data with our governance and compliance policies across our use of AWS analytics services, which makes it hard for teams to move quickly,” said Roberto Figueira, head of Data and Analytics Engineering Platform at Itaú Unibanco. “We are excited to test Amazon DataZone because it will simplify data governance and make data access across business units much easier. With Amazon DataZone, we will be able to quickly and easily set up fine-grained access for teams of analysts, engineers, and data scientists to experiment with data hypotheses across various business use cases.”