Apache Sqoop was created to efficiently transfer bulk data between Hadoop and external structured datastores because databases are not easily accessible by Hadoop. To enhance its functionality, Sqoop needs to fulfill data integration use-cases as well as become easier to manage and operate. In this session, we will talk about how a new generation of Sqoop, known as Sqoop 2, addresses this.
Apache Sqoop was created to efficiently transfer big data between Hadoop related systems (such as HDFS, Hive, and HBase) and structured data stores (such as relational databases, data warehouses, and NoSQL systems). The popularity of Sqoop in enterprise systems confirms that Sqoop does bulk transfer admirably. In the meantime, we have encountered many new challenges that have outgrown the abilities of the current infrastructure. To fulfill more data integration use cases as well as become easier to manage and operate, a new generation of Sqoop, also known as Sqoop 2, addresses several key areas, including ease of use, ease of extension, and security. This session will talk about Sqoop 2 from both the development and operations perspectives.