Import BLOB and CLOB columns from Oracle into HDFS using Sqoop

Importing data into HDFS (hadoop distributed file system) from various supported RDBMS (relational database management systems) using Sqoop is one of initial most steps the tech community tried as an important ingestion mode. Undoubtedly this is still the critical tool for data ingestion pipelines now a days. Sqoop (SQL <=> Hadoop) is an extremely stable and reliable tool for data import as well export with almost all organizations in big data space using it in one way or the other.  And as it happens with all technologies there are a certain complex use cases with Sqoop ingestion for which there are limited resources for guidance. On top of this due to changing versions of involved tech stack sometime the backward compatibility can’t be maintained which means previously working code might not work after upgrading the versions. One such issue that I faced wherein I wanted to import BLOB and CLOB columns from Oracle into HDFS using Sqoop. 

Let me discuss in detail the options I tried, the reported errors and the final solution which helped me successfully  import  BLOB and CLOB columns from Oracle into HDFS using Sqoop. Assume the table schema in Oracle database be:

If you want to check yourself what are the corresponding data types in hive for columns in Oracle table, then let hive create the table in hive for us. The first option I tried was explicitly map BLOB and CLOB columns to java String type and let java handle it for us, the sqoop query in this case looked like:

And strangely as contrary to certain blog posts it didn’t work with an error stating that the getString method is not available for BLOB accessor class in the JDBC driver: 


Although hive is written in java itself but I thought of giving it a try with hive’s String type as well by changing –map-column-java to –map-column-hive. Also as you might be aware if you are using any of the below arguments in the sqoop query then you must specify –hive-import in the sqoop query along with handful of other options:


The sqoop import query looks like:


And without any doubt it worked like a charm which means hive has handled the BLOB and CLOB data types. And our issue is resolved!!! Great. But wait a minute – do you see any problem in this approach? There are two in-fact:

  1. We have imported a binary formatted table as String in hive which is not a correct way of data representation.
  2. We have imported CLOB in plain text which might have the field and the line delimiters/terminators as part of the data/value itself.


So we need to fix those issues, one way is to import the data as such and overlay schema on top of it. In this case the sqoop import would be as simple as: 

And if you are worried about the auto mapped data type of BLOB and CLOB then you can refer the java file generated the sqoop import which is like:

com.cloudera.sqoop.lib.* on CDH platforms otherwise you would see org.apache.sqoop.lib.* 

You can then map com.cloudera.sqoop.lib.BlobRef and com.cloudera.sqoop.lib.ClobRef to Hive Binary type in the table schema.


The simple and better option is to import it in some format like parquet to avoid data corruption due to field/record delimiters within the value and explicitly define the data type of these LOBs. The sqoop query to be used is:

If you note carefully BLOB has been mapped to hive Binary datatype and CLOB column has been mapped to Java String type.

You might get several different types of errors during sqoop import process and I have highlighted a couple of those in my next blog. 

I hope this would give you some inputs on resolving issues related to BLOB or CLOB column import from RDBMS. Do let me know in case you find it usable or have any concern.





One thought on “Import BLOB and CLOB columns from Oracle into HDFS using Sqoop

  1. Thanks for one marvelous posting! I enjoyed reading it; you are a great author. I will make sure to bookmark your blog and may come back someday. I want to encourage that you continue your great posts, have a nice weekend!

Leave a Reply

Your email address will not be published.