Sqoop Errors

Every bigdata engineer at some point in time has to work with Apache Sqoop as one of the bridge for data connectivity between the RDBMS and the Hadoop environment. There are a number of reasons which can produce sqoop errors during import or export. Based on my working experience I am listing down a few sqoop errors along with their possible root causes and corresponding resolutions

OutOfMemoryError 

The error might look like this:

To resolve this exception you might have to tweak the settings for:

  1. mapreduce.map.memory.mb: Upper limit of the memory allocated by YARNto the mapper container.
  2. mapreduce.map.java.opts: Maximum memory to be allocated to the jvm process on mapper container. This memory is the subset of the one allocated to mapper (defined by mapreduce.map.memory.mb), so it should be a little less than the overall upper limit of the allocated memory for the mapper. You have to tweak both of these setting together by increasing both of these gradually until the issue is resolved. 
    Sqoop Errors Sqoop_Mapper_Memory
    Sqoop_Mapper_Memory
  3. –fetch-size: In smaller clusters generally there is much less liberty to give more memory to mappers, in this case the best bet is to control the fetch size from RDBMS and lower it gradually in decrements of 10%. In case you have wide table (200 or more columns) or in case the column sizes are huge then it would be difficult to accommodate huge number of records in the memory as part of a single batch. It would be better to reduce fetch-size value to reduce the batch size which would mean less number of records being sent in one batch to sqoop by RDBMS. Obviously you need to achieve a perfect balance between the resource consumption and performance which would need a little trial with these values (as well as number of mappers)

 

Unknown Format Error

This error comes when you try to load data in one particular format (avro/parquet) which is different from the format of the table which is already created. The simple most scenario is where you import the data in simple text input format without explicitly defining any file format (–as-parquetfile or –as-avrodatatfile)  and next time when you try to impose a new format on the same table then it complains by specifying the (unknown) format of the already created table. The best way is to drop the table before executing the sqoop import query. This should resolve such sqoop errors.

 

Hive Doesn’t support SQL Type

This error comes post data fetch in HDFS when as part of –hive-import construct, the data and its columns are mapped to the hive table but Hive as such is not able to find out the correct type for BLOB data types in RDBMS. The correct way to resolve this error is to explicitly map incoming BLOB datatype to Hive Binary data type using –map-column-hive PG_BINARY=Binary. For more details on the correct way to do this refer my earlier blog here.

 

Cannot convert SQL Type 

This one of the sqoop errors comes when file format has been specified without correctly casting the source data type to its corresponding data types in the target system (e.g. Hive). The simple example is the presence of a CLOB column in the incoming data which is to be stored in parquet format. Always remember to cast BLOBs and CLOBs to their corresponding types before using any file format. BLOBs have to be mapped to Binary data type via –map-column-hive  and CLOBs have to be mapped to String data type via –map-column-java. For more information refer my previous blog here.

 

No ResultSet method for Java type

This is again an issue with incorrect mapping for the source columns to their corresponding data types in the target system. A BLOB column when mapped to java Binary instead of hive binary throws this error use –map-column-hive BLOB_COLUMN=Binary to correctly import the BLOB data types.

 

I hope this would help you save some time while troubleshooting. Happy Sqooping !!! 

 

 

Leave a Reply

Your email address will not be published.