The technicalities 10 min read 1 May 2017

Sqoop and support for custom types

Author
TantusData

Company

Share
Share:
Postgress data

In this post you will become familiar with some more advanced Sqoop options. We will be discussing a very specific use case of Postgres database and a schema containing UUID and WKB types. These options are generic enough so stay tuned and keep reading! You might find them useful even if you are working with different database or some other data types. In our scenario we have a table definition:

Create table postgres_table ( 
    __id UUID,
    wkb geography(LINESTRING, 4326)
);

Nothing fancy here but let’s se what happens when we want to take data from HDFS and export it to Postgres. We run this sqoop command from inside Oozie:

<command>
    export --connect jdbc:postgresql://[postgres_host]:5432/testdb --table postgres_table -m 1 --export-dir /user/mszymaniuk/input --username mszymaniuk
</command>

We use a minimal set of parameters just to specify that we want to export data:

  • from /user/mszymaniuk/input directory in HDFS
  • to postgres_table in testdb database which is on <postgres_host> machine

Seems reasonable, right? But we do end up with these errors:

 6651 [uber-SubtaskRunner] ERROR org.apache.sqoop.orm.ClassWriter - Cannot resolve SQL type 1111
 6651 [uber-SubtaskRunner] ERROR org.apache.sqoop.orm.ClassWriter - Cannot resolve SQL type 1111
 6651 [uber-SubtaskRunner] ERROR org.apache.sqoop.orm.ClassWriter - No Java type for SQL type 1111 for column __id
 6651 [uber-SubtaskRunner] ERROR org.apache.sqoop.orm.ClassWriter - No Java type for SQL type 1111 for column wkb

So basically Sqoop is not aware of datatypes like UUID or WKB. Let’s just tell Sqoop to treat all data as Strings. The command below is slighly modified version of the original command. Here we are telling sqoop to interpret __id column and wkb column as Strings.

<command>
     export --verbose --connect jdbc:postgresql://[postgres_host]:6601/testdb --table postgres_table -m 1 --map-column-java __id=String,wkb=String --export-dir /user/mszymaniuk/input --username mszymaniuk
</command>

What happens when we run it? It looks promising:

INFO [uber-SubtaskRunner] org.apache.sqoop.orm.ClassWriter: Overriding type of column __id to String
INFO [uber-SubtaskRunner] org.apache.sqoop.orm.ClassWriter: Overriding type of column wkb to String

… but eventually it fails. With a Postgres exception this time:

Error: java.io.IOException: org.postgresql.util.PSQLException: ERROR: column "_id" is of type uuid but expression is of type character varying Hint: You will need to rewrite or cast the expression

Simply speaking, Postgres is complaining about a data type mismatch. Let Postgres decide about datatypes. We need to add one more thing: We have to tell the JDBC driver to send all strings untyped so Postgres can decide on the right type. We do that by passing:

?stringtype=unspecified

… to the Postgres connection string. The final version of the config would look like this:

<command>
     export --verbose --connect jdbc:postgresql://<postgres_host>:6601/testdb?stringtype=unspecified--table postgres_table -m 1 --map-column-java __id=String,wkb=String --export-dir /user/mszymaniuk/input --username mszymaniuk
 </command>

This technique enables you to export data types which are not supported by Sqoop out of the box. I just want to stress that even if you are using a database other than Postgres and you use data types other than mentioned above, it’s very likely that this technique can enable Sqoop to export data for you.

Share
Share: