Sqoop and support for custom types

In this post you will become familiar with some more advanced Sqoop options. We will be discussing a very specific usecase of Postgres database and a schema containing UUID and WKB types.

These options are generic enough so stay tuned and keep reading!

You might find them useful even if you are working with different database or some other data types.

In our scenario we have a table definition:

Nothing fancy here but let’s se what happens when we want to take data from HDFS and export it to Postgres. We run this sqoop command from inside Oozie:

We use a minimal set of parameters just to specify that we want to export data:

  • from /user/mszymaniuk/input directory in HDFS
  • to postgres_table in testdb database which is on <postgres_host> machine

Seems reasonable, right? But we end up with these errors:

So basically Sqoop is not aware of datatypes like UUID or WKB.

Let’s just tell Sqoop to treat all data as Strings

The command below is slighly modified version of the original command. Here we are telling sqoop to interpret __id column and wkb column as Strings.

What happens when we run it? It looks promising…:

…But eventually it fails. With a Postgres exception this time:

Simply speaking, postgres is complaining about a data type mismatch.

Let Postgres decide about datatypes.

We need to add one more thing: We have to tell the JDBC driver to send all strings untyped so Postgres can decide on the right type. We do that by passing

to the Postgres connection string. The final version of the config would look like this:

This technique enables you to export data types which are not supported by Sqoop out of the box. I just want to stress that even if you are using a database other than postgres and you use data types other than mentioned above, it’s very likely that this technique can enable Sqoop to export data for you.