Getting TaskUnzip to work with SQL Server Integration Services (SSIS) 2012

Background

It is a very common requirement in SSIS packages to extract contents of a compressed ZIP file, and then process that file in a data flow. From what I can see, it appears that TaskUnzip is quite a popular way to achieve this. So as part of the BI Aviation project I decided to use this for extracting the data from the ZIP files available in those datasets.

Using TaskUnzip

In my testing I used build 1.3.0.1 available for SQL 2008 (SSIS_TaskUnZip_1.3.0.1_SQL_2008.zip). I suspected it may not work in SQL 2012 as-is, but decided to give it a try. The setup instructions are included in the ZIP file, so I ran the Install_SQL_2008.cmd to install the library. That failed, because the references are to SQL 2008. So here is the corrected installation script:

net stop "SQL Server Integration Services 11.0"
gacutil.exe /u "ICSharpCode.SharpZipLib"
gacutil.exe /u TaskUnZip
copy "ICSharpCode.SharpZipLib.dll" "%ProgramFiles%Microsoft SQL Server110DTSTasks" /Y
copy "TaskUnZip.*" "%ProgramFiles%Microsoft SQL Server110DTSTasks" /Y
gacutil.exe /i "%ProgramFiles%Microsoft SQL Server110DTSTasksICSharpCode.SharpZipLib.dll"
gacutil.exe /i "%ProgramFiles%Microsoft SQL Server110DTSTasksTaskUnZip.dll"
net start "SQL Server Integration Services 11.0"
pause

The fixes are to change the references of 10.0 to 11.0 and 100 to 110, to suit the SQL 2012 version. You have to do similar corrections in the x86 version of the CMD script, because designers like the VS2010 based SSIS package editor are still 32-bit:

net stop "SQL Server Integration Services 11.0"
gacutil.exe /u "ICSharpCode.SharpZipLib"
gacutil.exe /u TaskUnZip
copy "ICSharpCode.SharpZipLib.dll" "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasks" /Y
copy "TaskUnZip.*" "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasks" /Y
gacutil.exe /i "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasksICSharpCode.SharpZipLib.dll"
gacutil.exe /i "%ProgramFiles(x86)%Microsoft SQL Server110DTSTasksTaskUnZip.dll"
net start "SQL Server Integration Services 11.0"
pause

At this stage, you would expect the TaskUnzip task should show up in the SSIS Toolbox, but if you have a fresh installation of SQL 2012, it will not show up!

Rebuilding TaskUnzip for SQL 2012

Start by downloading the sources (I used this link and clicked the ‘download’ link on the top) and opening it in Visual Studio 2010 SP1. When you rebuild this project, you get the following errors:

The type or namespace name ‘Dts’ does not exist in the namespace ‘Microsoft.SqlServer’ (are you missing an assembly reference?)
The type or namespace name ‘Task’ could not be found (are you missing a using directive or an assembly reference?)

(errors truncated for brevity). This is due to the namespace changes in DTS in SQL 2012. You will also find that the references to the older DLLs are broken:

image

We also need to change the .NET Framework version to 4.0 runtime (read more about this here):

image

Then remove the old references and add Microsoft.SqlServer.Dts.Design and Microsoft.SQLServer.ManagedDTS (though the namespace names are the same, the DLLs they reference are different, hence you need to add them again.)

After this is done, the DLL build should succeed.

Deploying the recompiled DLL

After this is done, if you uninstall and reinstall the newer DLL you might get the following error:

C:2008>gacutil.exe /i "C:Program FilesMicrosoft SQL Server110DTSTasksTaskUnZip.dll"
Microsoft (R) .NET Global Assembly Cache Utility.  Version 3.5.21022.8
Copyright (c) Microsoft Corporation.  All rights reserved.

Failure adding assembly to the cache:   This assembly is built by a runtime newer than the currently loaded runtime and cannot be loaded.

This is because the gacutil.exe which ships with the TaskUnzip library ZIP file is actually for the framework version 3.5. To fix this you need a gacutil for v4.0. You can find an updated gacutil at “c:Program Files (x86)Microsoft SDKsWindowsv7.0ABinNETFX 4.0 Tools” (this path is on your VS.NET box). Copy these to your deployment server and overwrite the version which shipped with the TaskUnzip version. More details on this step are here.

Conclusion

That’s it! By recompiling the DLL you can add it to the toolbox and then into your Control Flows in SSIS. We will be using this component to import our BI Aviation data, so stay tuned for more details on how that is done!

And in closing, please take a minute to leave your questions or comments in the space provided at the end of this post. I do appreciate your feedback!


Disclaimer: the mention of 3rd party or community products in these blog posts in no way constitutes a recommendation or advice. These are my personal opinion and do not reflect any of my employer’s opinions.

Aviation BI Project – Part 1: Requirements

Background

One of my hobbies is following the commercial aviation scene. My job requires me to fly around a lot, and that only sparks my curiosity further around how the commercial aviation business works. What better then, that to use this hobby as a motivation to test new features in SQL Server 2012? This hybrid interest, which I call Aviation BI, is the driver behind a series of blog posts, where I will take you through a typical lifecycle of a BI project:

  • Defining requirements
  • Solution design and technology selection
  • Cleansing and importing data into the data warehouse
  • Querying the data warehouse
  • Using a multidimensional / tabular model to obtain slice / dice insights
  • Visualizing those insights in eye-catching fashion

In the first blog post of this series, I will take you through what I (as an end user) see as possible requirements. We will also review the datasets available for consumption.

Requirements

There are some categories of requirements that we have, depending on the perspective we are talking about. A typical retail consumer would want to check some statistics:

  1. Given a particular sector, which airlines serve that sector?
  2. How is the on-time performance of each airline on the sector?
  3. Which airline has been providing the cheapest tickets on the sector?
  4. What is the average arrival delay for a particular airport? Is there a pattern based on time of day?
  5. What is the average load factor (how full is the aircraft) on the sector?

As an airline executive, I might want to look at:

  1. Which are the routes with the highest traffic?
  2. Which aircraft am I routinely having service difficulties with?

As an airport administrator, I might want to look at:

  1. For my airport, is the ramp time (the time taken to taxi from gate to runway) very high compared to other airports?
  2. What are the most common locations from which international passengers arrive to my airport?
  3. Which is the busiest time for arrivals and departures for my airport?

As an aviation analyst I may have the following queries:

  1. Which type of aircraft is used to make trans-Atlantic journeys?
  2. Which operators are still using DC-10 aircraft (DC-10s were largely phased out by major operators in the late 1990s)
  3. Did the Antonov 225 make any appearances in the US recently?
  4. What are the longest non-stop flights to the USA?

And so on… these perspectives are imaginary, but I suppose they are very typical of what people in those roles would be asking to see.

Data sets available

There are some amazing datasets available in the public domain. For our purposes, we will be using the following:

Data set Data Points Row count
BTS On-time Performance Origin, Destination airports
Departure delay
Arrival delay
Taxi times
Flight distance
Flight time

Data available: 1987 onwards
Granularity: flight

146,090,545
Airline Origin and Destination Survey (DB1B) Market Number of coupons for this market1
Origin, destination airports
Ticketing and operating carrier         
Market Fare         
Number of passengers
         
Data available: 1993 onwards
Granularity: individual ‘market’
346,513,372
Airline Origin and Destination Survey (DB1B) Ticket Number of coupons in itinerary
Origin airport
Round trip indicator
Miles flown
Ticket fare per person
Reporting carrier

Data available: 1993 onwards
Granularity: individual ticket

190,783,532
Airline Origin and Destination Survey (DB1B) Coupon Origin, destination airports
Number of passengers included
Fare class

Data available: 1993 onwards
Granularity: individual coupon

584,874,480
T-100 Segment Unique carrier code
Origin, destination airport
Aircraft type
Departures scheduled and performed
Load factor
Seats vs. passengers
Freight and mail
Distance
Ramp to ramp time
Air time

Data available: 1990 onwards
Granularity: aggregated flights (monthly)

7,347,563
T-100 Market 2 Unique carrier code
Origin, destination airport
Service class
Domestic / International indicator
Passengers
Freight and mail
Distance

Data available: 1990 onwards
Granularity:

5,994,306
FAA Service Difficulty Reporting (SDR) Data available: 1994 onwards
Granularity: each service difficulty report
 

1 A ‘market’ in DB1B terminology is a break in journey other than for changing planes.

2 The difference between ‘market’ and ‘segment’ in T-100 is explained at this link. The T-100 reporting guide is also useful to understand this dataset.

   

Next Steps

Over a series of blog posts, I will take you through how we:

  1. Cleanse and import this data into a data warehouse.
  2. Build some analysis services databases on top of the relational data. We will explore both multidimensional and tabular mode databases for this step and see where each fits in.
  3. We will then consume this data – through Reporting Services, Excel and PowerView clients.
  4. We will finally do an incremental loading of ‘new’ data into the warehouse, and follow the incremental processing of the Analysis Services databases and confirm if the reporting pieces refresh the new data.

All of this, towards answering the ‘customer’ requirements which we laid out earlier!

References

For those interested in reading further about airline statistics, here are some references:

So if you have some comments on the scope of this ‘project’, I would LOVE to hear from you! Do leave a comment below if you liked this idea and would like to see more.