Monday, July 14, 2014

Import a Flat file with Header and Detail Rows in SSIS

In this post, we will learn how to import a flat file which contains both Header and Detail Records with different formats in the database header & detail tables respectively.

Prerequisite
  • An input file (Input.txt) with header and detail records
  • Header & Detail Tables in the database
Content of Flat file (Input.txt)
H1100
D1050P1
D1050P2
H2200
D2100P3
D2100P4

Flat file Header/Detail Record Specs

Script for Header & Detail Tables
CREATE TABLE [dbo].[Header](
    [id] [varchar](50) NOT NULL,
    [amount] [int] NOT NULL
)
CREATE TABLE [dbo].[Detail](
    [id] [varchar](50) NOT NULL,
    [amount] [int] NOT NULL,
    [product] [char](100) NOT NULL
)

Step 1: Add a Data Flow Task to Control Flow.


Step 2: Add a Flat File Source to the Data Flow.


Step 3: Configure Flat File source.
  • Select file name (Input.txt)
  • Set Format as "Ragged Right"

Step 4: Add Script Component from Data Flow Transformations and connect Flat File Source to it


Step 5: Double click on Script component and Configure the Input Columns like below


Step 6: Configure Output under Inputs and Outputs tab like below as per Header & Details specifications.


Step 7: Click on Edit Script and write following code in the script window under Input0_ProcessInputRow method

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
    if (Row.Line.Substring(0, 1) == "H")
    {
        HeaderBuffer.AddRow();        
        HeaderBuffer.id = Row.Line.Substring(1, 1);
        HeaderBuffer.amount = Convert.ToInt16(Row.Line.Substring(2, 3));
    }
    else if (Row.Line.Substring(0, 1) == "D")
    {
        DetailBuffer.AddRow();
        DetailBuffer.id = Row.Line.Substring(1, 1);
        DetailBuffer.amount = Convert.ToInt16(Row.Line.Substring(2, 3));
        DetailBuffer.product = Row.Line.Substring(5, 2);
    }
}

Step 8: Add two OLE DB destination for header and detail and connect Header and Detail output of Script Component to each destination



Step 9: Configure OLE DB destination for Header & Detail like below




Step 10: Now let's execute the package



Step 11: Package is executed successfully. Now let's check for the data in the tables.

select * from Header
select * from Detail
OUTPUT

Thursday, March 20, 2014

Mutex Multithreading

In this post we will learn how to implement mutex multithreading. 

Let's take a case study to understand how and where to implement mutex and multi-threading at the same time.

Scenario
  • Schedule an application which would run very frequently (say 1 min).
  • Application has to execute two methods which take 2.5 mins and 0.5 min respectively.
  • Both methods are independent of each other.
Normal code without mutex and multithreading
using System;
using System.Threading;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Method1();
            Method2();
        }

        static void Method1()
        {
            for (int i = 0; i < 10; i++)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Method 1 : " + i.ToString());
            }
        }

        static void Method2()
        {
            for (int i = 0; i < 20; i++)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Method 2 : " + i.ToString());
            }
        }
    }
}
OUTPUT

First Problem: Method-2 takes only 0.5 min to execute but it is called only once Method-1 is executed i.e. after 2.5 mins
Solution: Implement multi-threading so that methods will be executed parallely

Code with multi-threading

static void Main(string[] args)
{
    Thread t1 = new Thread(new ThreadStart(Method1));
    t1.Start();
    Thread t2 = new Thread(new ThreadStart(Method2));
    t2.Start();
}
OUTPUT

Method-2 getting executed parallely without waiting for the first one to complete, thus, solving our first problem.

Next Problem: Application is taking 3 mins (2.5 + 0.5) to execute, but it is scheduled to run after every one minute, this might cause multiple instances of the application at the same time.

Solution: To prevent running multiple instances we can do either of the following 
  • While scheduling the application, select the option "Do not start a new instance"
  • Implement mutex in the application at application level
But only this would not solve our problem. Doing this way an application instance will run in every 3 mins even if it scheduled for 1 min causing delay in the execution of Method-2 which can be invoked every min as it takes less than a minute. So we have to implement mutex at method level so that application instance will run after 1 minute but will invoke only method(s) whose previous instance is not running.

Code with mutex and multi-threading
using System;
using System.Threading;

namespace ConsoleApplication1
{
    class Program
    {
        static Mutex M1;
        static Mutex M2;

        static void Main(string[] args)
        {
            bool b1, b2;
            M1 = new Mutex(false, "Method1", out b1);
            M2 = new Mutex(false, "Method2", out b2);
            if (b1)
            {
                Thread t1 = new Thread(new ThreadStart(Method1));
                t1.Start();
            }
            if (b2)
            {
                Thread t2 = new Thread(new ThreadStart(Method2));
                t2.Start();
            }
        }

        static void Method1()
        {
            M1.WaitOne();
            for (int i = 0; i < 150; i++)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Method 1 : " + i.ToString());
            }
            M1.ReleaseMutex();
            M1.Dispose();
        }

        static void Method2()
        {
            M2.WaitOne();
            for (int i = 0; i < 30; i++)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Method 2 : " + i.ToString());
            }
            M2.ReleaseMutex();
            M2.Dispose();
        }
    }
}

Let's schedule the application in Task Scheduler and check the outputs

Output on the execution of first instance after scheduling
Both methods are running parallely initially and after 30 secs only Method-1 is executing as Method-2 execution is finished

Second output on the execution of second instance (after 1 min)
First instance is still running and executing Method-1. Second instance executing only Method-2 as Method-1 is still executing in other instance as mutex prevented the execution of Method-1

Third output on the completion of second instance (after 1.5 mins)
Second Instance lasts for only 30 secs as it invoked only Method-2. First instance is still running and executing Method-1.

Saturday, November 9, 2013

Except and Intersect in sql

EXCEPT and INTERSECT operators were introduced in SQL Server 2005. Both operator works on two resultsets.

Basic rules for combining two resultsets
  • No. of columns in both resultset should be equal
  • The order of columns should be same
  • Corresponding data type of each column in each select should be compatible
Syntax
SELECT col1, col2 FROM tab1
EXCEPT
SELECT col1, col2 FROM tab2

EXCEPT returns distinct rows from the first resultset which not exists in second resultset.

INTERSECT returns rows which exists in both resultset

Example
declare @tab1 table (col1 int, col2 varchar(10))
insert into @tab1
select 1, 'val1' union all
select 1, 'val2' union all
select 2, 'val1'

declare @tab2 table (col1 int, col2 varchar(10))
insert into @tab2
select 1, 'val1' union all
select 1, 'val3' union all
select 3, 'val1'

select col1, col2 from @tab1
except
select col1, col2 from @tab2

select col1, col2 from @tab1
intersect
select col1, col2 from @tab2
OUTPUT 

Look at the output. 
  • First output using EXCEPT where we are getting 2 rows which exist in table 1 but not in table 2
  • Second output using INTERSECT where we are getting 1 row which exists in both tables.

Monday, October 14, 2013

row_number in sql

Row_Number feature was introduced in SQL Server 2005.

Row_Number generates a run-time column in the result set which generates sequential number to each row according to the column used in order by clause.

Syntax
Row_Number() over (order by col1, col2, ....n partition by col1, col2, ....n)
  • Order By is mandatory. Row Number is assigned in the result set based on the column provided in Order By clause.
  • Partition By is optional which groups the result set based on the column provided in Partition By clause wherein each group, the sequence starts with 1.
Let's create an Employee table on which we would see how it works.

create table tblEmployee(
    EmpId char(7)
    , FirstName varchar(50)
    , LastName varchar(50)
    , Gender char(1)
)

insert into tblEmployee
select 'EMP0001', 'Sandeep', 'Mittal', 'M' union all
select 'EMP0003', 'Abhay', 'Kumar', 'M' union all
select 'EMP0005', 'Priya', 'Gupta', 'F' union all
select 'EMP0002', 'Reema', 'Gupta', 'F' union all
select 'EMP0004', 'Ritesh', 'Kumar', 'M'

Now, we would generate a sequential number for all the employees using Row_Number

select  EmpId, FirstName, LastName, Gender
        , row_number() over (order by EmpId) as RowNum
from    tblEmployee
OUTPUT

Check out the result set where
RowNum is generated based on EmpId as provided in Order By

Multiple columns can be used in the Order By clause
Let's take an another example where we would use multiple columns in Order By clause 

select  EmpId, FirstName, LastName, Gender
        , row_number() over (order by FirstName, LastName) as RowNum
from    tblEmployee
OUTPUT

In the next example, we will see how
Partition By clause works 

select  EmpId, FirstName, LastName, Gender
        , row_number() over (partition by Gender order by EmpId) as RowNum
from    tblEmployee
OUTPUT

Check out the result set where
RowNum is generated starting from 1 for each group i.e. Gender which is provided in Partition By clause

In Partition By clause also, we can use multiple columns as in Order By clause