Tuesday, 05 December 2006

Hi

A guy on the microsoft.public.biztalk.something newsgroup is having problems splitting an incoming flat file into several files. The dataformat he has is this:

HH06SGLN00084CR
31102006~06~Miss~ABD~DEF GHI ~ZZZZZ~F~31111111~~BD46A~233435~NUTRITION & FOOD SCIENCE~1~88888888~ AN ADDRESS SOMEWHERE~AB18ZZ~AAAA33333333T~04444.00~01200.00~00000.00~00000.00~F~Y~P~Y~N~N~N~
01112006~06~Miss~ASODIFN~DSFJ ~BSODIFJSDF~F~55555555~~Q190E~444444~ENGLISH & POPULAR CULTURE~1~66666666~ ANOTHER ADDRESS ~CF51NT~EEEE94857463H~01200.00~01200.00~00000.00~00000.00~F~Y~F~Y~Y~N~N~
TT000024

The first line is a header, with identifier "HH". The last line is the trailer with identifier "TT". In between are many body lines, one per line.

He would like to create schemas for the header, the body and the trailer and use them in a custom receive pipeline, using the flat file disassembler to split it up, so he gets one XML document per body line.

Unfortunately, this doesn't seem to be doable. What seems to happen is that when parsing the document, first the header schema is used. It describes a single line, and this line in the incoming file is then parsed. Then, the body schema is used for parsing the rest of the incoming document. Since the body lines don't have tag identifiers, though, it seems that BizTalk will continue to parse the document, and this includes parsing the last line, which is the trailer. BizTalk doesn't know when to stop parsing for body lines. Therefore, this error appears in the eventlog:

-- BEGIN ERROR
Source: "Flat file disassembler" Receive Port: "ReceiveFlatFile" URI: "C:\Projects\BTS 2006\NewsgroupHelp\BodyWithoutTagIdentifierFlatFile\Instances\In\*.txt" Reason: Unexpected data found while looking for:
'~'
The current definition being parsed is BodyRoot. The stream offset where the error occured is 404. The line number where the error occured is 4. The column where the error occured is 0.
-- END ERROR

Baiscally, the flat file disassembler can't find the ~ character on the last line, which isn't supposed to be there, since this line is the trailer. So BizTalk gives up and fails.

What I have come up with isn't actually pretty, but it does seem to work :-)

I have created a schema for the entire flat file. And I have created a schema for the entire flat file without the trailer. Then, I created a schema for the header, and a schema for the body. I have made heavily use of the flat file schema wizard, because there are many elements in the body lines :-)

Then, I created a map between the two main schemas, effectively removing the trailer from the input.

I also created three custom pipelines:

  1. A pipeline for receiving the complete flat file
  2. A pipeline for sending out the flat file without its trailer
  3. A pipeline for splitting the flat file without trailer into several body documents, using the header- and body schemas.

So the solution is:

Let BizTalk read in the complete flat file, and use a map on the receive (or send) port to convert it into the same structure without the trailer. Then output it to a file. Let another receive location pick the new file up, and use the pipeline with body- and header schemas to split it up into several documents.

Pitfalls are: Remember to use different combinations of rootnode and target namespaces for each schema. After copying a schema it is easy to forget to change it. Also, change the .NET typename of the schema after copying it. The compiler will remind you of that if you forget it, though :-)

I really wanted to not use a flat file for the intermediate step and use XML instead, but I couldn't get it to work. I would have to have a schema for the output and another schema for the input, which was an envelope with an "Any"-element inside it, and these two schemas would need to have the same rootnode and target namespace. So I dropped it, and stayed with the flat file, allthough I hated it :-)

My .NET project can be found here: BodyWithoutTagIdentifierFlatFile.zip (99,28 KB)

I hope this has been of some help. The conclusion basically is that it can't be done, splitting an incoming flat file up using header, body and trailer schemas, if the body doesn't have a tag identifier.

UPDATE on 5'th December 2006: Greg Forsythe has another solution to the problem, which he has written in the newsgroup. I quote:

-- BEGIN QUOTE

There is another way of doing this:
Create a document schema with an optional trailer record.
This will debatch each record, with the last document having a trailer
record. This can be removed/ignored in the first map

--  END QUOTE

I haven't tried it, but it makes sense, and why didn't I think of that?

--
eliasen

Tuesday, 05 December 2006 00:14:35 (Romance Standard Time, UTC+01:00)  #    Comments [0]  | 

Theme design by Jelle Druyts