This is documentation for Semarchy xDI 2024.2, which is no longer actively maintained.

For more information, see our Global Support and Maintenance Policy.

Removing an UTF BOM header

Some UTF file encodings have a special "BOM" sequence of characters (Byte Order Mark).

Example when viewing an UTF-8 with BOM file within an ISO-8859-1 editor:

John;Doe
Jane;Jackson

When reading such files, these first characters may be considered as part of the data.

In order to ignore them, a simple trick in Semarchy xDI is to add a transformation script to the File metadata, such as:

i=0;
j=0;
do
{
    i=__in__.read();
    j=j+1;
    /* return data only after third character */
    if(j>3)
    {
         if (i>-1)
         {
             ch=String.fromCharCode(i);
             i=ch.charCodeAt();
             __out__.write(i);
         }
    }
}while(i>-1);