How to Prepare a Config file

  1. Create an ASCII data format file
  2. Create a config file base framework
  3. Edit the base framework to create a config file
  4. Syntax check the config file

  1. Create an ASCII data format file
  2. Starting from a suitable sample data file, create an ASCII data format file from which a base config file framework will be built. For example, a sample data file might look like this:
      # Last Name, First Name, Address Line 1, Address Line 2, City, State, Zip,... 
      Jones,Fred,Box 18,512 Maine St.,Hamburg,PA,16000,...
      Baker,Betty,1733 State Dr.,,Philmore,CT,06516,...  
    
    or like this:
      # Fruit Barrel Data file #1
      # 20020515
      # separator |
      # fields
      # ID|Date|Company|State|...|Tag|Item|Price|Unit
      097632|20020501|FruitCo|CA|...|FC|apples|3.95|pound
      064513|20020502|FruitCo|CA|...|FC|pears|.95|each
    

    Important - If you copy and paste, use care with tabs and spaces! Tabs paste as spaces!

    ASCII data format file

    An ASCII data format file consists of the following parts
    1. an optional comment block, all comment lines must begin with #
      DO NOT intersperse comments with formatting information below!
      Once the file format section starts, all # comments are considered
      descriptive of the current family!
    2. multiple family information; each family section looks like
            ==> example_filename <==
            # descriptive commentary
            fieldname1,fieldname2,fieldname3...
      
    Blank lines will be ignored.

    If you are lucky, you can generate this file by running head on your sample data files. The ==>...<== format is, in fact, chosen because this is what the output of running head produces.

    If you are less lucky, you'll need to create the file in a text editor, using the required format. Paste in the field names from the online documentation if available), from a sample file, or from whatever means you have at your disposal.

    Field names may break onto multiple lines; however, do not break in the middle of a "word". Don't forget to include the field separator between the field names! Do not start a line with the field separator (note that it is OK to end the line with a separator, or not, your choice).

    Example

    ==> sample0515 <==
    # Sample Contact Data
    
    First Name, Last Name, Address Line 1, Address Line 2, City, State, Zip, Phone, Fax, Mobile, Email, Company
    

  3. Create a config file base framework
  4. Once you have created the ASCII data format file, use write_config.pl to create a base framework for a config file.
    write_config.pl -S, -f dataformat_ffile > file-config.xml
    

    write_config.pl extracts as much information as possible from the data format file, including file name (to be converted to file name format), dscriptive comments, and field headers (what each field is). Defaults are used as appropriate when setting up the framework. The framework is in "OML" ("Ostensible Markup Language") format. OML is essentially XML format; we'll use OML and XML interchangeably.

    Note that the resulting config framework file is not ready to hand to a parser; it must first be viewed and edited by someone familiar with the family(s) it describes.

  5. Edit the base framework to create a config file
  6. Some fields in the generated framework file need to be created by hand. These are flagged by a mnemonic word, bracketed in #'s, e.g. #STRING#
     <family seq="0" family_id="#STRING#" > 
     ...
           <comment_char>#CHAR#</comment_char>
           ...
           <skip type="#TYPE#lines|until">#VALUE#</skip>
    
    In addition, the information for each generated field must be checked and confirmed. Be sure to refer to the Tag definitions and descriptions document when editing a new configuration file.

    There are three sections you must edit carefully or the Parser, IDMapper, and Loader routines will not be able to properly process your data.

    <parser> - Fields to be parsed

    In the parser section, one or more fields must be marked for extraction during parsing (set parse="1" to extract a field). Be sure to set var="value" (where value is both unique and mnemonic) at the same time. The value of var will be used as the variable name in which this field's data will be stored when parsed.

    <parser> Example

    <field pos="1"   header="Last Name"
        parse="1" tag="last"     />
    <field pos="2"   header="First Name"
        parse="1" tag="first"     />
    

    <loader> - Database Table Loading information

    In the loader section, create a set of configuration blocks to tell the Loader which database tables (and fields) should be loaded with the parsed data. Each block must start with a surrounding <table...> element with a name parameter; this specifies the data base table to be loaded, e.g.
    <table name="t_asset_family">
    
    Each block must also contain one or more <load...</load> elements describing which fields will be loaded (and with what data), e.g.
    <load type="string" global_tag="equity_dt">from_dt</load>
    
    Be sure to include a <load...</load> element for every field in the table (omit only fields which are not required to have data). There are no defaults in loading; all items must be specified. The content portion of the load element specifies the field to load; the parameters specify where the Loader will find the necessary data in the incoming data stream and how that data is to be formatted.

    <loader> Example

    The following example shows part of the the loader section of a sample configuration file. Be sure to refer to the loader section of the Tag definitions and descriptions document when editing a new configuration file.
    <loader>
       <table name="contacts">
          <load type="numeric" record_tag="contact">contact</load>
          <load type="string" global_tag="todays_date">todays_date</load>
          <load type="string" global_tag="family_id">family_id</load>
          <load type="string" field_tag="last">f_last</load>
          <load type="string" field_tag="first">f_first</load>
          ...
       </table>
    </loader>
    

  7. Syntax check the config file
  8. The parser has a syntax checking mode. If run with the -c flag, the Parser will only check the syntax of the named config file. In this mode, you can optionally specify a family (with -U); by default, the Parser will check the syntax of all familys it finds. Run
        Parser.pl -c -v -f configfile.xml
    
    Replace configfile.xml with the name of the config file to check.