Data Acquisition
The Learning Wizard can handle two different sources for acquiring data
for the learning:
The plain text files must conform to a certain standard:
- Each line in the file must consist of a list of values
- The values must be separated by a separator, with the default
separator being ",".
- All lines must contain the same number of values
- The following symbols are accepted as missing values: "N/A",
"null", "<null>"
A sample data file could be:
field1,field2,field3
true,true,false
N/A,true,true
false,false,true
false,true,false
false,N/A,false
Using this datafile will result in a model with three nodes named:
field1, field2, and field3. Each node will contain the states: true,
false.
TThe specification of the text file is entered using the fields in the
text file pane:
The data file can either be typed manually in the text field, or the
"Browse" button can be used to locate the data file.
If the data file has an encoding other than the default "UTF-8" the correct
encoding should be selected in the encoding drop down box.
If the datafile does not contain the names of the fields, the "Read
node names from file" checkbox should be unchecked. This will cause
the Learning Wizard to treat the first line of the file in the same
way as any other line. Furthermore, the nodes will get generic names
(i.e., Field_n, where n = 0,..,m)./p>
If the fields in the text file is not separated by one of the
standard
separators (or if one or more of the standard separators are used in
the field values), the user should specify a separator:
This may actually be a set of separators (each character typed in the
separator specification field will be treated as a separator), but
this is not recommended.
Currently, the Learning Wizard can access data from the following
databases:
To access an Oracle 8i database, the Learning Wizard needs five pieces
of information:
- The host name of the host running the database server
- The port used for connecting to the database server (defaults to
1521)
- The SID (i.e., name) of the specific database
- The username for the user who wishes to connect to the database
- The password for the user
Once this information has been given, the "Next" button will be
enabled, and the Learning Wizard is ready to configure the tables which
are to be used for acquiring the data.
ODBC is an Application Programming Interface, which allows transparent
access to a wide range of databases.
For the Learning Wizard to access an ODBC database, it is required
that the database is registered with the ODBC manager. When this is
done, the only piece of information needed to access the database is
the name of the database.
To access any JDBC database, the Learning Wizard needs the following
information.
- The URL to the database. "jdbc
: DATABASE VENDOR :// HOST : PORT
NUM / DATABASE NAME" (i.e.
"jdbc:derby://localhost:1527/sampleDB")
- The username for the user who wishes to connect to the database
- The password for the user
- The path to the directory where the database's drivers are
located. This can be set by pressing the "Set Driver" button or in Menu
- Options - Preferences - Database Driver Directory.