15 Distributed Approach to Neuroinformatic Data Interchange Šimon Řeřucha, Václav Přenosil Masarykova Univerzita, Brno 15.1 INTRODUCTION Despite the swift development on the field of engineering and artificial intelligence the machine control remains solely a human task. Unlike the machine, the human is much more influenced by the environment – for instance he or she is strongly liable to weariness (fatigue) or accidental loss of concentration. The fact the human operator still remains the weakest link in the human-machine interaction is caused by such factors. Typical representative of an operator in a human-machine system is a driver of a ground vehicle. The statistics has proven that significant amount of traffic accidents has been caused by insufficient attention paid to the traffic situation by the driver. One of the most common causes of decreased attention is fatigue and the probably most serious consequence is a micro sleep. The vigilance, the attention and the sleep has been already a focus of interest for neurologists for several decades. One of the most promising ways is searching for relations between humans’ biological state and his vigilance. Rather considerable amount of outputs from various research projects on this field has been collected so far, but the research is confronting some practical issues. The main data sources for such research are the outputs from experiments with human subjects that are often time-consuming and require complex technical equipment. Then the procedure to acquire an amount of data that is enough to perform relevant statistical analysis is often very difficult in comparison with the contribution of the experiment. The trivial solution is to define appropriate technique to store input data for further use. In this contribution we want to introduce a concept of new neuro-informatics database (called NIDB from now on), that will provide such functionality. This contribution is focused mainly on the specification of functional requirements posed on such system, but a preliminary model is also presented. The rest of this contribution is organized as follows: Section ‘Requirements’ describes several main aspects and specify the functional requirements from several points of view. Section ‘Preliminary model’ briefly introduces a proposed architecture of NIDB. Final chapter mentions other interesting contributions of NIDB and outlines future work on NIDB development. 15.2 REQUIREMENTS The NIDB is supposed to become a set of tools (also referred to as a platform) useful for collecting, management and interchange of a neuroinformatic data . The basic properties of this data are that it is heterogeneous, in large amounts and sensitive. Individual data items (records) are “bulky” in comparison with sizes of current storages and with transfer speeds of contemporary computer networks. The sensitivity means that the data is treated as personal, therefore is is protected by the law and a special care must be taken while manipulation with it. The reason to collect and organize the data is to share it among several research groups and enable a further research over it. 15.3 General requirements From the functional point of view the core of the NIDB will be a specialized Data Base Management System (DBMS). DBMS is an application layer between the physical storage of data and the client application that need to work with that data. For our purposes, a generic DBMS can be decomposed to following three functional layers: · methods of data storage and data management, · the DBMS functionality performed on the top of the stored data (called core functionality in this section), · an interface that allows clients to utilize the DBMS functionality. There are additional functionality the system must incorporate, especially due to the specific nature of the data (volume, sensitivity): · ability to work in a distributed environment, · access control. These individual components and features are discussed in following separate sections. Finally, NIDB must comprehend a set of accompanying applications. This could include: · the user interfaces, suited to the needs of the particular users, · the conversion tools to match the „standards“ used by the hardware manufacturers, · the interconnections with applications and software environments used for the research over managed data. 15.4 Core functionality The core functionality is a defined set of operations that the NIDB core is capable to perform. The requests are received from a client through the defined communication interface using defined protocol. The core executes a sequence of operations on the managed data and finally passes a response back to client. Required set of operation is similar to that performed by traditional (R)DBMSs. The basic tasks remain the same – we need the functionality for: · data definition and manipulation, · data retrieval. Since the managed data of same domain are expected to be stored with different internal structure (e.g. the HW vendors of EEG measuring facilities uses different techniques and formats to store data), such formats must be recognized by the core and the core must be able to provide data item in format that a client application requests. The most straight-forward and most convenient way is to design a modular architecture with an standardized interface to additional modules that will provide same functionality using different underlying protocols. Regarding the search operations, the main requirement is to provide a scale of search criteria as wide as possible. The real performance and efficiency is not a primary issue. However, the specific requirements for query operations are dependent on deeper analysis of data items and the relations among them. This goes beyond the scope of this text. There are another classes of operations beside these required for application processing, usually called „maintenance procedures“. Within NIDB core we need these: · configuration routines, · data integrity and consistency checks, · operational statistics and logging. These procedures are not crucial for the operation of NIDB core, but they are useful during operation. Finally, there are the aspects of access control and distributed working environments that are analyzed later in separate subsections. 15.5 Data management The data that NIDB will manage fall into following categories: · descriptive data (meta-data), · primary (measured) data, · secondary (derived) data. The set of „primary data“ consists of the data acquired from the experiments (e.g. EEG record, reaction time). The secondary data are the results of analyses over primary data (e.g. correlation of the reaction time and the EEG spectra). The meta-data are data about data within the meaning of DBMS terminology. This terminology has been established in order to avoid ambiguity in the documents regarding the NIDB and neurological experiments. There are two basic problems that need to be solved within the data management problematic: · the data definition and manipulation (e.g. storage, importing, describing), · the data extraction (searching, querying). The fundamental difference is that the first is needed to be robust and reliable so the database contain relevant data, the latter is needed to be fast and accurate in order to be of use. Regarding the data manipulation, NIDB will take care of two distinctive planes: · the basic data (both primary and secondary), · the relations among them. So far the basic data are usually stored within the data storage of individual workplaces. Considering the difficult manipulation with a large amount of data we want NIDB to respect present (file-based) structure of the users' data storages – in fact, we want to minimize the data transfers and movements. Another method of primary data storage is an export upon request from already existing information systems aimed at neurology and similar fields. Consequently, both these approaches require a methods to track the changes in the data and to check it's integrity. The demand on modularity and a possibility to choose proper method depending on local conditions is straightforward. It also assume a design of suitable interface between the „storage“ modules and the block body of NIDB core. Individual data items need to be supplied with some piece of information that is not included within the data item. Each data item must have some supplied “envelope”, that will hold the information like the origin of the item, the data-format used, etc. It must be ensured so that the “envelopes” are not easily counter-changeable. From the inner point of view, the items are individual records of some quantities in time (e.g. several tracks of EEG record). For simpler manipulation we need a possibility to append an arbitrary piece of information to individual track, time marker or a period of time. The piece of information could be represented e.g. by a textual note, reference to another item in NIDB. To achieve this, the information about the internal structure of the item must be present in the „envelope“. As a consequence of such approach we can omit the basic data from the discussion about data handling, because we expect them to be left in current state. We need just „describe“ them, so the data-handling problem reduces for meta-data. A relation can exist between two data items, between a data item and a particular piece of information within another or between two particular pieces information. We need to define a generic method that could handle all these types of relation. Such “relation” can be related to another item, piece of information or a relation as well. A typical example is a relation among primary data set and derived secondary set. The organization of the meta-data for efficient searching is a superstructure over the meta-data structure established for data manipulation. It will probably bring a demand for a data redundancy and the structure depends on the chosen search algorithm. This problematics is subject to further research. 15.6 Interfaces There are several classes of interfaces within NIDB mentioned so far: · user interfaces, · interfaces between modules within the NIDB core, · interface between NIDB core and client applications. The NIDB core will not provide any user-interface, the users will operate a client application that will communicate with NIDB core. The client applications are mentioned later in section „Client applications“. The design of interfaces between modules within the NIDB core is a concern of functional analysis. The only requirement so far is could be called „platform independence“ or „interoperability“ – we require to allow to develop modules in different environment than the operational part of NIDB core. The interface between NIDB core and client applications is required to be implemented using any of current „standard“ (e.g. XML, SQL) for data exchange. The reason is evident: Since the „standard“ facilities are supported in many SW development environments, it will simplify the implementation of client application. 15.7 NIDB in distributed environment The previous sections indicate that the NIDB is expected to provide a functionality for several (geographically distant) workplaces. The first aspect to resolve is to decide whether to design NIDB as a centralized system or as a specialized network of stand-alone nodes (distributed system, overlay network). The first approach is simpler to design and implement, but it is apparently inconvenient due to large volume of managed data. It would probably bring the technical problems (e.g. to establish and manage large data storage and connectivity) during implementation and operation and also limit the scalability. The latter approach yields more problems during the design and implementation process, however it has significant advantages. The most important aspect is that the data will be stored within node of their origin and will be transferred only if another node requests for them. There will be no need of central storage and the data will not be stored more than once unless requested. Since the amount of the data is supposed to be large, the minimization of data transfers will be a substantive benefit. There are another consequent convenient factors: · the distributed approach doesn't limit the scalability (as much as the centralized model), · it allows customization to local condition (on individual nodes), · it allows to avoid a single point of failure. We have already specified some functionality that we require from NIDB core. If we assume an instance of NIDB core to act as node within a collaborative network, we need to extend the functionality of NIDB core appropriately. It means to solve following problems concerning the communication among individual nodes: · the topology, · the communication protocol, · the functional requirements posed on the interconnection. 15.8 Access control It is necessary so that NIDB will incorporate a support for following three essential aspects of access control: · Authentication (and Identification), · Authorization, · Accounting. The user authentication is required for the identification and identity verification of the user for the purposes of authorization and accounting. Because of supposed use in different environment NIDB is expected to provide an user-based identification and authentication mechanism. The aim of authorization mechanism is to define which users are permitted to what with which data. Finally the accounting have to ensure that every modification in the managed data are clearly linked with an user who performed the operation. 15.9 Client applications The fundamental requirement posed on the user interfaces is to allow user to effectively utilize the functionality provided by system. The functional and behavioral requirements depend on the purposes of particular tool. At this moment we can just define few classes of applications that will be of use within NIDB: · the user interface that allow manage, import and modify the data, · the visualization tools, · the conversion tools, · the libraries that provide an interface for another development environments. 15.10 PRELIMINARY MODEL AND CONCEPTS This section briefly describe a preliminary model of NIDB that comply with the analysis and the requirements summarized in previous sections. Whole NIDB system will consist of separate collaborating nodes. These nodes will create a virtual network with nodes interconnected in peer-to-peer topology. Each particular node (called NIDB node) will implement the core functionality specified in section Requirements. Fig 1.: The preliminary NIDB node model overview The NIDB core will consist of several modules, where each module will provide a part of functionality – the proposed structure is shown on Figure 1. The central point will be the core module that will control the operation of entire NIDB node. Its main tasks will be launching and managing other modules and routing messages among them. The AAA module will take care of authentication, authorization and accounting. It will load necessary initial data from selected configuration module and log via one of logging modules. The set of interface modules will provide the standardized communication interfaces for client applications and another NIDB nodes. Such module will ask AAA module for user verification and route clients' requests to core, that will invoke proper storage modules. The set of storage modules will perform the operations over physically stored data using various back-ends (e.g. flat files, XML database, relational DBMS). The configuration and logging modules will enable the NIDB node operators to set up and manage the node itself. Both types will be able to load and/or store configuration parameters and log records using several ways (they can possibly use simplified storage modules). The nodes are supposed to communicate over IP network using a specialized protocol that will be based on interchange of XML messages (for example XMPP). The particular data items will be available either on-line or off-line (library-like approach). NIDB will use several cryptographic method for several purposes. A cryptographic hash function will be used to track the changes in the primary and secondary data and to preserve integrity and some digital signature scheme will be used to identify the origin of the data. The user authentication will utilize some public-key cryptosystem to verify the client identification. 15.11 CONCLUSION The contribution presents a preliminary model of new tool useful for research on the field of neuroinformatics – NIDB. The text is focused on discussion about fundamental aspects of such system and the conclusions are used to draft the architecture of NIDB system. The model of NIDB architecture is described on the plane of functional blocks and basic data flows. The development of NIDB will continue within the scope of CNNN research activities, currently supported by project ME 949 "The analysis of negative influences on driver drowsiness" in cooperation with other CNNN participants. References [1]... Bouchner P.: Driving simulators for HMI research (PhD. Thesis). Faculty of Transportation Sciences, CTU, Prague, 2007. [2]... Novak M., Faber J., Votruba Z.: Problem of Reliability in Interactions between Human Subject and Artificial Systems (First Book on Micro-Sleeps). Neural Network World – monographs edition. CTU & ICS AS CR, Prague, 2004. ISBN 80-903298-1-0. [3]... Novak M. - editor.: Neurodynamic and Neuroinformatics Studies (Second Book on Micro-Sleeps). Neural Network World – monographs edition. CTU & ICS AS CR, Prague, 2005. ISBN 80-903298-3-7. [4]... Novak M., Faber J., Tichy T., Kolda T.: Project of Micro-Sleep Base. Research Report No. LSS 112/01, CTU, Prague, 2001.