RAPIDS Manual

Table of contents

Introduction

Monitoring

How to Install RAPIDS
RAPIDS Configuration Files
RAPIDS Client Interface
RAPIDS Displays
Events and Metrics

How to Replay an Experiment

Troubleshooting

The RAPIDS API

Connecting to the Real-Time Stream of RAPIDS Events

Reporting bugs

Introduction

Center for the Collaborative Adaptive Sensing of the Atmosphere (CASA) develops networks of short-range radars that would allow to overcome effects of the Earth's curvature that prevent long-range radars from scanning lower parts of the atmosphere. The networks are built using a new approach, called Distributed Collaborative Adaptive Sensing (DCAS): a large number of small radars operate collaboratively within a dynamic information technology infrastructure, adapting to changing atmospheric conditions in a manner that meets competing end user needs.

It follows that the DCAS system is a "closed loop": the cycle starts at the radar nodes, where data is collected in real-time and sent to the System Operations Control Center (SOCC). At the SOCC the data is processed by a suite of meteorological algorithms in order to find hazardous weather features. Basing on the found features, a set of radar tasks is generated. Then the optimization module decides which tasks should be accomplished during the next radar scan and generates correspondent radar scanning parameters. The cycle ends when radars receive the parameters for the next scan.

The RAPIDS suite of software for monitoring, testing and validating real-time distributed systems has been developed to serve as a common platform for carrying out tasks that are crucial to the success of DCAS system development: Monitoring of the DCAS data flow and resource utilization

Figure 1: RAPIDS collects information about the data flow in the DCAS system: every time a radar product, meteorological feature, task or a control signal is sent/received, RAPIDS is being notified. Also, DCAS applications may report errors, send debug messages, notify when algorithms enter different stages etc. Finally, RAPIDS collects data about utilization of various system resources, such as CPU, memory, disk space and so on.

RAPIDS: Main Features

Back to top

Monitoring

RAPIDS consists of three components:
  1. Number of RAPIDS clients. Each client provides GUI for the system monitoring.
  2. Number of Local Monitoring Modules (LMMs). One LMM runs on each of the emulator machines employed in an experiment. The LMMs collect various events and metrics and send them to the RAPIDS server.
  3. RAPIDS server. Manages LMMs and forwards collected events and metrics to the clients. When started the server must receive a configuration file that describes the experiment: the LMMs will be set up according to the configuration file. The server is also responsible for shutting down all LMMs when the experiment is stopped.

How to install RAPIDS

  1. Configure SSH, so that you can log in to all the Emulator machines without having to type in a password. This is needed for starting Local Monitoring Modules (LMMs) remotely, as well as for launching processes under ldm and WDSSii logins. Here is what to do:
    1. Log in to emmy0 as yourself.
    2. Make sure that your home directory is not group-writeable.
    3. Go to the .ssh directory.
    4. Generate your public/private key pair for SSH protocol:

      ssh-keygen –t rsa

      Use default locations for the files; press return asked for a passphrase.

    5. Check if you have the authorized_keys file. If you don't, create it as follows:

      cat id_rsa.pub > authorized_keys

      If you do have the file, add the public key to it:

      cat id_rsa.pub >> authorized_keys

      This should enable you to log in to all the Emulator machines under your own login without providing a password.

    6. Finally, add your id_rsa.pub to the authorized_keys in ldm and WDSSii accounts on emmy5 and emmy1. This will enable you to start processes under these logins remotely without typing in passwords.
  2. Modify your .cshrc file, so that it would contain the following lines:

    setenv RAPIDS_HOME /usr/share/rapids
    setenv JAVA_HOME /usr/java/jdk1.5.0_01/
    setenv PATH .:${RAPIDS_HOME}/bin:${RAPIDS_HOME}/utils/bin:${JAVA_HOME}/bin:${PATH}
    setenv CLASSPATH ${HOME}/rapids_2.0/Server:${HOME}/rapids_2.0/Server/StreamAPI:${HOME}/rapids_2.0/Client:${HOME}/rapids_2.0/Client/sgt.jar:.
    setenv LD_LIBRARY_PATH ${HOME}/rapids_2.0/Server/server:${LD_LIBRARY_PATH}

    Source the edited .cshrc:

    source .cshrc

  3. Configure RAPIDS settings.
    1. In your home directory create a subdirectory .rapids.
    2. In .rapids directory create config file. This file will be used by the RAPIDS server to determine where to store the log files. Here’s an example of a config file:

      LOG_PATH = /nfs/erc/users2/abekkerm/rapidsLogs

    3. In .rapids directory create personal_settings file. This file will be used by the RAPIDS visualization client to determine a) the address of the RAPIDS server; b) where to read the log files from. Here’s an example of a personal_settings file:

      SERVER_ADDR = emmy0.casa.umass.edu
      LOG_PATH = /nfs/erc/users2/abekkerm/rapidsLogs

  4. Download and compile RAPIDS project.

    Note: RAPIDS project is stored in CASA SVN repository, so you'll need to have an account there in order to be able to download the code.

    1. Log in to emmy2.
    2. In your home directory create a subdirectory rapids_2.0.
    3. Make the initial check out of the RAPIDS project as follows:

      svn checkout http://server.casa.umass.edu:/repos/rapids/trunk/rapids_2.0

      The following files/directories will be created:

      • Server – RAPIDS server.
      • Client – RAPIDS visualization client.
      • Node – LMM.
      • EventsGenerator – test application that generates random LDM and WDSS events.
      • makefile

      If you need help to work with SVN, please, refer to the SVN manual:

      http://svnbook.red-bean.com/nightly/en/svn-book.html

    4. Log in to emmy0. Go to the rapids_2.0 directory.
    5. Compile the project (it'll take awhile):

      make all

  5. Run RAPIDS server and visualization client.
    1. Copy an example of RAPIDS configuration file:

      cp /usr/share/rapids/java/config/test_generator.rcf .

    2. According to this configuration file, RAPIDS will start two event generators – on emmy1 and emmy2. Also, CPU and memory utilization on these nodes will be monitored. Using your editor, adjust paths to the event generator application in the configuration file, so that your copy of the generator will be called when the RAPIDS server is started.
    3. Start RAPIDS server:

      cd rapids_2.0/Server
      ./proxy.pl path to the configuration file

    4. In another terminal start the visualization client:

      cd rapids_2.0/Client
      ./run_gui.pl

      When the main window is opened, go to the Monitoring menu item, choose Start monitoring.

    5. In order to stop the server, press Ctrl-C in the terminal where it’s running.
    6. Optionally, go to the Log menu item on the main RAPIDS GUI window. There is a number of actions you can perform on generated log files: use them to replay the experiment, calculate resource utilization statistics or simply convert the file to the ASCII format.

RAPIDS Configuration Files

RAPIDS configuration files describe experiment setups. A path to a configuration file should be given to the RAPIDS server, so that it can set up all nodes used in the experiment, start all applications etc. All shut-down procedures are be described in the configuration file as well. An example of a configuration file with some explanations can be found here.

RAPIDS Client Interface

The main menu contain the following items:

RAPIDS Displays

  • Message diagram. Shows Messaging and Application events as well as messages formed by these events in real time.

    The X-axis of the Message diagram shows the absolute time, while the Y-axis shows processes running on different system nodes. Messaging and Application events generated by a process P appear on the line that corresponds to P on the diagram. As time passes the diagram scrolls to show the current time frame.

    Message diagram

    Figure 2: RAPIDS message diagram showing 30 seconds system's cycle.

    • 19:21:48 Task generation module (featureRepo) sends a set of tasks to the optimization module (Optimization). These are tasks that should be completed during the upcoming cycle.
    • 19:21:48 - 19:21:51 Optimization module generates control signals and sends them to the radars.
    • 19:22:05, 19:22:07, 19:22:08 Radars (streamer_KRSP, streamer_KSAO etc.) send data to the SOCC. Note that streamer_KRSP, streamer_KLWE and streamer_KCYR have incomplete commands from the previous cycle at the time they receive new ones. For example, streamer_KLWE sends data at 19:21:53 finishing a scan according to the command received at the previous cycle.
    • Radar data is received at the SOCC and inserted into a linear buffer (lb_cat). Detection algorithms get the data out of the linear buffer (for example, lowRefCYR at 19:22:08) and process it. Detected meteorological features are sent to the task generation module (for example, at 19:22:09 lowRefCYR sends detected low reflectivity areas).
    • 19:22:17 Task generation module sends new set of tasks generated basing on the meteorological features detected during the cycle.

    Note that in the example above the detection algorithm generates application events (rather than messaging events) every time it sends detected features to the task generation module. Different types of application events are depicted on the diagram using different shapes: in our example detection algorithms throw simple debug events which are depicted as a vertical bars.

    Initially a 10 minutes long time interval is shown on the diagram but you can change it to 3 or 1 minute long one. In order to do it choose Time axis menu item on the diagram and then choose the desired resolution. Note that time axis ticks for 10 and 3 minutes long intervals correspond to minutes while for 1 minute long interval they correspond to seconds.

  • Event Details. Shows various attributes of the events on the Message Diagram, such as a specific message associated with the event, PID of the process that generated the event, the exact time when the event was generated etc. In order to obtain additional information about an event you should click the right mouse button while positioning the mouse on a dot that represents this event on the diagram.

    You may see that details of more than one event appear when you click on the diagram. It means that more than one event has been generated almost at the same time, and RAPIDS was unable to figure out which of them you have chosen. You may want to change the resolution of the diagram in order to obtain more precise results.

    Event Details

    Figure 3: Examples of attributes associated with various events:

    • An Application event generated by the detection algorithm lowrefCYR on 10/26/2006 at 15:22:54. The event has the following message attached to it: "Features sent".
    • Two LDM Messaging events generated by the streamer streamer_KCYR on 10/26/2006 at 15:23:07. Both events have the pqinsert type. The messages associated with the events clarify that Reflectivity and Velocity products have been sent.

  • RAPIDS Info.

    Shows various diagnostic messages as well as statuses of network connections (see Figure 4).

    RAPIDS Info

    Figure 4: This window is opened along with the main RAPIDS display; the first two messages notify the user that the RAPIDS client has successfully connected to the proxy server. The third message is printed when the active user starts an experiment. The rest of the messages show the statuses of network connections between system nodes.

  • System Metrics.

    Shows current values of various system metrics.

    System Metrics

    Figure 5: CPU and memory utilization on system nodes.

  • Windows that show color codes of monitored processes. These windows also show process metrics collected for the processes. The number of these windows is equal to the number of machines involved in the experiment.

Events and Metrics

RAPIDS collects information about radar data flow, utilization of the system resources, errors occurring in the system etc. All this information can be considered as a stream of events. An event can have one of the following types:
  • Messaging. Events of this type occur when data is being sent/received through one of the system's communication channels. Table 1 shows which communication channels are used for transferring different kinds of data.
     Data Type  Communication Channel
     Radar scans (NetCDF files)  LDM product queue
     Meteorological features  WDSS-II linear buffers
     Radar tasks  TCP connection
     Radar control signals  TCP connection
    Table 1.Communication channels used for transferring different types of DCAS data.

    When two Messaging events relate to the transferring of the same item between processes P1 and P2, they form a message from P1 to P2.

    Messaging events are depicted on the message diagram as rhombuses. Messages are depicted as arrows going from the sending event to the receiving event. The color of an arrow shows which communication channel was used for the transferring: red color corresponds to LDM product queues, green corresponds to WDSS-II linear buffers, while blue corresponds to TCP connections.

  • Application. Events of this type are generated by software modules running in the system. In contrast to Messaging events which RAPIDS collects transparently to the software modules, Application events require changes in the software's code in order to be seen by RAPIDS. RAPIDS API that can be used to generate Application events is described here. All Application events are shown on the message diagram. Here is the list of various Application events that can be generated:
    • Error. Different levels of importance (warning, error etc.) can be used to describe various faulty situations that might occur in the software. Events of this type are depicted on the message diagram as crosses.
    • Value of a variable. These events are generated when a current value of a specific variable in an application should be reported. Events of this type are depicted on the message diagram as five-pointed stars.
    • Algorithm stage. Events of this type should be generated to report various stages in an application's execution, such as entrance to a subroutine. Currently only two subtypes of this event are available:
      • Begin. Events of this type are depicted on the message diagram as small arrows pointing up.
      • End. Events of this type are depicted on the message diagram as small arrows pointing down.
    • Debug message. Events of this type can be generated to report any information that does not fall into the above categories. For example, some meteorological applications generate events of this type to report that detected features have been sent to the task generation module (see Figure 2). Events of this type are depicted on the message diagram as vertical bars.
  • Metric. RAPIDS allows real-time monitoring of utilization of various system resources, such as CPU, memory etc. In an experiment setup a user specifies which system resources should be monitored; RAPIDS periodically calculates utilization of these resources (according to the refresh rate specified by the user) and displays it on the system metrics display (see Figure 5). Every time RAPIDS calculates utilization of a specific system resource an event of Metric type occurs.

    Metric events are classified into System and Process events.

    • System. CPU, memory, workload over the last 1, 5 and 15 minutes: shown on the system metrics display (see Figure 5). Status of the network connections between the nodes: shown on the RAPIDS info display (see Figure 4).

      Note that the in order to monitor the network status RAPIDS starts small software probes on each system node in the beginning of an experiment. These probes periodically send "I'm alive" signals to one of the nodes, say S, who forwards them to RAPIDS. If such a signal stops coming from one of the nodes, say C, RAPIDS reports that connection between S and C has been broken. Obviously, this monitoring method creates a possibility of a false alarm: if the software probe on C crashes and stops sending signals to S, RAPIDS will be reporting a broken connection between S and C still the same.

    • Process.

Back to top


How to replay an experiment

All experiments are logged and log files can be used for replaying the experiments off-line. In order to replay an experiment do the following:
  1. Start the RAPIDS Client.

    >run_gui.pl

  2. Select the log file of the experiment you would like to replay. When the main RAPIDS window is opened choose the Log item from the menu and then the Replay... item. Choose the log file of the experiment you would like to replay. Press the Replay button.

    Note: The name of a log file starts with the prefix log followed by the date and time when this log file was created. For example, the following log file:

    log_02242005_172230

    was created on February 24th, 2005 at 17:22:30.

  3. Replay the experiment. In addition to standard RAPIDS windows described in Section “RAPIDS Displays” another window titled Replay functions will be opened. Replay functions include the following:

    • Play. Replay the experiment at the real time speed.
    • Pause. When the experiment is paused an additional time resolution (milliseconds) becomes available at the message diagram. See Magnify function for more details.
    • Stop.
    • Fast forward. Replay the experiment at the speed 5 time higher than the real time one.
    • Magnify. This function becomes available only when the experiment is paused. The function allows to view the message diagram at millisecond resolution. While being in this time mode you can scroll the diagram forward or backward using corresponding buttons on the tool bar at the bottom of the diagram (see Figure 6). 1 second resolution

      Figure 6: Message diagram that shows a one second long time frame. The diagram can be scrolled forward or backward using corresponding buttons on the tool bar at the bottom of the diagram.

      An Event Details display will be opened along with the diagram. Figure 7 shows an example of such a display with details of events that appear on the message diagram from Figure 6. Event details

      Figure 7: Event Details display that shows detailed descriptions of events that appear on the Message Diagram from Figure 6. For example, Messaging event generated by streamer_KCYR: radar control signals received on 10/26/2006 at 15:23:19. In the line providing details of this event there is a number that appear in brackets after the date/time when the event has occurred. This is merely the same date/time of the event occurrence shown in another format: as number of milliseconds since the standard base time known as "the epoch", namely January 1, 1970, 00:00:00 GMT. This number shows that this event occurred on the 993th millisecond of the 19th second which corresponds to the very first blue arrow on the diagram from Figure 6. Also, there are several Application events that have been generated by streamer_KCYR at the same time. They provide details for the received control signals, such as which sector and how many elevations should be scanned, how much time it will take to perform the scan etc.

  4. Stop replaying the experiment. Press Stop button in the “Replay functions” window.
If you prefer to view a log file in plain ASCII format you should first convert the file from a binary format it is being stored in. In order to do this start RAPIDS Client, go the Log item in the main menu and there choose Convert to ASCII... item. Choose the log file you would like to convert and press Convert button. Note that at the beginning of every log file there is a description of the setup of the logged experiment.

Back to top

Troubleshooting

Normally, when stopped, RAPIDS server removes all LMMs, who, in turn, take care of all necessary clean up. However, there might be a situation when an experiment has not been stopped properly. Often, this situation occurs when commands used to stop applications (provided by a user in the experiment configuration file) did not work properly. In this case you should stop the experiment manually as follows:
  1. Log in to all machines used in the experiment and stop all DCAS applications started during the experiment (streamers, detection algorithms, task generation and optimization module etc.)
  2. Log in to all machines used in the experiment as rapids and stop LMMs:

    >/usr/share/rapids/utils/bin/lmm_remover

  3. While logged in as rapids remove RAPIDS message queue - a data structure used for collecting events:

    >/usr/share/rapids/utils/bin/rmq_remover

Back to top

The RAPIDS API

Provided with the RAPIDS framework is a set of API functions that users can use in their applications to report different kinds of messages/events to the RAPIDS monitoring system. Below are links to the RAPIDS C, C++ and Java API.

Back to top

Connecting to the Real-Time Stream of RAPIDS Events

RAPIDS users can develop their own applications that process or visualize the data collected by RAPIDS. Provided with the RAPIDS framework is a rapidsStream package that can be used to connect to the real-time stream of RAPIDS events. RAPIDS visualization client and fault detector client are two applications that use this package. See RAPIDS installation instructions for an explanation on how to download the code for these clients as well as the package. The documentation on the rapidsStream package can be found here.

Back to top

Reporting Bugs

Please, send an e-mail to Anna (abekkerm@ecs.umass.edu) in case RAPIDS has crashed, you have found a bug or there is anything else you would like to tell :)

Back to top