RAPIDS Manual
Table of contents
Introduction
Monitoring
- How to Install RAPIDS
- RAPIDS Configuration Files
- RAPIDS Client Interface
- RAPIDS Displays
- Events and Metrics
How to Replay an Experiment
Troubleshooting
The RAPIDS API
Connecting to the Real-Time Stream of RAPIDS Events
Reporting bugs
Introduction
Center for the Collaborative Adaptive Sensing of the Atmosphere (CASA) develops networks of short-range
radars that would allow to overcome effects of the Earth's curvature that prevent long-range radars
from scanning lower parts of the atmosphere.
The networks are built using a new approach, called Distributed Collaborative Adaptive Sensing (DCAS):
a large number of small radars operate collaboratively within a dynamic information
technology infrastructure, adapting to changing atmospheric conditions in a manner that
meets competing end user needs.
It follows that the DCAS system is a "closed loop": the cycle starts at the radar nodes, where data
is collected in real-time and sent to the System Operations Control Center (SOCC).
At the SOCC the data is processed by a suite of meteorological algorithms in order to find hazardous
weather features.
Basing on the found features, a set of radar tasks is generated.
Then the optimization module decides which tasks should be accomplished during the next radar scan and
generates correspondent radar scanning parameters.
The cycle ends when radars receive the parameters for the next scan.
The RAPIDS suite of software for monitoring, testing and validating real-time distributed systems
has been developed to serve as a common platform for carrying out tasks that are crucial to
the success of DCAS system development:
- Monitoring. Ability to monitor an extensive set of the system events (see Fig. 1).
A comprehensive graphical interface allows users to monitor the behavior and health of the real system.
- Integration and test. Software modules developed by various thrusts (primarily distributing,
prediction and end-user thrusts) can be integrated and tested for their interoperability.
- Verification/validation. A first-level validation of system goals can be carried out, before the
system is deployed in the field.
- Experimentation. Various “what-if” scenarios can be evaluated to help answer important futuristic
questions:
- How will the system scale up?
- How will the system react to workload increases or node failures?
Figure 1: RAPIDS collects information about the data flow in the DCAS system: every time
a radar product, meteorological feature, task or a control signal is sent/received, RAPIDS is being
notified. Also, DCAS applications may report errors, send debug messages, notify when algorithms
enter different stages etc. Finally, RAPIDS collects data about utilization of various system
resources, such as CPU, memory, disk space and so on.
RAPIDS: Main Features
- Detailed monitoring of various events occurring in the system as well as system performance.
- Dynamic control over the following monitoring parameters:
- Set of collected system events and metrics.
- Frequency of collection of the events and metrics.
- Change in the value of a metric that would trigger reporting this new value to the user.
For example, if the user sets this parameter for the "CPU Utilization" metric to 10%, the value of the
metric will be updated only when it drops or rises by 10% or more.
All intermediate values of this metric will not be reported to the user.
- Comprehensive visualization of collected events and metrics:
- Message diagram showing flow of the radar data and control signals.
- Execution status and timing information for various algorithms.
- Errors occurring in the system.
- Utilization of various system resources (CPU, memory, disk space, network).
- Remote operation. Users can connect remotely to the emulator to run/watch experiments.
- Complete logging and play-back using RAPIDS GUI.
Back to top
Monitoring
RAPIDS consists of three components:
- Number of RAPIDS clients. Each client provides GUI for the system monitoring.
- Number of Local Monitoring Modules (LMMs). One LMM runs on each of the emulator machines employed in an experiment.
The LMMs collect various events and metrics and send them to the RAPIDS server.
- RAPIDS server. Manages LMMs and forwards collected events and metrics to the clients.
When started the server must receive a configuration file that describes the experiment: the LMMs
will be set up according to the configuration file.
The server is also responsible for shutting down all LMMs when the experiment is stopped.
How to install RAPIDS
- Configure SSH, so that you can log in to all the Emulator machines without having to type in a password.
This is needed for starting Local Monitoring Modules (LMMs) remotely, as well as for launching processes
under ldm and WDSSii logins. Here is what to do:
- Log in to emmy0 as yourself.
- Make sure that your home directory is not group-writeable.
- Go to the .ssh directory.
- Generate your public/private key pair for SSH protocol:
ssh-keygen –t rsa
Use default locations for the files; press return asked for a passphrase.
- Check if you have the authorized_keys file. If you don't, create it as follows:
cat id_rsa.pub > authorized_keys
If you do have the file, add the public key to it:
cat id_rsa.pub >> authorized_keys
This should enable you to log in to all the Emulator machines under your own login without providing a password.
- Finally, add your id_rsa.pub to the authorized_keys in
ldm and WDSSii accounts on emmy5 and
emmy1.
This will enable you to start processes under these logins remotely without typing in passwords.
- Modify your .cshrc file, so that it would contain the following lines:
setenv RAPIDS_HOME /usr/share/rapids
setenv JAVA_HOME /usr/java/jdk1.5.0_01/
setenv PATH .:${RAPIDS_HOME}/bin:${RAPIDS_HOME}/utils/bin:${JAVA_HOME}/bin:${PATH}
setenv CLASSPATH ${HOME}/rapids_2.0/Server:${HOME}/rapids_2.0/Server/StreamAPI:${HOME}/rapids_2.0/Client:${HOME}/rapids_2.0/Client/sgt.jar:.
setenv LD_LIBRARY_PATH ${HOME}/rapids_2.0/Server/server:${LD_LIBRARY_PATH}
Source the edited .cshrc:
source .cshrc
- Configure RAPIDS settings.
- In your home directory create a subdirectory .rapids.
- In .rapids directory create config file.
This file will be used by the RAPIDS server to determine where to store the log files.
Here’s an example of a config file:
LOG_PATH = /nfs/erc/users2/abekkerm/rapidsLogs
- In .rapids directory create personal_settings file.
This file will be used by the RAPIDS visualization client to determine a) the address of the RAPIDS server;
b) where to read the log files from.
Here’s an example of a personal_settings file:
SERVER_ADDR = emmy0.casa.umass.edu
LOG_PATH = /nfs/erc/users2/abekkerm/rapidsLogs
Download and compile RAPIDS project.
Note: RAPIDS project is stored in CASA SVN repository, so you'll need to have an account there
in order to be able to download the code.
- Log in to emmy2.
- In your home directory create a subdirectory rapids_2.0.
- Make the initial check out of the RAPIDS project as follows:
svn checkout http://server.casa.umass.edu:/repos/rapids/trunk/rapids_2.0
The following files/directories will be created:
- Server – RAPIDS server.
- Client – RAPIDS visualization client.
- Node – LMM.
- EventsGenerator – test application that generates random LDM and WDSS events.
- makefile
If you need help to work with SVN, please, refer to the SVN manual:
http://svnbook.red-bean.com/nightly/en/svn-book.html
- Log in to emmy0.
Go to the rapids_2.0 directory.
- Compile the project (it'll take awhile):
make all
Run RAPIDS server and visualization client.
- Copy an example of RAPIDS configuration file:
cp /usr/share/rapids/java/config/test_generator.rcf .
- According to this configuration file, RAPIDS will start two event generators – on emmy1
and emmy2.
Also, CPU and memory utilization on these nodes will be monitored.
Using your editor, adjust paths to the event generator application in the configuration file,
so that your copy of the generator will be called when the RAPIDS server is started.
- Start RAPIDS server:
cd rapids_2.0/Server
./proxy.pl path to the configuration file
- In another terminal start the visualization client:
cd rapids_2.0/Client
./run_gui.pl
When the main window is opened, go to the Monitoring menu item, choose Start monitoring.
- In order to stop the server, press Ctrl-C in the terminal where it’s running.
- Optionally, go to the Log menu item on the main RAPIDS GUI window.
There is a number of actions you can perform on generated log files: use them to replay the experiment,
calculate resource utilization statistics or simply convert the file to the ASCII format.
RAPIDS Configuration Files
RAPIDS configuration files describe experiment setups.
A path to a configuration file should be given to the RAPIDS server, so that it can set up all nodes used in
the experiment, start all applications etc.
All shut-down procedures are be described in the configuration file as well.
An example of a configuration file with some explanations can be found here.
RAPIDS Client Interface
The main menu contain the following items:
- Monitoring
- Start monitoring
Connect to the RAPIDS server and start receiving the stream of events.
At this point a number of RAPIDS displays will be opened.
- View current settings
Sometimes it is important to view the monitoring setup that is being currently used.
If you choose this item, the monitoring setup tabbed panel that shows the current setup will be opened.
Note that while viewing the setup you are not allowed to change it.
Go to the "Change Current Settings" menu item in order to modify the
current setup.
The tabbed panel that shows the currently used monitoring setup includes the following sections:
In order to view commands used to start processes on the node,
say, SOCC2 you should choose this node from a drop-down list at the top of the pane titled
Processes running on a node and then press Show start commands button.
A window that contains a list of commands will be opened (the nickname of the node, that is SOCC2,
will appear in the title of the window).
Similarly, if you would like to see commands used to stop processes on the node SOCC2 you should
choose SOCC2 and press Show stop commands button.
Note:
1. Meanwhile, there exists only one method to execute commands on different nodes in a certain order.
For example, Rotation algorithm should be started only after feature repository and
optimization algorithms have been already started.
This can be achieved by inserting sleeps where it is necessary:
in the example above - before starting feature repository and optimization algorithms.
2. A script called killer.pl is used for stopping processes.
This script receives the name of the process which should be killed, finds PIDs of
all processes that have this name and kills them.
Network.
Settings for the network simulation: packet delays and/or drop rates can be simulated on network
connections between Sensor nodes and the SOCC.
Monitoring.
Metrics and processes that will be monitored.
In order to view system metrics (such as CPU or memory utilization) that are
collected on the node SOCC2, for example, you should choose this node from a drop-down list
at the top of the Node settings pane and then press System... button.
Similarly, if you would like to view process metrics collected on SOCC2 and processes monitored on this
node you should choose SOCC2 and press Processes... button.
When you press Processes... button, a window that contains a list of all monitored processes
on a certain node will be opened.
All these processes will appear in the Message diagram.
If you choose a process and press Show details... button you will see various settings for this
process (such the color code of its events, metrics collected for the process etc.)
On the Node settings pane you will also find a Refresh rate spinner.
It shows how frequently the monitoring information is being updated.
For more details on events and metrics available for monitoring see the
"Events and Metrics" section.
Note: It is possible to specify in the experiment configuration file processes that you would like to monitor
and that will not appear on the Message diagram.
These processes and the metrics collected for them will appear on
displays showing process metrics.
However, the current version of GUI shows only processes
that appear on the Message diagram.
General.
Creation and modification dates for the currently viewed configuration, nickname of the configuration etc.
Change Current Settings.
This feature has not been implemented in the current version of RAPIDS.
Log.
- Replay...
Allows to view log files using RAPIDS displays.
More information on replay feature can be found here.
- Convert to ASCII...
RAPIDS log files are stored in a binary format.
In some cases, however, a conversion into the plain ASCII format is required (for example, if statistical
processing of collected events and metrics should be performed).
This feature allows to convert a chosen log file into ASCII format.
The textual file will have the same name as the source log file and the "txt" extension.
It will be placed into the same directory where all log files are located.
- Resource utilization...
- CPU utilization.
Mean and standard deviation will be calculated.
Histogram seeds will be printed that show how much time (as a percentage of the total recorded time) a
specific percentage of the CPU has been utilized.
Also, high activity periods will be printed: periods of time when the CPU has been utilized more
then 40%.
And finally, data files that can be used by gnuplot to create CPU utilization graphs will be
generated.
- Memory utilization.
Data files that can be used by gnuplot to create RAM and swap space utilization graphs will be
generated.
- LDM transfer times.
Data files that can be used by gnuplot to create LDM transfer time histograms will be
generated.
The histograms will show how many LDM messages were sent for a specific amount of time.
All delayed messages will be printed separately.
A message is considered delayed if it took more than one minute to transfer it.
Window.
A checkbox list of all currently available RAPIDS displays.
Sometimes a user might want to close RAPIDS displays; for example, if the user is not currently interested in
the information being shown on a display, or just would like to avoid cluttering the screen.
When the user closes a display, the correspondent item in the list of all RAPIDS displays will be unchecked.
The closed display can be restored by checking it back in the list.
Exit.
Exit RAPIDS client (alternatively press Ctrl-X).
Note that exiting the client does not mean that the experiment will be stopped (if there is one that is currently running).
If you would like to stop the experiment, you should stop the RAPIDS server.
Help.
Show this manual.
This feature has not been implemented in the current version of RAPIDS.
RAPIDS Displays
Message diagram.
Shows Messaging and Application events as well as messages formed by these events in
real time.
The X-axis of the Message diagram shows the absolute time, while the Y-axis shows processes running
on different system nodes.
Messaging and Application events generated by a process P appear on the line that corresponds to
P on the diagram.
As time passes the diagram scrolls to show the current time frame.
Figure 2: RAPIDS message diagram showing 30 seconds system's cycle.
- 19:21:48 Task generation module (featureRepo) sends a set of tasks to the optimization
module (Optimization).
These are tasks that should be completed during the upcoming cycle.
- 19:21:48 - 19:21:51 Optimization module generates control signals and sends them to the radars.
- 19:22:05, 19:22:07, 19:22:08 Radars (streamer_KRSP, streamer_KSAO etc.) send data to the SOCC.
Note that streamer_KRSP, streamer_KLWE and streamer_KCYR have incomplete commands
from the previous cycle at the time they receive new ones.
For example, streamer_KLWE sends data at 19:21:53 finishing a scan according to the command
received at the previous cycle.
- Radar data is received at the SOCC and inserted into a linear buffer (lb_cat).
Detection algorithms get the data out of the linear buffer (for example, lowRefCYR at 19:22:08)
and process it.
Detected meteorological features are sent to the task generation module (for example,
at 19:22:09 lowRefCYR sends detected low reflectivity areas).
- 19:22:17 Task generation module sends new set of tasks generated basing on the meteorological
features detected during the cycle.
Note that in the example above the detection algorithm generates application events (rather than
messaging events) every time it sends detected features to the task generation module.
Different types of application events are depicted on the diagram using different shapes: in our example
detection algorithms throw simple debug events which are depicted as a vertical bars.
Initially a 10 minutes long time interval is shown on the diagram but you can change it to 3 or 1
minute long one.
In order to do it choose Time axis menu item on the diagram and then choose the desired resolution.
Note that time axis ticks for 10 and 3 minutes long intervals correspond to minutes while for 1 minute long
interval they correspond to seconds.
Event Details.
Shows various attributes of the events on the Message Diagram, such as a specific message associated with
the event, PID of the process that generated the event, the exact time when the event was generated etc.
In order to obtain additional information about an event you should click the right mouse button while
positioning the mouse on a dot that represents this event on the diagram.
You may see that details of more than one event appear when you click on the diagram.
It means that more than one event has been generated almost at the same time, and RAPIDS was unable to
figure out which of them you have chosen.
You may want to change the resolution of the diagram in order to obtain more precise results.
Figure 3: Examples of attributes associated with various events:
- An Application event generated by the detection algorithm lowrefCYR on 10/26/2006
at 15:22:54.
The event has the following message attached to it: "Features sent".
- Two LDM Messaging events generated by the streamer streamer_KCYR on 10/26/2006
at 15:23:07.
Both events have the pqinsert type.
The messages associated with the events clarify that Reflectivity and Velocity products have been
sent.
- RAPIDS Info.
Shows various diagnostic messages as well as statuses of network connections (see Figure 4).
Figure 4: This window is opened along with the main RAPIDS display; the first two messages
notify the user that the RAPIDS client has successfully connected to the proxy server.
The third message is printed when the active user starts an experiment.
The rest of the messages show the statuses of network connections between system nodes.
- System Metrics.
Shows current values of various system metrics.
Figure 5: CPU and memory utilization on system nodes.
- Windows that show color codes of monitored processes.
These windows also show process metrics collected for the processes.
The number of these windows is equal to the number of machines involved in the experiment.
Events and Metrics
RAPIDS collects information about radar data flow, utilization of the system resources, errors occurring in the
system etc.
All this information can be considered as a stream of events.
An event can have one of the following types:
- Messaging.
Events of this type occur when data is being sent/received through one of the system's communication
channels.
Table 1 shows which communication channels are used for transferring different kinds of data.
| Data Type |
Communication Channel |
| Radar scans (NetCDF files) |
LDM product queue |
| Meteorological features |
WDSS-II linear buffers |
| Radar tasks |
TCP connection |
| Radar control signals |
TCP connection |
Table 1.Communication channels used for transferring different types of DCAS data.
When two Messaging events relate to the transferring of the same item between processes P1 and
P2, they form a message from P1 to P2.
Messaging events are depicted on the message diagram as rhombuses.
Messages are depicted as arrows going from the sending event to the receiving event.
The color of an arrow shows which communication channel was used for the transferring: red color corresponds to
LDM product queues, green corresponds to WDSS-II linear buffers, while blue corresponds to TCP connections.
- Application.
Events of this type are generated by software modules running in the system.
In contrast to Messaging events which RAPIDS collects transparently to the software modules, Application
events require changes in the software's code in order to be seen by RAPIDS.
RAPIDS API that can be used to generate Application events is described here.
All Application events are shown on the message diagram.
Here is the list of various Application events that can be generated:
- Error.
Different levels of importance (warning, error etc.) can be used to describe various faulty situations
that might occur in the software.
Events of this type are depicted on the message diagram as crosses.
- Value of a variable.
These events are generated when a current value of a specific variable in an application
should be reported.
Events of this type are depicted on the message diagram as five-pointed stars.
- Algorithm stage.
Events of this type should be generated to report various stages in an application's execution,
such as entrance to a subroutine.
Currently only two subtypes of this event are available:
- Begin.
Events of this type are depicted on the message diagram as small arrows pointing up.
- End.
Events of this type are depicted on the message diagram as small arrows pointing down.
- Debug message.
Events of this type can be generated to report any information that does not fall into the above categories.
For example, some meteorological applications generate events of this type to report that detected features
have been sent to the task generation module (see Figure 2).
Events of this type are depicted on the message diagram as vertical bars.
- Metric.
RAPIDS allows real-time monitoring of utilization of various system resources, such as CPU, memory etc.
In an experiment setup a user specifies which system resources should be monitored; RAPIDS periodically calculates
utilization of these resources (according to the refresh rate specified by the user) and displays it on the
system metrics display (see Figure 5).
Every time RAPIDS calculates utilization of a specific system resource an event of Metric type occurs.
Metric events are classified into System and Process events.
- System.
CPU, memory, workload over the last 1, 5 and 15 minutes: shown on the system metrics display
(see Figure 5).
Status of the network connections between the nodes: shown on the RAPIDS info display (see
Figure 4).
Note that the in order to monitor the network status RAPIDS starts small software probes on
each system node in the beginning of an experiment.
These probes periodically send "I'm alive" signals to one of the nodes, say S, who forwards
them to RAPIDS.
If such a signal stops coming from one of the nodes, say C, RAPIDS reports that connection
between S and C has been broken.
Obviously, this monitoring method creates a possibility of a false alarm: if the software probe
on C crashes and stops sending signals to S, RAPIDS will be reporting
a broken connection between S and C still the same.
- Process.
Back
to top
How to replay an experiment
All experiments are logged and log files can be used for replaying the experiments off-line.
In order to replay an experiment do the following:
- Start the RAPIDS Client.
>run_gui.pl
Select the log file of the experiment you would like to replay.
When the main RAPIDS window is opened choose the Log item from the menu and then the Replay... item.
Choose the log file of the experiment you would like to replay.
Press the Replay button.
Note: The name of a log file starts with the prefix log
followed by the date and time when this log file was created.
For example, the following log file:
log_02242005_172230
was created on February 24th, 2005 at 17:22:30.
Replay the experiment.
In addition to standard RAPIDS windows described in Section “RAPIDS Displays”
another window titled Replay functions will be opened.
Replay functions include the following:
- Play.
Replay the experiment at the real time speed.
- Pause.
When the experiment is paused an additional time resolution (milliseconds) becomes available at the
message diagram.
See Magnify function for more details.
- Stop.
- Fast forward.
Replay the experiment at the speed 5 time higher than the real time one.
- Magnify.
This function becomes available only when the experiment is paused.
The function allows to view the message diagram at millisecond resolution.
While being in this time mode you can scroll the diagram forward or backward using corresponding
buttons on the tool bar at the bottom of the diagram (see Figure 6).
Figure 6: Message diagram that shows a one second long time frame.
The diagram can be scrolled forward or backward using corresponding buttons on the tool
bar at the bottom of the diagram.
An Event Details display will be opened along with the diagram.
Figure 7 shows an example of such a display with details of events that appear on the
message diagram from Figure 6.
Figure 7: Event Details display that shows detailed descriptions of events
that appear on the Message Diagram from Figure 6.
For example, Messaging event generated by streamer_KCYR: radar control signals received on 10/26/2006
at 15:23:19.
In the line providing details of this event there is a number that appear in brackets
after the date/time when the event has occurred.
This is merely the same date/time of the event occurrence shown in another format: as
number of milliseconds since the standard base time known as "the epoch", namely
January 1, 1970, 00:00:00 GMT.
This number shows that this event occurred on the 993th millisecond of the 19th second
which corresponds to the very first blue arrow on the diagram from Figure 6.
Also, there are several Application events that have been generated by streamer_KCYR at the
same time.
They provide details for the received control signals, such as which sector and how many elevations
should be scanned, how much time it will take to perform the scan etc.
- Stop replaying the experiment.
Press Stop button in the “Replay functions” window.
If you prefer to view a log file in plain ASCII format you should first convert the file from a binary format it is
being stored in.
In order to do this start RAPIDS Client, go the Log item in the main
menu and there choose Convert to ASCII... item.
Choose the log file you would like to convert and press Convert button.
Note that at the beginning of every log file there is a description of the setup of the logged experiment.
Back to top
Troubleshooting
Normally, when stopped, RAPIDS server removes all LMMs, who, in turn, take care of all necessary
clean up.
However, there might be a situation when an experiment has not been stopped properly.
Often, this situation occurs when commands used to stop applications (provided by a user in the
experiment configuration file) did not work properly.
In this case you should stop the experiment manually as follows:
- Log in to all machines used in the experiment and stop all DCAS applications started
during the experiment (streamers, detection algorithms, task generation and optimization
module etc.)
- Log in to all machines used in the experiment as rapids and stop LMMs:
>/usr/share/rapids/utils/bin/lmm_remover
- While logged in as rapids remove RAPIDS message queue - a data structure used for
collecting events:
>/usr/share/rapids/utils/bin/rmq_remover
Back to top
The RAPIDS API
Provided with the RAPIDS framework is a set of API functions that
users can use in their applications to report different kinds of messages/events to the RAPIDS monitoring system.
Below are links to the RAPIDS C, C++ and Java API.
Back to top
Connecting to the Real-Time Stream of RAPIDS Events
RAPIDS users can develop their own applications that process or visualize the data collected by RAPIDS.
Provided with the RAPIDS framework is a rapidsStream package that can be used to
connect to the real-time stream of RAPIDS events.
RAPIDS visualization client and fault detector client are two applications that use this package.
See RAPIDS installation instructions for an explanation on how to download the code for these clients
as well as the package.
The documentation on the rapidsStream package can be found
here.
Back to top
Reporting Bugs
Please, send an e-mail to Anna (abekkerm@ecs.umass.edu)
in case RAPIDS has crashed, you have found a bug or there is anything
else you would like to tell :)
Back to top