EXPLORE lets you interactively investigate and manipulate data sets. You can read in almost any ASCII data set as long as it contains only numerical data and observations are stored line by line. Several plot options exist and you can easily produce postscript files from your plots. Data manipulation can be done using the internal calculator tool or by calling external routines (i.e. almost any IDL procedure). You can bin your data into groups and compute basic statistics for each group or as a total. Just read through the following if you want to know more!
System requirements: EXPLORE was developed on a Unix platform but it should run on Windows and MacOS platforms as well (see trouble shooting section below) if you run into trouble here.
Any feedback on EXPLORE is strongly encouraged and may determine the development effort for future upgrades.
Copy the file "explore.tar.Z" in the directory of your choice, and
issue the uncompress
command (under Unix), and then
the command tar -xf explore.tar
.
This will extract all the files you need
including a few example files.
Before you can start using EXPLORE, you will possibly have to adjust
a few environment variables and modify your IDL startup file
(or create one if you don't have one):
The environment variables that I use (C Shell) are:
setenv IDL_DEVICE X setenv IDL_STARTUP ~/IDL/idl_startup.proThere are two entries in the idl_startup file that are of importance:
!path='~/IDL/EXPLORE:'+!pathThis expands the default IDL Path so that IDL finds all the EXPLORE subroutines even if you start it from another directory.
; load default colortable with first 16 colors defined ; as drawing colors myct,27 ; EOS-B colortable loaded starting with index 17This command prevents you from (not) seeing a black line on a black screen. You can load any other color table with myct if you specify it's number as an argument: e.g. myct,14. And you can also shrink the color table loaded on top of the drawing colors and extract a portion of it by specifying the
NCOLORS
and
range
keywords.
After you are done with all this, try ".compile explore" to see whether you can compile the main program and some subroutines.
Simply type explore
and you should
see the
main window
and an empty plot window (there is no need
to issue a compile or .run statement before execution).
Alternatively, you can give EXPLORE a filename argument, and it will
automatically try to load this file. In addition to the filename you
may have to specify the delim
, skp1
,
skp2
, and/or autoskip
keywords (see description in File menu below).
Example:
explore,'test.dat',delim=' ',skp1=3
Another option is to pass a pre-loaded data set and variable names
list (called header) to EXPLORE. Note that the data
array must be 2-dimensional and have variables as columns and
observations as rows (i.e. data=fltarr(VARS,OBS)
.
Example:
explore,data=data,header=header [,comments=comments](the
comments
keyword allows you to pass a string
array which will be used as a file header when writing your data
to a file)
Starting EXPLORE will always give you one copy of the main window
which is shown on the left (click on the image for a better view).
This is your major "operating console": all actions of EXPLORE
are driven by selecting commands from the menues described below.
Note, that you can have more than one main window open at any time,
each with it's own data set and associated plot window (see the
Window menu documentation for details).
On the left of the main window you see the main variable list. You can
select a variable from this list and copy it to either the "X" or "Y"
list. The "X" and "Y" lists contain all variables that will be plotted
if you click on one of the options in the "PLOT" drop down menu
(see below). A click on the
button will copy a variable from the
main list into the "X" or "Y" list, the
button removes the selected variable from the "X" or "Y" list, and the
button removes all variables from the respective list.
The buttons labeled "File", "Window", "Data", "Plot", "Region", and "Info" are drop down menues - they are described in detail below. Finally you see a couple of option buttons (checkboxes):
File
This menu contains the following file manipulation routines:
Read Data Read a new ASCII data set into the current main window (all other main windows remain unaffected). You will see a dialog box which allows you to enter a file name and specify the header delimiter (i.e. one ASCII character that can be used to seperate variable names) and the position of the variable names line:
skp1
is the number of lines to
skip before the variable names line, and
skp2
the
number of lines to skip after the names line to the beginning of
the (numerical) data.
Autoskip
requires a special
header format: the first line of the file must contain the total
number of header lines, and the last header line must have the
variable names (e.g. the NASA GTE format).
readdata
routine which
is available in my
IDL library].
pickfile
dialog which
allows you to browse your file system and choose a file with your
mouse. After you select a file you will see a preview
window showing the first 20 lines of the data file, and you will
return to the
read data dialog box where
you can enter the appropriate values for Delimiter
,
and the skip
options.
Edit comments
command
described below.
Save Data a similar
dialog box appears and you can save
the data from the current main window. This will store all data,
including any hidden or non-visible data (see
region description for the concept of visible and valid data).
You can again specify a filename (default is your old name with
an appended .new
) and you can specify the variable names
delimiter and whether or not to save comments (i.e. the complete
header information). You can also browse through your file system
and select the filename per mouse.
There is a chance to edit the header information before saving the file:
if you click on the
button. This will display a simple text editor pre-loaded with the
current header information. The actual variable names are replaced
by a template
[%VARIABLE NAMES%]
. Note, that unlike
editing the header information directly from the File
menu, you will not change your header permanently if you
invoke Edit comments
from the save data
dialog.
Save Selected Data This command operates similar to
Save Data
, but you will only save the variables that
are currently selected in the "X" and "Y" windows and - if you made
a data selection - only the selected data. If no variables and no data
are selected, the complete data set will be saved.
Edit Comments This will display a simple text editor
pre-loaded with the
current header information. The actual variable names are replaced
by a template [%VARIABLE NAMES%]
. The header
information is changed permanently. If you delete the
[%VARIABLE NAMES%]
template, no variable names
are written out, but then EXPLORE will not be able to read your
data again.
Done Close all windows and return to the IDL prompt.
[Although IDL 5 supports an active command line while running
a widget application, this is not implemented in EXPLORE. You
can however access many IDL programs with the
Call external procedure
command in the
Data menu.]
Window
Copy Window copies the active data window and reproduces
the last plot you made. You get an independent copy of all the
data from your old
main window, so you can delete data or variables in one of them but
keep them in the other one. You can also read in a new data file in
one window while the other one still contains data from the first
file. You will even keep your "X" and "Y" selection lists, as long
as the variable names you selected are also found in the new file.
Close Window closes the active data window and its
associated plotting window. Unlike File-Done
EXPLORE
remains active and you don't return to the IDL prompt unless you
are closing the last window on the screen.
Data
Sort allows
a hierachical sort of up to three variables each one of them in
ascending or descending (reverse) order.
If more than 25 variables are defined in the main list, the
appearance of the
dialog box
will change from drop down lists
to simple lists with scroll bars.
Calculate This
dialog
allows computation of new variables or
manipulation of existing ones.
The formula on the top line is meant to give some help on
the precedence of operators. For more help, check
out the examples section.
You can give your variable any name (in the "Y" field), but be
careful not to include the character that you are planning to use
as delimiter for your header line when saving your work. If you
type in the name of an existing variable, EXPLORE will ask you whether
you really want to overwrite the existing variable before it performs
the calculation. Note, that there is one variable name with a special
meaning: GROUP
(case insensitive). If EXPLORE finds a
variable with this name, it will change it's plotting behaviour
(for details see description below).
The fields labeled "A" and "B" allow specification of a scaling factor
and an offset that will be applied to the expression within
parantheses. "f" lets you select from a variety of functions
(id [i.e. identity], 1/, ln, log, exp, qqnorm, trunc, rround),
"X1" is the first variable (or index in the data set, i.e. observation
number), "OP" is an arithmetic operator ( +, -, *, / ), and "X2"
is the second variable (or can be left blank).
In the second line you see a "where" clause that allows you to
restrict the calculation to only a subset of your data. Again you can
select a variable or the data index, a logical operator (ge, le, gt,
lt, ne, eq) and you can type in a threshold value. The calculation
will only be performed for data that satisfies the criterion you
specify here. If you create a new variable, the remaining values will
be set to missing (-999.99), if you perform a calculation on an
existing variable, these values will remain what they are.
Rename Variable select a variable in the main list, and you can give it a new name.
Delete Variable select a variable in the main list, and it will be deleted from the data set in memory (permanently and without any warning !)
Call external procedure allows you to execute
self-written routines for data manipulation, statistics, etc.
A dialog with
two input lines allows you to provide a procedure name and the
parameters and keywords to the routine. Default for parameters is
set to data,header
which will pass the actual data
array and the variable names into your routine. Here is a
summary of variables that are available to be passed outside:
data
: 2-dimensional float array containing the current
data (or a single value of 0 if no data is loaded). The data is
arranged as (variables,observations)
.
header
: string array with variable names
hidden
: integer array indexing the hidden observations,
or a value of -1 if no observations are hidden
select
: integer array indexing the currently selected
observations, or a value of -1 if no observations are selected
filename
: the current filename associated with the data
comments
: a string array containing the current file
header information (variable names are replaced by the
[%VARIABLE NAMES%]
template)
delim, skp1, skp2, autoskip
: parameters that determine
the file header structure (see File-Read Data).
execute
command, and the dialog box entries for procedure name and parameters
are simply concatenated with a comma inbetween. Although this option
is designed to work with procedures, you can also call functions
by typing the complete calling sequence in the procedure name field
(e.g. r=my_function(data)
). Note however that the result
value of the function is not evaluated by EXPLORE.
Call external procedure
can also be very useful to
perform simple numeric operations, e.g. to lump several constants
into one in order to save repeated calls of the calculator. This
is accomplished by calling print
with your numerical
operation as a parameter. You can also call help
in
order to get information on the data array, etc.
Plot
This is the menu which allows you to actually produce one or more
diagrams on your graphics display (and a postscript file as well
if you checked the generate postscript
box in the
main window).
The appearance of the plot will depend on the existance of a
variable named GROUP
(case insensitive). Observations
that are currently seelcted (see Region menu)
will be highlighted in all plots. This makes it easy to explore
features of individual data points in many variables.
The plot menu consists of the following commands:
Matrix plots all variables currently in the "X" selection vs. all variables that are in the "Y" selection of the main window.
Pairs plots each variable in the "X" selection vs the corresponding variable in the "Y" selection. You must have the same number of variables in both lists. However, if you have only one variable selected in either "X" or "Y", this will be used in all the plots.
Single plots only one graph from the selected variables. If a variable is selected (i.e. highlighted) in either the "X" or "Y" list, this variable will be plotted on the corresponding axis. If no variable is selected, EXPLORE will pick the last one in both lists.
Select Fit you can choose from a variety of fit functions that will be displayed when you activate the "overlay fit" checkbox from the main window. [The fit portion of EXPLORE is still in a crude stage, suggestions are welcome!]
Postscript options shows a dialog box where you can tune the postscript output a little. A postscript file will be produced from the next plot command (i.e. matrix, pair, or single) that is issued after selection of the "generate postscript" checkbox on the main window. In fact, the next plot command will produce identical plots on the screen and in the postscript file (although not WYSIWYG).
Region
This menu gives you control over which observations to display
and allows you to highlight ("select") individual observations.
A region
is a rectangular area in a single plot.
However, you can select data from irregularily shaped regions
using the Add to Selection
command.
The following terms are important to understand how EXPLORE works:
Region-hide
command
Region-select
command.
Region
menu:
Select First, a single plot will be generated with the
variables selected according to
the rules for Plot-single
.
Use the mouse to drag a selection rectangle
(hold the left mouse button while dragging). After you release the
mouse button, the original plot (matrix, pair or single) will be
re-generated and you will see the selected points highlighted
in all graphs. The actual appearance of the highlighted points
depends on the existance of a variable named GROUP
(see description below). Generally,
non-highlighted points will be displayed in fainter colors and
the highlighted points are increased in size. There is no
interactive control over the appearance of points, but you can
adjust the global_init
routine in explore.pro
to your needs (see point 4 of the
trouble shooting section below).
If you invoke Select
, any previous selection will be
cancelled. In order to de-select your data, simply call
Select
and drag the selection rectangle in an empty space.
Add to Selection Same as Select
, but selections
made before will not be cancelled, so you can select different groups
of data (e.g. along a diagonal line).
Invert All marked points will be unmarked and vice versa.
Note: this will also select observations which are not visible in
some of the plots because they contain missing data. This is important
if you want to make a hierarchical selection (e.g. all data points in a
certain geographical region with tracer concentrations greater than
XYZ pptv). Here, you will have to select the geographical region
first (since it will probably not contain missing values), then
invert
the selection and
delete
or hide
the region data. If you
select tracer and latitude first, you may throw out several data
points that are in your geographical region but have missing values
for tracer.
Hide The marked points will disappear from the plots and they will not show up in the statistics (see Info-Statistics below).
Unhide Lets you retrieve the hidden data. It will automatically appear as selected (highlighted) data, thereby cancelling any other selection.
Delete Region Data The marked data will be deleted from the
data set. Unlike with Hide
there is no way to retrieve
these data, except
re-loading the ASCII file (and redo all the calculations if you
haven't saved them !).
Info
This menu contains commands to display the data in tabular format
or compute statistics:
Table displays a table widget with the currently selected variables and observations ( example). If no observations are selected (highlighted), all observations will be shown.
Table with all variables Displays a table with selected observations but all variables.
Statistics produces a statistical information window with min, max, median, mean, standard deviation and number of valid observations for each variable. The display is broken down into the following categories:
Region-Select
command
.mean
and .median
. Or you can "print"
the statistical information into a text file, where you will have to
pick a name with the pickfile
dialog. The format of
this file will be identical to the output in the
statistical information
window, except for an additional header line that contains
the system date and time.
Treatment of missing values
Missing values are often coded as specific numbers, and there is
a great variety of codes in use. EXPLORE recognizes the following
values as missing data: -999., -888., -777., -666., -555.,
-999.99, -999.9, -99.99, -9.99, -9.999
. If you need to
make changes to this selection, you will have to edit the
global_init
procedure in explore.pro
.
Whenever EXPLORE itself generates missing values (as a result of a calculation made with the Calculator), it will produce a value of -999.99 . Note that the value of a missing data point is not changed by EXPLORE, so that you will have the same coding when you save your data again (except for those missing values produced by EXPLORE). This can be important if codes are also used to denote values outside the detection limits of an instrument, etc. These values will be treated as missing by EXPLORE but not altered.
The GROUP mechanism
With version 2 of EXPLORE, a feature has been added to distinguish
between more than two groups of data (i.e. more than selected and
not-selected). This mechanism is based on the existance of a
variable named GROUP
. Note, while the variable name
is case insensitive with respect to being found and used as grouping
identifier, the case does matter when you re-define groups with
the Calculator.
It should be noted that the GROUP
mechanism is a
"hidden" feature of EXPLORE, i.e. no menu will give you any indication
that it exists.
A GROUP
variable can be created like any other variable
with the Calculator, or it may already be read
from the file. It will also be saved as a normal variable when you
save your data. GROUP
may have
up to 30 different values. [This limit actually only applies to plotting
data. When you are only interested in a statistical table, you can
have even more group values.] It is the group index, not it's
value, that determines the plot symbol and color.
A typical example
would be to group your data in latitude bins:
Use the Calculator twice to calculate
GROUP
as
GROUP = 0.05 * LATITUDE GROUP = 20 * ROUND(LATITUDE)This will give you latitude bands of 20 degrees centered around -80, -60, -40, ... Because of the multiplication with 20, your
GROUP
value will reflect the center latitude which
simplifies identification of data in a
plot or statistical table.
While normal operation of EXPLORE is more or less black and white
oriented, the introduction of GROUP
opens the world
of colors. The plot symbols and colors used for each group are defined
in the global_init
procedure in explore.pro
(see trouble shooting, point 4).
A legend is added to the plot in the lower right corner if the
number of groups does not exceed 9 (EXPLORE uses a slightly modified
version of Ray Stern's leg
routine).
Although rather simple, the GROUP
feature is quite
powerful. Many features can be dreamed
in order to facilitate the use of groups
(e.g. display a fit for each group, select groups to be plotted,
name groups, etc.). However, it will probably take a while before
I will get back to work on those, and many things can be done
already if you know how EXPLORE works.
Examples for working with the Calculator
Using the built-in Calculator requires some
thought about how to break your calculations into parts that can
be entered in the
dialog box. This section shall give you some hints about how
to proceed.
A
to zero and offset B
to
the constant value
(example:
this will set a variabel named "const" to missing values)
LON = 360-LON where LON lt 0
(example:
note the entries in the where
section)
ratep = 1.e9 * 86400. * ratem / M
where M
is the number of molecules per cm3.
We will need several steps (several calls of the Calculator) in
order to do this:
M = 273.15 * ( 1/ T ) ; T must be in K ! M = 2.65482e16 * ( M * P ) ; constant is 2.69e19/1013.25 ; TIP: use call external with ; "print" as routine name.Then, it's easy:
RATEP = 8.64e13 * ( RATEM / M )