EXPLORE - a widget based interactive data analyzing tool for IDL

VERSION 2.00 beta

EXPLORE lets you interactively investigate and manipulate data sets. You can read in almost any ASCII data set as long as it contains only numerical data and observations are stored line by line. Several plot options exist and you can easily produce postscript files from your plots. Data manipulation can be done using the internal calculator tool or by calling external routines (i.e. almost any IDL procedure). You can bin your data into groups and compute basic statistics for each group or as a total. Just read through the following if you want to know more!

System requirements: EXPLORE was developed on a Unix platform but it should run on Windows and MacOS platforms as well (see trouble shooting section below) if you run into trouble here.

Any feedback on EXPLORE is strongly encouraged and may determine the development effort for future upgrades.

[Installation] [Starting EXPLORE] [The main window]
[File Menu] [Window Menu] [Data Menu] [Plot Menu] [Region Menu] [Info Menu]
[Missing Data] [The Group mechanism] [Calculator examples] [Trouble Shooting]

back to Martin's IDL homepage

Installation

Copy the file "explore.tar.Z" in the directory of your choice, and issue the uncompress command (under Unix), and then the command tar -xf explore.tar. This will extract all the files you need including a few example files. Before you can start using EXPLORE, you will possibly have to adjust a few environment variables and modify your IDL startup file (or create one if you don't have one):

The environment variables that I use (C Shell) are:

     setenv IDL_DEVICE X
     setenv IDL_STARTUP ~/IDL/idl_startup.pro

There are two entries in the idl_startup file that are of importance:

     !path='~/IDL/EXPLORE:'+!path

This expands the default IDL Path so that IDL finds all the EXPLORE subroutines even if you start it from another directory.

     ; load default colortable with first 16 colors defined 
     ; as drawing colors
     myct,27      ; EOS-B colortable loaded starting with index 17

This command prevents you from (not) seeing a black line on a black screen. You can load any other color table with myct if you specify it's number as an argument: e.g. myct,14. And you can also shrink the color table loaded on top of the drawing colors and extract a portion of it by specifying the NCOLORS and range keywords.

After you are done with all this, try ".compile explore" to see whether you can compile the main program and some subroutines.

Starting EXPLORE

Simply type explore and you should see the main window and an empty plot window (there is no need to issue a compile or .run statement before execution).

Alternatively, you can give EXPLORE a filename argument, and it will automatically try to load this file. In addition to the filename you may have to specify the delim, skp1, skp2, and/or autoskip keywords (see description in File menu below). Example:

          explore,'test.dat',delim=' ',skp1=3

Another option is to pass a pre-loaded data set and variable names list (called header) to EXPLORE. Note that the data array must be 2-dimensional and have variables as columns and observations as rows (i.e. data=fltarr(VARS,OBS). Example:

          explore,data=data,header=header [,comments=comments]

(the comments keyword allows you to pass a string array which will be used as a file header when writing your data to a file)

The main window

Starting EXPLORE will always give you one copy of the main window which is shown on the left (click on the image for a better view). This is your major "operating console": all actions of EXPLORE are driven by selecting commands from the menues described below.

Note, that you can have more than one main window open at any time, each with it's own data set and associated plot window (see the Window menu documentation for details).

On the left of the main window you see the main variable list. You can select a variable from this list and copy it to either the "X" or "Y" list. The "X" and "Y" lists contain all variables that will be plotted if you click on one of the options in the "PLOT" drop down menu (see below). A click on the button will copy a variable from the main list into the "X" or "Y" list, the button removes the selected variable from the "X" or "Y" list, and the button removes all variables from the respective list.

The buttons labeled "File", "Window", "Data", "Plot", "Region", and "Info" are drop down menues - they are described in detail below. Finally you see a couple of option buttons (checkboxes):

"overlay fit": compute a fit through the data and show it on all graphs. Default is a linear least squares fit, others can be selected from "Plot-Select Fit". The fit will always be performed on all valid and visible data points (see description of region commands below). [I am not too happy with the current fit implementation, but so far I rarely felt the need to improve on this - suggestions welcome!]
"generate postscript": The next plotting command issued will also produce your graph(s) as a postscript file. Default filename is "explore.ps", orientation is landscape. You can select a number of postscript options from the Plot-Postscript options menu. This button will be deselected automatically as soon as the postscript file is produced.
"x log" and "y log": display the data logarithmically on either x, y or both axes.

File

This menu contains the following file manipulation routines:

Read Data Read a new ASCII data set into the current main window (all other main windows remain unaffected). You will see a dialog box which allows you to enter a file name and specify the header delimiter (i.e. one ASCII character that can be used to seperate variable names) and the position of the variable names line:

skp1 is the number of lines to skip before the variable names line, and
skp2 the number of lines to skip after the names line to the beginning of the (numerical) data.
Autoskip requires a special header format: the first line of the file must contain the total number of header lines, and the last header line must have the variable names (e.g. the NASA GTE format).

[These are keywords to the readdata routine which is available in my IDL library].
If you are not sure about the file name or header format of the file to read, you can click the

button. This will lead you to a pickfile dialog which allows you to browse your file system and choose a file with your mouse. After you select a file you will see a preview window showing the first 20 lines of the data file, and you will return to the read data dialog box where you can enter the appropriate values for Delimiter, and the skip options.
Reading the data file will also retrieve the complete header information which can then be edited with the Edit comments command described below.

Save Data a similar dialog box appears and you can save the data from the current main window. This will store all data, including any hidden or non-visible data (see region description for the concept of visible and valid data).
You can again specify a filename (default is your old name with an appended .new) and you can specify the variable names delimiter and whether or not to save comments (i.e. the complete header information). You can also browse through your file system and select the filename per mouse.
There is a chance to edit the header information before saving the file: if you click on the button. This will display a simple text editor pre-loaded with the current header information. The actual variable names are replaced by a template [%VARIABLE NAMES%]. Note, that unlike editing the header information directly from the File menu, you will not change your header permanently if you invoke Edit comments from the save data dialog.

Save Selected Data This command operates similar to Save Data, but you will only save the variables that are currently selected in the "X" and "Y" windows and - if you made a data selection - only the selected data. If no variables and no data are selected, the complete data set will be saved.

Edit Comments This will display a simple text editor pre-loaded with the current header information. The actual variable names are replaced by a template [%VARIABLE NAMES%]. The header information is changed permanently. If you delete the [%VARIABLE NAMES%] template, no variable names are written out, but then EXPLORE will not be able to read your data again.

Done Close all windows and return to the IDL prompt. [Although IDL 5 supports an active command line while running a widget application, this is not implemented in EXPLORE. You can however access many IDL programs with the Call external procedure command in the Data menu.]

Window

Copy Window copies the active data window and reproduces the last plot you made. You get an independent copy of all the data from your old main window, so you can delete data or variables in one of them but keep them in the other one. You can also read in a new data file in one window while the other one still contains data from the first file. You will even keep your "X" and "Y" selection lists, as long as the variable names you selected are also found in the new file.

Close Window closes the active data window and its associated plotting window. Unlike File-Done EXPLORE remains active and you don't return to the IDL prompt unless you are closing the last window on the screen.

Data

Sort allows a hierachical sort of up to three variables each one of them in ascending or descending (reverse) order.
If more than 25 variables are defined in the main list, the appearance of the dialog box will change from drop down lists to simple lists with scroll bars.

Calculate This dialog allows computation of new variables or manipulation of existing ones. The formula on the top line is meant to give some help on the precedence of operators. For more help, check out the examples section.
You can give your variable any name (in the "Y" field), but be careful not to include the character that you are planning to use as delimiter for your header line when saving your work. If you type in the name of an existing variable, EXPLORE will ask you whether you really want to overwrite the existing variable before it performs the calculation. Note, that there is one variable name with a special meaning: GROUP (case insensitive). If EXPLORE finds a variable with this name, it will change it's plotting behaviour (for details see description below).
The fields labeled "A" and "B" allow specification of a scaling factor and an offset that will be applied to the expression within parantheses. "f" lets you select from a variety of functions (id [i.e. identity], 1/, ln, log, exp, qqnorm, trunc, rround), "X1" is the first variable (or index in the data set, i.e. observation number), "OP" is an arithmetic operator ( +, -, *, / ), and "X2" is the second variable (or can be left blank).
In the second line you see a "where" clause that allows you to restrict the calculation to only a subset of your data. Again you can select a variable or the data index, a logical operator (ge, le, gt, lt, ne, eq) and you can type in a threshold value. The calculation will only be performed for data that satisfies the criterion you specify here. If you create a new variable, the remaining values will be set to missing (-999.99), if you perform a calculation on an existing variable, these values will remain what they are.

Rename Variable select a variable in the main list, and you can give it a new name.

Delete Variable select a variable in the main list, and it will be deleted from the data set in memory (permanently and without any warning !)

Call external procedure allows you to execute self-written routines for data manipulation, statistics, etc. A dialog with two input lines allows you to provide a procedure name and the parameters and keywords to the routine. Default for parameters is set to data,header which will pass the actual data array and the variable names into your routine. Here is a summary of variables that are available to be passed outside:

data: 2-dimensional float array containing the current data (or a single value of 0 if no data is loaded). The data is arranged as (variables,observations).
header: string array with variable names
hidden: integer array indexing the hidden observations, or a value of -1 if no observations are hidden
select: integer array indexing the currently selected observations, or a value of -1 if no observations are selected
filename: the current filename associated with the data
comments: a string array containing the current file header information (variable names are replaced by the [%VARIABLE NAMES%] template)
delim, skp1, skp2, autoskip: parameters that determine the file header structure (see File-Read Data).

The call to the external routine is realized via the execute command, and the dialog box entries for procedure name and parameters are simply concatenated with a comma inbetween. Although this option is designed to work with procedures, you can also call functions by typing the complete calling sequence in the procedure name field (e.g. r=my_function(data)). Note however that the result value of the function is not evaluated by EXPLORE.
Call external procedure can also be very useful to perform simple numeric operations, e.g. to lump several constants into one in order to save repeated calls of the calculator. This is accomplished by calling print with your numerical operation as a parameter. You can also call help in order to get information on the data array, etc.

Plot

This is the menu which allows you to actually produce one or more diagrams on your graphics display (and a postscript file as well if you checked the generate postscript box in the main window). The appearance of the plot will depend on the existance of a variable named GROUP (case insensitive). Observations that are currently seelcted (see Region menu) will be highlighted in all plots. This makes it easy to explore features of individual data points in many variables.
The plot menu consists of the following commands:

Matrix plots all variables currently in the "X" selection vs. all variables that are in the "Y" selection of the main window.

Pairs plots each variable in the "X" selection vs the corresponding variable in the "Y" selection. You must have the same number of variables in both lists. However, if you have only one variable selected in either "X" or "Y", this will be used in all the plots.

Single plots only one graph from the selected variables. If a variable is selected (i.e. highlighted) in either the "X" or "Y" list, this variable will be plotted on the corresponding axis. If no variable is selected, EXPLORE will pick the last one in both lists.

Select Fit you can choose from a variety of fit functions that will be displayed when you activate the "overlay fit" checkbox from the main window. [The fit portion of EXPLORE is still in a crude stage, suggestions are welcome!]

Postscript options shows a dialog box where you can tune the postscript output a little. A postscript file will be produced from the next plot command (i.e. matrix, pair, or single) that is issued after selection of the "generate postscript" checkbox on the main window. In fact, the next plot command will produce identical plots on the screen and in the postscript file (although not WYSIWYG).

Region

This menu gives you control over which observations to display and allows you to highlight ("select") individual observations. A region is a rectangular area in a single plot. However, you can select data from irregularily shaped regions using the Add to Selection command.
The following terms are important to understand how EXPLORE works:

valid data: data that is not coded as missing (see description of missing values below)
visible data: data that are not hidden by the Region-hide command
(together valid and visible data denotes all observations that will be displayed in a plot of the respective variable(s) and that will be used to compute Statistics)
selected data: observations that have been highlighted with the Region-select command.

Here are the commands available in the Region menu:

Select First, a single plot will be generated with the variables selected according to the rules for Plot-single. Use the mouse to drag a selection rectangle (hold the left mouse button while dragging). After you release the mouse button, the original plot (matrix, pair or single) will be re-generated and you will see the selected points highlighted in all graphs. The actual appearance of the highlighted points depends on the existance of a variable named GROUP (see description below). Generally, non-highlighted points will be displayed in fainter colors and the highlighted points are increased in size. There is no interactive control over the appearance of points, but you can adjust the global_init routine in explore.pro to your needs (see point 4 of the trouble shooting section below).
If you invoke Select, any previous selection will be cancelled. In order to de-select your data, simply call Select and drag the selection rectangle in an empty space.

Add to Selection Same as Select, but selections made before will not be cancelled, so you can select different groups of data (e.g. along a diagonal line).

Invert All marked points will be unmarked and vice versa. Note: this will also select observations which are not visible in some of the plots because they contain missing data. This is important if you want to make a hierarchical selection (e.g. all data points in a certain geographical region with tracer concentrations greater than XYZ pptv). Here, you will have to select the geographical region first (since it will probably not contain missing values), then invert the selection and delete or hide the region data. If you select tracer and latitude first, you may throw out several data points that are in your geographical region but have missing values for tracer.

Hide The marked points will disappear from the plots and they will not show up in the statistics (see Info-Statistics below).

Unhide Lets you retrieve the hidden data. It will automatically appear as selected (highlighted) data, thereby cancelling any other selection.

Delete Region Data The marked data will be deleted from the data set. Unlike with Hide there is no way to retrieve these data, except re-loading the ASCII file (and redo all the calculations if you haven't saved them !).

Info

This menu contains commands to display the data in tabular format or compute statistics:

Table displays a table widget with the currently selected variables and observations ( example). If no observations are selected (highlighted), all observations will be shown.

Table with all variables Displays a table with selected observations but all variables.

Statistics produces a statistical information window with min, max, median, mean, standard deviation and number of valid observations for each variable. The display is broken down into the following categories:

Valid and visible observations: all observations that are not missing and would appear on a plot of that variable
Selected observations: all observations that are not a missing value and are highlighted with the Region-Select command
Observations in group [groupname]: one block of lines is displayed with statistics for valid and visible observations in each group.

Upon exiting the dialog you can optionally save mean and median values into data files which are EXPLORE compatible in format. These files will be named automatically using the current filename with appended suffixes .mean and .median . Or you can "print" the statistical information into a text file, where you will have to pick a name with the pickfile dialog. The format of this file will be identical to the output in the statistical information window, except for an additional header line that contains the system date and time.

Treatment of missing values

Missing values are often coded as specific numbers, and there is a great variety of codes in use. EXPLORE recognizes the following values as missing data:

-999., -888., -777., -666., -555.,
     -999.99, -999.9, -99.99, -9.99, -9.999

. If you need to make changes to this selection, you will have to edit the global_init procedure in explore.pro.

Whenever EXPLORE itself generates missing values (as a result of a calculation made with the Calculator), it will produce a value of -999.99 . Note that the value of a missing data point is not changed by EXPLORE, so that you will have the same coding when you save your data again (except for those missing values produced by EXPLORE). This can be important if codes are also used to denote values outside the detection limits of an instrument, etc. These values will be treated as missing by EXPLORE but not altered.

The GROUP mechanism

With version 2 of EXPLORE, a feature has been added to distinguish between more than two groups of data (i.e. more than selected and not-selected). This mechanism is based on the existance of a variable named GROUP. Note, while the variable name is case insensitive with respect to being found and used as grouping identifier, the case does matter when you re-define groups with the Calculator.

It should be noted that the GROUP mechanism is a "hidden" feature of EXPLORE, i.e. no menu will give you any indication that it exists.

A GROUP variable can be created like any other variable with the Calculator, or it may already be read from the file. It will also be saved as a normal variable when you save your data. GROUP may have up to 30 different values. [This limit actually only applies to plotting data. When you are only interested in a statistical table, you can have even more group values.] It is the group index, not it's value, that determines the plot symbol and color.

A typical example would be to group your data in latitude bins:
Use the Calculator twice to calculate GROUP as

     GROUP = 0.05 * LATITUDE
     GROUP = 20 * ROUND(LATITUDE)

This will give you latitude bands of 20 degrees centered around -80, -60, -40, ... Because of the multiplication with 20, your GROUP value will reflect the center latitude which simplifies identification of data in a plot or statistical table.

While normal operation of EXPLORE is more or less black and white oriented, the introduction of GROUP opens the world of colors. The plot symbols and colors used for each group are defined in the global_init procedure in explore.pro (see trouble shooting, point 4). A legend is added to the plot in the lower right corner if the number of groups does not exceed 9 (EXPLORE uses a slightly modified version of Ray Stern's leg routine).

Although rather simple, the GROUP feature is quite powerful. Many features can be dreamed in order to facilitate the use of groups (e.g. display a fit for each group, select groups to be plotted, name groups, etc.). However, it will probably take a while before I will get back to work on those, and many things can be done already if you know how EXPLORE works.

Examples for working with the Calculator

Using the built-in Calculator requires some thought about how to break your calculations into parts that can be entered in the dialog box. This section shall give you some hints about how to proceed.

Generate a new variable with a constant value: Set factor A to zero and offset B to the constant value (example: this will set a variabel named "const" to missing values)
Scale observations with a linear calibration: example (note the appearance of the variable lists as lists with scroll bars instead of droplists because the number of variables exceeded the limit of 25)
Multiply two variables: In this example we multiply a rate constant with an average OH concentration to get a decay rate of the respective hydrocarbon. If you want to convert the result from s^-1 to days^-1, multiply with 86400.
Convert negative longitudes to positive values: The formula for this is LON = 360-LON where LON lt 0 (example: note the entries in the where section)

Convert molecules/cm³ to ppbv/day: The formula for this is ratep = 1.e9 * 86400. * ratem / M where M is the number of molecules per cm³. We will need several steps (several calls of the Calculator) in order to do this:
First, calculate M ( = 2.69e19 * (273/T) * (P/1013) ):

         M = 273.15 * ( 1/  T )           ; T must be in K !
         M = 2.65482e16 * ( M  *  P )     ; constant is 2.69e19/1013.25
                                          ; TIP: use call external with
                                          ; "print" as routine name.

Then, it's easy:

         RATEP = 8.64e13 * ( RATEM / M )

Trouble Shooting

I have a Windows or Macintosh system. What do I have to do to run EXPLORE ?
As far as I can see, you should be able to run it right away. But since I had no way of testing EXPLORE on different platforms, I would be very interested if you report problems.
I am running IDL 4.0 and EXPLORE won't operate!
Might well be true! Although I did not explicitely try to use version 5 syntax (e.g. square brackets for array indices), there may be a couple of statements that are not compatible with version 4.0 . If someone comes up with a fix, I'd be happy to post a version 4.0 compatible EXPLORE version, but I will assume no responsibility and do no development on that version!
Now I got IDL 5.x and EXPLORE still won't come up with the main window!
Hmm! One possibility could be that you have old versions of some of my library programs sitting around in a directory that is searched prior to the EXPLORE directory. Try including the local path ('.') before any others in the !PATH variable and try running EXPLORE from your EXPLORE directory. If it still causes trouble, send me mail. Maybe we can figure it out together.
OK: I installed EXPLORE and it seems to run fine, but my colors are all messy. What did I do wrong?
The color handling of EXPLORE is based upon the routine MYCT.PRO which is distributed along with EXPLORE but *not called*. This has to be done prior to starting EXPLORE (or you can devise your own color table if you like). There is a side-issue related to this one: if you don't like the color assignment for the GROUP feature, you can go into the code (EXPLORE.PRO) and change the entries in PRO GLOBAL_INIT: gcols and gscols. The same holds for the symbols. At some point in the future, these settings may be read from a file or even become user-editable.

back to IDL homepage

This page was last modified 04/09/1998

Martin Schultz / Harvard University / mgs@io.harvard.edu