| Quick-Start Introduction to PDS Archiving |
 |
Suggestions for Checking Intelligibility
This is usually easily checked, but sometimes it requires a bit of simple
programming. Also, while it is true that many formats both common and
arcane are ultimately intelligible, reviewers should keep in mind that
PDS requires these files to be readable on the timescale of decades.
Thus, the preferred formats are those that are essentially simple and
logically consistent with the type of data being archived.
ASCII Data
Most often this is in the form of a table. Reviewers have repeatedly
demonstrated a preference for ASCII format for tabular data, even when the
data were delivered in a binary format, because it is so easy
to visually inspect the data and determine that it is intact.
Thus, human-readability is usually a large factor in determining the
intelligibility of ASCII data sets.
The best test of intelligibility
is to print out a section of the file and examine it.
- If the data are intended to be used primarily via visual inspection
(a cross-identification list or small catalog, for example), is the
record size small enough that the file may be easily printed or
viewed with an editor?
- Was the file printable? Displayable on a computer screen?
Note: in cases of very long records, it may be convenient or
necessary to split the records vertically (i.e., the first 80 bytes
in one file, the second 80 bytes in the next, etc.) prior to printing
or displaying.
- Are data values appropriately aligned (i.e., decimal points aligned,
character values left-justified, etc.)?
- Is there a blank column separating fields?
- Are the data values appropriately formatted, especially fields
with exponent values?
Binary Data
There are essentially two types of binary data: image and tabular.
Image data is 2-dimensional and should be displayable with suitable software;
tabular data is either a simple vector or an inhomogeneous array,
yielding either a plot or an equivalent ASCII table of values, respectively.
Given that ASCII is so often preferred for tables,
some nodes routinely convert this last form into an ASCII table prior to
ingest.
However, some data sets are so large or so clearly intended to be used as
input to a display or reduction routine that they are left as binary tables
in order to conserve disk space.
Either way, checking the intelligibility of binary data will almost always
involve some programming.
- For binary tables: is it possible to generate an ASCII equivalent?
- For linear data: is it possible to produce a graph or plot of the
primary datum?
- For image data: is it possible to display the image?
- In all cases, do the data values look real, or do they look like
noise?
Problems?
Problems encountered reading the data should be relayed to the discipline node
as soon as possible so that they can be resolved immediately.
(Clearly, an inability to read the data precludes the possibility of
determining its fitness for archiving.)
|