README to comspari and cdf2ascii, 20041118 ========================================== NAME comspari - COMparison of SPectral And Retention Information. A program for the visual comparison of datasets. cdf2ascii - a program to convert mass spectrometric data files from netCDF (Andi-MS) format to ASCII format. SYNOPSIS comspari [-g path/to/gnuplot] datafile [datafile2] cdf2ascii [-h] [OPTIONS] datafile.cdf [datafile2.cdf ...] DESCRIPTION Comspari is a software that has been written to facilitate the analysis of "paired" samples, i.e. samples that are almost identical yet present some qualitative difference. The software was originally written [1] for the analysis of data files produced by GC/MS and LC/MS, but can be used for other techniques, too. Since fall 2003, the software is in a fully operational stage, and is developed further. It has a command-line interface and a simple (yet very effective!) graphic display based on gnuplot. If you are not working with mass spectrometry, if you are neither chemist nor physicist, then this program is probably not for you ;-) OBTAINING THE SOFTWARE The latest version can be obtained via http from: http://www.biomechanic.org/comspari/ LICENSE This program package and the documentation are copyrighted: cdf2msg is (C) 2001, 2003-2004 Joerg Hau and (C) 2002-2004 Jonathan Katz. comspari is (C) 2002-2004 Jonathan Katz and (C) 2003-2004 Joerg Hau. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. See the file LICENSE for details. If you use this program (or *any* part of it) in another application, note that the resulting application becomes also GPL. In other words, GPL is a "contaminating" license. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License (file LICENSE) for more details. FILES The distribution package contains usually the following files: README The file you are just reading. LICENSE The license (GPL). cdf2ascii.c The 'C' source code for cdf2ascii cdf2ascii The stripped binary (executable) for Linux. cdf2ascii.exe The stripped binary (executable) for MS Windows (32bit) comspari.c The 'C' source code for comspari comspari The stripped binary (executable) for Linux. comspari.exe The stripped binary (executable) for MS Windows (32bit) To install, just copy the executable to any location you desire. (RE-)COMPILING ... is normally not necessary, as you have all the executables at hand ... 95% of all users will now skip to the next section ;-) However, if you want or need to recompile cdf2ascii, it's easy. Use something along the line: gcc -Wall -O3 comspari.c -o comspari This works with gcc under Linux and under MinGW/Win32. To recompile cdf2ascii, you need some things more: - The netCDF library. The current release of netCDF is available from Unidata at ftp.unidata.ucar.edu, in the /pub/netcdf directory. A number of pre-built libraries for different OS are located at the same server in the /pub/netcdf/contrib tree. For MS-Windows, pre-compiled NetCDF 3.4 libraries are kindly provided by Arlindo da Silva (NASA/GSF 1998), ftp://niteroi.gsfc.nasa.gov/pub/djgpp/netcdf/. Read the README file! - The public part of the original ANDI-MS (netCDF) distribution, which is still available via anonymous FTP from ftp.sjo.appliedbiosystems.com. It contains the documents for the ANDI/MS specification, C source code for reading/writing ANDI/MS files, and test data sets from a number of manufacturers. According to David Stranz, this code dates from 1994 and has not been updated since. First, build (if necessary) and install the netCDF libraries, e.g. in /usr/local. Put ms10aux.c, ms10enum.c and ms10io.c together with cdf2msg.c into one directory. Then, compile the auxiliary files to obj code: gcc -c -O3 ms10aux.c ms10enum.c ms10io.c Compile and link with cdf2ascii (put this all on one line): gcc -O3 -Wall -o cdf2ascii cdf2ascii.c ms10aux.o ms10enum.o ms10io.o /usr/local/lib/libnetcdf.a RUNNING THE PROGRAM The program is invoked by typing its name and that of the filesets to be read on the command line: comspari /path/to/my/data1 /path/to/my/data2 Note: Use only the "base filename" (e.g. "out/ko", not "out/ko_0224.chro"). There are some command-line options: -h shows a short Help screen and exits. -g /path/to/gnuplot if gnuplot is not in your PATH, you can specify its location using the "-g" switch. Note that under MS Windows you have to specify the "pgnuplot.exe" file (not "gnuplot.exe"), as MS Windows is not a POSIX compliant environment. A typical MS Windows batch file to visualise ONE datafile would look as this: comspari.exe -g c:\apps\gnuplot\pgnuplot.exe %1 You simply drag-and-drop a datafile on this batch file. QUICKSTART 0. Get some data files in netCDF format. For a start, either use the sample files provided, or get CDF files from your data acquisition package: - Micromass MassLynx: Start > Programs > MassLynx > DataBridge. - Finnigan Xcalibur: "Roadmap View" > Tools > File Converter. - Finnigan ICIS2: ICIS Executive > File > Convert > "DAT to ANDI-MS". - Agilent ChemStation: File > "Export MS Data to AIA"; there is also a batch mode from the same menu. - Sciex Analyst: Reportedly there is a macro available upon request from Sciex - (others? Please contribute if you can ...) 1. Convert the sample files for use by COMSPARI: cdf2ascii wt.CDF ko.CDF By default, the converted data are written into a directory "out", located under the current working directory. 2. Start COMSPARI and give it the two data file "prefixes" (these are the sample file names WITHOUT their extension, but WITH the directory name): comspari out/wt out/ko The program opens the data files and enters a "dialog mode". Type "q" to quit, "h" for help, and follow the instructions on the screen ;-) 3. The program makes use of gnuplot to display data. You can use the mouse to: - zoom in with the right mouse button - while the cursor is on the display window: type "p" to unzoom to the previous zoom level, "n" to re-zoom, "u" to fully unzoom, "h" for help. See the gnuplot help file (!) for details. - annotate the current cursor position using the middle mouse button (to erase an annotation, use Ctrl-MB2) [ FIXME: much much more to be added ;-) ] WORKING WITH MASS LISTS As of 20040813, comspari supports the use of external mass lists. This can be used to reduce the list of "interesting" mass traces, a feature that is particularly useful when you already have preprocessed data. One of the authors (J.H.) uses this to work with CODA-processed datasets [2] from MSGraph [3]. Here, the list contains one pair of m/z and MCQ values per line. The MCQ determination is performed with MSGraph, then the two MCQ files are combined: sed 1,4d ~/data/msgraph/ko.mcq | sed 's/.$//' | sort +0 -1 -n > ko.mcq sed 1,4d ~/data/msgraph/wt.mcq | sed 's/.$//' | sort +0 -1 -n > wt.mcq join wt.mcq ko.mcq > wt_ko.mcq (Note: MSGraph is a DOS Program, so the files contain CR-LF at the end of the lines. The "sed" script above just changes these CR-LF to LF.) The resulting file 'wt_ko.mcq' is then read into comspari, using the 'r' command. If the file contains MCQ data, you can use 'Q' to change the MCQ threshold can be changed. Pressing 'ENTER'/'b' will then bring you to the next/previous mass trace that is above this MCQ threshold. Clear the mass list by 'w' (without a parameter). Footnote: If you want a list of the files where at least one of the two m/z traces has an mcq > 0.9, just run an awk script like this: awk '{ x=0.9; if (($2 > x) || ($3 > x)) print $1 }' wt_ko.mcq ACKNOWLEDGEMENTS - David Stranz (for pointing me to the initial ANDI-MS distribution and code) - Herbert Kratky (for useful hints on compiling) - Arlindo da Silva (for making his pre-compiled libraries publicly available) - Unidata (for making all the nice netCDF stuff available) - Ethan Merritt (for valuable gnuplot hints) - Steve Cepa (for various manufacturer's CDF conversions) HISTORY [ to be completed ] 2004-03-25: updated documentation about obtaining netCDF files (JHa) 2004-08-13: Added possibility to read list of "interesting" mass traces. Supports import of files with MCQ values (JHa). 2004-11-15: Added Win32 cross-compilation options. 2004-11-16: Updated this documentation. 2004-11-18: cdf2msg now supports multiple files on the cmd line BUGS If you find any bugs or if you want to contribute, please contact the authors by e-mail (see below). We try to maintain the program as far as time and work allow. REFERENCES [1] J. E. Katz, J. Hau, D. S. Dumlao, and S. Clarke. "A New Technique (COMSPARI) to Facilitate the Identification of Minor Compounds in Complex Mixtures by GC/MS and LC/MS: Tools for the Visualisation of Matched Datasets". J. Amer Soc. Mass Spectrom. 15 (2004), 580-584. [2] W. Windig, J. M. Phalp, A. W. Payne. A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry. Analytical Chemistry 68 (1996), 3602-3606. [3] J. Hau, M. Linscheid. MSGraph: A Program for the Display of LC/MS Data. Spectrochimica Acta 48B (1993), E1047-E1051. Updated version available at . AUTHORS Joerg Hau, joerg.hau (at) dplanet.ch Jonathan Katz, jonathan (at) ucla.edu Thank you for your interest, and ... have fun! --eof--