The earliest inklings
I am a bibliophile - someone with a deep and abiding interest in books (and journals). See my biography, the early years. Even as an undergraduate in physics at MIT, and especially as a PhD student there, I had personal subscriptions to some major phyics, and chemical physics journals. All this grew out of my desire to understand my potential career - what was it that physicists did in their daily work and research? Physics Today, a popular, but specialist magazine, was my primary guide, along with the good books I found in the (open) Reserve stacks.
Awakening - A summer at the Marine Biological Laboratory
Fueled by a biology research grant from the NSF, my family and I were able to spend two full summers at the Marine Biological Laboratory (MBL) in Woods Hole, MA, 1980 and 1981. In the second summer someone had arranged a speaker series of luminaries to address issues of the biological research issues. The most notable presence was Eugene Garfield, who developed the famous Science Citation Index. I had some good conversations with him.
A major epiphany occurred during that second summer at the MBL. After attending a small symposium there on the future of digital libraries, I realized that an excellent goal for research would be to use computers, not for lab data acquistion, but to mine the huge collection of knowledge already residing in the public literature. As a biologist, I knew that the text and the figures in the literature, taken together, were what needed to be analyzed. So my work on this started more than ten years before the first widely-used browser, Mosaic, was developed.
The awakening came when I reflected on the use of computers in biology, something I had been heavily involved in since 1972, at the University of Colorado, and continued in my lab at Illinois from 1975-1985. What I realized was that the primary use of computers at that time was to gather and process data in the laboratory. But I realized that a huge amount of "data" was lying there, begging for the application of computers. That data was the content of the entire biological literature, summarizing literally millions of experiments done over the years. Because of my earlier interest in linguistics I could see those two streams coming together - the literature and the application of computers to extract knowledge from the literature. Because I approached the problem from linguistics, and because I was a working scientist, a biologist, I thought of this problem in the large, not in the extraction of specific items such as genes, proteins, and their interactions - very much the focus of current work in BioNLP.
I have always been interested in graphics and design. The first major professional effort in this domain was the Galatea system at the University of Chicago, 1973-1975. It overlayed a movie film image with an interactive graphics image. The system was used to study microcinematography images of the cellular slime mold, Dictyostelium discoideum. The centers of the cells could be followed for overall motion and even the shapes could be outlined, on a frame-frame-basis. Probably the most important result of all this work came later, in my 1982 paper: Futrelle, R. P., Traut, J., & McKee, G. W. (1982). Cell Behavior in Dictyostelium discoideum: Aggregation responses to localized cyclic AMP pulses. J. Cell Biology, 92, 807-821. This described work done in my lab at the University of Illinois in Urbana-Champaign.
To pursue all this, I shut down my biology lab at Illinois and joined the College of Computer Science at Northeastern University in early 1986, where I still am. I was able to obtain a large research grant from the NSF in 1989. This established the Biological Knowledge Laboratory (BKL), which I still head. There was a lot of work to do, because full-text papers and their figures were not readily available in electronic form. So we scanned figures and had an assistant trace over them, producing an electronic version that we could then analyze.