A concise introduction to fundamental methods for finding and extracting relevant information from the ever-increasing amounts of biomedical text available
The introduction of high-throughput methods has transformed biology into a data-rich science. Knowledge about biological entities and processes has traditionally been acquired by thousands of scientists through decades of experimentation and analysis. The current abundance of biomedical data is accompanied by the creation and quick dissemination of new information. Much of this information and knowledge, however, is represented only in text form—in the biomedical literature, lab notebooks, Web pages, and other sources. Researchers' need to find relevant information in the vast amounts of text has created a surge of interest in automated text-analysis.
In this book, Hagit Shatkay and Mark Craven offer a concise and accessible introduction to key ideas in biomedical text mining. The chapters cover such topics as the relevant sources of biomedical text; text-analysis methods in natural language processing; the tasks of information extraction, information retrieval, and text categorization; and methods for empirically assessing text-mining systems. Finally, the authors describe several applications that recognize entities in text and link them to other entities and data resources, support the curation of structured databases, and make use of text to enable further prediction and discovery.