Cheminformatics

Chemists in the Field

Rajarshi Guha


Rajarshi Guha

Research Scientist,
National Institutes of Health

Kevin Theisen


Kevin Theisen

President, iChemLabs, LLC.

 

   

Overview

molecule model

Cheminformatics (sometimes referred to as chemical informatics or chemoinformatics) focuses on storing, indexing, searching, retrieving, and applying information about chemical compounds. Through his use of cheminformatics techniques—pattern recognition and data visualization—to create the period table, Dmitri Mendeleev (1834–1907) is often credited as being one of the earliest cheminformatics scientists. In addition to chemical names and formulas, cheminformatics specialists search for and retrieve information about physical properties, three-dimensional molecular and crystal structures, spectroscopic signatures, chemical reaction pathways, molecular functional groups and docking sites, and other parameters, some of which require advanced information storage and retrieval technologies.

In addition to real-life compounds and structures, chemical databases and libraries can contain hypothetical compounds and structures. These are useful for guiding exploratory research or suggesting pathways to certain desired functionalities that do not exist yet. Virtual libraries can contain information on likely synthesis methods and predicted stability of the reaction products. Virtual screening uses chemical and physical principles to identify and evaluate the best candidates for a particular property or reaction from large libraries of real and virtual molecules. The most desirable candidates can then be verified in laboratory studies.

Cheminformatics programmers solve problems like defining data archival protocols that enable searching and comparison of whole spectroscopic profiles rather than numerical lists of peak positions. They have developed and standardized methods for representing three-dimensional molecular structures that enable searches for compounds having specific features.

Much work remains to be done on integrating chemical information from multiple sources and analytical techniques, extracting and mining information from journal articles, and applying this information to understanding and predicting how chemicals affect entire systems (including human bodies).

Polymer scientists would like to extract short lists of candidates with the right combination of tensile strength, melting point, toughness, and molecular weight, and sustainable synthesis processes from a database containing tens of thousands of real and hypothetical compounds and molecular structures. Pharmaceutical chemists can screen large combinatorial databases for the candidate molecules that are most likely to provide a specific functionality or therapeutic effect.

Typical work duties include the following:

  • Develop methods for data mining and performing statistical analysis of large datasets
  • Develop methods for archiving and retrieving data on molecular structures, reaction pathways, molecular interactions, or other phenomena
  • Collaborate with laboratory researchers to solve problems using data searching and retrieval
  • Collaborate with researchers in various fields to integrate information from a variety of disciplines and sources
  • Identify chemical property classifications and trends from large databases using mathematical techniques
  • Teach courses and train students

 

Education

Cheminformatics specialists sometimes get their start in other specializations, including organic chemistry, biochemistry, pharmaceuticals, and computational chemistry. However, most cheminformatics scientists work in this area full-time rather than as a sideline to another specialty.

The ACS Division of Computers in Chemistry (COMP) has published an annual demographic survey since 2009. In December 2012 (the most recent survey conducted), the most common degree discipline for this division's 2,276 members was physical chemistry (244 members), followed by organic chemistry (101) and medicinal/pharmaceutical chemistry (67).

Of the one-third of respondents who reported their highest academic degree, Ph.D.s outnumbered those with master's degrees by more than five to one and those with bachelor's degrees by more than six to one.

Some institutions offer formal degree programs in cheminformatics, including graduate certificates, bachelor's and master's degrees, and Ph.D.s. Cheminformatics specialists should gain a strong background in chemical structures and principles, as well as computer science and mathematical techniques including statistical analysis and graph theory.

Computer skills include clustering methods, database use, database structures, and knowledge of UNIX and Structured Query Language (SQL), which is used to maintain databases. Proficiency in the high-level Python computer language is important for developing and applying image manipulation techniques and for extracting useful information from large databases.

A Ph.D. is necessary for careers that involve doing science using cheminformatics technology. Postdoctoral appointments can be useful in developing advanced skills or in making the transition from another chemical specialty. Persons with bachelor's or master's degrees can focus on the IT side, developing and maintaining software, performing computations, and supporting facility users and customers.

 

Licenses

Licenses are not generally required in cheminformatics. Certificates demonstrating proficiency in specific software packages and programming languages may be advantageous.

 

Workspace

Cheminformatics work is a full-time profession. Often, a laboratory chemist will have some expertise in using informatics techniques, but will work with a specialist on designing calculations and interpreting the results.

Cheminformatics specialists may be required to train others in data mining and analysis methods, software packages, and computer visualization capabilities. They may teach courses or provide individualized instruction on programming and the use of commercial or proprietary software tools. They may also make presentations at conferences or conduct workshops.

In addition to helping users make the best use of existing resources, cheminformatics specialists may drive the field forward by developing software, archival standards, visualization capabilities, and new ways to apply and analyze results. They may work with computer scientists, who develop advanced hardware and software capabilities for working on especially large or complex problems. They may participate in consortia to develop and apply new capabilities and establish standards for reliability and accuracy to bring a new software tool to a broader user community.

Company mergers often create a need for persons skilled in merging databases having different formats and data entry and retrieval criteria. Being able to navigate the human issues involved (building consensus around a standard format and training users on a new interface) is an important skill. Many cheminformatics job ads emphasize teamwork skills.

 

 

Technical Skills

Analytical   Communication Background knowledge
  • Familiarity with computer modeling and statistical analysis methods
  • Various levels of programming, code development, and software architecture skills
  • Problem-solving skills and an interest in solving basic and applied research problems
  • Critical thinking and analytical skills to design and validate calculations and searches and to analyze and interpret results
  • Ability to work with and extract information from large datasets
  • Skills in adapting and integrating computer software to solve new categories of problems
  • Written and oral communication skills to explain findings and share results with scientists and nonscientists
  • Ability to work on research teams, integrating input from colleagues from various fields
  • Ability to create effective visual representations of models and datasets
  • Understanding of theoretical principles, including chemical bonding and reaction pathways
  • Understanding of biological, medical, or materials principles
  • Ability to apply mathematical methods to solving science problems
  • Proficiency in commonly used programming languages and software products

 

Career Path

Graduates with bachelor's or master's degrees can sometimes find employment as IT specialists or in user or customer support roles; however the number of positions and opportunities for advancement are limited. Students or recent graduates with an interest in research may do one or more internships in preparation for selecting an area of specialization for a graduate degree.

Research and supervisory positions generally require a doctoral degree, often with several years of postgraduate experience. Postdoctoral fellowships are one way to gain this experience, although this is not an absolute requirement.

Cheminformatics specialists may pursue a teaching and/or research career in academia, work in industry, or work for a government agency or national laboratory. They may also support and train facility users, students, or customers or develop new capabilities for collecting and analyzing data.

Because the field is so new and small, few current managers or program administrators have come from a cheminformatics background. However, this career path is a possibility for those with an interest and the required interpersonal and management skills.

 

Future Employment Trends

Job opportunities in industry include companies in the polymer and chemical industries. Some of the demand for cheminformatics specialists comes from pharmaceutical companies dealing with the high volume of data generated by high-throughput laboratory and computational modeling techniques. Some large chemical companies use cheminformatics techniques as well, but they often contract the services of specialists working outside the company.

Government jobs are available at the national laboratories and various government agencies, including those doing research on toxicology and biomedicine. Additional opportunities are available at universities and research laboratories in other countries, especially in Europe. Government and academic jobs are affected by federal spending priorities and the availability of grant money.

One of the major influences of job demand is the genome project and the use of personal genomics to treat diseases, i.e., marrying cellular pathways and genomic mutations with chemical drug information, such as Drugbank, and/or toxicity information like Comparative Toxicogenomics Database. Furthermore, Goggle and related web search providers, who must index and search chemical information for web searching, also influence hiring trends.

According to the ACS COMP Division's annual demographic survey, membership in this division ranged between 2,200 and 2,300 from 2009 to 2012, with 2,276 members in December 2012. Of these, 526 list computational/computers/informatics as their field of research (up from 327 in 2010 and 494 in 2011).

Slightly over half (53%) of COMP division members worked at academic institutions in 2012, up from 47% in 2010. The next most common employers in 2012 were pharmaceutical companies (9%), "other" manufacturers (6%), and government entities (5%). The most common job titles reported were professor/instructor/administrator, chemist/scientist, and graduate student.

 

Is This Career a Good Fit for You?

Cheminformatics specialists work at the intersection of several scientific fields. They must be technically skilled, but also assist their colleagues in extracting useful knowledge that advances their research. They should have a solid grasp of chemical principles (and possibly biology, pharmaceuticals, or polymers), be skilled in using and developing software and designing databases, and know how to apply statistical analysis and other mathematical methods. However, they must also be good communicators and listeners to ensure that the information they extract is useful and relevant. Often, they must advise colleagues who are not familiar with the capabilities available to them and consult with them in designing an effective approach.

 

 

Opportunities

Some demand for cheminformatics specialists comes from pharmaceutical companies dealing with a high volume of data. Large companies often outsource their cheminformatics jobs. Many job opportunities exist with companies that specialize in the design of software specifically for computational chemistry, and chemistry databases for efficiency storing and searching chemical structures and for indexing chemical patents, structures, and chemical formulas.

Government jobs are available at the national laboratories and various government agencies, including those doing research on toxicology and biomedicine. Additional opportunities are available at universities and research laboratories in other countries, especially in Europe.

Education

A strong background in chemical structures and principles is necessary, as well as computer science and mathematical techniques including statistical analysis and graph theory. Proficiency in computer programming and applying commercial software is advantageous, as is the ability to manage and extract information from large databases.

A Ph.D., and often several years of postdoctoral experience, is necessary for research careers. Persons with bachelor's or master's degrees can focus on the IT side, developing and maintaining software, performing computations, and supporting research scientists, facility users, and customers.

Salaries

SalaryList.com listed the following mean annual salaries in March 2015:

  • Informatics fellows (postdocs): $40,000
  • Analysts, specialists, and (staff) scientists: $62,000 to $68,000
  • Research scientists: $83,000 to $96,000
  • Managers in informatics: $115,000