Rajarshi Guha, Research Scientist
- National Institutes of Health
- Ph.D., Computational Chemistry, Pennsylvania State University, State College, PA
Rajarshi Guha compiles and analyzes the data that his colleagues generate in their labs to assist in discovering new drugs and combinations of drugs to treat rare diseases and cancers. For the past six years, he has worked at the National Institutes of Health. "Working at the NIH provides a collaborative & multidisciplinary environment with direct access to lots of ‘fresh’ chemogenomic data," he says.
His graduate studies, which focused on quantitative structure–activity relationships (QSAR), prepared him for the work he does now. This area of computational chemistry focuses on developing statistical models of biological activities and physical properties based on chemical structure information (in contrast to theoretical chemistry approaches such as quantum mechanics methods).
During his graduate studies, Guha started a blog, where he discussed cheminformatics, programming, and related data science issues. The blog attracted the attention of researchers at Indiana University, and they contacted him about an available postdoctoral position, where he could continue his studies. Guha accepted the position, which led to a visiting assistant professor position at Indiana University.
Although Guha enjoyed his academic research work, he aspired to be in a larger, collaborative environment and to work closely with experimentalists, which could provide him with a broader range of data on which to base his models and computational methods. He now works for the National Center for Advancing Translational Sciences, a relatively new institute within the NIH. He works on developing novel methods to analyze structure activity data and also collaborates on projects dealing with high throughput screening of small molecules, RNAi, and drug combinations. Within these projects, he contributes in a variety of ways including modeling assay activity, developing novel methods to identify true hits, and visualizing large collections of structures and their activities in different biological systems. His research helps his experimental colleagues to identify promising structure–activity trends and focus on subsets of molecules which will undergo further, more in-depth testing.
I provide cheminformatics and data science support to high throughput screening (HTS) programs. This includes designing computational filters to select compounds for secondary followup (cherry picks), suggesting structural modifications to improve potency and reduce undesirable side effects of promising compounds (lead optimization), calculating molecular properties, modeling structure–activity relationships, and so on.
I also develop computational infrastructure (databases, application programming interfaces, user interfaces) to support new screening paradigms. I work on developing and implementing new algorithms and predictive models to support small-molecule data analysis. This includes modeling activity cliffs (chemically similar compounds having very different activities), characterizing chemical spaces (a high dimensional abstract space that “contains” all the compounds that have a specific group of properties), and integrating structure data with other molecular and genomic data.
I use a Mac computer running OS X to run Emacs (a text editor for programmers), the "R" statistical programming environment, and the Python/Java programming languages.
I work from home and travel to my office in Rockville, MD, at least once a month. In that sense, my work environment is set up exactly the way I want it. I attend two or three conferences a year, generally because I'm presenting my research.
The NCATS lab environment is very open and collegial. There is very little hierarchy and a minimum of imposed schedules. It's definitely pretty easy going and as long the work gets done, there are no issues. Once in a while, we have deadlines and have to work a few hours more, but in general there is no extraneous pressure.
The range of biological systems and conditions (rare disease, infectious disease, etc.) that I get to work with via collaborations is unparalleled. This presents lots of learning opportunities and exposure to a lot of cutting edge science. Importantly, I have the freedom to talk about my own work. This includes being able to blog about my work, presenting it at conferences, and also releasing source code and data. I also have the flexibility to follow up on ideas that may not be immediately applicable to ongoing projects. But once they do become relevant, I have the ability to collaborate with experimental colleagues to run validation experiments. There is an open environment where I can chat with chemists and biologists to expand my knowledge on non-computational topics.
I am also active in the open-source software community. I have been involved in the Chemistry Development Kit (CDK) open-source Java library for about ten years now, and I am a co-founder of the Blue Obelisk group — an informal group of chemists who promote open data, open source, and open standards.
Attention to details and the ability to translate a problem statement to a computation or mathematical form
Learn how to code well — so well that when the time comes, you think about the problem you are solving rather than about writing the code. Also, pay attention to statistics. And it goes without saying, know your chemistry.
Being involved with ACS Division of Chemical Information has been very useful in terms of building up a network of colleagues, opportunities to be involved in leadership activities, as well as giving back to the cheminformatics community.
The range of biological systems and conditions (rare disease, infectious disease, etc.) that I get to work with via collaborations is unparalleled. This presents lots of learning opportunities and exposure to a lot of cutting edge science."