Bioinformatics

Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data.

Bioinformatics is an interdisciplinary field that develops and improves upon methods for storing, retrieving, organizing and analyzing biological data. A major activity in bioinformatics is to develop software tools to generate useful biological knowledge. Bioinformatics has become an important part of many areas of biology (it is sometimes called ‘the New Biology’). In experimental molecular biology, bioinformatics techniques such as image and signal processing allow extraction of useful results from large amounts of raw data. In the field of genetics and genomics, it aids in sequencing and annotating genomes and their observed mutations. It plays a role in the textual mining of biological literature and the development of biological and gene ontologies to organize and query biological data. It plays a role in the analysis of gene and protein expression and regulation. Bioinformatics tools aid in the comparison of genetic and genomic data and more generally in the understanding of evolutionary aspects of
molecular biology. At a more integrative level, it helps analyze and catalogue the biological pathways and networks that are an important part of systems biology. In structural biology, it aids in the simulation and modeling of DNA, RNA, and protein structures as well as molecular interactions.

Bioinformatics uses many areas of computer science, mathematics and engineering to process biological data. Complex machines and the use of parallelization and distributed computing are used to read in and process biological data at a much faster rate than before. Databases and information systems are used to store and organize biological data. Analyzing biological data may involve algorithms in artificial intelligence, soft computing, data mining, image processing, and simulation.

Question: What is Bioinformatics and why should it be one of the Multidisciplinary Research Partnerships at Ghent University?

Answer: Biology is currently undergoing a revolution, mainly due to recent improvements in technologies to obtain biological data, such as for instance sequence data. To give but one example: 10 years ago, sequencing a human genome took about 10 years of work with hundreds of scientists and cost more than 3 billion dollars. Today, we can sequence a human genome for less than 5000 dollars, in less than one day, with one technician. This means that we have now data available in numbers we could only dream of a few years ago. This also means that we can now study things we could not study before. However, in order to cope with such amounts of data, we need to apply the latest techniques from computer science running on the fastest computers.

Question: Bioinformatics is a truly interdisciplinary science. Why is this?

Answer: Currently there is a tsunami of data generation. All this information needs to be compiled, stored, and analyzed in ways that are novel. Novel algorithms and software tools need to be developed, new statistics need to be applied, and new biological questions can be asked, which needs an interdisciplinary group of people closely working together. For instance, in our Multidisciplinary Research Partnership, we have representatives from 9 different departments and 5 different faculties, all of who bring their particular expertise to the MRP.

Question: How can bioinformatics change the world?

Answer: Bioinformatics has numerous application domains, and will for instance revolutionize the medical field, because it will make personalized medicine possible. Given that the genome (the blueprints of life) of everyone can (and will) get sequenced, we can start investigating why some persons are susceptible to certain diseases while others are not, and why a certain treatment works for some and not for others. In agriculture, we can for instance study why certain plants are more resistant to drought than others, an important issue regarding global warming and climate change. More fundamental questions in evolutionary biology, such as whether Neanderthals and the ancestor of modern humans ever had sex (the answer is yes) can only be addressed with bioinformatics or computational biology.

Question: Where do you see the contributions of a statistician to bioinformatics or why does bioinformatics need statisticians?

Answer: In genomics applications, we measure millions of features on each individual and we want to learn if they are linked to interesting biological processes: e.g. disease, crop yield, resistance to drought stress. The genetic code, however, varies from individual to individual as these sequences are copied from generation to generation and random mutations and rearrangements arise during the copying process. Most of this variation will not be linked to the studied biological process, e.g. it can be nonfunctional or linked to other phenotypes (eye color, length, …). On top of that, the measuring method also introduces technical variability, so when we reanalyze the same sample we will obtain slightly different results. If we ignore the biological and technical variability, we are bound to find many genomic patterns in the data just by chance. By using statistics, we can learn from data while quantifying, controlling and communicating uncertainty. This will allow us to put a degree of
belief that an observed genomic pattern is linked to a biological process and to balance the number of false positive and false negative results. Unfortunately, standard statistical methods cannot deal with the scale and complexity of today’s genomic data. Hence, novel statistical tools have to be developed (by N2N partner Lieven Clement), implemented as efficient computational algorithms (by N2N partner Jan Fostier) and their results have to be integrated with biological knowledge (by N2N partners Kathleen Marchal and Yves Van de Peer).

Question: What is the contribution of computer scientists to bioinformatics or why does bioinformatics need them?

Answer: The recent revolution in technology enables us to produce molecular data at a rate that we could only dream of 10 years ago. All this raw data needs to be stored, moved and processed, which is a non-trivial task, given the vast volumes of data that we’re talking about. All of this leads to some non-trivial engineering problems. Just to give you an idea: the raw sequence data for a single human genome amounts to several hundreds of gigabytes. It is no longer possible to analyze such amounts of data on a personal laptop. Therefore, we make use of supercomputers which are powerful enough to do this job. All of this is easier said than done, because special software needs to be developed that can actually make use of such supercomputer in an efficient way.

Question: What are the most important skills a bioinformatician needs to have?

Answer: First of all you need to be a generalist rather than a specialist. You need to know a bit of everything but nothing too much in detail. To give an example: wet lab scientists typically have a very detailed view on biology: biological systems have randomly evolved into emerging complex systems that can not be captured in a few rules. There are more exceptions than fixed rules in biology. Engineers on the other hand, model systems and these models depend on predefined rules. As a bioinformatician you need to keep both parties happy: a good formalization of a biological question should reduce the problem to a model that is mathematically tractable but that still captures the intricacies the biologist is interested in. Finding the right assumptions and simplifications builds on this generic knowledge. This generic knowledge is also key to the scientific intuition you need to have as a bioinformatician. With bioinformatics we can solve research questions that could not be addressed
before. There is so much data out there that when you integrate it all, you can tackle research questions that go far beyond what was accessible or could be dreamt of by a single person or even a single lab. The difficulty often is defining these novel research questions or hypothesis no one has ever thought of before. This again requires very good interdisciplinary knowledge on how the data was generated, what type of information does it contain, how can it be integrated etc.

Question: If Bioinformatics will become so prominent and is referred to as ‘the new biology’, how will this effect the more classical wet lab science?

Answer: Of course without data there is no bioinformatics. But there is indeed a tendency that increasingly, data generation becomes robotized or outsourced. This has a consequence that wet lab scientists have more time left to spend on the design of their experiment and will be confronted at a much earlier stage with the analysis of their data, and the problems related to this. What do you hope to get out of your data, how will you synthesize all these data, what is the hypothesis you want to formulate, and so on? So rather than focusing on a single gene, they will need to start thinking more globally, solving the bigger picture and that is what the term ‘new biology’ is referring to. This is now often considered the problem of the bioinformatician, but obviously, the wet lab scientist of the future will have to adopt at least some of those skills. So the distinction between a bioinformatician and a wet lab scientist (systems biologist) will become fuzzier and in the coming decades, we
expect that about one third of the people in the life sciences will be bioinformaticians or at least will use some sort of bioinformatics in their research. However, although genome hackers and number crunchers can learn a lot from the loads of data generated, wet lab work will always be necessary. Bioinformatics is also often about making predictions, but of course these still will need to be validated in the lab. On the other hand, for some specific fields such as evolutionary research, bioinformatics is often sufficient or even the only way to obtain results.

Question: Suppose I would like to become a bioinformatician, what do I need to study? Do I need to have master degrees in Biology and Mathematics and Computer Science?

Answer: Well, as a matter of fact, none of us studied “bioinformatics”. We all started in one field, let’s say Biology, and subsequently became more and more acquainted with statistics, informatics et cetera throughout our research. This is no ideal scenario, as our backgrounds are still fairly limited. Therefore we started a master’s program in Bioinformatics at Ghent University, which will allow Bachelor students from the different fields to obtain a very broad background, but without losing their own specialty. So, there is not one type of “bioinformatician”, but a whole range of people looking at bioinformatics from a different angle.

Question: You’re a young professor at the start of a hopefully long career in a booming field of science. In your wildest dreams, where do you think bioinformatics will be at the end of your career?

Answer: That’s of course quite hard to say, but if I look at medicine, I think we’ll not only have our personal genome sequencer but also our genome app. This should allow us to screen our body for disease each week and already suggest the best therapy. As all these profiles will be combined online, novel knowledge will be generated almost automatically through bioinformatics. In ecology, all individuals of endangered species can be sequenced on the spot, enabling us to regenerate the species if they would extinct after all. Breeding of crop species will be revolutionized by the instant availability and interpretation of the blueprint of each interesting plant...