Teach for Bioinformatics

As I have started writing about bioinformatics, I should explain first what Bioinformatics actually is! First of all, Bioinformatics is an interdisciplinary field of science which simply helps in understanding and analyzing biological data. As I mentioned it is an interdisciplinary field, it combines biology, computer science, information technology, biochemistry, biophysics, mathematics and statistics to analyze biological data. Bioinformaticians develop methods and software tools to analyze and interpret data. The databases are also important for analyzing data. As raw data came from various sources e.g. sequences, molecular structures which may be empirical or predicted data. A large amount of data is overflowed, to maintain and analyze these data’s various specific databases are made available for a particular task.

The term “Bioinformatics”: The term bioinformatics meaning has been altered and its not the same meaning when it was coined for the first time. In 1970 Paulien Hogeweg and Ben Hesper coined this term to refer the study of information processes in biotic systems. The director of NCBI (National Center for Biotechnology Information) has called Margaret Oakley Dayhoff the mother and father of bioinformatics.

Importance of Bioinformatics: Bioinformatics is an important part of biology at present day. Bioinformatics techniques has become necessary in daily experiments done in the field of biology. Extraction of data from a simple image to a complex protein structure requires bioinformatic tools. The field of bioinformatics experienced explosive growth starting from the mid-1990s, driven largely by the Human Genome Project and by rapid advances in DNA sequencing technology. Major work in the field of bioinformatics include data mining, machine learning, gene prediction or gene finding, genome assembly, drug design, drug discovery, sequence alignment, protein structure alignment, protein structure prediction, gene expression and protein-protein interaction prediction, phylogenetic analysis etc.

Approaching towards the programming part of bioinformatics. At first the biological problem is analysed using problem solving techniques. Different algorithms are developed and then the code is made ready for execution. Using these programs or software which were done manually previously, takes less time and the results are also precise in-silico. Algorithms are based on graph theory, artificial intelligence, soft computing, data mining, image processing, and computer simulation. The different software packages developed for bioinformatics are like BioPerl, BioJava, EMBOSS, BioPython, NETBio, Bioclipse, etc.

Sequence analysis is an important part of bioinformatics, where different sequences are analysed to get meaningful data, this data can be genes encoding protein, RNA genes, regulatory genes etc. Different sequences can be compared to see the similarity between or among them. Comparing two or more sequences can show similarity and relationships between different species, hence it can be phylogenetic analysis too. Sequence assembly, genome annotation, different sequencing techniques, comparative genomics, analysis of mutations etc. are important aspects of sequence analysis.

Structural bioinformatics is also an emerging field of bioinformatics. Structure prediction is another important application. structure of a protein plays a vital role in knowing the proper function of a protein. Analysis of protein structure can be done in three ways- homology based, de-novo and virtual screening. Homology modelling is based on a previously known structure and function. If two proteins are homologous one can share another’s function too. De-novo means completely new; i.e. from the scratch. Virtual screening is the quantitative structure activity relationship model; e.g. simulation of ligand binding studies and in-silico mutagenesis studies.

Network analysis and systems biology are two related fields. Network analysis is basically done on biological networks like protein-protein interaction networks. It can be structural or functional analysis of network or both of it. Systems biology is rather involving cellular subsystems. It is to analyse and visualise the complex connections of these cellular processes. It comprises metabolism and signal transduction pathways and gene regulatory networks.

Using bioinformatics tools it is possible to store digital data (encode)into a nucleotide sequence. The amazing feature of storing digital data into nucleotide sequence is that it can store approximately 5.5 petabytes of data in just 1 gram of nucleotide sequence.Firstly the digital data is to be converted to a nucleotide sequence . Then using NGS(Next Generation Sequencing) nucleotide sequence is synthesized . But the problem is at the time of decoding the sequence data back to digital data. Because to decode a single file it is necessary to decode the whole DNA sequence back to digital data. So to reduce time consumption and money something new had to be introduced.

So the authors of this paper (DOI:https://doi.org/10.1038/nbt.4079) attempted something new. They used PCR(Polymerase Chain Reaction) to randomly access data from the whole DNA sequence. They used particular primer for each and every file i.e. for each and every file they used identical pair of PCR primers(Forward and Reverse). So whichever file we need ,just the pair of primers will bring up the nucleotide sequence of the file and we can easily decode it back to the digital data.

P.S.:Digital data means any type of files like document,image,audio,video etc.

Teach for Bioinformatics

Friday, 18 January 2019

What is BIOINFORMATICS!

Sunday, 23 September 2018

Random access in large scale DNA data storage.

What is BIOINFORMATICS!