#CHIBADAI Story

Open Databases, More Freedom: From Failed Experiments to Data Science Associate Professor, Institute for Advanced Academic Research/Graduate School of Medicine, Chiba University Tazro OHTA

#AI#Data Science
2026.01.20

Contents

Share

  • Share on Twitter
  • Share on Facebook
  • Share on LINE
  • Share with Hatena Bookmark

“My job is to collect and clean data,” says Associate Professor Tazro Ohta of Chiba University’s Institute for Advanced Academic Research, Graduate School of Medicine, and Data Science Core (DSC). As human genetic information has become easier to obtain as digitized data, analyzing it now requires advanced expertise and high-performance computing. By making his research products available as freely accessible open-source software and databases, Dr. Ohta is contributing to the development of data science. “I believe that data science needs a place where people can interact,” he told us, speaking about the importance and excitement of making databases open to the public.

Organizing data for researchers to easily analyze it

What kind of research are you doing in data science?

My job is to organize databases in the life sciences, such as those used in genomics, and make them easy for other researchers to use.

Data science is often described as having three pillars: computational resources, data structures, and algorithms. I mainly focus on the first two. I work on the computing environments and software needed to organize and use biomedical data. I also develop databases and tools that make it easier for researchers to analyze their data. In a sense, I help build the foundations of data science.

Before coming to Chiba University, I worked at the Database Center for Life Science (DBCLS), part of the Research Organization of Information and Systems. There, I helped organize genome data and develop systems that balance personal information protection—including human genome data—with the need to make data usable for research. My field is called ‘bioinformatics*,’ which combines life sciences with information science. Methods to handle data obtained from life science experiments. Research Organization of Information and Systems. There, I helped organize genome data and develop systems that balance personal information protection—including human genome data—with the need to make data usable for research. My field is called ‘bioinformatics,’ which combines life sciences with information science and develops methods to handle data obtained from biological experiments.

*A field that combines life sciences with information science. Using computers, we can analyze DNA sequences and protein structures, uncover how life works, and even support drug development.

Has your research style changed since you took up your post at Chiba University?

Yes, I now work much more directly with medical data. For example, in a collaborative project involving the Department of Clinical Psychiatry at Chiba University Hospital, NTT Precision Medicine Corporation, and our AI Medicine Group, we are trying to uncover how differences in patients’ genomes influence the effectiveness of certain drugs. Throughout this process, I carefully consider how to properly handle and analyze sensitive data. Managing medical data requires not only analytical expertise but also a solid understanding of ethical guidelines and information security related to clinical research. I’m heavily involved in data management to ensure that the research progresses smoothly while ensuring robust data protection.

Another major change is that I now have a clearer sense of researchers who are actually using the data and how they are using it. In my previous job, I mainly worked behind the scenes developing databases, so I rarely had the chance to see how those databases were used in real clinical or research settings. I had long wanted more opportunities to meet the people using the data and understand what they really needed. I joined Chiba University with that awareness, and now I work directly alongside professors who use the data, gaining firsthand insight into the issues they face. As someone who builds databases, it’s very rewarding to see the data being used effectively in the field.

In large-scale data analysis, there is also a growing recognition that projects cannot move forward without informatics expertise. Instead of playing a purely supporting role, I now work more collaboratively and contribute as an equal partner in the research. I realized that I thrive in team-based environments where people with diverse expertise work together toward a common goal, rather than working alone in isolation.

The fun of open science lies in its unexpected applications

Dr. Ohta, together with Professors Zhaonan Zou and Shinya Oki of Kumamoto University, developed the epigenome-integrated database “ChIP-Atlas,” which underwent a major update in 2024. What was included in this update?

The epigenome refers to mechanisms that regulate gene function without altering the DNA base sequence. When researchers around the world conduct epigenomic studies, the resulting data are made publicly available once their papers are published, allowing other researchers to reuse them freely. In practice, however, handling the published data isn’t so simple. It requires advanced analytical techniques and significant computing resources, which not everyone has access to.

Another challenge is the inconsistency of metadata—the information describing the background of the data. For example, even something as simple as ‘male’ can appear in different forms, such as ‘male,’ ’m,’ or ‘M,’ depending on the dataset. If you search only for ‘male,’ you may unintentionally miss relevant datasets.

ChIP-Atlas collects publicly available epigenomic data, standardizes the metadata, and makes the integrated dataset accessible through a web browser. In this latest update, we added an Annotation Track*, which displays information on genomic regulatory regions, chromosome structure, and mutations linked to diseases and disease predispositions. We also introduced Diff Analysis*, which allows users to easily compare data between two groups. With these new features, users can simply select the conditions they want to compare in their browser and immediately visualize the results without requiring specialized knowledge or complex procedures.

*Annotation Track: A feature that integrates data from external databases, in addition to data within the ChIP-Atlas, to visually display important genomic regions and chromosome structures.
*Diff Analysis: An analytical method that compares differences between data from different conditions or groups.

It seems like it could be used to elucidate diseases and develop new drugs.

 I don’t have any specific expectations for how our dataset will be used. I always love to see people use it in ways I never imagined.

When ChIP-Atlas was first released to the public in 2015, I thought it would mainly be used in basic research fields, such as embryology, which is Professor Oki’s specialty. At the time, I was working on other projects, so I created it as a side project over one summer to make it easy for Professor Oki to use. I never imagined that it would eventually become so widely used, including for analyzing patient genomic data in clinical research.

I hope that people will continue to use it for a wide variety of purposes, without any limitation.

I believe that data integration will include data from overseas. Are there any challenges?

Open science has taken root around the world, especially in the field of genome science, but regulations regarding ethics and the handling of personal information vary from country to country. What’s important is respecting each country’s contributions and values. We need open discussions within international communities about data sharing that balances accessibility with ethical and legal requirements.

In Europe and the United States, many communities are actively discussing these issues. However, until recently, there have been few examples of such discussions in Asia. To address this, we recently gathered stakeholders from national projects across Asia to hold a discussion on the topic at the event called MedHackathon Asia. We’re currently working to publish a statement based on the discussions from the event.

The most important thing for data science is having a community for open discussion

Dr. Ohta, how did you end up in your current research field?

As an undergraduate, I majored in plant pathology and studied fungi and viruses that infect plants. But to be honest, I was very bad at experimental work! Things became even worse when I entered graduate school, because my experiments rarely went as planned.

Around that time, I met Professor Hidemasa Bono—now at Hiroshima University—who was looking for student research assistants at the DBCLS. He recruited me, and that was my first real encounter with bioinformatics. I was amazed to find that, although I struggled with the reproducibility of wet lab experiments, I could repeat data analysis and obtain consistent results when working on a computer. Of course, some settings, such as different software versions, can lead to different outputs, but compared with the challenges of my wet lab experiments, being able to try things repeatedly with ease for free felt remarkably empowering.

The DBCLS also had students from information science, and it turned out to be an exceptional environment. I enjoyed talking with them about life sciences, and they taught me programming. After completing my master’s degree, I joined the center as a technician and continued my research and development there.

I heard that you are involved in organizing a variety of events. What are they?

One long-running event I help organize is the DBCLS BioHackathon. Although it’s called a hackathon, it isn’t a competitive event. Instead, it’s an intensive week-long workshop where people involved in life science data analysis come together for a whole week to discuss research topics and develop software.

I believe that creating a community for intensive and continuous interaction is essential in data science, especially with people from other geographical areas or from different research fields with whom you wouldn’t usually work. Research moves much faster when you know the people you need. And when such an open community exists, anyone can join, make friends, and grow their network.

At the Data Science Core (DSC) at Chiba University, where I serve as deputy director, we also hosted a four-day data science hackathon. Connecting researchers across disciplines brings tremendous benefits to research, and having a community where that can happen is a real asset for the university. In the long run, creating and maintaining such spaces is part of doing science, and I feel that having this community helps advance my own work as well.

Build a network of people and expand your research possibilities

Finally, do you have a message for students and early-stage researchers?

My own career path has been anything but conventional, so I often tell my students not to feel constrained by predefined career paths when thinking about their future. I encourage them to stay open-minded and follow the path that feels right to them.

For early-stage researchers, I also recommend getting involved in a variety of communities. Research styles differ from person to person, but meeting many people, building networks, and working collaboratively can greatly expand the scope of their research.

● ● Off Topic ● ●

 

Have you ever had the experience of analyzing particularly unusual data?

 
 

Yes, there was a fun one. In a DNA-sequencing research community I’m part of, we once had everyone collect bacteria from cherry-blossom petals all across Japan and analyze them together. We jokingly called it our “Flowering Party Metagenomics” project.

 
 

That’s such a charming name!

 
 

Right? And we’ve done all sorts of quirky things, even gut bacteria. The fun part is that nothing stays in one box for bioinformatics. There’s always something unexpected to dive into – for better or worse.

 

Recommend

このページのトップへ戻ります