GDA Extraction

This repository contains the source code to train and test Biomedical Relation Extraction (BioRE) models on the TBGA dataset. TBGA is a large-scale, semi-automatically annotated dataset for Gene-Disease Association (GDA) extraction. In addition, the repository contains scripts to compute dataset statistics and to convert other BioRE datasets in the required format.

It allows us to gather, clean, and ingest relevant Gene-Disease Associations data to be used for training and testing Relation Extraction methods.

The library is available at its GitHub repository.

TBGA is available at its Zenodo page.