Data sources for data-driven life science

This is a list of data sources available for data-driven life science in Sweden. Use the Search function to find sources by name/keywords (often more granular than other filters). A resource type of Repository indicates that at least some users can both submit and access data, whilst Data Source indicates that data can only be accessed, not submitted. It is also possible to filter data sources according to Data Type and Scientific Area.

Get in touch about adding further data sources.

Search
Resource Type
Data Type
Scientific area
AI4Life BioImage Model Zoo
A collaborative effort to bring artificial intelligence models to the bioimaging community.
Type: Repository
AIDA Dataset Register
Datasets shared on the AIDA Data Hub. Data is ideal for use as training data as it has considerable annotation.
Type: Data source
Alpha Fold Protein Structure Database
Contains over 200 million predictions of 3D protein structures based on amino acid sequence. The predictions were generated using AlphaFold, an AI system developed by DeepMind.
Type: Data source
Array Express
ArrayExpress is an archive of functional genomics data, such as gene expression or DNA methylation profiling data. The data is open for reuse by the research community.
Type: Repository
Available code/data for COVID-19 from Sweden.
Data and code shared in papers on COVID-19 that include at least one author affiliated with a Swedish research organisation.
Type: Data source
BacDive
The largest database for standardised bacterial phenotypic data. It includes >600 data fields, e.g. taxonomy, molecular data, and morphology. Integrates data from many sources & makes it freely accessible.
Type: Data source
BBMRI-ERIC
A biobanking infrastructure that promotes biomedical research. Samples can be accessed with appropriate permissions. Services, software, tools, and support with ELSI are also available.
Type: Data source
BioImage archive
A free, publicly available online resource that stores and distributes biological images. It accepts submissions of data from any imaging modality.
Type: Repository
BioModels
A repository of mathematical models related to biological and biomedical system. It contains many literature-based physiologically and pharmaceutically relevant mechanistic models.
Type: Repository
BioSamples
BioSamples stores and supplies metadata for a large number of biological samples used in research and development. The samples are either 'reference' samples or samples used in an assay database.
Type: Repository
BioStudies
Includes descriptions of biological studies, and links data from them to other databases and data that does not fit in structured archives. Accepts a wide range of studies, and supplementary information.
Type: Repository
BRENDA
A database containing information about enzymes and enzyme-ligands from all taxonomic groups. Data is extracted from primary literature, and integrated with external data and prediction algorithms.
Type: Data source
Cancer Cell Line Encyclopedia
Genetic characterisation of a large panel of human cancer cell lines. It provides access to analyses and visualisations of DNA copy number, mRNA expression, mutation data, and more.
Type: Data source
Cancer Genome Atlas
Has >2.5 petabytes of genomic, epigenomic, transcriptomic, and proteomic data from >10k patients across 33 cancer types. Raw data is under controlled access, but derived data is openly accessible.
Type: Data source
Cancer Imaging Archive
Collections of data that are typically patient cohorts related by a common disease, image modality, or type. Supporting data for the images are also available, e.g. patient outcomes & treatment details.
Type: Repository
CATH/Gene3D
A database containing information on evolutionary relationships between protein domains. It classifies protein domains from Protein Data Bank in a hierarchical manner.
Type: Data source
cBioPortal
An open-access resource enabling the interactive exploration of multidimensional cancer genomics datasets from more than 5,000 tumour samples.
Type: Data source
Cellosaurus
A knowledge resource aiming to describe all cell lines used in biomedical research. Users can browse the cell lines by group.
Type: Data source
ChEMBL
A manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity, and genomic data to enable the translation of information into effective new drugs.
Type: Data source
Chemical Entities of Biological Interest
A dictionary of molecular entities (i.e. constitutionally or isotopically distinct atoms, molecules, ions, ion pairs, radicals, etc.). It is focused in particular on small chemical compounds.
Type: Data source
ClinVar
The primary site for the deposition and retrieval of variant data and annotations from individual submitters. It helps with explorations of how genomic variation is related to human health.
Type: Repository
COViMAPP
A meta-analysis resource showing changes in plasma proteins that occur during COVID-19.
Type: Data source
DECIPHER
An interactive database that includes a suite of tools related to the interpretation of genomic variants. It contains data from around 50k patients that have given consent for broad data sharing.
Type: Repository
Dryad Digital Repository
A general purpose repository for data underlying scientific and medical publications. Over 100,000 datasets have been made available.
Type: Repository
ECDC Avian Influenza
Contains information related to avian influenza, including a distribution map of confirmed human cases, as well as guidelines for surveillance.
Type: Data source
ECDC Tuberculosis
Contains open datasets related to tuberculosis, including an atlas showing the distribution of reported cases across Europe.
Type: Data source
Electron Microscopy Data Bank
A public repository for cryoEM and representative tomograms of macromolecular complexes and subcellular structures. Multiple different techniques are covered.
Type: Repository
Electron Microscopy Public Image Archive
A public resource for raw electron microscopy images. Includes images underpinning 3D cryo-EM maps and tomograms, as well as 3D datasets generated using volume EM techniques and X-ray tomography.
Type: Repository
Ensembl
A browser for vertebrate genomes. Supports research in evolution, transcriptional regulation, and sequence variation. It is part of the Ensembl project, which also includes Ensembl Genomes.
Type: Data source
Ensembl Genomes
A browser for invertebrate genomes that enables comparative analysis, data mining, and visualisation. Information is available for viruses, metazoa, protists, plants, fungi, and bacteria.
Type: Data source
European Chemical Biology Database
The central hub for data generated within the EU-OPENSCREEN network. The network collaborates with external users to develop novel molecular tool compounds and early therapeautic candidate molecules.
Type: Data source
European Genome-phenome Archive
A service for permanently archive and share personally identifiable genetic, phenotypic, and clinical data generated for biomedical research or for research-focused healthcare systems.
Type: Repository
European Nucleotide Archive
An open, global data resource for nucleotide sequences, including raw sequence data, alignments, and assemblies. The data is shared openly.
Type: Repository
FEGA Sweden
Secure archiving and sharing of genetic and phenotypic data resulting from Swedish biomedical research projects. Access to the data only will be granted after a formal application procedure.
Type: Repository
Gemma
Gemma provides data, experimental design annotations, and differential expression analysis results for microarray and RNA-seq experiments. Gemma contains data from thousands of public studies.
Type: Data source
Gene Expression Omnibus
An international public repository that archives and freely distributes microarray, next-generation sequencing, and other high-throughput functional genomics data submitted by the research community.
Type: Repository
GENEVESTIGATOR
Enables the simultaneous analysis of thousands of studies to find genes specifically related to a drug or disease. Free to use for those working in academia.
Type: Data source
Genomic Data Commons
A unified repository and cancer knowledge base that enables data sharing across cancer genomic studies in support of precision medicine.
Type: Repository
GISAID
A sequencing data repository on emerging infectious diseases, including influenza and SARS-CoV-2. Requires registration to access the data, and there are multiple conditions regarding data sharing.
Type: Repository
Global Biodiversity Information Facility
An international network and data infrastructure aimed at providing open access to data about all types of life on Earth. It supports the publication of four classes of datasets using widely accepted biodiversity data standards.
Type: Data source
GWAS Catalog
The GWAS Catalog is a curated collection of human genome-wide association studies. It is results from a collaboration between EMBL-EBI and NHGRI.
Type: Repository
HUGO Gene Nomenclature Committee
A resource for approved human gene nomenclature. It is responsible for approving unique symbols and names for human loci. It contains thousands of gene symbols and names, and many gene families and sets.
Type: Data source
Human Developmental Cell Atlas
A molecular atlas of human prenatal development, describing every cell type in detail and showing their spatial and temporal distributions in three dimensions.
Type: Data source
Human Protein Atlas
The Human Protein Atlas aims to map all the human proteins in cells, tissues, and organs using an integration of various omics and imaging technologies
Type: Data source
Infectious Disease Data Observatory
Brings together multiple types of infectious disease data to improve the diagnosis and treatment of patients in the clinic. Access to the data held in the IDDO data repository has to be requested.
Type: Data source
IntAct Molecular Interaction Database
An open source database system and set of analysis tools for molecular interaction data. Interactions are derived from literature curation or direct user submission.
Type: Repository
InterPro
A collation of protein data from multiple sources. It enables the functional analysis of proteins by classifying them into families, and also allows for the prediction of domains and important sites.
Type: Data source
LIPID MAPS
An open, systematic, and standardised resource for lipidomics research. It leads the field of lipid curation, classification, and nomenclature. It provides open access to multiple tools and data.
Type: Repository
MassIVE
A community resource developed by the NIH-funded Center for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data.
Type: Repository
Mendeley Data
A general purpose, cloud-based repository intended for research data. Data can be shared privately or publicly, and are assigned unique identifiers.
Type: Repository
Metabolic Atlas
Metabolic Atlas is a web platform integrating open-source genome scale metabolic models (GEMs) for easy browsing and analysis.
Type: Data source
MetaboLights
Contains data from metabolics experiments, and derived information from them. It contains multiple types of data from many species and techniques. It and is the recommended repository for many journals.
Type: Repository
MGnify
Facilitates the assembly, analysis, and archiving of microbiome-derived nucleic acid sequences. Provides taxonomic assignments and functional annotations covering multiple data types from many environments.
Type: Data source
ModelArchive
Includes theoretical models of macromolecular structure, together with information about model coordinates, and details about assumptions, parameters, and constraints applied to simulations.
Type: Repository
Molecular Interaction Database
A public, open source database detailing protein-protein interactions that are experimentally verified. The interactions have been mined from scientific literature by expert curators.
Type: Data source
National Biodiversity Network Atlas
The NBN Atlas is the UK’s largest repository of publicly available biodiversity data. It also provides a platform to engage, educate and inform people about the natural world.
Type: Repository
National Board of Health and Welfare Statistical Databases
A set of databases related to health and welfare from the National Board of Health and Welfare. This Swedish governmental agency is responsible for ensuring high-quality health and social care.
Type: Data source
Orphadata Science
Provides access to high-quality datasets related to rare diseases and orphan drugs. The data is available in a reusable and computable format.
Type: Data source
PomBase
A knowledge base for the model organism Schizosaccharomyces pombe, which is a fission yeast. It enables exploration of genetic, phenotypic, and molecular data from multiple sources.
Type: Data source
Protein Data Bank
Includes over 170k 3D protein structures, nucleic acids, and complex assemblies available for download. Allows users to explore computational structural models alongside data from AlphaFold DB and ModelArchive.
Type: Repository
Protein Data Bank Europe
A repository in which data on biological macromolecular structures can be deposited. A founding member of the worldwide Protein Data Bank (PDB), and works to collate, maintain, and provide access to PDB.
Type: Data source
ProteomeXchange
A single point of submission for mass spectrometry proteomics data. It includes a globally coordinated data submission and dissemination pipeline that involves the main proteomics repositories.
Type: Repository
Proteomic Data Commons
The PDC represents the NCI's largest public repository of comprehensive proteogenomic tumor datasets. It is essentially a Proteogenomic Cancer Atlas.
Type: Repository
PRoteomics IDEntifications Archive database
A repository for mass spectrometry (MS) proteomics data. Includes protein and peptide identifications, with corresponding expression values, post-translational modifications, and supporting MS evidence.
Type: Repository
Public Health Agency of Sweden
The Public Health Agency of Sweden (Folkhälsomyndigheten, FoHM) is responsible for issues related to public health. It collates and collects data on diseases that pose a threat to public health.
Type: Data source
REACTOME
A graphical map of known biological processes and pathways. Provides tools to visualise, interpret, and analyse pathways to support research. Used for genome analysis, modelling, and systems biology.
Type: Data source
Registers in Sweden
A list of Swedish registry-based data sources, maintained by the Swedish Research Council. Users can link data from multiple registers, including healthcare records, biobanks, and research data.
Type: Data source
Resistance bank
The resistance bank is an open access repository for surveys and maps of antimicrobial resistance (AMR) in animals. It focuses particularly on low and middle-income countries.
Type: Repository
Rhea
A knowledge base of chemical and transport reactions of biological interest curated by experts. It is the standard for enzyme and transporter annotation in UniProtKB.
Type: Data source
SCAPIS database
Data from over 30k patients from across Sweden to study cardiovascular disease (CVD) and chronic obstructive pulmonary disease (COPD). Biobanked blood and DNA analysed in collaboration with SciLifeLab.
Type: Data source
SciLifeLab Data Repository
SciLifeLab’s institutional instance of FigShare. It is an open, general purpose repository that includes many data types. Data can be submitted by those working in Swedish life science research.
Type: Repository
Sequence Read Archive
A bioinformatics database that provides a public repository for DNA sequencing data. In particular, short reads generated using high‐throughput sequencing (typically less than 1000 base pairs long).
Type: Repository
SILVA
A comprehensive online resource for quality-checked ribosomal RNA (rRNA) data. Includes aligned small and large subunits of rRNA from all three domains of life (archea, bacteria, eukarya).
Type: Data source
Skin Science Foundation Bioinformatics Hub
The hub includes a web platform through which users can gain open access to curated transcriptomic data from multiple inflammatory skin diseases.
Type: Data source
Statistics Sweden
Statistics Sweden has a large amount of data available for everyone to use free of charge under open licences. It includes geodata, public data, and data from different sectors.
Type: Data source
STRING
Database of known and predicted protein-protein interactions, including both direct and indirect associations. Data is aggregated from multiple sources, e.g. existing databases and computer modelling.
Type: Data source
SubCellBarCode
A collection of resources centred around the subcellular localisation of proteins.
Type: Data source
Sveriges Dataportal
Organisations can share and download data related to data-driven development and innovation. It is created by Myndigheten for Digital Förvaltning (DIGG). Multiple types of data are available.
Type: Data source
Swedish Biodiversity Data infrastructure
The Swedish access point for open biodiversity data. It includes many powerful tools and services that allow you to search, download, analyse, and visualise the data.
Type: Repository
Swedish COVID-19 Sample Collection Database
The Swedish COVID-19 Sample Collection Database contains information about COVID-19 samples in biobanks across Sweden and how to access them, along with metadata about the samples.
Type: Data source
Swedish National Data Service
A service for Swedish research in humanities, social sciences, and medicine. It includes multiple collections of research data from Sweden, as well as links to international data.
Type: Repository
SweFreq
A website developed to make genomic datasets more findable and accessible
Type: Data source
SWISS-MODEL
An automated protein structure homology server. Protein sequences for 13 core species are modelled weekly. The aim of this database is to make protein modelling is accessible to researchers worldwide.
Type: Data source
UK Biobank
A large-scale biomedical database including biological and clinical data from 500K participants in the UK. Applications to access the data are welcomed from researchers anywhere in the world.
Type: Data source
Uniprot
A comprehensive resource for protein sequence and annotation data. UniProt is a collaboration between EMBL-EBI, the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR).
Type: Repository
VEuPathoDB
Aggregation of data from multiple genomic and other large datasets related to eukaryotic pathogens and invertebrate disease vectors. Includes data on many emerging and re-emerging infectious disease.
Type: Data source