Where to get fast5 files

2025-05-19
blog

These files encapsulate raw signal data generated during nanopore sequencing, enabling researchers to perform basecalling, detect methylation patterns, and develop custom bioinformatics workflows.

FAST5 Files

Before diving into where to find FAST5 files, it’s important to understand what they are. FAST5 is an HDF5-based file format developed by ONT. Each file stores signal-level data produced as DNA or RNA molecules pass through nanopores during sequencing. Unlike FASTQ or BAM files, which store processed reads and alignments, FAST5 files contain raw signal traces and metadata, providing access to the most fundamental level of sequencing data.

There are two primary formats: single-FAST5 and multi-FAST5. In single-FAST5 format, each read is stored in a separate file. In multi-FAST5. multiple reads are bundled into a single file, improving efficiency in storage and data handling.

1. Oxford Nanopore Technologies (ONT) Official Channels

If you’re looking for authentic and comprehensive FAST5 datasets, ONT is the primary and most reliable source. They offer access through the following means:

a. MinION and Other ONT Devices

The most direct way to obtain FAST5 files is by running a sequencing experiment using ONT platforms such as MinION, GridION, or PromethION. When you initiate a sequencing run with these devices, they output FAST5 files by default. The ONT MinKNOW software handles data acquisition, and the sequencing summary and raw data files are automatically stored in the specified output directories.

b. EPI2ME Labs and ONT Developer Resources

ONT offers training datasets and interactive tutorials via EPI2ME Labs, which are ideal for beginners. These datasets include FAST5 files along with other formats and come with guides for analysis. While not as extensive as a full experimental run, these datasets are curated to introduce users to typical challenges and analysis workflows in nanopore sequencing.

c. ONT Community Forums

The ONT Community (https://community.nanoporetech.com/) often hosts shared datasets by users or developers participating in beta tests or educational initiatives. While access might be limited to members, it’s worth joining if you’re serious about nanopore sequencing.

2. Public Repositories and Data Archives

For users who do not have access to an ONT device, public repositories are a goldmine. These repositories offer real-world sequencing datasets that include FAST5 files.

a. European Nucleotide Archive (ENA)

ENA is one of the largest genomic data archives and includes datasets from ONT sequencers. It supports downloads of associated FAST5 files if the data submitter has made them available. Users can search by accession numbers, organism, or sequencing platform.

Example:

The “human-genome” dataset ONT released includes thousands of FAST5 files hosted on AWS

Use AWS CLI or S3 browsers to download the datasets directly

3. GitHub Repositories and Bioinformatics Projects

Although not as common as official archives, GitHub is another useful source for example FAST5 files, especially from bioinformatics researchers developing open-source tools. These datasets are usually small (for testing purposes) but can be extremely useful when validating software.

Some projects that have included FAST5 datasets:

nanopolish

bonito

megalodon

deepnano-blitz

Be cautious: these are not suitable for full-scale analyses but can help you get a feel for FAST5 file structures.

4. Academic Institutions and Research Labs

Many universities that use nanopore technology for research make datasets available via institutional repositories or data-sharing portals. These datasets are often linked in the “Data Availability” section of research articles.

If a paper mentions FAST5 usage:

Check supplementary materials or contact the authors

Visit associated lab websites (e.g., genomics labs at MIT, Oxford, UCSC)

Explore institutional data repositories such as Harvard Dataverse or Stanford Digital Repository

Some universities also host courses in genomics that offer hands-on datasets, including FAST5 files, for registered students.

5. Cloud-Based Analysis Platforms

Platforms such as DNAnexus, Terra.bio, and Galaxy occasionally host nanopore datasets as part of training modules or shared user workspaces. While you may need to create an account or request access, these platforms often provide access to sample FAST5 data preloaded in cloud environments.

Benefits of cloud-hosted FAST5 data:

No local storage needed

Integrated analysis tools (e.g., Guppy, Minimap2. Nanopolish)

Ideal for testing workflows or teaching bioinformatics

6. Workshops, Webinars, and Conferences

FAST5 datasets are frequently shared during nanopore-focused workshops and webinars. Oxford Nanopore itself hosts events like Nanopore Community Meetings, where participants receive curated datasets for exercises. These can include full sequencing runs or subsets designed for teaching specific concepts such as basecalling or methylation calling.

Search platforms:

ONT events page

Academic conference websites

YouTube channels from university seminars

Often, datasets linked to these events are only temporarily available, so it’s best to download them promptly if offered.

7. Educational Courses and MOOCs

Massive Open Online Courses (MOOCs) focused on genomics, offered by platforms like Coursera, edX, or FutureLearn, may include access to nanopore datasets. Courses affiliated with ONT or large research projects may provide access to sample FAST5 files for lab sessions.

Benefits include:

Structured learning around the data

Access to tutors or community forums

Exposure to real-world sequencing challenges

8. Third-Party Blogs and Community Portals

Experienced users and independent researchers sometimes share test FAST5 files on blogs or discussion platforms such as Reddit, Stack Overflow (Bioinformatics section), or BioStars.

Examples:

A blog post comparing Guppy vs. Bonito might include download links

GitHub Gists with small FAST5 bundles for testing purposes

Bioinformatics Q&A threads where users request and share test files

While convenient, it’s important to verify that these files are ethically sourced and comply with any licensing or privacy restrictions.

9. Simulated FAST5 Files

In cases where you can’t obtain real FAST5 data, simulated datasets can be generated using tools like DeepSimulator or NanoSim. These synthetic files mimic the structure and signal patterns of actual nanopore reads, useful for testing algorithms or training AI models.

Advantages:

Customizable parameters (error rate, genome, signal noise)

No ethical concerns about patient data

Great for reproducibility in scientific software development

10. Direct Collaboration with Labs or Institutions

If your work requires access to specific types of FAST5 files (e.g., from a rare organism or particular tissue), reaching out directly to research labs or data owners is often fruitful. Researchers are usually open to collaboration, especially if you’re contributing to open science or academic research.

Tips for success:

Be clear about your purpose and intended use

Offer to cite or acknowledge the data provider

Use institutional email or official channels

Many datasets are not publicly shared simply due to size constraints or lack of hosting options—not because of unwillingness.

Conclusion

FAST5 files are essential for anyone working with Oxford Nanopore sequencing data, whether your goal is to explore signal-level analysis, optimize basecalling, or develop machine learning models in genomics. Fortunately, there is a wide array of sources—from official ONT platforms to public archives and academic repositories—that offer access to real or simulated FAST5 files.

Post Views: 239

About us and this blog

Panda Assistant is built on the latest data recovery algorithms, ensuring that no file is too damaged, too lost, or too corrupted to be recovered.

Free download

Request a free quote

We believe that data recovery shouldn’t be a daunting task. That’s why we’ve designed Panda Assistant to be as easy to use as it is powerful. With a few clicks, you can initiate a scan, preview recoverable files, and restore your data all within a matter of minutes.

Free download