https://en.wikipedia.org/wiki/List_of_file_formats#Biology
http://www.nationalarchives.gov.uk/pronom/
https://oclc-research.github.io/infoURI-Frozen/info-uri.info/ListRecords.html
https://curlie.org/Computers/Data_Formats/
Protein | “File” | ["*.pdb"] | 3D protein structure. |
---|---|---|---|
Small molecule | “File” | [".sdf"] or [".sdf", "*.mol2"] | 3D small molecule structures. We generally recommend using .sdf files. |
Small molecule SMILES | “File” | ["*.smi"] | SMILES (Simplified Molecular Input Line Entry Specification) describes the structure of molecules using short ASCII strings. |
Peptide sequence (e.g. amino acid chains such as proteins) | “File” | ["*.fasta"] | Common format for sequencing data. |
Nucleotide sequence (e.g. DNA, RNA) | “File” | ["*.fasta"] | As above. |
Sequencing raw data | “File” | ["*.fastq"] | FASTQ is an extension of FASTA. It stores the biological sequence and the corresponding quality scores. Often this data comes from 2nd generation sequencing machines from Illumina. |
Nanopore sequencing raw data | “File” | ["*.fast5"] | The standard sequencing output for Oxford Nanopore sequencers such as the MinION. Based on the HDF5 standard. Unlike .fasta and .fastq, .fast5 is binary. |