Flowio
Parse Flow Cytometry Standard (FCS) files v2.0–3.1 and extract events/metadata for preprocessing workflows (e.g., when you need NumPy arrays, channel info, or CSV/DataFrame export from cytometry files).
SKILL.md
When to Use
- You need to read FCS v2.0/3.0/3.1 files and extract event matrices for downstream preprocessing.
- You want to inspect or validate FCS metadata (TEXT segment) without loading event data (memory-efficient parsing).
- You need channel definitions (PnN/PnS), ranges (PnR), and automatic identification of scatter/fluorescence/time channels.
- You need to handle problematic FCS files with offset inconsistencies or multi-dataset content.
- You want to export cytometry events to CSV/Pandas DataFrame or write new/modified FCS files.
Key Features
- FCS parsing (v2.0–3.1): Reads HEADER/TEXT/DATA and optional ANALYSIS segments.
- Event extraction to NumPy: Returns event data as
ndarraywith shape(events, channels). - Optional preprocessing: Applies standard FCS transformations (gain/log/time scaling) when enabled.
- Metadata access: Exposes TEXT keywords and common instrument/acquisition fields.
- Channel utilities: Provides PnN/PnS labels, ranges, and indices for scatter/fluorescence/time channels.
- Robust parsing options: Flags for offset discrepancy handling and null-channel exclusion.
- Multi-dataset support: Detects and reads files containing multiple datasets.
- FCS writing: Create new FCS files from arrays and optionally preserve/override metadata.
Dependencies
python >= 3.9flowio(install via pip/uv; version depends on your environment)- Example-only:
numpy >= 1.20pandas >= 1.5
Example Usage
"""
End-to-end example:
1) Read an FCS file (metadata + events)
2) Convert to a Pandas DataFrame and export CSV
3) Filter events and write a new FCS file
4) Handle multi-dataset files
"""
from pathlib import Path
import numpy as np
import pandas as pd
from flowio import (
FlowData,
create_fcs,
read_multiple_data_sets,
MultipleDataSetsError,
FCSParsingError,
DataOffsetDiscrepancyError,
)
FCS_PATH = "sample.fcs"
def read_fcs_safely(path: str) -> FlowData:
try:
return FlowData(path)
except DataOffsetDiscrepancyError:
# Common workaround for files with inconsistent offsets
return FlowData(path, ignore_offset_discrepancy=True)
except FCSParsingError:
# Looser mode if the file is malformed
return FlowData(path, ignore_offset_error=True)
def main() -> None:
# --- 1) Read file (single dataset) ---
try:
flow = read_fcs_safely(FCS_PATH)
except MultipleDataSetsError:
# --- 4) Multi-dataset handling ---
datasets = read_multiple_data_sets(FCS_PATH)
flow = datasets[0] # pick the first dataset for this demo
print("File:", getattr(flow, "name", Path(FCS_PATH).name))
print("FCS version:", flow.version)
print("Events:", flow.event_count)
print("Channels:", flow.channel_count)
print("PnN labels:", flow.pnn_labels)
# Metadata (TEXT segment)
print("Instrument ($CYT):", flow.text.get("$CYT", "N/A"))
print("Acquisition date ($DATE):", flow.text.get("$DATE", "N/A"))
# --- 2) Events -> NumPy -> DataFrame -> CSV ---
events = flow.as_array(preprocess=True) # default preprocessing behavior
df = pd.DataFrame(events, columns=flow.pnn_labels)
df.to_csv("events.csv", index=False)
print("Wrote CSV:", "events.csv")
# --- 3) Filter and write a new FCS ---
# Example: threshold on first scatter channel if available, else channel 0
fsc_idx = flow.scatter_indices[0] if getattr(flow, "scatter_indices", []) else 0
threshold = np.percentile(events[:, fsc_idx], 50) # median threshold
mask = events[:, fsc_idx] > threshold
filtered = events[mask]
create_fcs(
"filtered.fcs",
filtered,
flow.pnn_labels,
opt_channel_names=flow.pns_labels,
metadata={**flow.text, "$SRC": "Filtered via FlowIO example"},
)
print("Wrote FCS:", "filtered.fcs")
# --- Metadata-only read (memory efficient) ---
meta_only = FlowData(FCS_PATH, only_text=True)
print("Metadata-only read: $DATE =", meta_only.text.get("$DATE", "N/A"))
if __name__ == "__main__":
main()
Implementation Details
Data Model and Segments
An FCS file is organized into segments:
- HEADER: FCS version and byte offsets for other segments.
- TEXT: Keyword/value metadata (e.g.,
$DATE,$CYT,$PnN,$PnS,$PnR,$PnG,$PnE). - DATA: Event matrix encoded as integer/float/double/ASCII depending on file keywords.
- ANALYSIS (optional): Post-processing results if present.
In FlowIO, these are exposed via FlowData attributes such as:
flow.header(HEADER info)flow.text(TEXT keyword dictionary)flow.analysis(ANALYSIS keyword dictionary, if present)flow.as_array(...)(decoded event matrix)
Preprocessing (as_array(preprocess=True))
When preprocessing is enabled, FlowIO applies common FCS transformations:
- Gain scaling (PnG): Values are multiplied by the per-parameter gain.
- Log/exponential transform (PnE): If present, applies:
value = a * 10^(b * raw_value)wherePnE = "a,b".
- Time scaling: If a time channel is detected, values may be scaled into appropriate units.
To disable all transformations and obtain raw decoded values:
flow.as_array(preprocess=False)
Channel Identification
FlowIO provides convenience indices for common channel types:
flow.scatter_indices(e.g., FSC/SSC)flow.fluoro_indices(fluorescence channels)flow.time_index(time channel index orNone)
These indices can be used to slice the event matrix:
events[:, flow.scatter_indices]events[:, flow.fluoro_indices]
Handling Problematic Files (Offsets and Null Channels)
Some files contain inconsistent offsets between HEADER and TEXT:
ignore_offset_discrepancy=Trueto tolerate HEADER/TEXT offset mismatch.use_header_offsets=Trueto prefer HEADER offsets.ignore_offset_error=Trueto bypass offset-related failures more aggressively.
To exclude known null/empty channels during parsing:
FlowData(path, null_channel_list=[...])
Multi-Dataset Files
If a file contains multiple datasets, constructing FlowData(path) may raise MultipleDataSetsError. Use:
read_multiple_data_sets(path)to load all datasets, orFlowData(path, nextdata_offset=...)to load a specific dataset using$NEXTDATAoffsets.
Writing FCS
Two common patterns:
- Write metadata-only changes:
flow.write_fcs("out.fcs", metadata={...}) - Modify event data: extract array → modify →
create_fcs(...)to generate a new file (FlowIO does not modify event data in-place).