Agent Skills
Citation Network
AIPOCH
Build and visualize a citation network from a source/target CSV to identify key papers, communities, and emerging hotspots; use when you have citation pairs and need fast literature review or trend analysis.
3
0
FILES
92100Total Score
View Evaluation ReportCore Capability
87 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
9 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
100You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis
4/4
97You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters
4/4
95Builds a directed citation graph from a minimal CSV containing source and target
4/4
94De-duplicates nodes by identifier (DOI recommended; otherwise unique titles)
4/4
94End-to-end case for Builds a directed citation graph from a minimal CSV containing source and target
4/4
SKILL.md
When to Use
- You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis.
- You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters.
- You want to detect community structures (research subfields) and compare them across time or datasets.
- You need an interactive, shareable visualization (HTML) or a Gephi-importable graph file (GEXF).
- You are positioning a new project and want evidence of research hotspots and bridging papers between communities.
Key Features
- Builds a directed citation graph from a minimal CSV containing
sourceandtarget. - De-duplicates nodes by identifier (DOI recommended; otherwise unique titles).
- Exports:
citation_network.gexffor Gephi and other graph toolsnetwork_metrics.jsonfor basic network statisticscitation_network.htmlfor interactive browser viewing (auto-generated by the build script)
- Run-directory workflow to keep each execution reproducible and isolated under
outputs/runs/<timestamp>/. - Optional input encoding control to avoid garbled characters (e.g., UTF-8 / UTF-8-SIG).
Dependencies
- Python 3.10+
- pandas >= 2.0
- networkx >= 3.0
- (Optional, for HTML visualization) pyvis >= 0.3
Example Usage
1) Initialize a run directory
python scripts/init_run.py
This creates a new run folder:
outputs/runs/<timestamp>/
config.json
data/
outputs/
2) Prepare the citation CSV (minimal)
Create citations.csv and place it into:
outputs/runs/<timestamp>/data/citations.csv
Minimal CSV format:
source,target
Paper A,Paper B
Paper A,Paper C
Recommended DOI-based identifiers:
source,target
10.1234/abcd.1,10.1234/abcd.2
10.1234/abcd.1,10.1234/abcd.3
3) Confirm configuration
Open:
outputs/runs/<timestamp>/config.json
Ensure the configured input filename and column names match your CSV (at minimum source and target). If you see garbled characters, set an explicit encoding (e.g., utf-8 or utf-8-sig) via an input_encoding field if supported by the config.
4) Build the citation network
python scripts/build_citation_network.py
The build script will also generate the HTML automatically (you do not need to run scripts/export_gexf_html.py manually).
5) Inspect outputs
Expected outputs under the same run directory:
citation_network.gexf(import into Gephi)network_metrics.json(node/edge counts, density, etc.)citation_network.html(open in a browser)
Implementation Details
Data Model
- Nodes: papers, identified by the value in
source/target(DOI preferred; otherwise a unique, consistent title string). - Edges: directed citations
source -> target.
Input Requirements and Constraints
- The network builder reads only the
sourceandtargetcolumns. - Additional columns (e.g., author/year/venue) are ignored by the current scripts.
- If you need metadata, maintain a separate table for downstream joining/annotation (not consumed by the builder), for example:
id,title,authors,year,doi
10.1234/abcd.1,Paper A,"Zhang, Wei; Li, Ming",2021,10.1234/abcd.1
10.1234/abcd.2,Paper B,"Wang, Fang",2019,10.1234/abcd.2
Run Directory Standard
- Always run
python scripts/init_run.pybefore an execution to create a new run directory. - All inputs, configs, and outputs must remain inside
outputs/runs/<timestamp>/. - By default, scripts operate on the latest run directory under
outputs/runs/.
Metrics and Analysis (Conceptual)
- Basic network statistics are exported to
network_metrics.json(e.g., node/edge counts, density). - Typical downstream analyses include:
- centrality (degree, betweenness)
- community detection (e.g., Louvain), if enabled/implemented in the pipeline
Common Failure Modes
- Garbled characters: ensure CSV is UTF-8/UTF-8-SIG; set
input_encodinginconfig.jsonif available. - Duplicate nodes: identical identifiers are treated as the same node; prefer DOIs or enforce unique titles.
- Empty or missing output: verify the CSV header names match the configured
source/targetcolumns.
Related References
- Data cleaning checklist:
references/data-cleaning-checklist.md - Network metrics notes:
references/network-metrics-notes.md - Additional documentation:
references/README.md