Agent Skills

Citation Network

AIPOCH

Build and visualize a citation network from a source/target CSV to identify key papers, communities, and emerging hotspots; use when you have citation pairs and need fast literature review or trend analysis.

3
0
FILES
citation-network/
skill.md
scripts
build_citation_network.py
export_gexf_html.py
init_run.py
references
data-cleaning-checklist.md
network-metrics-notes.md
README.md
92100Total Score
View Evaluation Report
Core Capability
87 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
9 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
100You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis
4/4
97You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters
4/4
95Builds a directed citation graph from a minimal CSV containing source and target
4/4
94De-duplicates nodes by identifier (DOI recommended; otherwise unique titles)
4/4
94End-to-end case for Builds a directed citation graph from a minimal CSV containing source and target
4/4

SKILL.md

When to Use

  • You have a citation relationship table (who cites whom) and want to quickly turn it into a directed network for analysis.
  • You are conducting a literature review and need to identify influential papers (high in-degree / centrality) and core clusters.
  • You want to detect community structures (research subfields) and compare them across time or datasets.
  • You need an interactive, shareable visualization (HTML) or a Gephi-importable graph file (GEXF).
  • You are positioning a new project and want evidence of research hotspots and bridging papers between communities.

Key Features

  • Builds a directed citation graph from a minimal CSV containing source and target.
  • De-duplicates nodes by identifier (DOI recommended; otherwise unique titles).
  • Exports:
    • citation_network.gexf for Gephi and other graph tools
    • network_metrics.json for basic network statistics
    • citation_network.html for interactive browser viewing (auto-generated by the build script)
  • Run-directory workflow to keep each execution reproducible and isolated under outputs/runs/<timestamp>/.
  • Optional input encoding control to avoid garbled characters (e.g., UTF-8 / UTF-8-SIG).

Dependencies

  • Python 3.10+
  • pandas >= 2.0
  • networkx >= 3.0
  • (Optional, for HTML visualization) pyvis >= 0.3

Example Usage

1) Initialize a run directory

python scripts/init_run.py

This creates a new run folder:

outputs/runs/<timestamp>/
  config.json
  data/
  outputs/

2) Prepare the citation CSV (minimal)

Create citations.csv and place it into:

outputs/runs/<timestamp>/data/citations.csv

Minimal CSV format:

source,target
Paper A,Paper B
Paper A,Paper C

Recommended DOI-based identifiers:

source,target
10.1234/abcd.1,10.1234/abcd.2
10.1234/abcd.1,10.1234/abcd.3

3) Confirm configuration

Open:

outputs/runs/<timestamp>/config.json

Ensure the configured input filename and column names match your CSV (at minimum source and target). If you see garbled characters, set an explicit encoding (e.g., utf-8 or utf-8-sig) via an input_encoding field if supported by the config.

4) Build the citation network

python scripts/build_citation_network.py

The build script will also generate the HTML automatically (you do not need to run scripts/export_gexf_html.py manually).

5) Inspect outputs

Expected outputs under the same run directory:

  • citation_network.gexf (import into Gephi)
  • network_metrics.json (node/edge counts, density, etc.)
  • citation_network.html (open in a browser)

Implementation Details

Data Model

  • Nodes: papers, identified by the value in source/target (DOI preferred; otherwise a unique, consistent title string).
  • Edges: directed citations source -> target.

Input Requirements and Constraints

  • The network builder reads only the source and target columns.
  • Additional columns (e.g., author/year/venue) are ignored by the current scripts.
  • If you need metadata, maintain a separate table for downstream joining/annotation (not consumed by the builder), for example:
id,title,authors,year,doi
10.1234/abcd.1,Paper A,"Zhang, Wei; Li, Ming",2021,10.1234/abcd.1
10.1234/abcd.2,Paper B,"Wang, Fang",2019,10.1234/abcd.2

Run Directory Standard

  • Always run python scripts/init_run.py before an execution to create a new run directory.
  • All inputs, configs, and outputs must remain inside outputs/runs/<timestamp>/.
  • By default, scripts operate on the latest run directory under outputs/runs/.

Metrics and Analysis (Conceptual)

  • Basic network statistics are exported to network_metrics.json (e.g., node/edge counts, density).
  • Typical downstream analyses include:
    • centrality (degree, betweenness)
    • community detection (e.g., Louvain), if enabled/implemented in the pipeline

Common Failure Modes

  • Garbled characters: ensure CSV is UTF-8/UTF-8-SIG; set input_encoding in config.json if available.
  • Duplicate nodes: identical identifiers are treated as the same node; prefer DOIs or enforce unique titles.
  • Empty or missing output: verify the CSV header names match the configured source/target columns.
  • Data cleaning checklist: references/data-cleaning-checklist.md
  • Network metrics notes: references/network-metrics-notes.md
  • Additional documentation: references/README.md