API Documentation

Automate dataset submission to the Protein Diffraction Image Server

Overview

The REST API lets you upload crystallographic raw datasets and track their processing status programmatically – without using the web interface. All requests go to:

https://proteindiffraction.org/api/v1/

Supported archive formats:

  • tar.gz / tgz
  • tar.bz2 / tbz2
  • tar
  • zip
  • Any of the above compressed with zstd*.tar.zst, *.tar.gz.zst, etc. (the server decompresses transparently)

Authentication

Every request (except POST /api/v1/auth/token/) requires an API token in the Authorization header:

Authorization: Token your40chartoken...
Logged-in users: your personal API token is available on your Profile page. You can also regenerate it there if it has been compromised.

Getting a token via the API

Exchange your email + password for a token once and store it safely.

POST /api/v1/auth/token/

Request (form-data or JSON):

curl -X POST https://proteindiffraction.org/api/v1/auth/token/ \
     -d "username=you@lab.edu&password=secret"

Response 200:

{
  "token":   "643df60b0f9156255757c5f641d5ea95bb043198",
  "user_id": 42,
  "email":   "you@lab.edu"
}

The token does not expire automatically. Regenerate it from your Profile page if needed.

Submission workflow

A complete submission requires two API calls:

  1. UploadPOST /api/v1/upload/
    Send the compressed archive (zstd recommended). The server validates the file format, checks for duplicates, and tries to auto-detect the PDB ID from headers inside the archive. Returns an upload token.
  2. SubmitPOST /api/v1/submit/
    Confirm the upload and supply metadata (PDB code, title, authors, equipment). Triggers the background processing pipeline (archive repacking → metadata harvesting → publication).

Shell example (full flow)

# ① Get a token once
TOKEN=$(curl -s -X POST https://proteindiffraction.org/api/v1/auth/token/ \
  -d "username=you@lab.edu&password=secret" \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['token'])")

# ② Compress raw data with zstd (fast level-3)
tar -I zstd -cf dataset_1ABC.tar.zst ./raw_diffraction_data/

# ③ Upload – server decompresses, validates and detects PDB ID
UPLOAD=$(curl -s -X POST https://proteindiffraction.org/api/v1/upload/ \
  -H "Authorization: Token $TOKEN" \
  -F "dataset_file=@dataset_1ABC.tar.zst")
echo $UPLOAD
# {"token":"abc123…","pdbid":"1ABC","md5hash":"…","size":12345678}

FILE_TOKEN=$(echo $UPLOAD | python3 -c "import sys,json; print(json.load(sys.stdin)['token'])")

# ④ Submit with metadata
curl -s -X POST https://proteindiffraction.org/api/v1/submit/ \
  -H "Authorization: Token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "token":           "'"$FILE_TOKEN"'",
    "pdb_code":        "1ABC",
    "project_title":   "High-resolution structure of protein X",
    "project_authors": "Smith J., Doe A.",
    "equipment_other": "APS 24-ID-C beamline"
  }'
# {"status":"submitted","token":"abc123…"}
# "project_title", "project_authors" and "equipment_other" are optional if the "pdb_code" is given.
# In this case this data will be fetched from RCSB.


# ⑤ Poll for processing status
curl -H "Authorization: Token $TOKEN" https://proteindiffraction.org/api/v1/submissions/$FILE_TOKEN/

zstd compression

We recommend sending files with zstd instead of gzip because it achieves similar compression at much higher speeds, however, tar.gz, tar.bz2 and zip are also supported.

MethodCreate archiveDecompress locally
zstd (recommended) tar -I zstd -cf data.tar.zst ./data/ tar -I zstd -xf data.tar.zst
gzip tar -czf data.tar.gz ./data/ tar -xzf data.tar.gz
bzip2 tar -cjf data.tar.bz2 ./data/ tar -xjf data.tar.bz2

Install zstd: apt install zstd / brew install zstd

Endpoint reference

Obtain auth token

POST /api/v1/auth/token/ No authentication required
FieldTypeDescription
username required stringYour registered e-mail address
password required stringAccount password

Returns 200 { token, user_id, email }

Upload dataset

POST /api/v1/upload/ multipart/form-data
FieldTypeDescription
dataset_file required file Archive file: .tar, .tar.gz, .tar.bz2, .zip, or any of those compressed with zstd (.tar.zst, .tar.gz.zst, etc.). The server decompresses zstd transparently — the inner archive is stored.
dataset_type optional string "pdb" (default) or "other" for non-PDB depositions

Returns 201:

{
  "token":         "<40-char upload token>",  // save this!
  "md5hash":       "d41d8cd98f00b204e9800998ecf8427e",
  "filename":      "dataset.tar",
  "size":          1234567,
  "pdbid":         "1ABC",    // null if not detected
  "is_pdb_exists": false
}

The upload token is required for the POST /submit/ call. If is_pdb_exists is true, this PDB ID already has a dataset in the repository.

Submit dataset

POST /api/v1/submit/ application/json
FieldTypeDescription
token required stringUpload token returned by /upload/
pdb_code string4-letter PDB ID (required unless dataset_type=other)
dataset_type optional string"pdb" (default) or "other"
project_title string Descriptive title (min. 10 chars). Required only when pdb_code is absent — optional when a PDB code is given (title is fetched from RCSB).
project_authors string Author list, e.g. "Smith J., Doe A." Required only when pdb_code is absent.
synchrotron_id optional integerSynchrotron ID from GET /equipment/
beamline_id optional integerBeamline ID (pair with synchrotron_id)
equipment_other string Free-text equipment description. Either this or synchrotron_id+beamline_id is required only when pdb_code is absent.

Returns 200: { "status": "submitted", "token": "…" }

List submissions

GET /api/v1/submissions/

Returns an array of all submissions belonging to the authenticated user, ordered newest-first.

Returns 200 – array of submission objects (see status fields below).

Get / Delete a submission

GET /api/v1/submissions/<upload-token>/

Returns 200 – full submission object:

{
  "token":            "abc123…",
  "original_filename":"dataset.tar",
  "md5hash":          "d41d8…",
  "pdbid":            "1ABC",
  "dataset_type":     "pdb",
  "project_title":    "High-resolution structure…",
  "project_authors":  "Smith J., Doe A.",
  "synchrotron":      { "id": 25, "short": "APS" },
  "beamline":         { "id": 42, "short": "24-ID-C" },
  "equipment_other":  null,
  "status":           "validated",
  "status_details":   null,
  "inserted":         "2026-06-02T10:15:30Z",
  "updated":          "2026-06-02T10:18:45Z"
}
DELETE /api/v1/submissions/<upload-token>/

Delete a submission. Only allowed when the current status is deletable (e.g. uploaded, validated, error_no_data).

Returns 204 No Content on success.

List equipment

GET /api/v1/equipment/

Returns the list of synchrotrons and their beamlines. Use id values with the synchrotron_id and beamline_id parameters of POST /submit/.

[
  {
    "id": 25, "short": "APS", "fullname": "Advanced Photon Source",
    "url": "http://www.aps.anl.gov",
    "beamlines": [
      { "id": 42, "short": "24-ID-C", "fullname": null, "url": null },
      …
    ]
  }, …
]

Submission statuses

StatusMeaningNext action
uploaded File received, waiting for metadata confirmation Call POST /submit/
submitted Metadata confirmed, queued for validation Wait
pending Validation in progress Wait
validated Archive structure OK, metadata harvesting in progress Wait
published Data publicly accessible Done ✓
error_no_data No diffraction images found in the archive Delete and re-upload a corrected archive
inconsistent_datasets Multiple non-matching datasets detected Contact support or resubmit
header_not_readable Image headers could not be parsed Check image format; contact support

Python client

The file api_upload_client.py available here: https://github.com/prubach/proteindiffraction_api wraps the full workflow into a single command.

Installation

pip install requests zstandard

zstandard is optional, if not available tar.bz2 will be used for compression instead of zstd.

Submit a dataset from a local directory

python api_upload_client.py \
    --server https://proteindiffraction.org/ap \
    --email you@lab.edu --password secret \
    --path ./raw_data/ \
    --pdb 1ABC \
    --title "High-resolution structure of protein X" \
    --authors "Smith J., Doe A." \
    --synchrotron-id 25 --beamline-id 42
        submit

"--title", "--authors" and "--synchrotron-id" are optional if the PDB code is given. In this case this data will be fetched from RCSB.

Other commands

# List available synchrotrons + beamlines
python api_upload_client.py --server … --email … --password … equipment

# List your submissions
python api_upload_client.py --server … --email … --password … list

# Check / poll a specific submission
python api_upload_client.py --token abc123… --poll \
    --server … --email … --password … status

# Delete a pending submission
python api_upload_client.py --token abc123… \
    --server … --email … --password … delete

Using from Python code

from api_upload_client import ProteinDiffractionClient, compress_directory_zstd

client = ProteinDiffractionClient("https://proteindiffraction.org/ap")
client.login("you@lab.edu", "secret")

# compress and upload
archive = compress_directory_zstd("./raw_data")
upload  = client.upload(archive)

# submit with metadata
client.submit(
    token          = upload["token"],
    pdb_code       = "1ABC",
    project_title  = "High-resolution structure of protein X",
    project_authors= "Smith J., Doe A.",
    equipment_other= "APS 24-ID-C beamline",
)

# wait for processing
final = client.poll_until_done(upload["token"])
print(final["status"])

Error responses

All errors return JSON with an "error" field:

{ "error": "Description of what went wrong." }
StatusMeaning
400 Bad request – missing/invalid field or wrong file format
401 Authentication credentials missing or invalid
403 Forbidden – submission belongs to another user
404 Submission token not found
409 Conflict – duplicate file hash, or submission already processed
500 Internal server error – contact support