API Documentation – Protein Diffraction

Overview

The REST API lets you upload crystallographic raw datasets and track their processing status programmatically – without using the web interface. All requests go to:

https://proteindiffraction.org/api/v1/

Supported archive formats:

tar.gz / tgz
tar.bz2 / tbz2
tar
zip
Any of the above compressed with zstd — *.tar.zst, *.tar.gz.zst, etc. (the server decompresses transparently)

Authentication

Every request (except POST /api/v1/auth/token/) requires an API token in the Authorization header:

Authorization: Token your40chartoken...

Logged-in users: your personal API token is available on your Profile page. You can also regenerate it there if it has been compromised.

Getting a token via the API

Exchange your email + password for a token once and store it safely.

POST /api/v1/auth/token/

Request (form-data or JSON):

curl -X POST https://proteindiffraction.org/api/v1/auth/token/ \
     -d "username=you@lab.edu&password=secret"

Response 200:

{
  "token":   "643df60b0f9156255757c5f641d5ea95bb043198",
  "user_id": 42,
  "email":   "you@lab.edu"
}

The token does not expire automatically. Regenerate it from your Profile page if needed.

Submission workflow

A complete submission requires two API calls:

Upload – POST /api/v1/upload/
Send the compressed archive (zstd recommended). The server validates the file format, checks for duplicates, and tries to auto-detect the PDB ID from headers inside the archive. Returns an upload token.
Submit – POST /api/v1/submit/
Confirm the upload and supply metadata (PDB code, title, authors, equipment). Triggers the background processing pipeline (archive repacking → metadata harvesting → publication).

Shell example (full flow)

# ① Get a token once
TOKEN=$(curl -s -X POST https://proteindiffraction.org/api/v1/auth/token/ \
  -d "username=you@lab.edu&password=secret" \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['token'])")

# ② Compress raw data with zstd (fast level-3)
tar -I zstd -cf dataset_1ABC.tar.zst ./raw_diffraction_data/

# ③ Upload – server decompresses, validates and detects PDB ID
UPLOAD=$(curl -s -X POST https://proteindiffraction.org/api/v1/upload/ \
  -H "Authorization: Token $TOKEN" \
  -F "dataset_file=@dataset_1ABC.tar.zst")
echo $UPLOAD
# {"token":"abc123…","pdbid":"1ABC","md5hash":"…","size":12345678}

FILE_TOKEN=$(echo $UPLOAD | python3 -c "import sys,json; print(json.load(sys.stdin)['token'])")

# ④ Submit with metadata
curl -s -X POST https://proteindiffraction.org/api/v1/submit/ \
  -H "Authorization: Token $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "token":           "'"$FILE_TOKEN"'",
    "pdb_code":        "1ABC",
    "project_title":   "High-resolution structure of protein X",
    "project_authors": "Smith J., Doe A.",
    "equipment_other": "APS 24-ID-C beamline"
  }'
# {"status":"submitted","token":"abc123…"}
# "project_title", "project_authors" and "equipment_other" are optional if the "pdb_code" is given.
# In this case this data will be fetched from RCSB.


# ⑤ Poll for processing status
curl -H "Authorization: Token $TOKEN" https://proteindiffraction.org/api/v1/submissions/$FILE_TOKEN/

zstd compression

We recommend sending files with zstd instead of gzip because it achieves similar compression at much higher speeds, however, tar.gz, tar.bz2 and zip are also supported.

Method	Create archive	Decompress locally
zstd (recommended)	`tar -I zstd -cf data.tar.zst ./data/`	`tar -I zstd -xf data.tar.zst`
gzip	`tar -czf data.tar.gz ./data/`	`tar -xzf data.tar.gz`
bzip2	`tar -cjf data.tar.bz2 ./data/`	`tar -xjf data.tar.bz2`

Install zstd: apt install zstd / brew install zstd

Endpoint reference

Obtain auth token

POST /api/v1/auth/token/ No authentication required

Field	Type	Description
`username` required	string	Your registered e-mail address
`password` required	string	Account password

Returns 200 { token, user_id, email }

Upload dataset

POST /api/v1/upload/ multipart/form-data

Field	Type	Description
`dataset_file` required	file	Archive file: `.tar`, `.tar.gz`, `.tar.bz2`, `.zip`, or any of those compressed with zstd (`.tar.zst`, `.tar.gz.zst`, etc.). The server decompresses zstd transparently — the inner archive is stored.
`dataset_type` optional	string	`"pdb"` (default) or `"other"` for non-PDB depositions

Returns 201:

{
  "token":         "<40-char upload token>",  // save this!
  "md5hash":       "d41d8cd98f00b204e9800998ecf8427e",
  "filename":      "dataset.tar",
  "size":          1234567,
  "pdbid":         "1ABC",    // null if not detected
  "is_pdb_exists": false
}

The upload token is required for the POST /submit/ call. If is_pdb_exists is true, this PDB ID already has a dataset in the repository.

Submit dataset

POST /api/v1/submit/ application/json

Field	Type	Description
`token` required	string	Upload token returned by `/upload/`
`pdb_code`	string	4-letter PDB ID (required unless `dataset_type=other`)
`dataset_type` optional	string	`"pdb"` (default) or `"other"`
`project_title`	string	Descriptive title (min. 10 chars). Required only when `pdb_code` is absent — optional when a PDB code is given (title is fetched from RCSB).
`project_authors`	string	Author list, e.g. `"Smith J., Doe A."` Required only when `pdb_code` is absent.
`synchrotron_id` optional	integer	Synchrotron ID from `GET /equipment/`
`beamline_id` optional	integer	Beamline ID (pair with `synchrotron_id`)
`equipment_other`	string	Free-text equipment description. Either this or `synchrotron_id`+`beamline_id` is required only when `pdb_code` is absent.

Returns 200: { "status": "submitted", "token": "…" }

List submissions

GET /api/v1/submissions/

Returns an array of all submissions belonging to the authenticated user, ordered newest-first.

Returns 200 – array of submission objects (see status fields below).

Get / Delete a submission

GET /api/v1/submissions/<upload-token>/

Returns 200 – full submission object:

{
  "token":            "abc123…",
  "original_filename":"dataset.tar",
  "md5hash":          "d41d8…",
  "pdbid":            "1ABC",
  "dataset_type":     "pdb",
  "project_title":    "High-resolution structure…",
  "project_authors":  "Smith J., Doe A.",
  "synchrotron":      { "id": 25, "short": "APS" },
  "beamline":         { "id": 42, "short": "24-ID-C" },
  "equipment_other":  null,
  "status":           "validated",
  "status_details":   null,
  "inserted":         "2026-06-02T10:15:30Z",
  "updated":          "2026-06-02T10:18:45Z"
}

DELETE /api/v1/submissions/<upload-token>/

Delete a submission. Only allowed when the current status is deletable (e.g. uploaded, validated, error_no_data).

Returns 204 No Content on success.

List equipment

GET /api/v1/equipment/

Returns the list of synchrotrons and their beamlines. Use id values with the synchrotron_id and beamline_id parameters of POST /submit/.

[
  {
    "id": 25, "short": "APS", "fullname": "Advanced Photon Source",
    "url": "http://www.aps.anl.gov",
    "beamlines": [
      { "id": 42, "short": "24-ID-C", "fullname": null, "url": null },
      …
    ]
  }, …
]

Submission statuses

Status	Meaning	Next action
`uploaded`	File received, waiting for metadata confirmation	Call `POST /submit/`
`submitted`	Metadata confirmed, queued for validation	Wait
`pending`	Validation in progress	Wait
`validated`	Archive structure OK, metadata harvesting in progress	Wait
`published`	Data publicly accessible	Done ✓
`error_no_data`	No diffraction images found in the archive	Delete and re-upload a corrected archive
`inconsistent_datasets`	Multiple non-matching datasets detected	Contact support or resubmit
`header_not_readable`	Image headers could not be parsed	Check image format; contact support

Python client

The file api_upload_client.py available here: https://github.com/prubach/proteindiffraction_api wraps the full workflow into a single command.

Installation

pip install requests zstandard

zstandard is optional, if not available tar.bz2 will be used for compression instead of zstd.

Submit a dataset from a local directory

python api_upload_client.py \
    --server https://proteindiffraction.org/ap \
    --email you@lab.edu --password secret \
    --path ./raw_data/ \
    --pdb 1ABC \
    --title "High-resolution structure of protein X" \
    --authors "Smith J., Doe A." \
    --synchrotron-id 25 --beamline-id 42
        submit

"--title", "--authors" and "--synchrotron-id" are optional if the PDB code is given. In this case this data will be fetched from RCSB.

Other commands

# List available synchrotrons + beamlines
python api_upload_client.py --server … --email … --password … equipment

# List your submissions
python api_upload_client.py --server … --email … --password … list

# Check / poll a specific submission
python api_upload_client.py --token abc123… --poll \
    --server … --email … --password … status

# Delete a pending submission
python api_upload_client.py --token abc123… \
    --server … --email … --password … delete

Using from Python code

from api_upload_client import ProteinDiffractionClient, compress_directory_zstd

client = ProteinDiffractionClient("https://proteindiffraction.org/ap")
client.login("you@lab.edu", "secret")

# compress and upload
archive = compress_directory_zstd("./raw_data")
upload  = client.upload(archive)

# submit with metadata
client.submit(
    token          = upload["token"],
    pdb_code       = "1ABC",
    project_title  = "High-resolution structure of protein X",
    project_authors= "Smith J., Doe A.",
    equipment_other= "APS 24-ID-C beamline",
)

# wait for processing
final = client.poll_until_done(upload["token"])
print(final["status"])

Error responses

All errors return JSON with an "error" field:

{ "error": "Description of what went wrong." }

Status	Meaning
400	Bad request – missing/invalid field or wrong file format
401	Authentication credentials missing or invalid
403	Forbidden – submission belongs to another user
404	Submission token not found
409	Conflict – duplicate file hash, or submission already processed
500	Internal server error – contact support