UAV Forensic Toolkit

Desktop forensic toolkit for recovering, decoding, validating, and visualizing UAV telemetry logs with chain-of-custody logging, SHA-256 hashing, PDF reports, and machine-readable JSON/CSV artifacts.

1) Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 -m uav_forensic_toolkit
Forensic principle: this toolkit is designed to operate on forensic copies. If your lab process requires strict write-blocking, ensure the evidence is protected at the OS/hardware level before using any software tool.

2) High-level workflow

  1. Recover: copy telemetry files from the SD card (or its forensic copy) into a case directory; compute SHA-256.
  2. Decode: convert DJI logs into a unified CSV schema.
  3. Analyze: compute physics/timestamp/precision checks + ML anomaly scoring; generate reports.
  4. Visualize: create an integrity-aware flight map and charts.
How to run: launch the GUI, initialize a case directory, then run each tab in order (Recover → Decode → Analyze → Visualize).

2.1) Complete System Flow

┌─────────────────────────────────────────────────────────────────┐
│                    UAV FORENSIC TOOLKIT FLOW                     │
└─────────────────────────────────────────────────────────────────┘

1. INITIALIZATION
   └─> User enters Operator ID + Case Directory
       └─> Click "Initialize Case" button
           └─> Creates: case_metadata.json, chain_of_custody.jsonl
           └─> Code: uav_forensic_toolkit/gui/main_window.py:86-105

2. RECOVER TAB
   └─> User selects Evidence Source Directory
       └─> (Optional) Enable deep recovery + select forensic image
       └─> Click "Run Recovery" button
           └─> Function: recover_logs_from_directory()
           └─> Code: uav_forensic_toolkit/core/recover.py
           └─> Outputs:
               • RecoveredLogs/ (copied files)
               • CarvedLogs/ (if deep recovery enabled)
               • metadata.json, metadata.csv
               • hash_verification.csv
               • recovery_report.pdf

3. DECODE TAB
   └─> User selects Input Log File (.txt/.dat/.csv)
       └─> (Optional) Provide dji-log Path
       └─> Click "Decode to Unified CSV" button
           └─> Function: cli_decode subprocess
           └─> Code: uav_forensic_toolkit/cli_decode.py
           └─> Outputs:
               • Decoded/decoded_flightlog.csv (unified schema)
               • Decoded/decoder_metadata.json
               • Decoded/decoder_report.pdf

4. ANALYZE TAB
   └─> User selects Decoded CSV
       └─> (Optional) Provide Labels CSV + Label column name
       └─> Click "Run Tampering Analysis" button
           └─> Function: analyze_tampering()
           └─> Code: uav_forensic_toolkit/core/analyze.py:44-330
           └─> Process:
               a) Load points from CSV (lines 62-94)
               b) Physics checks (lines 98-167):
                  • Speed limit: 20.0 m/s
                  • Acceleration limit: 5.0 m/s²
                  • Time gap threshold: 5.0 seconds
               c) GPS precision check (lines 169-179)
               d) ML anomaly detection (lines 181-236):
                  • IsolationForest model
                  • Features: dt_s, distance_m, speed_mps, accel_mps2,
                    alt_rate_mps, delta_lat, delta_lon, bearing_deg,
                    bearing_change_deg
                  • Threshold: 70th percentile (default 0.12)
               e) Evaluation metrics (if labels provided) (lines 243-282)
           └─> Outputs:
               • Analysis/anomaly_scores.csv
               • Analysis/tampering_report.json
               • Analysis/tampering_report.pdf

5. VISUALIZE TAB
   └─> User selects Decoded CSV + Tampering Report JSON
       └─> Click "Generate Flight Visualization" button
           └─> Function: build_visualization_artifacts()
           └─> Code: uav_forensic_toolkit/core/visualize.py:72-234
           └─> Process:
               a) Load points and tampering report (lines 91-96)
               b) Build integrity segments (lines 99-118):
                  • Green = authenticated
                  • Red = tampered
                  • Orange = missing data
               c) Generate KML (lines 125-126)
               d) Generate interactive map with folium (lines 128-146)
               e) Generate charts with matplotlib (lines 169-191):
                  • altitude_chart.png
                  • speed_chart.png
           └─> Outputs:
               • Visualization/flight_map.html (interactive map)
               • Visualization/flight_map.kml
               • Visualization/altitude_chart.png
               • Visualization/speed_chart.png
               • Visualization/flight_report.pdf
               • Visualization/flight_summary.json
    

2.2) Every Button Explained

Top Panel Buttons

  • Browse (Case Directory)
    Opens file dialog to select case directory
    Code: main_window.py:81-84
  • Initialize Case
    Creates case folder structure, writes metadata and custody log
    Code: main_window.py:86-105
    Creates: case_metadata.json, chain_of_custody.jsonl

Recover Tab Buttons

  • Browse (Evidence Source)
    Selects SD card root or Flight Records folder
    Code: recover_tab.py:78-81
  • Browse (Forensic Image)
    Selects forensic image file (.img/.dd/.raw/.dmg)
    Code: recover_tab.py:83-86
  • Run Recovery
    Calls recover_logs_from_directory()
    Code: recover_tab.py:88-131
    Copies files, computes SHA-256, generates reports

Decode Tab Buttons

  • Browse (Input Log File)
    Selects log file (.txt/.dat/.csv/.log/.bin)
    Code: decode_tab.py:72-75
  • Browse (dji-log Path)
    Selects dji-log executable (optional)
    Code: decode_tab.py:77-80
  • Decode to Unified CSV
    Runs CLI decode subprocess
    Code: decode_tab.py:85-128
    Converts DJI logs to unified CSV format

Analyze Tab Buttons

  • Browse (Decoded CSV)
    Selects decoded_flightlog.csv
    Code: analyze_tab.py:63-66
  • Browse (Labels CSV)
    Selects optional labels CSV file
    Code: analyze_tab.py:68-71
  • Run Tampering Analysis
    Calls analyze_tampering()
    Code: analyze_tab.py:73-111
    Runs physics checks + ML anomaly detection

Visualize Tab Buttons

  • Browse (Decoded CSV)
    Selects decoded_flightlog.csv
    Code: visualize_tab.py:89-92
  • Browse (Tampering Report JSON)
    Selects tampering_report.json
    Code: visualize_tab.py:94-97
  • Generate Flight Visualization
    Calls build_visualization_artifacts()
    Code: visualize_tab.py:99-132
    Generates maps, charts, and reports
  • Open Map in Browser
    Opens flight_map.html in default browser
    Code: visualize_tab.py:142-145

2.3) Code Locations for Study

Core Analysis Engine

  • Main Analysis Function:
    uav_forensic_toolkit/core/analyze.py
    • Function: analyze_tampering() (line 44)
    • Physics checks: lines 98-167
    • ML model: lines 181-236
    • Threshold calculation: lines 219-226
  • Visualization Engine:
    uav_forensic_toolkit/core/visualize.py
    • Function: build_visualization_artifacts() (line 72)
    • Map generation: lines 128-146
    • Chart generation: lines 169-191

GUI Components

  • Main Window:
    uav_forensic_toolkit/gui/main_window.py
    • Window setup: lines 27-75
    • Button handlers: lines 81-164
  • Tab Components:
    • Recover: gui/tabs/recover_tab.py
    • Decode: gui/tabs/decode_tab.py
    • Analyze: gui/tabs/analyze_tab.py
    • Visualize: gui/tabs/visualize_tab.py

Model Training

  • Isolation Forest:
    uav_forensic_toolkit/core/train_if.py
    • Function: train_isolation_forest_on_dataset() (line 117)
    • Graph generation: lines 228-354
  • Supervised Models:
    uav_forensic_toolkit/core/train_supervised.py
    • Function: train_supervised_on_dataset() (line 117)
    • Supports: RandomForest, HistGradientBoosting, ExtraTrees
    • Graph generation: lines 280-349

Utility Functions

  • Geographic Calculations:
    uav_forensic_toolkit/core/geo.py
    • Haversine distance, bearing calculations
  • Chain of Custody:
    uav_forensic_toolkit/core/custody.py
    • OperatorContext, append_custody_event()
  • PDF Generation:
    uav_forensic_toolkit/core/pdf.py
    • write_simple_pdf() function

2.4) All Threshold Values

Physics-Based Thresholds

  • Speed Limit: 20.0 m/s (72 km/h)
    Location: analyze.py:99
    Flag: "impossible_speed"
    Meaning: Maximum physically possible speed for UAV
  • Acceleration Limit: 5.0 m/s²
    Location: analyze.py:98
    Flag: "excessive_acceleration"
    Meaning: Maximum physically possible acceleration
  • Time Gap Threshold: 5.0 seconds
    Location: analyze.py:100
    Flag: "suspicious_gap"
    Meaning: Gaps larger than this indicate missing data

ML Model Thresholds

  • IsolationForest Threshold:
    Default: 0.8 (line 182)
    Auto-calculated: 70th percentile (line 224)
    Minimum: 0.12 (line 224)
    Location: analyze.py:181-226
    Meaning: Anomaly scores ≥ threshold = tampered
  • Contamination Rate:
    Range: 0.08 to 0.18
    Formula: min(0.18, max(0.08, len(X) / 100000.0))
    Location: analyze.py:205
    Meaning: Expected proportion of anomalies
  • Threshold Percentile Target: 0.70 (70th percentile)
    Location: analyze.py:183
    Meaning: Uses 70th percentile of scores as threshold

GPS Precision Thresholds

  • Decimal Place Deviation: ≥ 2 places
    Location: analyze.py:175
    Flag: "gps_precision_inconsistency"
    Meaning: GPS coordinates with precision deviating by 2+ decimal places from mode
  • Mode Calculation:
    Location: analyze.py:171-172
    Meaning: Most common decimal precision for lat/lon

Supervised Model Thresholds

  • Default Probability Threshold: 0.5
    Location: train_supervised.py:123
    Meaning: Probability ≥ 0.5 = tampered (binary classification)
  • Test Size: 0.2 (20%)
    Location: train_supervised.py:121
    Meaning: 20% of data used for testing
  • Max Segments: 800,000
    Location: train_supervised.py:120
    Meaning: Maximum segments to use for training

2.5) Machine Learning Models Explained

IsolationForest (Used in Analysis)

  • Location: analyze.py:184-211
  • Purpose: Unsupervised anomaly detection
  • Training: Trained on current decoded CSV
  • Features (9 total):
    1. dt_s - Time delta between points
    2. distance_m - Haversine distance
    3. speed_mps - Calculated speed
    4. accel_mps2 - Acceleration
    5. alt_rate_mps - Altitude change rate
    6. delta_lat - Latitude change
    7. delta_lon - Longitude change
    8. bearing_deg - Direction of travel
    9. bearing_change_deg - Change in direction
  • Output: Anomaly score (0.0 to 1.0)
  • Normalization: Min-max normalization (line 209)

Supervised Models (Training Only)

  • Location: train_supervised.py:117-398
  • Supported Models:
    • RandomForest: 500 trees, balanced_subsample
    • HistGradientBoosting: learning_rate=0.08, max_depth=6
    • ExtraTrees: 500 trees, balanced_subsample
  • Features: Same 9 features as IsolationForest
  • Training Data: Requires labeled dataset CSV
  • Output: Probability score (0.0 to 1.0)
  • Evaluation: Accuracy, Precision, Recall, F1

Model Training (IsolationForest)

  • Location: train_if.py:117-411
  • Purpose: Train IsolationForest on labeled dataset
  • Contamination: 0.02 (2% expected anomalies)
  • Threshold Percentile: 0.98 (98th percentile)
  • Training: Uses only normal (label=0) segments
  • Output: Trained model saved as .joblib file

2.6) Labels Format and Meaning

Labels CSV Format

row_idx,label
12,1
13,1
14,0
15,0
  • row_idx: Row index from decoded CSV (0-based)
  • label: Ground truth label
    • 1 or "1" or "true" or "tampered" or "anomaly" or "yes" = Tampered
    • 0 or "0" or "false" or "normal" or "ok" or "no" = Normal

Location: analyze.py:244-259

Label Column Name

  • Default: "label"
  • Customizable: User can specify different column name
  • Location: analyze_tab.py:49-51

Dataset Format (for Training)

case_id,row_idx,label,timestamp,latitude,longitude,altitude
1,0,0,2024-01-01T10:00:00+00:00,37.7749,-122.4194,100.0
1,1,1,2024-01-01T10:00:01+00:00,37.7750,-122.4195,100.0
  • case_id: Unique case identifier
  • row_idx: Row index within case
  • label: 0 = normal, 1 = tampered
  • timestamp, latitude, longitude, altitude: Flight data

Location: train_supervised.py:52-66

Evaluation Metrics (if labels provided)

  • True Positive (TP): Correctly identified tampered rows
  • True Negative (TN): Correctly identified normal rows
  • False Positive (FP): Normal rows flagged as tampered
  • False Negative (FN): Tampered rows missed
  • Accuracy: (TP + TN) / (TP + TN + FP + FN)
  • Precision: TP / (TP + FP)
  • Recall: TP / (TP + FN)
  • F1 Score: 2 × (Precision × Recall) / (Precision + Recall)

Location: analyze.py:266-274

2.7) Graph Generation Locations

Visualization Charts

  • Altitude Chart:
    Location: visualize.py:166-183
    Library: matplotlib
    Output: altitude_chart.png
    Data: Timestamp vs Altitude
  • Speed Chart:
    Location: visualize.py:185-190
    Library: matplotlib
    Output: speed_chart.png
    Data: Timestamp vs Speed (m/s)

Interactive Map

  • Flight Map (HTML):
    Location: visualize.py:128-146
    Library: folium (Leaflet.js)
    Output: flight_map.html
    Features:
    • Green lines = authenticated segments
    • Red dashed lines = tampered segments
    • Orange dashed lines = missing data
    • Start/End markers

Training Graphs (IsolationForest)

  • Metrics Curve: train_if.py:250-264
    Shows: Accuracy, Precision, Recall, F1 vs threshold
  • Accuracy Curve: train_if.py:266-277
  • Precision Curve: train_if.py:279-290
  • Recall Curve: train_if.py:292-303
  • F1 Curve: train_if.py:305-316
  • Confusion Matrix: train_if.py:318-328
  • Precision-Recall Curve: train_if.py:330-354

Training Graphs (Supervised)

  • Metrics Curve: train_supervised.py:292-306
    Shows: Accuracy, Precision, Recall, F1 vs probability threshold
  • Accuracy Curve: train_supervised.py:322
  • Precision Curve: train_supervised.py:323
  • Recall Curve: train_supervised.py:324
  • F1 Curve: train_supervised.py:325
  • Precision-Recall Curve: train_supervised.py:327-337
  • Confusion Matrix: train_supervised.py:339-349
Graph Libraries Used:
  • matplotlib: For static charts (altitude, speed, training curves)
  • folium: For interactive maps (uses Leaflet.js under the hood)
  • Backend: matplotlib uses "Agg" backend (non-interactive) for server-side rendering

3) Case setup (top panel)

Operator ID (textbox)

Who is operating the tool. This value is recorded for chain-of-custody traceability.

Case Directory (textbox + Browse)

Where outputs are written. The app creates subfolders for each component.

Initialize Case (button)

Creates the case folder structure and writes:

  • case_metadata.json
  • chain_of_custody.jsonl (one JSON event per line)

Each custody event includes timestamp_utc, operator_id, tool_version, action, and a details object.

Activity Log (panel)

Shows progress updates from each operation.

The main window starts at 1280×720 for easier review.

4) Recover tab (Component 1)

Goal: copy telemetry-related files into the case without modifying originals.

Evidence Source Directory (Browse)

Select the SD root folder (or a forensic copy). The recovery code automatically searches for FlightRecord / Flight Records directories and prefers them if found.

Run Recovery (button)

Copies supported extensions into RecoveredLogs/ and computes SHA-256 hashes.

Deep recovery (carving) (optional)

If the SD card contains deleted or corrupted logs that are not visible as normal files, you can enable deep recovery.

Recovery outputs

What “hash_verification.csv” means: it lists each recovered artifact and its SHA-256 hash so you can later verify that files did not change after recovery.

5) Decode tab (Component 2)

Goal: convert DJI logs into a unified CSV schema used by the rest of the pipeline.

Input Log File (Browse)

Select a recovered log file (for DJI, typically DJIFlightRecord_YYYY-MM-DD_[HH-MM-SS].txt).

dji-log Path (optional)

If binary DJI FlightRecord decoding is needed, the toolkit calls the external dji-log tool (from dji-log-parser). Provide the executable path here if it is not in your PATH.

Decode to Unified CSV (button)

Runs decoding in a separate subprocess and writes:

Unified CSV schema

timestamp,latitude,longitude,altitude,speed,heading,source

source records the origin/decoder path (example: DJI_FLIGHTRECORD_BIN_DJI_LOG).

Note: .DAT decoding is not implemented in this build.

6) Analyze tab (Component 3)

Goal: detect suspicious segments using physics checks + timestamp checks + GPS precision + ML anomaly scoring.

Complete Analysis Process (Step-by-Step)

  1. Load Data (lines 62-94)
    • Reads decoded CSV row by row
    • Parses timestamp, latitude, longitude, altitude, speed, heading
    • Stores raw GPS strings for precision analysis
    • Creates Point objects with index, timestamp, coordinates
  2. Feature Extraction (lines 106-132)
    • For each consecutive point pair (i-1, i):
      • dt_s = time difference in seconds
      • distance_m = Haversine distance
      • speed_mps = distance / dt (m/s)
      • accel_mps2 = (speed - prev_speed) / dt
      • bearing_deg = direction of travel (0-360°)
      • bearing_change_deg = change in direction
      • alt_rate_mps = altitude change rate
      • delta_lat = latitude change
      • delta_lon = longitude change
  3. Physics-Based Checks (lines 134-167)
    • Checks if dt_s ≤ 0"non_monotonic_timestamp"
    • Checks if dt_s ≥ 5.0"suspicious_gap"
    • Checks if speed_mps > 20.0"impossible_speed"
    • Checks if |accel_mps2| > 5.0"excessive_acceleration"
    • Marks both rows as "tampered" if any flag is set
  4. GPS Precision Check (lines 169-179)
    • Counts decimal places in latitude/longitude strings
    • Finds mode (most common) precision
    • Flags rows with precision deviating by ≥2 decimal places
    • Adds "gps_precision_inconsistency" flag
  5. ML Anomaly Detection (lines 181-236)
    • Builds feature matrix X from all feature_rows (9 features)
    • Trains IsolationForest on current data
    • Calculates anomaly scores for each segment
    • Normalizes scores to 0.0-1.0 range
    • Calculates threshold as 70th percentile (min 0.12)
    • Flags rows with score ≥ threshold as "ml_anomaly"
  6. Evaluation Metrics (lines 243-282, if labels provided)
    • Loads ground truth labels from CSV
    • Compares predictions vs labels
    • Calculates TP, TN, FP, FN
    • Computes Accuracy, Precision, Recall, F1
  7. Report Generation (lines 284-330)
    • Creates JSON report with all findings
    • Writes anomaly_scores.csv
    • Generates PDF report
    • Logs to chain of custody

Decoded CSV (Browse)

Select Decoded/decoded_flightlog.csv.

Labels CSV (optional) + Label column

If you have ground truth labels, provide them to compute metrics (accuracy/precision/recall/F1).

row_idx,label
12,1
13,1
14,0

Run Tampering Analysis (button)

Writes:

How the ML works

How to interpret F1/accuracy: these metrics only appear if you provide labels. Without labels, the tool reports anomaly scores and rule-based flags, but supervised accuracy metrics are not meaningful.

7) Visualize tab (Component 4)

Goal: build an integrity-aware flight path and outputs.

Decoded CSV + Tampering Report JSON (Browse)

Select Decoded/decoded_flightlog.csv and Analysis/tampering_report.json.

Generate Flight Visualization (button)

Writes:

Visualization Process (Step-by-Step)

  1. Load Data (lines 91-96)
    • Loads points from decoded CSV
    • Loads tampering report JSON
    • Extracts row_status dictionary
  2. Build Integrity Segments (lines 99-118)
    • Groups consecutive points by status (authenticated/tampered/missing)
    • Detects gaps ≥ 5 seconds → "missing" status
    • Creates segments with color coding:
      • Green: authenticated (solid line)
      • Red: tampered (dashed line, pattern "10, 10")
      • Orange: missing data (dashed line, pattern "2, 8")
  3. Generate KML (lines 125-126)
    • Creates KML file for Google Earth
    • Format: longitude,latitude,altitude
  4. Generate Interactive Map (lines 128-146)
    • Uses folium library (Python wrapper for Leaflet.js)
    • Creates map centered on first point
    • Adds colored polylines for each segment
    • Adds start/end markers
    • Saves as HTML file
  5. Generate Charts (lines 169-191)
    • Altitude Chart:
      • X-axis: Timestamp (Unix epoch)
      • Y-axis: Altitude (meters)
      • Library: matplotlib
      • Output: altitude_chart.png (10×3 inches)
    • Speed Chart:
      • X-axis: Timestamp
      • Y-axis: Speed (m/s)
      • Library: matplotlib
      • Output: speed_chart.png (10×3 inches)
  6. Calculate Statistics (lines 153-164)
    • Total distance: Sum of all Haversine distances
    • Speed series: Calculated for each segment
  7. Generate Reports (lines 195-225)
    • Creates flight_summary.json with metadata
    • Generates flight_report.pdf
    • Logs to chain of custody

Map Color Coding Explained

Green Lines (Authenticated)

  • Color: Green
  • Style: Solid line (no dashes)
  • Weight: 4 pixels
  • Meaning: Trusted flight path segments
  • Code: visualize.py:134

Red Lines (Tampered)

  • Color: Red
  • Style: Dashed line (pattern "10, 10")
  • Weight: 4 pixels
  • Meaning: Suspicious/tampered segments
  • Code: visualize.py:135

Orange Lines (Missing)

  • Color: Orange
  • Style: Dashed line (pattern "2, 8")
  • Weight: 4 pixels
  • Meaning: Missing data gaps (≥5 seconds)
  • Code: visualize.py:135

Embedded map view

The app embeds flight_map.html if Qt WebEngine is available; otherwise use the "Open Map in Browser" button.

Code: visualize_tab.py:73-83 - Uses QWebEngineView if available

8) Testing the analysis (CLI)

You can run analysis without the GUI:

source .venv/bin/activate
python3 -c "from pathlib import Path; from uav_forensic_toolkit.core.analyze import analyze_tampering; \
from uav_forensic_toolkit.core.custody import OperatorContext; \
print(analyze_tampering(Path('output/Decoded/decoded_flightlog.csv'), Path('output/Analysis'), OperatorContext('op','0.1.0'), Path('output/chain_of_custody.jsonl'), print))"

9) Testing using curl (local HTTP API)

This project includes an optional local HTTP API to run analysis and return JSON. It is intended for local testing only.

source .venv/bin/activate
python3 -m uav_forensic_toolkit.api_server

Then in another terminal:

curl -s http://127.0.0.1:8000/health
curl -s -X POST http://127.0.0.1:8000/analyze \
  -F "decoded_csv=@output/Decoded/decoded_flightlog.csv" \
  -F "operator_id=test_op" \
  -o analysis_result.json

The response is a JSON object containing paths to generated artifacts and key summary fields.

Note: If you run the API, outputs are still written to a local directory; it is not intended for network exposure.

10) Model training (Random Forest)

The toolkit can train a supervised Random Forest model on a labeled CSV dataset and generate training reports and graphs.

Expected dataset format

The training script expects a CSV with at least these columns:

case_id,row_idx,label,timestamp,latitude,longitude,altitude

Extra columns are allowed and will be ignored by the trainer.

Train Random Forest (CLI)

source .venv/bin/activate
python3 -m uav_forensic_toolkit.cli_train_supervised \
  --model random_forest \
  --dataset-csv "path/to/labeled_dataset.csv" \
  --out-dir "output/ModelTraining_supervised_rf" \
  --max-segments 800000 \
  --test-size 0.2 \
  --seed 42 \
  --threshold 0.5

Training outputs

11) Reports and artifacts (PDF/JSON/CSV)

This toolkit produces both human-readable PDFs and machine-readable JSON/CSV artifacts.

Recover

Decode

Analyze

Visualize

Chain of custody

12) Where to find outputs

Everything from the GUI is stored under your Case Directory (example: output/).

13) Troubleshooting