All notable changes to the Ona Platform will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Controller/Renderer Architecture Migration (GitHub Issue #180):
- Complete separation of concerns between business logic (Controllers) and DOM rendering (Renderers)
- New Controller modules for all sections:
sections/IssuesController.js- Manages subscriptions, timers, and business logic for Issues sectionsections/DashboardController.js- Manages subscriptions, timers, WebSocket simulation, and maintenance plan approvalssections/PerformanceController.js- Manages subscriptions, timers, and performance data updatessections/MaintenanceController.js- Manages subscriptions, BOM operations, and maintenance plan approvals
- New Renderer modules for all sections:
sections/IssuesRenderer.js- Handles all DOM mutations and formatting for Issues sectionsections/DashboardRenderer.js- Handles all DOM mutations and formatting for Dashboard sectionsections/PerformanceRenderer.js- Handles all DOM mutations and formatting for Performance sectionsections/MaintenanceRenderer.js- Handles all DOM mutations and formatting for Maintenance section
- UIRenderQueue integration: All sections now use centralized render queue for coordinated DOM updates
- Event-driven architecture: Controllers subscribe to DataStore domains, Renderers handle DOM mutations atomically
- Backward compatibility: Legacy section modules still loaded for gradual migration
Changed
- UI Architecture (
ui/admin-gpu-panel.html,ui/admin-gpu-panel.js):- Updated to load new Controller and Renderer modules before legacy modules
- Section loading now uses Controller.load() pattern with async/await
- Event handlers updated to use Controller methods (e.g.,
handleApprovePlan,handleAddToBOM) - All sections follow consistent Controller/Renderer pattern
- Section Modules (
ui/sections/):- All sections migrated to Controller/Renderer architecture
- Controllers handle: subscriptions, timers, business logic, event delegation
- Renderers handle: DOM mutations, formatting, event listener attachment
- No inline onclick handlers - all use data-action attributes with event delegation
Fixed
- JS Safety Checker: Fixed undefined function call issue (renamed
unsubparameter tounsubscribe)
[1.2.0] - 2025-12-13
Added
- Asoba Internal Ops (Data Admin) System:
- Backend infrastructure with SAM template at
infrastructure/data-admin/ - Lambda functions: NewsroomFunction (S3 indexing), AssetsFunction (CRUD)
- DynamoDB table:
ona-platform-internal-assetswith GSIs (AssetTypeIndex, AssignedToIndex) - Newsroom Intel: S3-indexed article browsing (~8,700 articles from
news-collection-websitebucket) - Asset Tracking: Full CRUD for internal assets (laptops, devices) with user assignment
- Data Directory: Static HTML links to internal resources
- Frontend UI:
data-admin.html,data-admin.js,DataAdminService.js - Role-based access: Super Admin and Admin only
- Server-side pagination (50/100/200 items per page)
- 90-day default date range with filtering by geography, topic, search
- Dual view mode: iframe view and filterable table view
- User assignment validation with enrichment from
ona-platform-userstable
- Backend infrastructure with SAM template at
- Deployment Scripts:
scripts/28-create-data-admin-tables.sh- Creates internal assets DynamoDB tablescripts/29-create-data-admin-lambda.sh- Deploys Data Admin Lambda functions via SAM
- LLM Benchmarking System:
- Comprehensive benchmarking tests for EnergyAnalyst RAG service
- Automated test suite with 12 core tests for faster CI/CD execution
- Performance metrics and validation
Changed
- UI Deployment (
ui/deploy-edge.sh):- Added
data-admin.htmlanddata-admin.jsto deployment - Updated required files check and deployment output
- Added
- Application Selection (
ui/application-select.html):- Added “Asoba Internal Ops” application card (visible to Super Admin and Admin only)
- Integrated data-admin into APPLICATION_MAP
- Role-based visibility logic for data-admin card
- DataAdminService.js:
- Fixed authentication pattern to match AdminService (uses
authServicesingleton,ona_auth_token) - Added cache-busting parameters for GET requests (CloudFront 404 cache prevention)
- Fixed authentication pattern to match AdminService (uses
- Config.js (
ui/config.js):- Added
DATA_ADMIN_API_ENDPOINTconfiguration
- Added
- data-admin.js:
- Fixed authentication references:
authServiceinstead ofwindow.AuthService - Fixed method names:
isAuthenticated(),getUser(),logout() - Fixed skin service:
window.skinService.init()instead ofwindow.SkinService.loadSkin()
- Fixed authentication references:
Fixed
- EnergyAnalyst RAG Response Handling:
- Preserved newlines in
clean_responsefunction - Removed “Answer:” duplication in responses
- Removed aggressive response truncation
- Added proper
.railwayignoreto exclude test venvs
- Preserved newlines in
- CI/CD Workflows:
- Fixed variable expansion in GitHub Actions
- Fixed type annotations in benchmark tests (
any->Any) - Simplified sync-edge-ui-pr workflow
- Data Admin Authentication:
- Fixed immediate logout issue by correcting service references
- Ensured proper AuthService singleton usage
- Fixed skin service initialization
Documentation
- UI README (
ui/README.md):- Removed “Coming Soon” tags from Data Admin sections
- Marked Data Admin as deployed (2025-12-13)
- Updated Data Admin API base URL with actual endpoint
- System Admin Guide (
docs/SYSTEM ADMIN.md):- Added complete Data Admin deployment documentation
- Added Data Admin API endpoints documentation
- Updated DynamoDB tables section (7 -> 13 tables)
- Added documentation for user management tables (users, roles, groups, customers, skins)
- Added documentation for data admin table (internal-assets)
- Added SSM parameters for Data Admin
- Added troubleshooting guide for Data Admin
- AWS Infrastructure Diagram:
- Updated with current service catalog
Infrastructure
- AWS SAM Template (
infrastructure/data-admin/template.yaml):- Serverless API with 2 Lambda functions (NewsroomFunction, AssetsFunction)
- 30-second timeout and 512MB memory for S3 scanning performance
- CORS configuration for cross-origin requests
- Environment variables for S3 bucket, DynamoDB tables, JWT secret
- Cross-table access: AssetsFunction reads from
ona-platform-usersfor enrichment
- DynamoDB Table:
ona-platform-internal-assets- Internal asset tracking with user assignment- GSIs: AssetTypeIndex (query by asset type), AssignedToIndex (query by assigned user)
- Attributes: asset_type, serial_number, assigned_to, purchase_date, notes, status
- SSM Parameters:
/ona-platform/prod/data-admin/api-endpoint- API Gateway endpoint URL/ona-platform/prod/data-admin/assets-table- DynamoDB table name
- API Endpoint:
- Production:
https://pj1ud6q3uf.execute-api.af-south-1.amazonaws.com/prod - Stack:
ona-data-admin-prod(CloudFormation)
- Production:
[1.1.0] - 2025-11-29
Added
- User Management & Role-Based Access Control (RBAC) System:
- Complete user management infrastructure with AWS Lambda, API Gateway, and DynamoDB
- JWT-based authentication with secure password hashing (bcrypt)
- Role-based access control with 5 default roles: Super Admin, Admin, Operator, Viewer, Customer Admin
- Application-level permissions for fine-grained access control
- Customer association for multi-tenant access control
- User & Role Administration UI (
ui/user-admin.html) with full CRUD operations - AdminService.js for frontend API interactions
- AuthService.js for authentication and authorization checks
- Config.js for centralized API endpoint configuration
- Deployment Scripts:
scripts/22-create-user-management-tables.sh- Creates DynamoDB tables for users, roles, and customersscripts/23-create-user-management-lambda.sh- Deploys Lambda functions and API Gateway using SAMscripts/24-initialize-user-management.sh- Initializes default roles in DynamoDBscripts/25-create-super-admin-user.sh- Creates initial super admin userscripts/26-update-user-password.sh- Utility script for password updatesscripts/get-user-management-endpoint.sh- Helper script to retrieve API endpoint from SSM/CloudFormation
- Documentation:
docs/USER_MANAGEMENT_IMPLEMENTATION.md- Comprehensive user management documentation- Updated
docs/SYSTEM ADMIN.mdwith user management section - Added user management references and links
Changed
- UI Deployment (
ui/deploy-edge.sh):- Added
user-admin.htmlanduser-admin.jsto deployment - Added
config.jsto deployment - Updated to include all admin page files
- Added
- Application Selection (
ui/application-select.html):- Added dynamic application filtering based on user role and permissions
- Added “User & Role Administration” application card (visible to Super Admin and Admin only)
- Implemented APPLICATION_MAP for centralized application configuration
- Login Page (
ui/index.html):- Integrated with AuthService.js for API-based authentication
- Replaced simple sessionStorage authentication with JWT token-based system
- Added config.js loading before AuthService
Fixed
- API Gateway Path Parsing:
- Fixed path parsing in Lambda functions to handle API Gateway stage names (
/prod/api/users) - Added validation to distinguish between resource names (
users,roles) and IDs (user_xxx,role_xxx) - Prevents treating resource names as IDs, which caused 404 errors
- Fixed path parsing in Lambda functions to handle API Gateway stage names (
- Error Handling:
- Improved AdminService.js error handling to distinguish Lambda errors from API Gateway errors
- Added cache-busting query parameters to avoid CloudFront 404 cache issues
- Enhanced error messages with detailed debugging information
- DynamoDB Region Configuration:
- Fixed DynamoDB client initialization to use correct region (af-south-1)
- Added debug logging for troubleshooting user/role lookups
- CloudFront Caching:
- Added cache-busting parameters to GET requests
- Changed API Gateway endpoint type to REGIONAL to avoid CloudFront caching issues
- Build Artifacts:
- Added
.aws-sam/to.gitignoreto prevent committing build artifacts
- Added
Infrastructure
- AWS SAM Template (
infrastructure/user-management/template.yaml):- Serverless API with 4 Lambda functions: AuthFunction, UsersFunction, RolesFunction, PermissionsFunction
- REGIONAL endpoint configuration to avoid CloudFront caching
- CORS configuration for cross-origin requests
- Environment variables for DynamoDB table names and JWT secret
- DynamoDB Tables:
ona-platform-users- User accounts with username indexona-platform-roles- Role definitions with application permissionsona-platform-customers- Customer data for multi-tenant access
- SSM Parameters:
/ona-platform/prod/user-management/api-endpoint- API Gateway endpoint URL- Role IDs stored in SSM for easy reference
[1.0.0] - 2025-11-28
[2025-11-28]
Added
- Document Title Extraction and Citation Support (
services/energyAnalystRag/):- Document title extraction from first page of PDFs (first substantial line, 50-200 chars)
- Document title extraction from text documents in
/add_documentsendpoint - Citation field added to
QueryResponsemodel - Title extraction prioritizes metadata
document_title, falls back to text extraction - All documents now include
document_titlein metadata for proper citation
- Handler Improvements (
services/energyAnalystRag/handler_fixed.py):- Handler now extracts only generated text (removes input prompt from response)
- Uses token slicing to return only new tokens after input
- Prevents full prompt from appearing in response
- Manual ECR Build Guide (
docs/MANUAL_ECR_BUILD.md):- Complete guide for manually building and pushing ECR images
- Instructions for building specific services
- Troubleshooting for common ECR build issues
- Service-Specific ECR Build Script (
scripts/build-energyanalystrag-ecr.sh):- Dedicated script for building and pushing energyAnalystRag service to ECR
- Supports both mutable and immutable tags
- Includes verification and logging
- Railway Redeploy Automation (
scripts/redeploy-railway-energyanalystrag.sh):- Automated Railway redeployment script
- Integrated into ECR build process for automatic redeployment after image push
- Includes Railway CLI detection and login validation
Changed
- EnergyAnalyst RAG Service (
services/energyAnalystRag/main.py):- Updated Inference Endpoint URL to new endpoint:
sfg89dy7nzesdkl7.us-east-1.aws.endpoints.huggingface.cloud - Improved error handling and logging for Inference Endpoint responses
- Added detailed logging for response extraction steps
- Simplified citation extraction logic (metadata-first approach)
- Enhanced startup validation with better error messages
- Updated Inference Endpoint URL to new endpoint:
- Query Response Format:
- Added
citationfield toQueryResponsemodel - Citation extracted from
document_titlemetadata with fallback chain - Removed complex text extraction logic in favor of metadata-based approach
- Added
Fixed
- Inference Endpoint Response Handling:
- Fixed
KeyError('generated_text')by usingdetails=Trueintext_generation()calls - Added proper response object extraction with fallback handling
- Improved error messages to identify exact failure points
- Fixed
- Handler Response Format:
- Handler now returns only generated text instead of full sequence
- Prevents prompt repetition in responses
Documentation
- EnergyAnalyst RAG Troubleshooting Guide (
docs/ENERGYANALYST_RAG_TROUBLESHOOTING.md):- Comprehensive troubleshooting guide for HuggingFace Inference Endpoint issues
- Railway deployment troubleshooting
- Model loading and inference issues
- Vector database and authentication solutions
- Production deployment checklist
- Service Documentation Updates:
- Updated README with two-tier deployment architecture documentation
- Added HuggingFace Inference Endpoint setup guide
- Documented handler requirements and dependencies
- Added production endpoint URLs and configuration examples
[2025-11-27]
Changed
- HuggingFace API Integration (
services/energyAnalystRag/):- Updated to use HuggingFace Inference Endpoints (dedicated endpoints) instead of router API
- Migrated from deprecated
api-inference.huggingface.coendpoint - Updated
huggingface-hubto>=0.28.0,<0.32.0with[hf_xet]extra - Updated environment variable handling to prioritize
HUGGING_FACE_HUB_TOKENandHF_TOKEN - Kept
HUGGINGFACE_API_TOKENas fallback for backward compatibility
- Railway Deployment:
- Removed
Procfileto force Railway to useDockerfilefor builds - Updated deployment configuration for better compatibility
- Removed
Fixed
- InferenceClient Configuration:
- Fixed issues with
base_urlandproviderparameter usage - Corrected InferenceClient initialization for Inference Endpoints
- Fixed authentication issues causing 410 errors
- Fixed issues with
- Error Handling:
- Added
GatedRepoErrorhandling for better startup validation - Improved error messages for model access issues
- Added
[2025-11-26]
Added
- EnergyAnalyst RAG LLM Service (
services/energyAnalystRag/):- FastAPI-based RAG service using EnergyAnalyst-v0.1 model (Mistral-7B-v0.3 fine-tuned)
- ChromaDB vector database for document storage and semantic search
- Sentence-transformers for document embeddings
- HuggingFace Inference API integration
- Endpoints:
/query,/add_documents,/health,/collection/info,/collection/clear - Specialized capabilities:
- Regulatory compliance requirement identification
- Energy policy gap detection and analysis
- Arbitrage opportunity spotting in regulations
- Actionable compliance checklist generation
- Model training: 3-stage pipeline (SFT on Dolly-15k, pre-training on 50k policy docs, fine-tuning on 7k Q&A pairs)
- Containerized with Docker for ECR deployment
- Test Script (
services/energyAnalystRag/test_hf_connection.py):- Local testing script for HuggingFace Inference Endpoint connection
- Rapid troubleshooting tool for endpoint validation
Fixed
- ChromaDB Telemetry:
- Suppressed PostHog telemetry errors causing log spam
- Set telemetry logger to CRITICAL level only
- Dockerfile:
- Fixed package check to verify
huggingface-hubinstead ofopenai - Corrected dependency validation in build process
- Added cache-busting mechanism using requirements.txt hash
- Fixed package check to verify
- InferenceClient Configuration:
- Fixed to use
InferenceClientfor text generation models (not OpenAI SDK) - Corrected endpoint configuration for HuggingFace router API
- Added model access validation on startup
- Fixed multiple iterations of endpoint configuration (router API, base_url, provider parameters)
- Fixed to use
Changed
- CI/CD Pipeline (
.github/workflows/build-and-push.yml):- Added
ona-energyanalystragto ECR repository creation list - New build step for EnergyAnalyst RAG service Docker image
- Pushes three image tags:
prod,prod-{gitsha},latest - Platform: linux/amd64
- Added
- Deployment Script (
ui/deploy-edge.sh):- Added
energy-analyst.htmlto required files check - Included Energy Analyst in deployment copy operations
- Added direct link in deployment output
- Added
Documentation
- Railway ECR Deployment Guide (
services/energyAnalystRag/RAILWAY_ECR_DEPLOYMENT.md):- Complete guide for deploying RAG service to Railway using ECR images
- IAM setup instructions for ECR access
- Railway configuration (dashboard and CLI methods)
- Automatic deployment with webhooks
- Monitoring, troubleshooting, and rollback procedures
- Cost optimization strategies
- Security best practices
- Service Documentation (
services/energyAnalystRag/README.md):- Service overview and architecture
- API endpoint documentation with examples
- Model details and limitations
- Deployment options (Railway ECR vs direct)
- Local development setup
- Environment variables reference
- Deployment Tools:
railway.json: Railway service configurationsetup-railway-ecr.sh: Automated deployment scripttest_api.py: API validation script.env.example: Environment variable template
- Release Notes (
docs/RELEASE_NOTES.md):- Comprehensive release notes for EnergyAnalyst RAG service
- Includes features, improvements, bug fixes, and migration notes
[2025-11-23]
Added
- Device-Level Training (
globalTrainingService):- New device-level training pipeline that trains individual LSTM models per device (serial_number)
- Device discovery scans S3 for device datasets under
total/{client_id}/{site_id}/{region}/{location}/{manufacturer}/{device_id}/ - Device quality filtering with lower thresholds (500 records, 3 months, 80% completeness)
- Site registry management storing device mappings at
site_registry/{site_id}/devices.json - Device-specific feature engineering with device/site/manufacturer statistics
- Device validation strategy: trains on other devices, validates on target device
- Device model artifacts stored at
device_models/{site_id}/{device_id}/models/ - Uses smaller SageMaker instances (ml.g4dn.xlarge) optimized for single-device training
- Device-Level Forecasting (
forecastingApi):- New endpoint for single device forecasts:
{"site_id": "...", "device_id": "...", "forecast_hours": 24} - New site aggregate endpoint:
{"site_id": "...", "forecast_hours": 24, "include_device_breakdown": true} - Site forecasts aggregate device forecasts by summing predictions
- Optional device breakdown shows per-device contributions to site total
- New endpoint for single device forecasts:
- Backwards Compatibility: Legacy customer_id (site-level) training and forecasting APIs remain fully supported
[2025-11-18]
Fixed
- UI & Charting: Prevented legacy chart rendering when the new Performance module is active.
[2025-11-17]
Changed
- UI Architecture: Modularized the UI into a component-based architecture with
components/,sections/,services/, andutils/directories. This improves code organization, reusability, and maintainability. - Customer Selection: Switched to using
localStoragefor customer selection instead of hardcoding, allowing user preferences to persist across sessions. - API Calls: Converted parallel API calls to sequential loading to prevent Lambda throttling issues.
- Deployment Scripts: Added
set -euo pipefailto all shell scripts to ensure safer and more robust execution.
Fixed
- UI & Charting:
- Fixed an issue where orphaned Chart.js instances were not being destroyed before creating new charts.
- Corrected time ranges for the Performance chart and added a missing temperature chart.
- Ensured the Performance module correctly uses
DataServicefunctions and data structures. - Fixed access to forecast comparison series in
Performance.js. - Enabled interactive time range selection for charts in the Performance section.
- Data & API:
- Corrected a syntax error resulting from an async/await conversion.
- Fixed incorrect
Issuesmodule function references in thehandleDataRefreshlogic. - Added missing terminal endpoints to the API Gateway configuration.
- Changed the default customer from “Sibaya” to “demo-customer” to align with available API data.
- Fixed an issue causing the site listing not to refresh when the customer selector was changed.
- Deployment & CI/CD:
- Updated deployment scripts to correctly deploy the new modular JS directories (components, sections, services, utils) to S3.
- Updated the Docker cache in the CI/CD workflow.
- Tooling & Safety:
- Updated the JavaScript safety checker to correctly recognize callback parameters and browser APIs.
Documentation
- UI README:
- Added a comprehensive module API reference to the UI README.
- Updated the UI README structure section to reflect the new modular architecture.
- Added a testing quick reference in the
ui/directory.
- System Admin Docs:
- Referenced the
ui/README.mdAPI documentation inSYSTEM ADMIN.mdfor better discoverability.
- Referenced the
[2025-11-15]
Added
- Test suite for terminalApi: Added an endpoint test suite for terminalApi validation.
Changed
- Testing: Moved and renamed the main test script to the
tests/directory for better organization.
[0.2.0] - 2025-10-17
Added
- Terminal Environment Configuration (
config/terminal-environment.sh)- Standalone configuration extending platform config
- 4 terminal services defined (terminalApi, terminalOoda, terminalAssets, terminalBom)
- Terminal-specific Lambda memory/timeout configurations
- 8 helper functions for resource management
- Tag inheritance with Component=terminal extension
- Terminal SSM Parameters (26 parameters via
scripts/14-create-terminal-parameters.sh)- OODA configuration: detection threshold, loss weights, severity levels, fault categories
- Alert configuration: SNS topic, email, enabled flag
- API configuration: rate limiting, timeout, debug mode
- Integration endpoints: parts API, weather API, maintenance system
- Operational parameters: crew count, work hours, maintenance windows, priorities
- Data retention policies: asset history, schedules, orders, tracking
- Feature flags: OODA, auto-schedule, auto-order, AI diagnostics
- Terminal API Service with 7 OODA workflow endpoints (
/terminal/*)/terminal/assets- Asset management operations/terminal/detect- Fault detection/terminal/diagnose- AI diagnostics/terminal/schedule- Maintenance scheduling/terminal/bom- Bill of materials generation/terminal/order- Work order creation/terminal/track- Job tracking
- Comprehensive test suite for Global Training Service (261 lines, 4 test scenarios)
- AI Coding Guidelines documentation (
.claude/rules/ai-coding-guidelines.md) - Parallel processing for deployment scripts
- Docker-optional Lambda deployment support
- Global Training Service README with detailed LSTM architecture documentation
Changed
- Separated terminal configuration from platform config (config/environment.sh → config/terminal-environment.sh)
- Removed terminal tables, services array, and helper functions from main config
- Updated 5 scripts to source terminal config: 03, 05, 08, 10, 17
- Clean separation of concerns for maintainability
- Optimized IAM role creation with parallel processing (70% faster: 35-56s → 10-15s)
- Optimized Lambda deployment with parallel updates (75% faster: 7 minutes → 1.5 minutes)
- Optimized API Gateway endpoint creation with parallel execution (70% faster: 30-50s → 8-12s)
- Reduced redundant Lambda wait operations (6 waits → 4 waits per function)
- Updated MLflow to version 3.4.0 (from 2.6.0)
- Fixed IAM policy variable expansion (removed redundant string substitution in lines 169-175)
- Improved API Gateway idempotency for nested terminal endpoints
Fixed
- DynamoDB Decimal serialization in Terminal API (added DecimalEncoder class)
- API Gateway method creation now properly checks for existing methods
- CloudWatch logging and error handling improvements
- ECR login handling when Docker is not available (graceful fallback)
Performance
- Total deployment time reduced by 77% (8-9 minutes → 1.8-2 minutes)
- IAM creation: 35-56s → 10-15s (70% improvement)
- Lambda deployment: 420s (7m) → 90s (1.5m) (75% improvement)
- API Gateway setup: 30-50s → 8-12s (70% improvement)
Security
- All deployment scripts maintain idempotency for safe re-execution
- Proper error tracking and reporting in parallel processes
- Thread-safe CloudWatch logging
[0.1.0] - 2025-10-13
Added
- Initial platform implementation
- Core services:
dataIngestion- Real-time SCADA/inverter data ingestionweatherCache- Weather data integration (15-minute intervals)interpolationService- Data enrichment and ML interpolationglobalTrainingService- LSTM model training orchestrationforecastingApi- 30+ day forecasting capabilities
- DynamoDB tables:
ona-platform-locations- Location and customer dataona-platform-weather-cache- Weather data cache- Terminal tables (assets, schedules, BOMs, orders, tracking)
- S3-based data pipeline
sa-api-client-input- Input data bucketsa-api-client-output- Output data and models bucket
- API Gateway integration with custom domain support (
api.asoba.co) - Automated weather data collection via Visual Crossing API
- ML-based data interpolation
- LSTM forecasting capabilities (placeholder)
- Deployment automation scripts (12 scripts)
- DNS infrastructure setup with SSL/TLS certificates
- CloudWatch logging and monitoring
Infrastructure
- AWS Lambda functions (containerized with Docker)
- API Gateway REST API
- S3 storage buckets
- DynamoDB tables
- EventBridge scheduling
- Route53 DNS management
- ACM SSL/TLS certificates
- ECR Docker registries