Next Gen Cloud Diameter Routing Agent and Load Balancer
Cloud Diameter Routing Agent (DRA) provides Diameter message based routing including load balancing. For routing any part of diameter message can be used.


NextGen DRA, initially conceived as a startup, has developed a product that is now ready for evaluation, demonstration, and proof-of-concept testing in any environment. If you are interested in evaluating the product or would like to propose additional features, please contact alex.al.borisov@yandex.ru

NextGen DRA is compatible with the Diameter HTTP/2 protocol, which is utilized in 5G networks. It serves as a comprehensive Layer 7 load balancer for Diameter and HTTP/2. The platform offers a flexible framework that can be utilized to develop load balancers for well-known telecommunications protocols, such as SIP.

The following is a list of currently supported features:

Diameter based features
  • Centralized message routing: NextGen DRA can route traffic based on any AVP.
  • Diameter load balancer: Supports various load balancing algorithms, including round-robin and latency-based.
  • Session replication: Ensures session state is replicated across all NextGen DRAs in a cluster using Redis as centralized storage.
  • Session-based routing and non-session-based routing: Flexible routing options for Diameter messages.
  • Support for Diameter over TCP and Diameter over TLS, including routing between TCP and TLS.
  • Diameter message manipulations: Capabilities to add, remove, or replace AVPs in Diameter messages.
  • Diameter AVP-based firewall: Granular control over Diameter traffic based on AVPs.
  • Different forwarding rules for requests and responses: Configurable subsets of rules for incoming and outgoing messages.
  • AVP-based routing: Routing decisions can be made based on specific AVP values.
  • Transparent Diameter proxy functionality: Operates as a proxy for Diameter traffic.
  • Packet injection: Ability to inject packets into existing Diameter sessions.
  • Unlimited number of connections: No restrictions on the number of concurrent connections.
  • CE message customization: Different Diameter backends can receive customized CEs based on IP addresses.
  • Real-time metrics: Provides real-time metrics via JMX exporter.
  • Active/standby cluster support: Manages active and standby nodes, with the option to route traffic to a fallback DRA or other Diameter backend.
  • Endpoint disablement: Temporarily disables endpoints for maintenance purposes.
  • Docker container support: Can be deployed within a Docker container.
  • Kubernetes integration: Full support for Kubernetes deployment using Helm.
  • Control center implementation: Provides real-time monitoring and configuration management.
  • Configuration history logging: Logs all changes to the configuration.
NextGen DRA in Cloud.
There are a lot of cases where NextGen DRA can help. A lot of companies migrate diameter applications in cloud to use auto scaling and reduce costs of solution. Companies suppose that cloud can provide good traffic load balancing and suitable feature for L7 load balancing. That is true for protocols which based on HTTP/HTTP2. Cloud providers provides ingress via nginx. Those solutions works quite good and they are good for HTTP based applications.
What can happened with Diameter? Here is some cases

Case 1
Diameter application in cloud. Limited number if connections from external DRAs. Assumption that cloud provided will work as Diameter Load balancer. But cloud can support load balancing of http/http2. In this case single replica of application should able to handle huge traffic from single connection and application must responsible to do load balancing. Most applications does not have internal load balancer even having internal load balancing contradict with cloud approach. NextGen DRA can handle this case and all replicas will got appropriate amount of traffic.
Case 2
Diameter application in cloud and need to process software update. Software update correspond to restart of replicas of application. Restart of application bring to connection drop and customer DRA must reestablish connection to next replica. But in most cases link will be reestablished to old version of application and during next update link will be dropped. NextGen DRA can handle this case and all replicas will got appropriate amount of traffic and connection between NextGen DRA and customer DRA will not be dropped.
Case 3
Diameter application in cloud and 1 replica of component fail because of traffic spike. Connection will be dropped and reestablished with next replica. But stability of service will be impacted until traffic spike disappeared. NextGen DRA can handle this case of unpredictable restart of replica and all other replicas will got appropriate amount of traffic and connection between NextGen DRA and customer DRA will not be dropped.

Those case and more other cases can be solved using NextGen DRA in cloud

Next Gen inside cloud
Next steps of evolution of DRA is integration with 5G network as SCP as single product
  • 1st version for HTTP2 LB created
  • A lot of work done and need to do a lot of work to test
  • Need to implement SCP as application which control LB traffic forwarder engine
Next Gen DRA and SCP
Next steps of evolution of DRA is integration with 5G network as SCP as single product
  • current Nextgen DRA architecture allow to integrate fast parsing and fast forwarding HTTP/2 messages to endpoint
  • current Nextgen DRA architecture allow provide low latency
  • Nextgen DRA now support handling HTTP traffic. and protect services from huge amount of connections.
  • It is possible to configure size of connections to HTTP endpoint
Next Gen DRA and SCP
For demo/trials/poc please contact alex.al.borisov@yandex.ru
Kubernetes resources
Next steps of evolution of DRA is integration with 5G network as SCP as single product
  • current Nextgen DRA architecture allow to integrate fast parsing and fast forwarding HTTP/2 messages to endpoint
  • current Nextgen DRA architecture allow provide low latency
  • This is a kind of 1st message about SCP integration
Next Gen DRA and SCP
Next Gen DRA has integration with external system to manage subscribers.
  • Next Gen DRA has API to provide integration with external systems like OCS
  • API do import list if subscribers and able to do routing based on Subscription ID(443) E.164 number (MSISDN):
  • API do import list if subscribers and able to do routing based on Subscription ID(443) IMSI:
  • Next Gen DRA can segregate traffic based on Subscription ID AVP
Next Gen DRA in Active Active
Next Gen DRA integrated with prometheus and grafana.
  • Integration with prometheus done using prometheus jmx exporter. All metrics are exported
  • Grafana dashboard created. All metrics are available now
Kubernetes resources
Resources consumption inside Kubernetes cluster per instance
HA mode
Minimal (up to 2000 TPS)
  • Next Dra as L7 load balancer: 2 vCPU 16G of RAM
  • Redis: 2 vCPU 16G of RAM
Medium (up to 5000 TPS)
Next Dra as L7 load balancer: 4 vCPU 24G of RAM
Redis: 4 vCPU 32G of RAM
Standard (up to 10000 TPS)
Next Dra as L7 load balancer: 6 vCPU 32G of RAM
Redis: 4 vCPU 32G of RAM
Huge (up to 20000 TPS)
Next Dra as L7 load balancer: 10 vCPU 64G of RAM
Redis: 4 vCPU 32G of RAM

Kubernetes resources
Peering
Peering is usefull if Next Gen DRA act as L7 load balancer.
If connections between left and right side are not stable responses may be send to other instance of Next Gen DRA. In this case Next Gen DRA can forward responses to correct originator.
Besides peering help in case of imbalanced connections between Next Gen DRAs because it allow to balance connections and utilise Next Gen DRAs in much efficient way
Peering working only if Next Gen DRA configured as cluster

Peering case
How peering can work
DRA at Docker Hub
DRA docker images available at nextgen DRA
Package will contain
  1. helm chart (will be available soon)
  2. Docker image for kubernetes
  3. Docker image for pure docker run (will be available soon)
  4. Nextgen DRA control center
Supported feature in latest release:
  • Centralised message router (Gy, Sy tested)
  • Diameter Load balancer (Gy, Sy tested)
  • Round robin load balancing algorithm
  • Latency based load balancing algorithm
  • Support sessions replication between all Next Gen DRAs in cluster using Redis as centralised storage
  • Session based routing
  • Non session based routing
  • Support Diameter over TCP and Diameter over TLS and routing between TCP and TLS
  • Diameter messages manipulations (all types of manipulations related to add/remove/replace any AVPs)
  • Diameter AVP based firewall.
  • Different subset of forwarding rules can be configured for requests and responses.
  • Diameter AVP based routing (routing can be done by any AVP/AVPs)
  • Next Gen DRA can act as transparent diameter proxy
  • Packets injection in any Diameter Sessions
  • No limitations on number of connections.
  • CE messages customization based by IP address. Different diameter backends can recieve different CEs.
  • Different metrics can be provided at real time using JMX exporter
  • Can help to organize active/standby cluster. Forwarding can be done to fallback DRA or other diameter backend
  • Next Gen DRA can temporary disable endpoint it help in case of maintenance of exact server.
  • Can be run in docker container
  • Full support of Kubernetes using helm
  • Next Gen DRA control center implemented.
  • Next Gen DRA control center implemented provided real time monitoring data
  • Next Gen DRA control center support change of configuration in DRA
  • Configuration history. Next Gen DRA logs any changes in configuration

Planned features
  • Diameter over SCTP
Major update happened
Now, DRA supports the Latency based load balancing algorithm. The algorithm is based on calculating response times collected from different hosts. As a result, the load will be distributed more evenly compared to the Round Robin algorithm.
Supported feature in latest release:
  • Centralised message router (Gy, Sy tested)
  • Support sessions replication between all Next Gen DRAs in cluster
  • Session based routing
  • Non session based routing
  • Support Diameter over TCP and Diameter over TLS and routing between TCP and TLS
  • Diameter messages manipulations (all types of manipulations related to add/remove/replace any AVPs)
  • Diameter AVP based firewall.
  • Different subset of forwarding rules can be configured for requests and responses.
  • Diameter AVP based routing (routing can be done by any AVP/AVPs)
  • Next Gen DRA can act as transparent diameter proxy
  • Packets injection in any Diameter Sessions
  • No limitations on number of connections.
  • CE messages customization based by IP address. Different diameter backends can recieve different CEs.
  • Different metrics can be provided at real time
  • Can help to organize active/standby cluster. Forwarding can be done to fallback DRA or other diameter backend
  • Next Gen DRA can temporary disable endpoint it help in case of maintenance of exact server.
  • Can be run in docker container
  • Full support of Kubernetes using helm
  • Next Gen DRA control center implemented.

Planned features
  • Diameter over SCTP
New Next Gen DRA controller introduced.
Supported features:
  • Visualise configuration
  • Modify configuration and store download it.
  • Connect to Next Gen DRA cluster/single instance and configure it at realtime
  • Inject configuration to Next Gen DRA cluster (partially implemented)
  • Read Next Gen DRA metrics (partially implemented)
  • Install as helm chart (not implemented)
New Next Gen DRA controller introduced.
Supported features:
  • Visualise configuration
  • Modify configuration and store download it.
  • Connect to Next Gen DRA cluster/single instance and configure it at realtime (not implemented)
First version of DRA able to work in cloud.
Supported features:
  • Deployment via helm charts
  • Access working using Cluster IP service. DRA can learn remote party IPs from kubernetes service
  • Redis supported. Tested with bitnami redis. Thank you for greate helm charts
  • Autoscaling working as expected
  • Extremely fast startup - up to 30 seconds
Supported feature in latest release:
  • Centralised message router (Gy, Sy tested)
  • Support sessions replication between all Next Gen DRAs in cluster
  • Session based routing
  • Support Diameter over TCP and Diameter over TLS and routing between TCP and TLS
  • Diameter messages manipulations (all types of manipulations related to add/remove/replace any AVPs)
  • Diameter AVP based firewall
  • Diameter AVP based routing ( routing can be done by any AVP/AVPs)
  • Next Gen DRA can act as transparent diameter proxy
  • Packets injection in any Diameter Sessions
  • No limitations on number of connections.
  • CE message customization based by IP address
  • Different metrics can be provided at real time
  • Can be run in docker container

Planned features
  • Diameter over SCTP
  • Full Kubernetes support
Nearest future plans:
  1. Implement a user-friendly configuration tool. The tool is partially ready and allows you to configure the most useful cases.
  2. Implement full Kubernetes support, such as deployment in a Kubernetes cluster, autoscaling, and so on.


I decided to share DRA and documentation and a lot of usefull guides. Software is available here

DRA package

Content of package
  1. nextgen-dra.jar - executable jar file
  2. nextgen-dra.gz docker image with DRA
  3. run.sh - shell script to run DRA
  4. start-in-docker.sh shell script to run DRA in container
  5. config-templated - templates of configurations for DRA
  6. docs - documentation and how to use DRA

image from https://en.wikipedia.org/wiki/Google_Drive

Looking for mobile operators or companies who can help with test solution with real traffic or close to real traffic


image from https://triptonkosti.ru
Core features
  1. Can be run inside docker or virtual machine
  2. Unlimited number of connections
  3. Redundancy and HA
  4. Diameter reassembling of any AVPs (Grouped AVP included) (create,delete,rewrite operations)
  5. Support Diameter over TCP/TLS
  6. CER/CEA messages can be customized for each host
  7. Back/White list support based on any AVP

DRA capabilities


1 instance (non redundant mode) which can process 10,000 messages per second
8 VCPU
8-16 GB of RAM
1 Diameter o server
< 10 reassembling rules for diameter messages
About 1 million diameter sessions
< 100 OCSes as a backends
latency about 10ms

2 instances (cluster mode) which can process 10,000 messages per second per instance
DRA instance
8 VCPU
8-16 GB of RAM
< 10 reassembling rules for diameter messages
1 Diameter o server
About 10 million diameter sessions per instance
< 100 OCSes as a backends
latency about 10ms

Redis instance as persistent storage
4 VCPU
16 GB of RAM

Samples of configs introduces
small for small systems. Can process about 2,000 messages per second. 1 server preconfigured. Support unlimited number of backends and support reassembling rules . Good for systems with up to 1 vcpu and 8GB of RAM. Latency 20-500ms 1,500 messages per second.

medium good for medium systems. Can process about 5,000 messages per second. 1 server preconfigured. Support unlimited number of backends and support reassembling rules. Good for systems with up to 2 vcpu and 12GB of RAM. Latency 20ms 5,000 messages per second.

large good for large systems. Can process about 20,000 messages per second. 1 server preconfigured. Support unlimited number of backends and support reassembling rules. Good for systems with up to 8 vcpu and 16GB of RAM. Latency 10ms on 15,000 messages per second.

DRA can be configured in redundant mode.
Case TODO: Implement DR side support and track messages delivery.
If ACTIVE ocs not working/responding message will be routed/retransmitted to next available ocs (DR)

DRA active/dr handling
Case: DRA in cluster mode with session sharing
DRA can be configured in HA mode. Redis is used as session storage. Diameter session can start on DRA1 and finish on DRA2 and vice versa. In case of 1st DRA will fail sessions will continue to work using 2nd DRA


DRA for cross sites access
Case: Diameter traffic across 2 sites using TLS
DRA can be configured to send traffic between 2 sites using Diameter over TLS channel. Also it is possible to have connectivity to local OCS.


DRA for cross sites access
Case: Diameter traffic segregation by AVP and load balancing inside group.
This case explained how traffic can be segregated by DRA. We have several PGWs with different Destination Host AVP and DRA will segregated traffic by 2 different groups and DRA will perform load balancing inside each group. Diameter over TCP or diameter over TLS can be used.


DRA as load balancer group based case
Case: Diameter load balancing
This is very common case for diameter signaling processing. We have several PGWs which must be connected to OCS cluster. DRA provides equal load balancing across OCS cluster. For this case PGWs and OCSes can be connected using Diameter over TCP or Diameter over TLS.

DRA as load balancer
Why DRA and L7 load balancers will help in multiple cases
The most software components will be migrated to cloud soon. Most cloud platforms like OpenShift, AWS assumes that we are deploying http/https based application. And cloud have no ideas how to work with diameter. And diameter based services can have real issue in cloud.
Here is examples what can happened if standard kubernetes service will be used for diameter
During cloud deployment we expect that kubernetes service will distribute load across OCSes but it will not happened because service will work based on TCP sessions. In case of Expectation we assumes that traffic will be like this. But reality is OCS1 will be loaded and ocs software must take care about internal load balancing.
In case of DRA we connect directly to PGW via kubernetes service and DRA will do load balancing as a result all OCSes will be equally loaded

DRA cloud deployment
Why DRA is usefull
Here is classical multisite deployment for OCS infrastructure. To archive good redundancy network operator should have good interconnect between sites. For most cases network operator should create secured connections to forward traffic between LBs and OCSes. For most cases IPSEC tunnels must used. In general this solution works if you have enough resources and F5 load balancer with L7 support and network operator can guarantee no NAP(T)s between sites. In general adding IPSEC and extra LBs add costs to multisite solution

Redundancy with interconnect and LB
How DRA can help
In case of usage of DRA we have a bit different picture. Network operator dont need to think about tunnels and interconnects. DRA able to detect OCS outage and able to route traffic to next site depends on Routing Instance configuration on DRA. During failover DRA send diameter packets to DRA on site 2 via TLS and DRA will route it to OCS on site 2. In summary response will be send back to site 1. For this case public internet can be used because TLS will provide traffic encryption. After restoration DRA will switch traffic back as described in normal scenario. This solution is simpler in comparison with LB and IPSEC tunnels

Redundancy withDRA
Here is internal structure of DRA.
It consist of several modules
  1. Avp based router - responsible to do made decision about destination for exact diameter packet. Router send data to exact routing instance
  2. Packet reassembler - responsible to do packet reassembling bases on reassembling rules.
  3. Routing instance - subset of diameter hosts e.g. OCSes.
  4. Routing instance is used to do load balancing between hosts correspond to exact routing instance.
  5. Equal load balancing supported.
  6. DRA can contains multiple TCP/TLS servers. Servers are fully independent

19 February 2023
DRA internal architecture
Features
Redundancy and scalability
  1. DRA support sessions handover. Session started on DRA1 can continue on DRA2.
  2. Redis cluster used as storage. Usage of redis is configurable
  3. DRA message reassembling features:
Diameter message based routing
  1. DRA can have multiple Diameter servers all of them are independent
  2. Routing by any AVP including grouped AVP
  3. Logical expressions supported like if values of AVP1 and AVP2 are specified route to specified routing target
  4. Routing target contains any amount of diameter servers
Dimeter message reassembling
  1. DRA can insert any AVP to any message including AVPs in grouped AVPs
  2. DRA can delete any AVP to any message including AVPs in grouped AVPs
  3. DRA can modify any AVP to any message including AVPs in grouped AVPs

DRA performance
All test was done with following configuration
1 Diameter server
2 routing targets with 2 diameter servers
3 Message reassembling 3 rules
1 insert rule
1 update rule
1 delete rule

Diameter over TCP
  1. Redundant mode up to 5000 CPS per connection. Limited by redis cluster performance
  2. Non redundant mode up to 10000 CPS per connection. Limited by HW only
  3. Latency was 85% in range 0-50 ms
Diameter over TCL
  1. Redundant mode up to 5000 CPS per connection. Limited by redis cluster performance
  2. Non redundant mode up to 8000 CPS per connection. Limited by HW only
  3. Latency was 85% in range 0-50 ms

I'll appreciate if you can help with real tests with real diameter traffic. DRA is ready for tests
14 February 2023
TLS Beta version
Supported configurations:
Diameter over TCP -> DRA -> Diameter over TLS
Diameter over TLS -> DRA -> Diameter over TCP
Diameter over TLS -> DRA -> Diameter over TLS

TLS works and tested with 1000 CPS. Test scenario

Seagull -> DRA(TCP->TLS) -> DRA(TLS->TCP) -> Seagull

Latency 0-50ms it is really good for such scentio.

Diameter over TLS will help to communicate components where TLS is not supported. Also it allow to have fallback routes which help to connect other site without IPSEC tunnels

8 February 2023
TLS alfa
Testing following case Seagull -> DRA(TCP->TLS) -> DRA(TLS->TCP) -> Seagull
It was started inside same DRA process. DRA works but some issues were found. I can process 100CPS only
1 July 2022
DRA big release
Biggest DRA release
Features
Redundancy and scalability
  1. DRA support sessions handover. Session started on DRA1 can continue on DRA2.
  2. Redis cluster used as storage. Usage of redis is configurable
  3. DRA message reassembling features:
Diameter message based routing
  1. DRA can have multiple Diameter servers all of them are independent
  2. Routing by any AVP including grouped AVP
  3. Logical expressions supported like if values of AVP1 and AVP2 are specified route to specified routing target
  4. Routing target contains any amount of diameter servers
Dimeter message reassembling
  1. DRA can insert any AVP to any message including AVPs in grouped AVPs
  2. DRA can delete any AVP to any message including AVPs in grouped AVPs
  3. DRA can modify any AVP to any message including AVPs in grouped AVPs

DRA performance
All test was done with following configuration
1 Diameter server
2 routing targets with 2 diameter servers
3 Message reassembling 3 rules
1 insert rule
1 update rule
1 delete rule

  1. Redundant mode up to 5000 CPS per connection. Limited by redis cluster performance
  2. Non redundant mode up to 10000 CPS per connection. Limited by HW only
  3. Latency was 85% in range 0-50 ms

Limitations
  1. Update configuration without DRA restart - partially worked
  2. TLS support
  3. Improve redundant mode - to have >5000 CPS
1 May 2022
DRA first draft
DRA works after 1 year of prototyping
Features and NFRs
  1. Start - 2 seconds
  2. Performance 10000 PPS based on Seagull measurements
  3. 8000 PPS less than 100ms
  4. DRA diameter messages reassembler works
  5. DRA can work as L7 diameter load balancer
  6. up to 100000 session supported
Plans
  1. DRA redundancy
  2. TLS
  3. DWR/DWA and CER/CEA
  4. Memory and performance optimizations
  5. Metrics should provide what happened inside DRA
Photo credits: Nicola Albertini

All photo and video materials belong to their owners and are used for demonstration purposes only. Please do not use them in commercial projects.
This site was made on Tilda — a website builder that helps to create a website without any code
Create a website