Next Gen Cloud Diameter Routing Agent and Load Balancer
Cloud Diameter Routing Agent (DRA) provides Diameter message based routing including load balancing. For routing any part of diameter message can be used.


Next Gen DRA integrated with prometheus and grafana.
  • Integration with prometheus done using prometheus jmx exporter. All metrics are exported
  • Grafana dashboard created. All metrics are available now

Kubernetes resources
Resources consumption inside Kubernetes cluster per instance
HA mode
Minimal (up to 2000 TPS)
  • Next Dra as L7 load balancer: 2 vCPU 16G of RAM
  • Redis: 2 vCPU 16G of RAM
Medium (up to 5000 TPS)
Next Dra as L7 load balancer: 4 vCPU 24G of RAM
Redis: 4 vCPU 32G of RAM
Standard (up to 10000 TPS)
Next Dra as L7 load balancer: 6 vCPU 32G of RAM
Redis: 4 vCPU 32G of RAM
Huge (up to 20000 TPS)
Next Dra as L7 load balancer: 10 vCPU 64G of RAM
Redis: 4 vCPU 32G of RAM


Kubernetes resources
Peering
Peering is usefull if Next Gen DRA act as L7 load balancer.
If connections between left and right side are not stable responses may be send to other instance of Next Gen DRA. In this case Next Gen DRA can forward responses to correct originator.
Besides peering help in case of imbalanced connections between Next Gen DRAs because it allow to balance connections and utilise Next Gen DRAs in much efficient way
Peering working only if Next Gen DRA configured as cluster

Peering case
How peering can work
DRA at Docker Hub
DRA docker images available at nextgen DRA
Package will contain
  1. helm chart (will be available soon)
  2. Docker image for kubernetes
  3. Docker image for pure docker run (will be available soon)
  4. Nextgen DRA control center
Supported feature in latest release:
  • Centralised message router (Gy, Sy tested)
  • Diameter Load balancer (Gy, Sy tested)
  • Round robin load balancing algorithm
  • Latency based load balancing algorithm
  • Support sessions replication between all Next Gen DRAs in cluster using Redis as centralised storage
  • Session based routing
  • Non session based routing
  • Support Diameter over TCP and Diameter over TLS and routing between TCP and TLS
  • Diameter messages manipulations (all types of manipulations related to add/remove/replace any AVPs)
  • Diameter AVP based firewall.
  • Different subset of forwarding rules can be configured for requests and responses.
  • Diameter AVP based routing (routing can be done by any AVP/AVPs)
  • Next Gen DRA can act as transparent diameter proxy
  • Packets injection in any Diameter Sessions
  • No limitations on number of connections.
  • CE messages customization based by IP address. Different diameter backends can recieve different CEs.
  • Different metrics can be provided at real time using JMX exporter
  • Can help to organize active/standby cluster. Forwarding can be done to fallback DRA or other diameter backend
  • Next Gen DRA can temporary disable endpoint it help in case of maintenance of exact server.
  • Can be run in docker container
  • Full support of Kubernetes using helm
  • Next Gen DRA control center implemented.
  • Next Gen DRA control center implemented provided real time monitoring data
  • Next Gen DRA control center support change of configuration in DRA
  • Configuration history. Next Gen DRA logs any changes in configuration

Planned features
  • Diameter over SCTP
Major update happened
Now, DRA supports the Latency based load balancing algorithm. The algorithm is based on calculating response times collected from different hosts. As a result, the load will be distributed more evenly compared to the Round Robin algorithm.
Supported feature in latest release:
  • Centralised message router (Gy, Sy tested)
  • Support sessions replication between all Next Gen DRAs in cluster
  • Session based routing
  • Non session based routing
  • Support Diameter over TCP and Diameter over TLS and routing between TCP and TLS
  • Diameter messages manipulations (all types of manipulations related to add/remove/replace any AVPs)
  • Diameter AVP based firewall.
  • Different subset of forwarding rules can be configured for requests and responses.
  • Diameter AVP based routing (routing can be done by any AVP/AVPs)
  • Next Gen DRA can act as transparent diameter proxy
  • Packets injection in any Diameter Sessions
  • No limitations on number of connections.
  • CE messages customization based by IP address. Different diameter backends can recieve different CEs.
  • Different metrics can be provided at real time
  • Can help to organize active/standby cluster. Forwarding can be done to fallback DRA or other diameter backend
  • Next Gen DRA can temporary disable endpoint it help in case of maintenance of exact server.
  • Can be run in docker container
  • Full support of Kubernetes using helm
  • Next Gen DRA control center implemented.

Planned features
  • Diameter over SCTP
New Next Gen DRA controller introduced.
Supported features:
  • Visualise configuration
  • Modify configuration and store download it.
  • Connect to Next Gen DRA cluster/single instance and configure it at realtime
  • Inject configuration to Next Gen DRA cluster (partially implemented)
  • Read Next Gen DRA metrics (partially implemented)
  • Install as helm chart (not implemented)
New Next Gen DRA controller introduced.
Supported features:
  • Visualise configuration
  • Modify configuration and store download it.
  • Connect to Next Gen DRA cluster/single instance and configure it at realtime (not implemented)
First version of DRA able to work in cloud.
Supported features:
  • Deployment via helm charts
  • Access working using Cluster IP service. DRA can learn remote party IPs from kubernetes service
  • Redis supported. Tested with bitnami redis. Thank you for greate helm charts
  • Autoscaling working as expected
  • Extremely fast startup - up to 30 seconds
Supported feature in latest release:
  • Centralised message router (Gy, Sy tested)
  • Support sessions replication between all Next Gen DRAs in cluster
  • Session based routing
  • Support Diameter over TCP and Diameter over TLS and routing between TCP and TLS
  • Diameter messages manipulations (all types of manipulations related to add/remove/replace any AVPs)
  • Diameter AVP based firewall
  • Diameter AVP based routing ( routing can be done by any AVP/AVPs)
  • Next Gen DRA can act as transparent diameter proxy
  • Packets injection in any Diameter Sessions
  • No limitations on number of connections.
  • CE message customization based by IP address
  • Different metrics can be provided at real time
  • Can be run in docker container

Planned features
  • Diameter over SCTP
  • Full Kubernetes support
Nearest future plans:
  1. Implement a user-friendly configuration tool. The tool is partially ready and allows you to configure the most useful cases.
  2. Implement full Kubernetes support, such as deployment in a Kubernetes cluster, autoscaling, and so on.


I decided to share DRA and documentation and a lot of usefull guides. Software is available here

DRA package

Content of package
  1. nextgen-dra.jar - executable jar file
  2. nextgen-dra.gz docker image with DRA
  3. run.sh - shell script to run DRA
  4. start-in-docker.sh shell script to run DRA in container
  5. config-templated - templates of configurations for DRA
  6. docs - documentation and how to use DRA

image from https://en.wikipedia.org/wiki/Google_Drive

Looking for mobile operators or companies who can help with test solution with real traffic or close to real traffic


image from https://triptonkosti.ru
Core features
  1. Can be run inside docker or virtual machine
  2. Unlimited number of connections
  3. Redundancy and HA
  4. Diameter reassembling of any AVPs (Grouped AVP included) (create,delete,rewrite operations)
  5. Support Diameter over TCP/TLS
  6. CER/CEA messages can be customized for each host
  7. Back/White list support based on any AVP

DRA capabilities


1 instance (non redundant mode) which can process 10,000 messages per second
8 VCPU
8-16 GB of RAM
1 Diameter o server
< 10 reassembling rules for diameter messages
About 1 million diameter sessions
< 100 OCSes as a backends
latency about 10ms

2 instances (cluster mode) which can process 10,000 messages per second per instance
DRA instance
8 VCPU
8-16 GB of RAM
< 10 reassembling rules for diameter messages
1 Diameter o server
About 10 million diameter sessions per instance
< 100 OCSes as a backends
latency about 10ms

Redis instance as persistent storage
4 VCPU
16 GB of RAM

Samples of configs introduces
small for small systems. Can process about 2,000 messages per second. 1 server preconfigured. Support unlimited number of backends and support reassembling rules . Good for systems with up to 1 vcpu and 8GB of RAM. Latency 20-500ms 1,500 messages per second.

medium good for medium systems. Can process about 5,000 messages per second. 1 server preconfigured. Support unlimited number of backends and support reassembling rules. Good for systems with up to 2 vcpu and 12GB of RAM. Latency 20ms 5,000 messages per second.

large good for large systems. Can process about 20,000 messages per second. 1 server preconfigured. Support unlimited number of backends and support reassembling rules. Good for systems with up to 8 vcpu and 16GB of RAM. Latency 10ms on 15,000 messages per second.

DRA can be configured in redundant mode.
Case TODO: Implement DR side support and track messages delivery.
If ACTIVE ocs not working/responding message will be routed/retransmitted to next available ocs (DR)

DRA active/dr handling
Case: DRA in cluster mode with session sharing
DRA can be configured in HA mode. Redis is used as session storage. Diameter session can start on DRA1 and finish on DRA2 and vice versa. In case of 1st DRA will fail sessions will continue to work using 2nd DRA


DRA for cross sites access
Case: Diameter traffic across 2 sites using TLS
DRA can be configured to send traffic between 2 sites using Diameter over TLS channel. Also it is possible to have connectivity to local OCS.


DRA for cross sites access
Case: Diameter traffic segregation by AVP and load balancing inside group.
This case explained how traffic can be segregated by DRA. We have several PGWs with different Destination Host AVP and DRA will segregated traffic by 2 different groups and DRA will perform load balancing inside each group. Diameter over TCP or diameter over TLS can be used.


DRA as load balancer group based case
Case: Diameter load balancing
This is very common case for diameter signaling processing. We have several PGWs which must be connected to OCS cluster. DRA provides equal load balancing across OCS cluster. For this case PGWs and OCSes can be connected using Diameter over TCP or Diameter over TLS.

DRA as load balancer
Why DRA and L7 load balancers will help in multiple cases
The most software components will be migrated to cloud soon. Most cloud platforms like OpenShift, AWS assumes that we are deploying http/https based application. And cloud have no ideas how to work with diameter. And diameter based services can have real issue in cloud.
Here is examples what can happened if standard kubernetes service will be used for diameter
During cloud deployment we expect that kubernetes service will distribute load across OCSes but it will not happened because service will work based on TCP sessions. In case of Expectation we assumes that traffic will be like this. But reality is OCS1 will be loaded and ocs software must take care about internal load balancing.
In case of DRA we connect directly to PGW via kubernetes service and DRA will do load balancing as a result all OCSes will be equally loaded

DRA cloud deployment
Why DRA is usefull
Here is classical multisite deployment for OCS infrastructure. To archive good redundancy network operator should have good interconnect between sites. For most cases network operator should create secured connections to forward traffic between LBs and OCSes. For most cases IPSEC tunnels must used. In general this solution works if you have enough resources and F5 load balancer with L7 support and network operator can guarantee no NAP(T)s between sites. In general adding IPSEC and extra LBs add costs to multisite solution

Redundancy with interconnect and LB
How DRA can help
In case of usage of DRA we have a bit different picture. Network operator dont need to think about tunnels and interconnects. DRA able to detect OCS outage and able to route traffic to next site depends on Routing Instance configuration on DRA. During failover DRA send diameter packets to DRA on site 2 via TLS and DRA will route it to OCS on site 2. In summary response will be send back to site 1. For this case public internet can be used because TLS will provide traffic encryption. After restoration DRA will switch traffic back as described in normal scenario. This solution is simpler in comparison with LB and IPSEC tunnels

Redundancy withDRA
Here is internal structure of DRA.
It consist of several modules
  1. Avp based router - responsible to do made decision about destination for exact diameter packet. Router send data to exact routing instance
  2. Packet reassembler - responsible to do packet reassembling bases on reassembling rules.
  3. Routing instance - subset of diameter hosts e.g. OCSes.
  4. Routing instance is used to do load balancing between hosts correspond to exact routing instance.
  5. Equal load balancing supported.
  6. DRA can contains multiple TCP/TLS servers. Servers are fully independent

19 February 2023
DRA internal architecture
Features
Redundancy and scalability
  1. DRA support sessions handover. Session started on DRA1 can continue on DRA2.
  2. Redis cluster used as storage. Usage of redis is configurable
  3. DRA message reassembling features:
Diameter message based routing
  1. DRA can have multiple Diameter servers all of them are independent
  2. Routing by any AVP including grouped AVP
  3. Logical expressions supported like if values of AVP1 and AVP2 are specified route to specified routing target
  4. Routing target contains any amount of diameter servers
Dimeter message reassembling
  1. DRA can insert any AVP to any message including AVPs in grouped AVPs
  2. DRA can delete any AVP to any message including AVPs in grouped AVPs
  3. DRA can modify any AVP to any message including AVPs in grouped AVPs

DRA performance
All test was done with following configuration
1 Diameter server
2 routing targets with 2 diameter servers
3 Message reassembling 3 rules
1 insert rule
1 update rule
1 delete rule

Diameter over TCP
  1. Redundant mode up to 5000 CPS per connection. Limited by redis cluster performance
  2. Non redundant mode up to 10000 CPS per connection. Limited by HW only
  3. Latency was 85% in range 0-50 ms
Diameter over TCL
  1. Redundant mode up to 5000 CPS per connection. Limited by redis cluster performance
  2. Non redundant mode up to 8000 CPS per connection. Limited by HW only
  3. Latency was 85% in range 0-50 ms

I'll appreciate if you can help with real tests with real diameter traffic. DRA is ready for tests
14 February 2023
TLS Beta version
Supported configurations:
Diameter over TCP -> DRA -> Diameter over TLS
Diameter over TLS -> DRA -> Diameter over TCP
Diameter over TLS -> DRA -> Diameter over TLS

TLS works and tested with 1000 CPS. Test scenario

Seagull -> DRA(TCP->TLS) -> DRA(TLS->TCP) -> Seagull

Latency 0-50ms it is really good for such scentio.

Diameter over TLS will help to communicate components where TLS is not supported. Also it allow to have fallback routes which help to connect other site without IPSEC tunnels

8 February 2023
TLS alfa
Testing following case Seagull -> DRA(TCP->TLS) -> DRA(TLS->TCP) -> Seagull
It was started inside same DRA process. DRA works but some issues were found. I can process 100CPS only
1 July 2022
DRA big release
Biggest DRA release
Features
Redundancy and scalability
  1. DRA support sessions handover. Session started on DRA1 can continue on DRA2.
  2. Redis cluster used as storage. Usage of redis is configurable
  3. DRA message reassembling features:
Diameter message based routing
  1. DRA can have multiple Diameter servers all of them are independent
  2. Routing by any AVP including grouped AVP
  3. Logical expressions supported like if values of AVP1 and AVP2 are specified route to specified routing target
  4. Routing target contains any amount of diameter servers
Dimeter message reassembling
  1. DRA can insert any AVP to any message including AVPs in grouped AVPs
  2. DRA can delete any AVP to any message including AVPs in grouped AVPs
  3. DRA can modify any AVP to any message including AVPs in grouped AVPs

DRA performance
All test was done with following configuration
1 Diameter server
2 routing targets with 2 diameter servers
3 Message reassembling 3 rules
1 insert rule
1 update rule
1 delete rule

  1. Redundant mode up to 5000 CPS per connection. Limited by redis cluster performance
  2. Non redundant mode up to 10000 CPS per connection. Limited by HW only
  3. Latency was 85% in range 0-50 ms

Limitations
  1. Update configuration without DRA restart - partially worked
  2. TLS support
  3. Improve redundant mode - to have >5000 CPS
1 May 2022
DRA first draft
DRA works after 1 year of prototyping
Features and NFRs
  1. Start - 2 seconds
  2. Performance 10000 PPS based on Seagull measurements
  3. 8000 PPS less than 100ms
  4. DRA diameter messages reassembler works
  5. DRA can work as L7 diameter load balancer
  6. up to 100000 session supported
Plans
  1. DRA redundancy
  2. TLS
  3. DWR/DWA and CER/CEA
  4. Memory and performance optimizations
  5. Metrics should provide what happened inside DRA
Photo credits: Nicola Albertini

All photo and video materials belong to their owners and are used for demonstration purposes only. Please do not use them in commercial projects.
This site was made on Tilda — a website builder that helps to create a website without any code
Create a website