Development of a robust technique for real-time synchronized data transmission from a Magnetic Observatory to an INTERMAGNET GIN
Since internet availability at PowerLine is very limited due to its remote location from a city, we approached a reliable permanent fiber optic setup from BSNL (Bharath Sanchar Nigam Limited) maintained by the government. from India. But it was too expensive to set up and maintain, so we used the facilities of a local service provider, with a maximum bandwidth of 20 Mbps to initiate the data transfer technique.
Online data transfer from CPL to HYB observatory started using cross-platform data transmission as ISP (internet service provider) resources were not available. As the service provider could not resolve some TCP/IP network issues regarding data transmission from one Linux machine to another remote Linux machine, we had to perform a cross-platform data transmission process, because the final data had to be processed on Windows-based Matlab codes.
Initially, we implemented shell scripts, cron jobs and the rsync protocol to transfer data from the Magrec-4B data logger to an intermediate Linux machine (Centos) deployed at PLC. The data was transferred from Magrec-4B to the Linux machine (backup storage) at the PLC control room with a latency of 5 min, then it was transferred to a Windows machine (client) at HYB Observatory using codes , scripts developed by us and , third-party tools (Fig. 2). The bandwidth being low, we decided to transfer the data from the Linux machine to the Windows pc at HYB-NGRI with a time-lapse of 1 min.
We installed a batch file with the “Abort” option and confirmed with the “Off” option to check the health of the connection on the client side (Windows PC), iterated for a default delay of 120 s. The session begins by verifying the host ID username and pre-entered password authenticated with the RSA key (Rivest, Shamir and Adelman) via SFTP (Secured File Transfer Protocol). The terms ‘Comparison’ and ‘Synchronization’ in the figure show the details of data transmission from the host to the client machine at conventional intervals with a time interval of 120s.
From Magrecc-4B, we selected 9 data parameters as shown in Fig. 2, to transmit real-time data to the client machine. Details of file size, each data parameter, and how fast data is transmitted from the host to the client machine are included. The percentages in column 5 of Figure 2 show the process of transmitting and updating data from the client machine. 100% data transfer is only achieved when data is copied with the last 120s records. Additionally, the client machine double-checks the data by synchronizing previous records of the current day. The example of the perpetual data transmission process with the latest records and the update process is also shown in line 9 of Fig. 2. Once the data is synchronized with the latest records (e.g. the filename of line 9 in Fig. 2), the 23 % transmission of the file will become 100% at the end of this task, in further synchronizing with previously recorded data. The file size of the above said nine parameters keeps increasing every 120s of updated data on the host machine. The whole process is repeated for each cycle of 120 s until the day is over.
As a large amount of data from the two observatories needs to be transferred and requires dedicated storage to back up the data on a daily basis, we have set up a server at the HYB observatory. And also, at CPL, the internet network services have recently been upgraded with the increased bandwidth of 50 Mbps (which is the maximum bandwidth available today), which allowed us to configure the technique automated robust data transmission to GIN and the details thereof are discussed below.
Since our main objective was to achieve an automated transmission of data within 1 minute from the HYB and CPL observatories to GIN, we had to make additional R&D efforts to develop a robust configuration concerning both the hardware (that is i.e., high-end workstation, firewall router configuration) and software. Thus, Python code, shell scripts, cron jobs and rsync protocol have been developed to support data transmission without data loss. Even if internet services are disconnected, once internet services are restored, the Python code will recheck the data from the last successfully transmitted file.
The transfer of data from CPL and HYB to the central server located at the HYB observatory follows the RSH and SSH key algorithm which is in itself a very secure algorithm. We have designed a system to transfer the data in a secure and encrypted model with SSH keys and save the same data set on the local CSIR-NGRI server. We used the RSA-SSH (Rivest–Shamir–Adleman) algorithm, which is a widely used public-key cryptosystem for secure data transmission. The key generated by the ssh-keygen in the source machine (MAGREC-DAS) will create two files namely “id_rsa & id_rsa.pub in the .ssh directory, which is shared/copied to the destination machine (Centos). There are so a perfect handshake between the source machine and the destination machine for data transfer.This configuration stays the same unless the network stays the same, that’s why we assigned a static IP address.In addition to the keys ssh, a code was written to transfer the data using the ‘rsync tool’ and the same was instilled in the ‘crontab’ to keep repeating the same with a 10s delay. same technique was also used at HYB observatory from Centos machine to server for secure and successful data transmission.
After successful R&D efforts of transmitting data from both observatories to a dedicated high-end Linux server, with a 24TB RAID-5 configuration at HYB Observatory, we created individual user accounts on the server, i.e. IMO-CPL, IMO-HYB, to store the data received from the respective observatories. The developed Python code will transfer several types of data from the DAS and store them daily in the respective user accounts (Fig. 3). Scripts developed from each Linux PC will filter data based on directory requirements (i.e., GIN). The sorted data of the individual directory will be transmitted with a latency period of 300 s to INTERMAGNET GIN.
After a successful transmission of data from both Observatories to the GIN, we encountered a few minor issues, and how we resolved them is discussed in detail:
Number 1 Initially, the Python code was executed using the “rsync synchronization protocol” with a minimum latency period of 60 s to transfer real-time data from the two observatories. As noted by GIN experts, with this latency period, the same data was repeatedly sent to the receiving web service (http://app.geomag.bgs.ac.uk/GINFileUpload/UploadForm.html), Fig . 4a, due to where the GIN storage/cache memory was receiving huge volumes of data from both observatories. This caused problems for their entire web service, log files filling up very quickly and the web service data cache was difficult to use as it took up a lot of disk space (Fig. 4b).
The solution To solve the above problem, we created background daemons instead of “rsync synchronization protocol”, so the data recheck every 60s changed to 300s. The backend daemons will execute Python code every 300s for smooth real-time data transmission without any duplication (as shown in Fig. 3).
Number 2 After successful transmission of data from both Observatories, on a few occasions the data tracing services at INTERMAGENT were not taken into account even though our hardware and software were intact. We cross-checked the logs on our end and found that the data was successfully uploaded to GIN. Even if the data records are successful, the reason why the data was not plotted on the INTERMAGNET website was unknown.
The solution The above issue has been resolved after BGS experts suggested a link (http://app.geomag.bgs.ac.uk/GINFileUpload/UploadForm.html) to upload a one day file to check if did he succeed or not? As suggested by BGS, if the data download was not successful and with some errors (Fig. 4), there is a problem with the INTERMAGNET server. This verification allowed us to see that the code we are running works correctly (Fig. 5).