However, reliable optical data transmission in harsh marine environments encounters interference from waves and currents, as it requires strict link alignment. The problems of beam misalignment and pointing errors have been addressed by several research efforts. Luo et al \cite{mss-node-uav} proposed a beam pointing adjustment algorithm based on a soft actor-critic approach to mitigate the effect of waves on system performance. Shin et al. \cite{shin-ioagent-2023} proposed a two-stage DRL algorithm for controlling data transmission, where beam divergence angle and transmission power are jointly optimized to maintain seamless connectivity between ocean surface vehicles and underwater sensors. Weng et al. \cite{Weng2024AUV} proposed a reinforcement learning‐based alignment method to control the AUV to establish an optical link and maintain alignment utilizing multiple sensors in conjunction with particle filters. However, the former are controlling the beam to be vertically upward or horizontally aligned, and do not take into account more flexible pointing relationships. Shin et al. \cite{shin-bobd-2024} proposed a two-step two-agent deep reinforcement learning algorithm that enables an underwater sensor installed on the seabed to sequentially determine the beam orientation (BO) and beam divergence (BD) angles for transmitting its sensing data to an USV that may irregularly shake above the sea level. Although it uses BO and BD to improve the reliability of communication, the two agents are adjusted in two steps still face the problem of dynamic changes in the environment when the actual control is deployed.