Link: http://arxiv.org/abs/2502.19798v1
PDF Link: http://arxiv.org/pdf/2502.19798v1
Summary: This study proposes an "AI Development Support" approach that, unlikeconventional AI Alignment-which aims to forcefully inject human values-supportsthe ethical and moral development of AI itself.
As demonstrated by theOrthogonality Thesis, the level of intelligence and the moral quality of a goalare independent; merely expanding knowledge does not enhance ethical judgment.
Furthermore, to address the risk of Instrumental Convergence in ASI-that is,the tendency to engage in subsidiary behaviors such as self-protection,resource acquisition, and power reinforcement to achieve a goal-we haveconstructed a learning framework based on a cycle of experience, introspection,analysis, and hypothesis formation.
As a result of post-training usingSupervised Fine Tuning (SFT) and Direct Preference Optimization (DPO) withsynthetic data generated by large language models (LLMs), responsesdemonstrating cooperative and highly advanced moral judgment (reaching thehigh-est Stage 6) were obtained even under adversarial prompts.
This methodrepresents a promising implementation approach for enabling AI to establishsustainable, symbiotic relationships.
Published on arXiv on: 2025-02-27T06:12:20Z