Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

前処理の問題点と改善内容 #7

Open
Tracked by #1
Unagi2 opened this issue Aug 21, 2022 · 0 comments
Open
Tracked by #1

前処理の問題点と改善内容 #7

Unagi2 opened this issue Aug 21, 2022 · 0 comments

Comments

@Unagi2
Copy link
Owner

Unagi2 commented Aug 21, 2022

Preprocessの改善

この前共有したデータにも,下記の問題点が含まれている.(train_generated_cwea_repeat1_7words_gramarcheck.csv)
現在は問題点は解消している.加えて生成モデルのFine Tuningでは,Loss Scoreが -0.0245 と改善し,語彙や文の構築の精度も上がっている.

Trainデータ内の変化

  • 正しく文を分割できておらず,単語が連結してしまい,英文として機能していない.改善後は,ピリオドで分割され単語や文がそれぞれ分離して,学習モデルや生成モデルへの影響を減少することができたと考えられる.

[Before]
SegmentationMeasurementOmnichannelResource Allocation and OptimizationExperimental design (test/control) analysesCollaborate on cross-matrix teams as well as influence across varying levels of leadership, by demonstrating subject matter leadership with other teams/functions to drive efficiencies and seamless decision making.

[After]
Segmentation.Measurement.Omnichannel.Resource Allocation and Optimization.Experimental design (test/control) analyses.Collaborate on cross-matrix teams as well as influence across varying levels of leadership, by demonstrating subject matter leadership with other teams/functions to drive efficiencies and seamless decision making.

生成データ内の変化

  • Inputデータ元が正しく前処理されておらず,HTMLタグが含まれており,生成データにもその影響が現れており,類似文が生成できていない.
  • 句点が無かったり,セミコロンが存在していたが,修正後はピリオドが打たれており不必要な記号を排除している

[Input Data] 生成モデルinput元データ
<span style="font-family:"Calibri",sans-serif">Conceptualize and design innovative models to assess viability of new ongoing initiatives and program offerings.<span style="font-family:"Calibri",sans-serif">Build out enterprise-wide framework for reporting.<span style="font-family:"Calibri",sans-serif">Perform ad-hoc analysis for internal and external stakeholders.<span style="font-family:"Calibri",sans-serif">Other duties as assigned.

[Before] 生成データ
Large lt span style sans font family calibri lt sans Lt span style font family Calibri sans-serif Typescript, Helvetica Mockingbird (Scattletoad, HTML, PDF, TypeScript) Translate web features into HTML Change control functionality with CSS / CSS, CSS Interfaces (HTML, XML) for enhanced layouts Interpreting complex user interactions, and providing a clear and concise and effective interface Effectively manage the deployment of web applications Generate and maintain user-facing notifications, notifications and notifications to alert the user

[After] 生成データ
Conceptualize alternatives and eventually design innovative simulation models to assess the effectiveness of existing technology solutions. Provide support and mentorship to internal groups on business performance analysis, design tools, and solutions using industry trends, trends in research and development, business practices, industry practices and practices. Work with and communicate effectively with cross-functional partners. Participate in the development and delivery of research initiatives that are aimed at solving business problems for the company. Responsible for delivering outstanding research products to audiences around the world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant