YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
์ธตํ ๋ถ๋ฆฌ๋ ๋ฐ์ดํฐ ๋ถํ
๋ถํ ๋ฐฉ๋ฒ
- ์ธตํ ๋ถ๋ฆฌ (Stratified Split): industry ๋ผ๋ฒจ ๋ถํฌ๋ฅผ ๊ณ ๋ คํ์ฌ ๋ถํ
- design_idx ๊ทธ๋ฃนํ: ๋์ผํ ๋์์ธ์ด ์ฌ๋ฌ split์ ๋๋์ง ์๋๋ก ์ฒ๋ฆฌ
- ๋น์จ: Train 70% / Val 10% / Test 20%
- Random Seed: 42
ํต๊ณ
Design ์
- Train: 44,022๊ฐ designs
- Val: 6,228๊ฐ designs
- Test: 12,736๊ฐ designs
- Total: 62,986๊ฐ designs
๋ ์ฝ๋ ์
- Train: 70,109๊ฐ records
- Val: 9,981๊ฐ records
- Test: 20,340๊ฐ records
- Total: 100,430๊ฐ records
Industry ๋ถํฌ (Train ์์ 10๊ฐ)
- ๊ธฐ์ /๋น์ฆ๋์ค/์ ๋ฌธ์๋น์ค > ์ ์กฐ/์ค๊ณต์ /๊ธฐ๊ณ/๊ธ์: 3,645๊ฐ (5.20%)
- IT/ํ ํฌ > IT/์น/๋ฐ์ดํฐ: 2,602๊ฐ (3.71%)
- ๋ถ๋์ฐ/๊ฑด์ถ/ํ๊ฒฝ > ๊ฑด์ถ > ๊ฑด์ถ์ค๊ณ/์ธํ ๋ฆฌ์ด์๊ณต: 2,127๊ฐ (3.03%)
- ์ ์ข ๋ฒ์ฉ > ๊ธฐํ์/๋ณด๊ณ ์/์ ์์: 2,000๊ฐ (2.85%)
- ์ ์ข ๋ฒ์ฉ > ์์ค์๋ด/์คํผ์ค๊ด๋ฆฌ: 1,928๊ฐ (2.75%)
- ์๋ฃ/๊ฑด๊ฐ > ๋ณ์/์์/์๋ฃ๊ธฐ๊ด: 1,640๊ฐ (2.34%)
- ๊ณต๊ณต/๊ธฐ๊ด > ์ ๋ถ/๊ณต๊ณต๊ธฐ๊ด > ์ค์์ ๋ถ/์ง์์ฒด: 1,628๊ฐ (2.32%)
- ๊ต์ก/์ปค๋ฆฌ์ด > ํ์/์จ๋ผ์ธ๊ต์ก/๊ธฐํ > ์ผ๋ฐํ์ตํ์: 1,574๊ฐ (2.25%)
- ์์๋ฃ/์ธ์ > ์์ฌ๋ฃ/์ํํ๋งค > ๋์ฐ/์ฒญ๊ณผ/์์ฐ: 1,282๊ฐ (1.83%)
- ๋ถ๋์ฐ/๊ฑด์ถ/ํ๊ฒฝ > ํ๊ฒฝ/์๋์ง/ESG > ํ๊ฒฝ์ ํ/ํ๊ธฐ๋ฌผ: 1,228๊ฐ (1.75%)
๊ฒ์ฆ
๊ฐ split์ industry ๋ถํฌ๊ฐ ์ ์ฒด ๋ฐ์ดํฐ์ ๋ถํฌ์ ์ ์ฌํ๊ฒ ์ ์ง๋ฉ๋๋ค.
์ฌํ ๋ฐฉ๋ฒ
cd opensource
python scripts/stratified_split.py
์์ฑ์ผ: 2026-03-10 ๋ฐฉ๋ฒ: Stratified sampling by industry labels with design_idx grouping
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support