ðè«ææ å ±
ðãã®è«æã®ããŒã¡ãã»ãŒãž
- ïŒ1, 2æã§ãŸãšããïŒ
ðã©ãããåé¡ã«åãçµãã ã®ã
- SAEãçšããç¹åŸŽééžæã«ãããŠãå ¥åãšåºåã®ç¹åŸŽéã®ããããã«åœ±é¿ãããç¹åŸŽéãèŠã€ããããš
ð§âðãã®åé¡ã«åãçµãããšããªãéèŠãªã®ã
- Sparse AutoEncoder(SAE)ã¯ä»å ¥ããããã®ç¹åŸŽéãéžæããæã«æå¹ãªææ³ã§ãã
- ã ããä»å ¥ã®ããã«æå¹ãªç¹åŸŽãéžæããããšã¯ãŸã æªç¥ã®åé¡ã§ãã
ð¡åé¡è§£æ±ºã«åããããŒã¢ã€ãã¢ã¯äœã
- ç¹åŸŽéã以äžã®äºçš®é¡ã«åé¡ããåé¡ããããã®ææšãææ¡ãã
- Input featuresïŒã¢ãã«ã«å ¥åããããã¿ãŒã³ãèªèããç¹åŸŽé
- Output featuresïŒã¢ãã«ãçæããããŒã¯ã³ã«åœ±é¿ããç¹åŸŽé
- ãããã®åæã«ã¯ãLogit Lensã䜿çšãããŠãã
- Logit Lengsã¯ã¢ãã«ã®ãã©ã¡ãŒã¿ãèªåœç©ºéã«å°åœ±ãããã®åºåååžãèŠãŠãã©ã¡ãŒã¿ãåæããæ¹æ³ã®ããš
- Input featuresã®ã¹ã³ã¢ã®èšç®ã«ã¯ãä»»æã®æç« éåãçšãã
- ãã®æç« éåã«ãããŠæã倧ããSAEã®ããŒã¯ã³ãçºç«ãããããŒã¯ã³ãšãLogit Lensã«ããäºæž¬ãããããŒã¯ã³ã®äžèŽçãã¹ã³ã¢ãšããŠãã
- Output Featuresã®ã¹ã³ã¢ã®èšç®ã«ã¯Logit Lensã«ããäºæž¬ãããããŒã¯ã³ã®ã¹ã³ã¢ãšé äœã確çã䜿çšãã
- ãã®ç¹åŸŽéã«ä»å ¥ãè¡ã£ãæã®ã¢ãã«ã®åºåååžãšä»å ¥ãããåã®ååžã®å·®ãã¹ã³ã¢ãšããŠãã
- Logit Lensã«ããäºæž¬çµæãçšããŠä»å ¥ããåã®åºåååžãèšç®ããŠããããããåãããªãã£ã
ðæ°ãã«åãã£ãããšã¯äœã
- äžèšã®ã¹ã³ã¢ãGemmaãLlamaã«é©çšããæãGemmaã«ãããŠã¯å
¥åã«è¿ãå±€ã§ã¯Input featuresãåºåã«è¿ãå±€ã§ã¯Output Featuresã®ã¹ã³ã¢ã倧ãããªã£ãŠããã
- ãã以å€ã®ã¢ãã«ã«ãããŠã¯ããã®åŸåã¯åœãŠã¯ãŸã£ãŠããªã
- Output featuresãé«ããã©ã¡ãŒã¿ã«ä»å
¥ããããšã«ããåºåæç« ã®å€åãèšç®ãã
- å®éšã§ã¯ãã¹ã³ã¢ã«éŸå€ãçšæãä»å ¥ããç¹åŸŽéãéžæããŠãã
- è©äŸ¡ã«ã¯ãGeneration Success@Kã䜿çšããŠããã
- Logit Lensã«ããäºæž¬ãããTop-kã®ããŒã¯ã³ãšæç« ã«å«ãŸããããŒã¯ã³ã®äžèŽçãèšç®ããŠããã
- éŸå€ãäžãããšãGeneration Success@Kãäžæããããšãåãã£ã
âçåç¹ã¯äœã
- ã¹ã³ã¢ã®èšç®çµæã§ãããããªçµæãåºãŠããã®ãGemmaã ããªã®ãæ°ã«ãªã
- ä»å ¥ã®çµæã¯åæ§ã®åŸåã瀺ããŠãã
- çµå±Output featuresãé«ããã®ãè¯ãç¹åŸŽã§ããã®ãïŒ
- ä»å
¥ã®æ¹æ³ãè¯ãåãããªãã£ã
- æ¹åãæ±ºããæ¹æ³ãç¥ããã