Edit model card

Introduction

Reverse Dictionary
This dictionary is not a dictionary that tells you the meaning when you enter a word, but a dictionary that tells you the words corresponding to the meaning when you enter sentence.

I used μš°λ¦¬λ§μƒ˜ dataset, which consists of a lot of information such as word, word meanings, word types, synonymsm, and example sentences. Then, only the words and their meaning were separated to fit the model input structure.

The process of model training

Because I worked in a colab environments, I used Unsloth, a finetuning optimization tool that is useful for small GPU resources.

I used gemma-2-9b-bnb-4bit model among the models supported by Unsloth. This model was 4bit quantized and trained by modifying the parameters. But during the learning process, evaluation couldn't be performed due to 'out of memory', and the entire dataset was trained.

You can find detailed code on github below.

Result

An example inference is as follows: image/png

First of all, we tested simple 10 words:

λΉ„ν–‰κΈ° - 동λ ₯으둜 ν”„λ‘œνŽ λŸ¬λ₯Ό λŒλ¦¬κ±°λ‚˜ μ—°μ†Œ κ°€μŠ€λ₯Ό λ‚΄λΏœλŠ” νž˜μ— μ˜ν•˜μ—¬ μƒκΈ°λŠ” μ–‘λ ₯(ζšεŠ›)을 μ΄μš©ν•˜μ—¬ κ³΅μ€‘μœΌλ‘œ λ– μ„œ λ‚ μ•„λ‹€λ‹ˆλŠ” 항곡기

κ°€λ°© - 물건을 λ„£μ–΄ λ“€κ±°λ‚˜ λ©”κ³  닀닐 수 있게 λ§Œλ“  용ꡬ

고양이 - κ³ μ–‘μž‡κ³Όμ˜ ν•˜λ‚˜. μ›λž˜ μ•„ν”„λ¦¬μΉ΄μ˜ 리비아살쾑이λ₯Ό 길듀인 κ²ƒμœΌλ‘œ, ν„±κ³Ό μ†‘κ³³λ‹ˆκ°€ 특히 λ°œλ‹¬ν•΄μ„œ μœ‘μ‹μ„ 주둜 ν•œλ‹€. λ°œν†±μ€ 자유둭게 κ°μΆ”κ±°λ‚˜ λ“œλŸ¬λ‚Ό 수 있으며, λˆˆμ€ μ–΄λ‘μš΄ κ³³μ—μ„œλ„ 잘 λ³Ό 수 μžˆλ‹€. μ• μ™„λ™λ¬Όλ‘œλ„ μœ‘μ’…ν•˜μ—¬ μ—¬λŸ¬ ν’ˆμ’…μ΄ μžˆλ‹€.

μ˜ν™” - μΌμ •ν•œ 의미λ₯Ό κ°–κ³  μ›€μ§μ΄λŠ” λŒ€μƒμ„ μ΄¬μ˜ν•˜μ—¬ μ˜μ‚¬κΈ°λ‘œ μ˜μ‚¬λ§‰μ— μž¬ν˜„ν•˜λŠ” μ’…ν•© 예술.

μžλ™μ°¨ - 원동기λ₯Ό μž₯μΉ˜ν•˜μ—¬ κ·Έ 동λ ₯으둜 바퀴λ₯Ό κ΅΄λ €μ„œ μ² κΈΈμ΄λ‚˜ κ°€μ„€λœ 선에 μ˜ν•˜μ§€ μ•„λ‹ˆν•˜κ³  λ•… μœ„λ₯Ό 움직이도둝 λ§Œλ“  μ°¨. 승용차, μŠΉν•©μžλ™μ°¨, ν™”λ¬Ό μžλ™μ°¨, 특수 μžλ™μ°¨ 및 이λ₯œμžλ™μ°¨κ°€ μžˆλ‹€.

λ°”λ‚˜λ‚˜ - 파초과의 상둝 μ—¬λŸ¬ν•΄μ‚΄μ΄ν’€. λ†’μ΄λŠ” 3~10미터이며, λ•…μ†μ˜ μ•Œμ€„κΈ°μ—μ„œ 죽순 λͺ¨μ–‘μ˜ 싹이 λ‚˜μ™€ κΈ΄ νƒ€μ›ν˜•μ˜ 녹색 잎이 8~10κ°œκ°€ λ­‰μ³λ‚˜κ³ , κΈ΄ μžŽκΉμ§€κ°€ μ„œλ‘œ 겹쳐 헛쀄기λ₯Ό μ΄λ£¨λ©΄μ„œ μžλž€λ‹€. μ΄ˆμ—¬λ¦„μ— μ»€λ‹€λž€ 꽃쀄기가 λ‚˜μ™€ 엷은 λˆ„λŸ°μƒ‰μ˜ μž”κ½ƒμ΄ 이삭 λͺ¨μ–‘μœΌλ‘œ ν”Όκ³ , μ—΄λ§€λŠ” μ‹μš©ν•œλ‹€. μ—΄λŒ€ 지방이 μ›μ‚°μ§€λ‘œ μš°λ¦¬λ‚˜λΌμ—μ„œλŠ” μ˜¨μ‹€μ—μ„œ μž¬λ°°ν•œλ‹€.

컴퓨터 - μ „μž 회둜λ₯Ό μ΄μš©ν•œ κ³ μ†μ˜ μžλ™ 계산기. 숫자 계산, μžλ™ μ œμ–΄, 데이터 처리, 사무 관리, μ–Έμ–΄λ‚˜ μ˜μƒ 정보 처리 λ”°μœ„μ— κ΄‘λ²”μœ„ν•˜κ²Œ μ΄μš©λœλ‹€.

사과 - μ‚¬κ³Όλ‚˜λ¬΄μ˜ 열맀.

μ±… - 쒅이λ₯Ό μ—¬λŸ¬ μž₯ λ¬Άμ–΄ 맨 물건.

학ꡐ - μΌμ •ν•œ λͺ©μ γ†κ΅κ³Ό κ³Όμ •γ†μ„€λΉ„γ†μ œλ„ 및 λ²•κ·œμ— μ˜ν•˜μ—¬ κ³„μ†μ μœΌλ‘œ ν•™μƒμ—κ²Œ κ΅μœ‘μ„ μ‹€μ‹œν•˜λŠ” κΈ°κ΄€.

The result is that 7 out of 10 words were guessed correctly, and 2 words were output as similar words.
And 10% of the dataset was used as a testset.

References

https://github.com/teddylee777/langchain-kr/tree/main/18-FineTuning


If you want to see more,

Downloads last month
16
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for hj08/Gemma2-9b-reverse-dictionary

Finetuned
(225)
this model