Найти Ρ‚Π΅ΠΌΡƒ
10,2 тыс подписчиков

πŸ’¬ ΠŸΠΎΠ»Π΅Π·Π½Ρ‹Π΅ NLP инструмСнты: Π‘ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠ° fastText


fastText - это Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅ΠΊΠ° для Π°Π½Π°Π»ΠΈΠ·Π° ΠΈ классификации тСкста.

Π’ΠΎΡ‚ ΠΊΠ°ΠΊ Π·Π°Π³Ρ€ΡƒΠ·ΠΈΡ‚ΡŒ ΠΈ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ ΠΏΡ€Π΅Π΄Π²Π°Ρ€ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎ ΠΎΠ±ΡƒΡ‡Π΅Π½Π½Ρ‹Π΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ:

import fasttext
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id="facebook/fasttext-en-vectors", filename="model.bin")
model = fasttext.load_model(model_path)
model.words

['the', 'of', 'and', 'to', 'in', 'a', 'that', 'is', ...]

len(model.words)

145940

model['bread']

array([ 4.89417791e-01, 1.60882145e-01, -2.25947708e-01, -2.94273376e-01,
-1.04577184e-01, 1.17962055e-01, 1.34821936e-01, -2.41778508e-01, ...])

Π’ ΡΠ»Π΅Π΄ΡƒΡŽΡ‰Π΅ΠΌ ΠΏΡ€ΠΈΠΌΠ΅Ρ€Ρ‹ ΠΌΡ‹ Π±ΡƒΠ΄Π΅ΠΌ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ ΠΌΠ΅Ρ‚ΠΎΠ΄ Π±Π»ΠΈΠΆΠ°ΠΉΡˆΠΈΡ… сосСдСй:

import fasttext
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id="facebook/fasttext-en-nearest-neighbors", filename="model.bin")
model = fasttext.load_model(model_path)
model.get_nearest_neighbors("bread", k=5)

[(0.5641006231307983, 'butter'),
(0.48875734210014343, 'loaf'),
(0.4491206705570221, 'eat'),
(0.42444291710853577, 'food'),
(0.4229326844215393, 'cheese')]

Π’ΠΎΡ‚ ΠΊΠ°ΠΊ ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΠΎΠ²Π°Ρ‚ΡŒ эту модСль для опрСдСлСния языка ΠΈΠ· Π²Π²Π΅Π΄Π΅Π½Π½ΠΎΠ³ΠΎ тСкста:

import fasttext
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
model = fasttext.load_model(model_path)
model.predict("Hello, world!")

(('__label__eng_Latn',), array([0.81148803]))

model.predict("Hello, world!", k=5)

(('__label__eng_Latn', '__label__vie_Latn', '__label__nld_Latn', '__label__pol_Latn', '__label__deu_Latn'),
array([0.61224753, 0.21323682, 0.09696738, 0.01359863, 0.01319415]))

β–ͺGithub

1 ΠΌΠΈΠ½ΡƒΡ‚Π°
196 Ρ‡ΠΈΡ‚Π°Π»ΠΈ