

{"id":4742,"date":"2024-07-02T07:22:57","date_gmt":"2024-07-02T07:22:57","guid":{"rendered":"https:\/\/aigender.net\/?p=4742"},"modified":"2024-08-11T08:43:10","modified_gmt":"2024-08-11T08:43:10","slug":"mkllm-7b-mkd","status":"publish","type":"post","link":"https:\/\/aigender.net\/index.php\/2024\/07\/02\/mkllm-7b-mkd\/","title":{"rendered":"MKLLM-7B e \u043f\u0440\u0432\u0438\u043e\u0442 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 open-source \u0433\u043e\u043b\u0435\u043c \u0458\u0430\u0437\u0438\u0447\u0435\u043d \u043c\u043e\u0434\u0435\u043b"},"content":{"rendered":"\n<p><a href=\"https:\/\/huggingface.co\/spaces\/trajkovnikola\/MKLLM-7B-Instruct\" data-type=\"link\" data-id=\"https:\/\/huggingface.co\/spaces\/trajkovnikola\/MKLLM-7B-Instruct\">MKLLM-7B<\/a> e \u043f\u0440\u0432\u0438\u043e\u0442 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 open-source \u0433\u043e\u043b\u0435\u043c \u0458\u0430\u0437\u0438\u0447\u0435\u043d \u043c\u043e\u0434\u0435\u043b. \u041f\u0440\u0435\u043a\u0443 \u0430\u0434\u0430\u043f\u0442\u0430\u0446\u0438\u0458\u0430 \u043d\u0430 Mistral-7B, \u0433\u0440\u0443\u043f\u0430 \u043d\u0430 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 \u0418\u0422 \u043f\u0440\u043e\u0444\u0435\u0441\u0438\u043e\u043d\u0430\u043b\u0446\u0438 \u0438 \u0435\u043d\u0442\u0443\u0437\u0438\u0458\u0430\u0441\u0442\u0438 \u0433\u043e \u043a\u0440\u0435\u0438\u0440\u0430\u0430 MKLLM-7B, \u043f\u0440\u0432\u0438\u043e\u0442 open-source LLM \u043d\u0430 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438. \u0418\u0441\u0442\u0438\u043e\u0442 \u0435 \u0438\u0441\u0442\u0440\u0435\u043d\u0438\u0440\u0430\u043d \u0441\u043e \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438 \u0440\u0435\u0441\u0443\u0440\u0441\u0438 \u0438 \u0441\u043f\u043e\u0440\u0435\u0434 \u043d\u0435\u0433\u043e\u0432\u0438\u0442\u0435 \u043a\u0440\u0435\u0430\u0442\u043e\u0440\u0438, \u0442\u043e\u0458 \u043e\u0434\u043b\u0438\u0447\u043d\u043e \u0433\u043e \u0432\u043b\u0430\u0434\u0435\u0435 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438\u043e\u0442 \u0458\u0430\u0437\u0438\u043a \u0438 \u0435 \u0437\u043d\u0430\u0447\u0438\u0442\u0435\u043b\u043d\u043e \u043f\u043e\u0434\u043e\u0431\u0430\u0440 \u043e\u0434 \u043c\u043e\u0434\u0435\u043b\u0438\u0442\u0435 \u0441\u043e \u0441\u043b\u0438\u0447\u043d\u0430 \u0433\u043e\u043b\u0435\u043c\u0438\u043d\u0430.<\/p>\n\n\n\n<p>MKLLM-7B is an open-source Large Language Model for the Macedonian language. The model is built on top of the amazing&nbsp;<a href=\"https:\/\/huggingface.co\/mistralai\/Mistral-7B-v0.1\">Mistral-7B-v0.1<\/a>&nbsp;model by continued pretraining on a mix of Macedonian and English text. A corpus of around 300M tokens, repeated in 2 epochs, was used for the training and even though this might be considered small compared to other similar projects, the resulting model is very capable in understanding and processing the Macedonian language.<\/p>\n\n\n\n<p>This is the instruction-tuned version of MKLLM-7B. It was trained by taking MKLLM-7B and then performing a full instruction training with axolotl by using the chatml format for conversations.<\/p>\n\n\n\n<p>We tested the model against Meta&#8217;s Llama3-8B-Instruct and Mistral&#8217;s Mistral-7B-Instruct-v0.3 on a set of benchmarks we translated in Macedonian and the model performs better than both leading models in its category. Additionally, these benchmarks are primarily focused on understanding and do not measure generation capabilities and fluency, in these categories we believe there&#8217;s an even larger difference in performance as MKLLM-7B-Instruct writes much more coherent Macedonian. The benchmarking was done with:&nbsp;<a href=\"https:\/\/github.com\/N13T\/mk-llm-eval\">https:\/\/github.com\/N13T\/mk-llm-eval<\/a><a href=\"https:\/\/cdn-uploads.huggingface.co\/production\/uploads\/65f85631e019bdfd8cd83f10\/k0ztAR-H8xdPZHNxhu35_.png\"><\/a><\/p>\n\n\n\n<p>In order to leverage the instruction training your prompt should follow the chatml format:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&lt;|im_start|&gt;system\n\u0420\u0430\u0437\u0433\u043e\u0432\u043e\u0440 \u043f\u043e\u043c\u0435\u0453\u0443 \u0459\u0443\u0431\u043e\u043f\u0438\u0442\u0435\u043d \u043a\u043e\u0440\u0438\u0441\u043d\u0438\u043a \u0438 \u0430\u0441\u0438\u0441\u0442\u0435\u043d\u0442 \u0441\u043e \u0432\u0435\u0448\u0442\u0430\u0447\u043a\u0430 \u0438\u043d\u0442\u0435\u043b\u0438\u0433\u0435\u043d\u0446\u0438\u0458\u0430. \u0410\u0441\u0438\u0441\u0442\u0435\u043d\u0442\u043e\u0442 \u0434\u0430\u0432\u0430 \u043a\u043e\u0440\u0438\u0441\u043d\u0438, \u0434\u0435\u0442\u0430\u043b\u043d\u0438 \u0438 \u0459\u0443\u0431\u0435\u0437\u043d\u0438 \u043e\u0434\u0433\u043e\u0432\u043e\u0440\u0438 \u043d\u0430 \u043f\u0440\u0430\u0448\u0430\u045a\u0430\u0442\u0430 \u043d\u0430 \u043a\u043e\u0440\u0438\u0441\u043d\u0438\u043a\u043e\u0442.&lt;|im_end|&gt;\n&lt;|im_start|&gt;user\n\u041a\u043e\u0458\u0430 \u043f\u043b\u0430\u043d\u0435\u0442\u0430 \u0435 \u043f\u043e\u0437\u043d\u0430\u0442\u0430 \u043a\u0430\u043a\u043e '\u0426\u0440\u0432\u0435\u043d\u0430\u0442\u0430 \u041f\u043b\u0430\u043d\u0435\u0442\u0430'?&lt;|im_end|&gt;\n&lt;|im_start|&gt;assistant\n\u041c\u0430\u0440\u0441&lt;|im_end|&gt; \n<\/code><\/pre>\n\n\n\n<p>This prompt is available as a&nbsp;<a href=\"https:\/\/huggingface.co\/docs\/transformers\/main\/chat_templating\">chat template<\/a>, which means you can format messages using the&nbsp;<code>tokenizer.apply_chat_template()<\/code>&nbsp;method:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>messages = &#91;\n    {\"role\": \"system\", \"content\": \"\u0420\u0430\u0437\u0433\u043e\u0432\u043e\u0440 \u043f\u043e\u043c\u0435\u0453\u0443 \u0459\u0443\u0431\u043e\u043f\u0438\u0442\u0435\u043d \u043a\u043e\u0440\u0438\u0441\u043d\u0438\u043a \u0438 \u0430\u0441\u0438\u0441\u0442\u0435\u043d\u0442 \u0441\u043e \u0432\u0435\u0448\u0442\u0430\u0447\u043a\u0430 \u0438\u043d\u0442\u0435\u043b\u0438\u0433\u0435\u043d\u0446\u0438\u0458\u0430. \u0410\u0441\u0438\u0441\u0442\u0435\u043d\u0442\u043e\u0442 \u0434\u0430\u0432\u0430 \u043a\u043e\u0440\u0438\u0441\u043d\u0438, \u0434\u0435\u0442\u0430\u043b\u043d\u0438 \u0438 \u0459\u0443\u0431\u0435\u0437\u043d\u0438 \u043e\u0434\u0433\u043e\u0432\u043e\u0440\u0438 \u043d\u0430 \u043f\u0440\u0430\u0448\u0430\u045a\u0430\u0442\u0430 \u043d\u0430 \u043a\u043e\u0440\u0438\u0441\u043d\u0438\u043a\u043e\u0442.\"},\n    {\"role\": \"user\", \"content\": \"\u041a\u043e\u0458\u0430 \u043f\u043b\u0430\u043d\u0435\u0442\u0430 \u0435 \u043f\u043e\u0437\u043d\u0430\u0442\u0430 \u043a\u0430\u043a\u043e '\u0426\u0440\u0432\u0435\u043d\u0430\u0442\u0430 \u041f\u043b\u0430\u043d\u0435\u0442\u0430'?\"}\n]\ngen_input = tokenizer.apply_chat_template(messages, \n                                          tokenize=True,\n                                          return_dict=True,\n                                          return_tensors=\"pt\",\n                                          add_generation_prompt=True).to(\"cuda\")\n<em>with<\/em> torch.no_grad():\n  generated_ids = model.generate(**gen_input, max_new_tokens=150,\n                                                do_sample=True,\n                                                temperature=0.1,\n                                                repetition_penalty=1.1,\n                                 )\n<em>print<\/em>(tokenizer.decode(generated_ids&#91;0]&#91;prompt&#91;\"input_ids\"].shape&#91;1]:], skip_special_tokens=False))\n<\/code><\/pre>\n\n\n\n<p><strong>Notes<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>MKLLM-7B-Instruct can hallucinate and produce factually incorrect output. This is especially pronounced when discussing Macedonian topics due to the smaller training dataset.<\/li>\n<\/ul>\n\n\n\n<p><a href=\"https:\/\/github.com\/OpenAccess-AI-Collective\/axolotl\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MKLLM-7B e \u043f\u0440\u0432\u0438\u043e\u0442 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 open-source \u0433\u043e\u043b\u0435\u043c \u0458\u0430\u0437\u0438\u0447\u0435\u043d \u043c\u043e\u0434\u0435\u043b. \u041f\u0440\u0435\u043a\u0443 \u0430\u0434\u0430\u043f\u0442\u0430\u0446\u0438\u0458\u0430 \u043d\u0430 Mistral-7B, \u0433\u0440\u0443\u043f\u0430 \u043d\u0430 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 \u0418\u0422 \u043f\u0440\u043e\u0444\u0435\u0441\u0438\u043e\u043d\u0430\u043b\u0446\u0438 \u0438 \u0435\u043d\u0442\u0443\u0437\u0438\u0458\u0430\u0441\u0442\u0438 \u0433\u043e \u043a\u0440\u0435\u0438\u0440\u0430\u0430 MKLLM-7B, \u043f\u0440\u0432\u0438\u043e\u0442 open-source LLM \u043d\u0430 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438. \u0418\u0441\u0442\u0438\u043e\u0442 \u0435 \u0438\u0441\u0442\u0440\u0435\u043d\u0438\u0440\u0430\u043d \u0441\u043e \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438 \u0440\u0435\u0441\u0443\u0440\u0441\u0438 \u0438 \u0441\u043f\u043e\u0440\u0435\u0434 \u043d\u0435\u0433\u043e\u0432\u0438\u0442\u0435 \u043a\u0440\u0435\u0430\u0442\u043e\u0440\u0438, \u0442\u043e\u0458 \u043e\u0434\u043b\u0438\u0447\u043d\u043e \u0433\u043e \u0432\u043b\u0430\u0434\u0435\u0435 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438\u043e\u0442 \u0458\u0430\u0437\u0438\u043a \u0438 \u0435 \u0437\u043d\u0430\u0447\u0438\u0442\u0435\u043b\u043d\u043e \u043f\u043e\u0434\u043e\u0431\u0430\u0440 \u043e\u0434 \u043c\u043e\u0434\u0435\u043b\u0438\u0442\u0435 \u0441\u043e \u0441\u043b\u0438\u0447\u043d\u0430 \u0433\u043e\u043b\u0435\u043c\u0438\u043d\u0430. MKLLM-7B is an [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4743,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"slim_seo":{"title":"MKLLM-7B e \u043f\u0440\u0432\u0438\u043e\u0442 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 open-source \u0433\u043e\u043b\u0435\u043c \u0458\u0430\u0437\u0438\u0447\u0435\u043d \u043c\u043e\u0434\u0435\u043b - \u041a\u043e\u0440\u0438\u0441\u0442\u0435\u045a\u0435 \u043d\u0430 \u0432\u0435\u0448\u0442\u0430\u0447\u043a\u0430\u0442\u0430 \u0438\u043d\u0442\u0435\u043b\u0438\u0433\u0435\u043d\u0446\u0438\u0458\u0430 (AI) \u0437\u0430 \u0443\u043d\u0430\u043f\u0440\u0435\u0434\u0443\u0432\u0430\u045a\u0435 \u043d\u0430 \u0440\u043e\u0434\u043e\u0432\u0430\u0442\u0430 \u0435\u0434\u043d\u0430\u043a\u0432\u043e\u0441\u0442","description":"MKLLM-7B e \u043f\u0440\u0432\u0438\u043e\u0442 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 open-source \u0433\u043e\u043b\u0435\u043c \u0458\u0430\u0437\u0438\u0447\u0435\u043d \u043c\u043e\u0434\u0435\u043b. \u041f\u0440\u0435\u043a\u0443 \u0430\u0434\u0430\u043f\u0442\u0430\u0446\u0438\u0458\u0430 \u043d\u0430 Mistral-7B, \u0433\u0440\u0443\u043f\u0430 \u043d\u0430 \u043c\u0430\u043a\u0435\u0434\u043e\u043d\u0441\u043a\u0438 \u0418\u0422 \u043f\u0440\u043e\u0444\u0435\u0441\u0438\u043e\u043d\u0430\u043b\u0446\u0438 \u0438 \u0435\u043d\u0442\u0443\u0437\u0438\u0458\u0430\u0441\u0442\u0438 \u0433\u043e \u043a\u0440\u0435\u0438\u0440\u0430\u0430 MKLL"},"footnotes":""},"categories":[1],"tags":[],"class_list":["post-4742","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"_links":{"self":[{"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/posts\/4742","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/comments?post=4742"}],"version-history":[{"count":6,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/posts\/4742\/revisions"}],"predecessor-version":[{"id":6769,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/posts\/4742\/revisions\/6769"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/media\/4743"}],"wp:attachment":[{"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/media?parent=4742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/categories?post=4742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aigender.net\/index.php\/wp-json\/wp\/v2\/tags?post=4742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}