About the results of the Actinomock Example #5

WayneWu01 · 2022-06-30T19:15:48Z

I have a question of how to evaluate the result of the example. I read your paper, but don't know how to find those FN things. Could you elaborate me more?

Lizhen0909 · 2022-07-01T15:10:00Z

You may find examples in 'notebook' folder (The data link in the notebook may not exist anymore. But it can be download from https://www.amazon.com/clouddrive/share/eTIKYVLckXUCMnMQSpO8TCqZOwekmBrx23ZhMa3XO8d).

Also a python wrapper of pretrained models can be found at https://github.com/Lizhen0909/pyLSHVec.

WayneWu01 · 2022-07-01T15:56:50Z

I’m able to reproduce your example, But I was wondering how to analyze your result? Like how to know whether the example works good or not. Lizhen Shi ***@***.***>于2022年7月1日周五上午10:12写道：

…

You may find examples in 'notebook' folder (The data link in the notebook may not exist anymore. But it can be download from https://www.amazon.com/clouddrive/share/eTIKYVLckXUCMnMQSpO8TCqZOwekmBrx23ZhMa3XO8d <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amazon.com%2Fclouddrive%2Fshare%2FeTIKYVLckXUCMnMQSpO8TCqZOwekmBrx23ZhMa3XO8d&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C8c177a15c2344889a28e08da5b73cba2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637922850143434681%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MV6dGCp2NzslAfDlSE8e0ltW4c7cuBQnIx%2Ff%2Bmem4t8%3D&reserved=0> ). Also a python wrapper of pretrained models can be found at https://github.com/Lizhen0909/pyLSHVec <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FpyLSHVec&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C8c177a15c2344889a28e08da5b73cba2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637922850143434681%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LIXBkDwWrcxwRtbvA6dMpEuHrCfSs4X6kT7QQn7fhhg%3D&reserved=0> . — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1172448148&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C8c177a15c2344889a28e08da5b73cba2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637922850143434681%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mv9Owvc6AITb9fuJo4%2BZNw1hmFWpRpJKqH43rCrO0V8%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ3PXB7DA6PIUXXBXZTVR4C5HANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C8c177a15c2344889a28e08da5b73cba2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637922850143434681%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=sZkdfiqWnFq92u9FpI6PHm%2BkJtc%2FkC2n%2Fv4YegU4KB0%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Lizhen0909 · 2022-07-01T22:07:46Z

There are two tasks: embedding and classification.
For the actinomock example data the second columun looks like '47914-2616644829-Gammaproteobacteria-Proteobacteria'.
Here either of Proteobacteria, Gammaproteobacteria, 2616644829 can be the label of this row (depending on which level do you want on the taxonomy hierarchy).

########
Embedding is and unsupervised model to get embedding vectors for kmers/sequences.
Since it is unsupervised, there is no numerical metrics.
Also because the dimension is high (e.g. 100), we have to use tools like tsne or umap to visualize the vectors in low dimension (e.g. 2).
Then with the prior knowledge of the sequences (a.k.a labels of sequence, which is not used in training), we can judge if the vectors is good or not (It is expected that vectors of same labels are clustered together)
(You may find visualization examples of language embedding at https://projector.tensorflow.org/)

###########
Classification is just exactly same as what it is in machine learning.
Basically data is spitted as training and test datasets. Model is trained on training dataset. Metrics are reported on test dataset.
All metrics (accuracy, precision, recall, F1, AUC) applied to multiclass classification problems can be used . I did not get what FN is earlier. I think you mean false negative. It is part of the definition of precision and recall.

WayneWu01 · 2022-07-01T22:17:56Z

I could get your result for the tsne part, but I have question on how to find the number of false negative and all those numbers. Are you training model for example use 80 percent coverage for the sequence ? How do you use the mod.bin for the predicting of the rest 20 percent? Lizhen Shi ***@***.***>于2022年7月1日周五下午5:08写道：

…

There are two tasks: embedding and classification. For the actinomock example data the second columun looks like '47914-2616644829-Gammaproteobacteria-Proteobacteria'. Here either of Proteobacteria, Gammaproteobacteria, 2616644829 can be the label of this row (depending on which level do you want on the taxonomy hierarchy). ######## Embedding is and unsupervised model to get embedding vectors for kmers/sequences. Since it is unsupervised, there is no numerical metrics. Also because the dimension is high (e.g. 100), we have to use tools like tsne or umap to visualize the vectors in low dimension (e.g. 2). Then with the prior knowledge of the sequences (a.k.a labels of sequence, which is not used in training), we can judge if the vectors is good or not (It is expected that vectors of same labels are clustered together) (You may find visualization examples of language embedding at https://projector.tensorflow.org/ <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprojector.tensorflow.org%2F&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cf2480d9056d6428ed39008da5bae27fc%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923100792719711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VZQ%2FLzoIHNIqlDye%2BybzwrXqGvNMMbDVJH8FWLBLPjA%3D&reserved=0> ) ########### Classification is just exactly same as what it is in machine learning. Basically data is spitted as training and test datasets. Model is trained on training dataset. Metrics are reported on test dataset. All metrics (accuracy, precision, recall, F1, AUC) applied to multiclass classification problems can be used . I did not get what FN is earlier. I think you mean false negative. It is part of the definition of precision and recall. — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1172748900&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cf2480d9056d6428ed39008da5bae27fc%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923100792719711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vyzbtd%2F0Fmu2mg8dOSZPnb3rif%2B%2Bh3s%2BsNtdpWT1ynw%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ4J325WL2DVXRKXB5DVR5T33ANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cf2480d9056d6428ed39008da5bae27fc%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923100792719711%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=A5rHjHa9qpgFqG03t7k0sohZ8WQ2Ct%2FCe86B5CWYjBI%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Lizhen0909 · 2022-07-02T04:03:52Z

use command line:
lshvec predict <model.bin> <test_data_file>
or
lshvec predict-prob<model.bin> <test_data_file>

WayneWu01 · 2022-07-02T04:06:38Z

Thank you so much！I will try it! Lizhen Shi ***@***.***>于2022年7月1日周五下午11:04写道：

…

use command line: lshvec predict <model.bin> <test_data_file> or lshvec predict-prob<model.bin> <test_data_file> — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1172828764&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C84ad7ba156424191966f08da5bdfe742%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923314456736956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JLEl4X%2BN6cdJR%2BdeX%2FzIKGaL28yYal3w6iV%2BMLTN3is%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ6YKUAMNX4HTS4ONHTVR65THANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C84ad7ba156424191966f08da5bdfe742%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923314456736956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=J13bF3aFgLcgt25x6FnEua%2FkW%2FJK0cDoZwPJGs7XrKg%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

WayneWu01 · 2022-07-03T10:28:16Z

Just for confirm, about the input of test data file. Which format should I use? The seq file or the hash file? Zheng Wu ***@***.***>于2022年7月1日周五下午11:06写道：

…

Thank you so much！I will try it! Lizhen Shi ***@***.***>于2022年7月1日周五下午11:04写道： > use command line: > lshvec predict <model.bin> <test_data_file> > or > lshvec predict-prob<model.bin> <test_data_file> > > — > Reply to this email directly, view it on GitHub > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1172828764&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C84ad7ba156424191966f08da5bdfe742%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923314456736956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JLEl4X%2BN6cdJR%2BdeX%2FzIKGaL28yYal3w6iV%2BMLTN3is%3D&reserved=0>, > or unsubscribe > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ6YKUAMNX4HTS4ONHTVR65THANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C84ad7ba156424191966f08da5bdfe742%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923314456736956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=J13bF3aFgLcgt25x6FnEua%2FkW%2FJK0cDoZwPJGs7XrKg%3D&reserved=0> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

Lizhen0909 · 2022-07-05T19:30:38Z

Hashed data.
It should be same as the training data.

WayneWu01 · 2022-07-07T01:09:49Z

When I used the predict, it gave me "terminate called after throwing an instance of 'std::invalid_argument' what(): stoi". I use the command in following structure: .../lshvec predict \ --model_path .../model.bin \ --hash_path .../test.hash What is the correct way of using the predict command? I could use lshvec skipgram for training.

…

On Tue, Jul 5, 2022 at 2:30 PM Lizhen Shi ***@***.***> wrote: Hashed data. It should be same as the training data. — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1175421142&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cb859d71a08e94b4ced4708da5ebcdec5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637926462537174537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZtJBDXKtHyaz0r9xLE8h9WF8O%2F54JR2R8qnHzVdMlhs%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ42M7N4QUVVQMHXBZDVSSEOVANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cb859d71a08e94b4ced4708da5ebcdec5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637926462537174537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VNFoA3dxcQUevpYkT6ESgiMi6nxaVmFml75nrCfyBWE%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

WayneWu01 · 2022-07-08T08:21:30Z

Could you help me with the command? Zheng Wu ***@***.***>于2022年7月6日周三下午8:09写道：

…

When I used the predict, it gave me "terminate called after throwing an instance of 'std::invalid_argument' what(): stoi". I use the command in following structure: .../lshvec predict \ --model_path .../model.bin \ --hash_path .../test.hash What is the correct way of using the predict command? I could use lshvec skipgram for training. On Tue, Jul 5, 2022 at 2:30 PM Lizhen Shi ***@***.***> wrote: > Hashed data. > It should be same as the training data. > > — > Reply to this email directly, view it on GitHub > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1175421142&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cb859d71a08e94b4ced4708da5ebcdec5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637926462537174537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZtJBDXKtHyaz0r9xLE8h9WF8O%2F54JR2R8qnHzVdMlhs%3D&reserved=0>, > or unsubscribe > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ42M7N4QUVVQMHXBZDVSSEOVANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7Cb859d71a08e94b4ced4708da5ebcdec5%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637926462537174537%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=VNFoA3dxcQUevpYkT6ESgiMi6nxaVmFml75nrCfyBWE%3D&reserved=0> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

Lizhen0909 · 2022-07-08T17:59:24Z

Should be something like this:

.../lshvec predict .../model.bin .../test.hash

Here predict or predict-prob is subcommand, you can find them at src/main_fastseq.cc
If you run command without arguments, it should print usages.
For example:

lshvec #print all subcommands
lshvec predict #print usage for predict sub commands

WayneWu01 · 2022-07-09T19:50:24Z

I tried your way and it gave me all the details, but it still gave me the error of "Model needs to be supervised for prediction!". I read that the k and threshold are optional, why did it still give me this error? So sorry for bothering you again and again! Zheng Wu ***@***.***>于2022年7月8日周五下午1:49写道：

…

I tried your way and it gave me all the details, but it still gave me the error of "Model needs to be supervised for prediction!". I read that the k and threshold are optional, why did it still give me this error? So sorry for bothering you again and again! On Fri, Jul 8, 2022 at 12:59 PM Lizhen Shi ***@***.***> wrote: > Should be something like this: > > .../lshvec predict .../model.bin .../test.hash > > Here predict or predict-prob is subcommand, you can find them at > src/main_fastseq.cc > If you run command without arguments, it should print usages. > For example: > > lshvec #print all subcommands > lshvec predict #print usage for predict sub commands > > — > Reply to this email directly, view it on GitHub > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1179238375&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6d7a248546d348dbc6f408da610b9f1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637928999830546333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=w9n31Jkn3K0WVTMgQsgtk4iUSWmN4a%2FK7kwJ5D7daWM%3D&reserved=0>, > or unsubscribe > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ7MX6VLJYCOCRL7QSTVTBUARANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6d7a248546d348dbc6f408da610b9f1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637928999830546333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jQdzUlDOfXHCJBfxu0%2Bvxhpska2vkl2uKliG6whU29c%3D&reserved=0> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

WayneWu01 · 2022-07-11T18:53:59Z

I tried exactly your way for prediction. It still gave me the error that the "Model needs to be supervised for prediction!" How did you predict your model? Sorry for bothering you again.

…

On Sat, Jul 9, 2022 at 2:50 PM Zheng Wu ***@***.***> wrote: I tried your way and it gave me all the details, but it still gave me the error of "Model needs to be supervised for prediction!". I read that the k and threshold are optional, why did it still give me this error? So sorry for bothering you again and again! Zheng Wu ***@***.***>于2022年7月8日周五下午1:49写道： > I tried your way and it gave me all the details, but it still gave me the > error of "Model needs to be supervised for prediction!". I read that the > k and threshold are optional, why did it still give me this error? So sorry > for bothering you again and again! > > On Fri, Jul 8, 2022 at 12:59 PM Lizhen Shi ***@***.***> > wrote: > >> Should be something like this: >> >> .../lshvec predict .../model.bin .../test.hash >> >> Here predict or predict-prob is subcommand, you can find them at >> src/main_fastseq.cc >> If you run command without arguments, it should print usages. >> For example: >> >> lshvec #print all subcommands >> lshvec predict #print usage for predict sub commands >> >> — >> Reply to this email directly, view it on GitHub >> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1179238375&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6d7a248546d348dbc6f408da610b9f1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637928999830546333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=w9n31Jkn3K0WVTMgQsgtk4iUSWmN4a%2FK7kwJ5D7daWM%3D&reserved=0>, >> or unsubscribe >> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ7MX6VLJYCOCRL7QSTVTBUARANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6d7a248546d348dbc6f408da610b9f1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637928999830546333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jQdzUlDOfXHCJBfxu0%2Bvxhpska2vkl2uKliG6whU29c%3D&reserved=0> >> . >> You are receiving this because you authored the thread.Message ID: >> ***@***.***> >> >

Lizhen0909 · 2022-07-12T17:10:31Z

It seems that your model was trained for embedding but for classification.
You may refer to https://github.com/Lizhen0909/LSHVec/blob/master/notebook/lsa_spike_fnv_classfication.ipynb, which is a bit older.
Notice the cell 35 where uses subcommand "supervised"

WayneWu01 · 2022-07-12T17:53:28Z

I see, then how to test the accuracy or the performance of the skipgram model? What is the standard?

…

On Tue, Jul 12, 2022 at 12:11 PM Lizhen Shi ***@***.***> wrote: It seems that your model was trained for embedding but for classification. You may refer to https://github.com/Lizhen0909/LSHVec/blob/master/notebook/lsa_spike_fnv_classfication.ipynb <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fblob%2Fmaster%2Fnotebook%2Flsa_spike_fnv_classfication.ipynb&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6c285fc509bf4d37396408da64297605%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637932426473192788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FkMw1IF1ARdQq8J1HbZaQHrZA3j069ubG28cZYQ%2BBrc%3D&reserved=0>, which is a bit older. Notice the cell 35 where uses subcommand "supervised" — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1182026785&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6c285fc509bf4d37396408da64297605%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637932426473192788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oEQNDXLzqMfaX5Vx2HWpqpGParPXcet8GLmUpb9nl7k%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ3PPGHEOHVAOVYCGMLVTWRJHANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6c285fc509bf4d37396408da64297605%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637932426473192788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UohUo%2FpCRobbcGSy0vbQVaLpqEXeUecM9ZR7fMMgqjc%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Lizhen0909 · 2022-07-14T01:53:21Z

From 'predict' subcommand you get predictions. Then it depends on you to evaluate the performance, no standard.
For example, comparing the predictions to the ground truth label to get accuracy, precision and recall.

WayneWu01 · 2022-07-14T02:11:07Z

But how to evaluate the skipgram model? Which model does supervised command use? Lizhen Shi ***@***.***>于2022年7月13日周三下午8:53写道：

…

From 'predict' subcommand you get predictions. Then it depends on you to evaluate the performance, no standard. For example, comparing the predictions to the ground truth label to get accuracy, precision and recall. — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1183866324&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C4318cb9e6993474bdd7408da653ba871%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637933604154054460%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=YKI5eL4y3pl9ASnMPsE1iqs1a7fe3Qg9fIx6h2RY93c%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ2AIOPD6G4UTJU3UTDVT5XJZANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C4318cb9e6993474bdd7408da653ba871%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637933604154054460%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jDVt4UIgXAOhhiH%2FsV3smm6WMbgrWVGboTLMTyZTNqg%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

WayneWu01 · 2022-10-11T06:59:17Z

Is the following process right? Separate the hash file by 1:4. Train the model use the 4 part and get the prediction of 1 part. Then compare it with the seq file which is the ground truth to get accuracy. Zheng Wu ***@***.***>于2022年7月3日周日上午5:27写道：

…

Just for confirm, about the input of test data file. Which format should I use? The seq file or the hash file? Zheng Wu ***@***.***>于2022年7月1日周五下午11:06写道： > Thank you so much！I will try it! > > Lizhen Shi ***@***.***>于2022年7月1日周五下午11:04写道： > >> use command line: >> lshvec predict <model.bin> <test_data_file> >> or >> lshvec predict-prob<model.bin> <test_data_file> >> >> — >> Reply to this email directly, view it on GitHub >> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1172828764&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C84ad7ba156424191966f08da5bdfe742%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923314456736956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JLEl4X%2BN6cdJR%2BdeX%2FzIKGaL28yYal3w6iV%2BMLTN3is%3D&reserved=0>, >> or unsubscribe >> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ6YKUAMNX4HTS4ONHTVR65THANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C84ad7ba156424191966f08da5bdfe742%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637923314456736956%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=J13bF3aFgLcgt25x6FnEua%2FkW%2FJK0cDoZwPJGs7XrKg%3D&reserved=0> >> . >> You are receiving this because you authored the thread.Message ID: >> ***@***.***> >> >

WayneWu01 · 2022-10-11T07:38:47Z

I tried your way and it gave me all the details, but it still gave me the error of "Model needs to be supervised for prediction!". I read that the k and threshold are optional, why did it still give me this error? So sorry for bothering you again and again!

…

On Fri, Jul 8, 2022 at 12:59 PM Lizhen Shi ***@***.***> wrote: Should be something like this: .../lshvec predict .../model.bin .../test.hash Here predict or predict-prob is subcommand, you can find them at src/main_fastseq.cc If you run command without arguments, it should print usages. For example: lshvec #print all subcommands lshvec predict #print usage for predict sub commands — Reply to this email directly, view it on GitHub <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1179238375&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6d7a248546d348dbc6f408da610b9f1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637928999830546333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=w9n31Jkn3K0WVTMgQsgtk4iUSWmN4a%2FK7kwJ5D7daWM%3D&reserved=0>, or unsubscribe <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ7MX6VLJYCOCRL7QSTVTBUARANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6d7a248546d348dbc6f408da610b9f1d%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637928999830546333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jQdzUlDOfXHCJBfxu0%2Bvxhpska2vkl2uKliG6whU29c%3D&reserved=0> . You are receiving this because you authored the thread.Message ID: ***@***.***>

WayneWu01 · 2022-10-11T07:47:40Z

How did you get this table? Just use the supervised model for different types of hashing methods? Which model did the supervised use? Just this " std::uniform_int_distribution<> uniform(0, labels.size() - 1); "?

…

On Tue, Jul 12, 2022 at 12:53 PM Zheng Wu ***@***.***> wrote: I see, then how to test the accuracy or the performance of the skipgram model? What is the standard? On Tue, Jul 12, 2022 at 12:11 PM Lizhen Shi ***@***.***> wrote: > It seems that your model was trained for embedding but for classification. > You may refer to > https://github.com/Lizhen0909/LSHVec/blob/master/notebook/lsa_spike_fnv_classfication.ipynb > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fblob%2Fmaster%2Fnotebook%2Flsa_spike_fnv_classfication.ipynb&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6c285fc509bf4d37396408da64297605%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637932426473192788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FkMw1IF1ARdQq8J1HbZaQHrZA3j069ubG28cZYQ%2BBrc%3D&reserved=0>, > which is a bit older. > Notice the cell 35 where uses subcommand "supervised" > > — > Reply to this email directly, view it on GitHub > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FLizhen0909%2FLSHVec%2Fissues%2F5%23issuecomment-1182026785&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6c285fc509bf4d37396408da64297605%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637932426473192788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oEQNDXLzqMfaX5Vx2HWpqpGParPXcet8GLmUpb9nl7k%3D&reserved=0>, > or unsubscribe > <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYEGJ3PPGHEOHVAOVYCGMLVTWRJHANCNFSM52KMFZAA&data=05%7C01%7Czheng.wu%40mail-service-3-mx.vanderbilt.edu%7C6c285fc509bf4d37396408da64297605%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C637932426473192788%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=UohUo%2FpCRobbcGSy0vbQVaLpqEXeUecM9ZR7fMMgqjc%3D&reserved=0> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> >

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the results of the Actinomock Example #5

About the results of the Actinomock Example #5

WayneWu01 commented Jun 30, 2022

Lizhen0909 commented Jul 1, 2022

WayneWu01 commented Jul 1, 2022 via email

Lizhen0909 commented Jul 1, 2022

WayneWu01 commented Jul 1, 2022 via email

Lizhen0909 commented Jul 2, 2022

WayneWu01 commented Jul 2, 2022 via email

WayneWu01 commented Jul 3, 2022 via email

Lizhen0909 commented Jul 5, 2022

WayneWu01 commented Jul 7, 2022 via email

WayneWu01 commented Jul 8, 2022 via email

Lizhen0909 commented Jul 8, 2022

WayneWu01 commented Jul 9, 2022 via email

WayneWu01 commented Jul 11, 2022 via email

Lizhen0909 commented Jul 12, 2022

WayneWu01 commented Jul 12, 2022 via email

Lizhen0909 commented Jul 14, 2022

WayneWu01 commented Jul 14, 2022 via email

WayneWu01 commented Oct 11, 2022 via email

WayneWu01 commented Oct 11, 2022 via email

WayneWu01 commented Oct 11, 2022 via email

About the results of the Actinomock Example #5

About the results of the Actinomock Example #5

Comments

WayneWu01 commented Jun 30, 2022

Lizhen0909 commented Jul 1, 2022

WayneWu01 commented Jul 1, 2022 via email

Lizhen0909 commented Jul 1, 2022

WayneWu01 commented Jul 1, 2022 via email

Lizhen0909 commented Jul 2, 2022

WayneWu01 commented Jul 2, 2022 via email

WayneWu01 commented Jul 3, 2022 via email

Lizhen0909 commented Jul 5, 2022

WayneWu01 commented Jul 7, 2022 via email

WayneWu01 commented Jul 8, 2022 via email

Lizhen0909 commented Jul 8, 2022

WayneWu01 commented Jul 9, 2022 via email

WayneWu01 commented Jul 11, 2022 via email

Lizhen0909 commented Jul 12, 2022

WayneWu01 commented Jul 12, 2022 via email

Lizhen0909 commented Jul 14, 2022

WayneWu01 commented Jul 14, 2022 via email

WayneWu01 commented Oct 11, 2022 via email

WayneWu01 commented Oct 11, 2022 via email

WayneWu01 commented Oct 11, 2022 via email