Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topic proposal: auto-completion #4

Open
asteroidb612 opened this issue Mar 7, 2024 · 24 comments
Open

topic proposal: auto-completion #4

asteroidb612 opened this issue Mar 7, 2024 · 24 comments
Assignees
Labels
assigned Topic has been assigned in-content Is the issue in lesson content?
Milestone

Comments

@asteroidb612
Copy link

asteroidb612 commented Mar 7, 2024

Original proposal:

I've been working on a Roc project recreating this yot neural network engine https://github.com/karpathy/micrograd, and think it might be an interesting chapter.

Revised proposal (see thread below): show how autocompletion works.

@asteroidb612
Copy link
Author

One reason this could be a bad chapter idea is that it requires some Calculus thinking! But if this is for schools, maybe there's a lot of calculus going along there anyway.

@isaacvando
Copy link
Contributor

I'd love to read that chapter!

@gvwilson gvwilson added discuss An issue or PR currently being discussed in-content Is the issue in lesson content? propose-addition A suggestion for an addition to content or infrastructure labels Mar 8, 2024
@gvwilson gvwilson added this to the topic-outline milestone Mar 8, 2024
@gvwilson
Copy link
Collaborator

gvwilson commented Mar 8, 2024

My concern isn't with the math requirements, but whether programmers use neural networks when they're programming: most of the other tools are things like editors and linters that crop up regularly when building and deploying code.

@asteroidb612
Copy link
Author

asteroidb612 commented Mar 8, 2024

A year ago, I would have agreed immediately - I never used any machine learning tools while learning to code. But I'm starting to see them used more, and I believe that ChatGPT was used as a low-reliability but occasionally very helpful tool in making Roc.

Maybe those cases were as obscure as making parsers! But maybe they're commonly useful? I am finding myself using ChatGPT as a faster-than-documentation search for how to use various libraries or frameworks.

@Anton-4
Copy link
Collaborator

Anton-4 commented Mar 8, 2024

I use chatGPT almost everyday when working on Roc :) Neural nets also power many autocomplete tools. It's also possible to make the chapter about a tool that connects to e.g. the chatGPT API to avoid getting into the math too much.

@gvwilson gvwilson changed the title Neural Network Chapter topic proposal: neural network Mar 8, 2024
@gvwilson
Copy link
Collaborator

gvwilson commented Mar 8, 2024

I think that implementing a neural network would be a lot safer than using an external API - the latter are changing so rapidly right now that the chapter could be out of date as soon as it appears. Does Roc have something like NumPy that you could build the NN computations on? If not, could that be the first chapter, and the NN the second? (The JS and Py versions of the book build row-wise and column-wise dataframes in order to illustrate ideas about interface vs. implementation and using benchmarking to pick which implementation is best—could that work here?)

@Anton-4
Copy link
Collaborator

Anton-4 commented Mar 8, 2024

the latter are changing so rapidly right now that the chapter could be out of date as soon as it appears.

Good point, another option would be download a pre-trained neural network model from a stable URL and run it locally.

Does Roc have something like NumPy that you could build the NN computations on?

Not yet, someone from Roc zulip has been experimenting with matrices but I have not looked at it closely yet.

I think explaining the inner workings of neural nets in depth is not feasible considering the one hour time limit. Andrej Karpathy, an excellent teacher spends about 2h30m on it [1], [2]. That is also for a "vanilla neural net", not the more complicated transformer ones people actually use for coding assistance.

Making a tool that uses a downloaded neural net seems to have the best trade-offs.

@isaacvando
Copy link
Contributor

I would be more interested in reading a chapter that implemented a neural net than one that used a preexisting one. I also don't think it is necessary to fully understand the topic after reading a chapter and I suspect that a worthwhile treatment could still be done in an hour.

@Anton-4
Copy link
Collaborator

Anton-4 commented Mar 9, 2024

That's reasonable, we can draft the chapter like that and see how we feel about it then :)

@asteroidb612
Copy link
Author

asteroidb612 commented Mar 9, 2024

Andrej Karpathy's approach in that micrograd video is exactly what I'd like to present. I would crib his perspective, where we ignore optimizations like linear algebra. I would implement backpropogation on simple networks, like in the video you link @Anton-4. I think we could get it down to an hour, if we remove some of the dotlang and python operator override content.

Ideally we could have something useful at the end:

  • A network that can do word2vec
  • Identify a programming language given a file
  • etc.

I think it's once the backpropogation algorithm is understood, it's easy for us to say "Add lots more data / training time / clever network structure / $$$ and you have chatgpt."

@asteroidb612
Copy link
Author

I think that viewing machine learning through functional programming lenses is enlightening. Your neural network is just a function - we can even write it's type signature! But it's a function that we train instead of writing.

I have a hunch that roc will be actually nice for this kind of thing! My progress was stalled by a lambda set error but that has just been unblocked.

@asteroidb612
Copy link
Author

If someone were building a dataframe chapter, it would be interesting to base this off that. Or maybe we make a third chapter combining the two basic chapters?

@gvwilson
Copy link
Collaborator

gvwilson commented Mar 9, 2024

I still think that neural networks don't fit the "tools programmers use to program" theme, but I realize I might just be showing my age :-). I am more certain that there are two chapters here if we want to respect the "teachable in one hour" restriction per chapter:

  1. NumPy-in-Roc (NumRoc?), i.e., a linear algebra package. This could be pure Roc or a Roc wrapper around Polars.
  2. A neural network built on top of that linalg package.

If y'all agree, let's create a separate ticket for the linear algebra package and see who wants to take it on.

@Anton-4
Copy link
Collaborator

Anton-4 commented Mar 11, 2024

NumPy-in-Roc definitely sounds good!

I still think that neural networks don't fit the "tools programmers use to program" theme

I do agree, a more fitting possibility would be neural net based autocomplete but that seems too large in scope.

@gvwilson
Copy link
Collaborator

What about a more traditional autocomplete whose completion tree is updated incrementally based on what's currently in scope? I think most programmers rely on that in their editor - is that big enough/interesting enough for a chapter?

@Anton-4
Copy link
Collaborator

Anton-4 commented Mar 11, 2024

is that big enough/interesting enough for a chapter?

I think so.

I see two possible approaches:

  • Use Roc as the language to be autocompleted and show how to fetch possible completions using the Roc language server. Language servers are definitely a commonly used and important tool.
  • Use English as the language to be autocompleted and have a much more self-contained example. So for example, given the text I typed in this comment, if I now were to type pos, it would suggest possible.

@gvwilson
Copy link
Collaborator

Can you do the latter first to show learners how incremental autocomplete works from the ground up? I think that building a small language server would be a great second chapter, but as a learner, I'd want to know what the magic is before relying on an external service to do it for me. (Cool idea, by the way...)

@Anton-4
Copy link
Collaborator

Anton-4 commented Mar 11, 2024

Yeah that could work :)

I personally already have a lot to do with other Roc things, but any available motivated person could probably take on the first chapter of this. Are you interested in working on the second chapter about a tiny language server @faldor20?

@gvwilson gvwilson changed the title topic proposal: neural network topic proposal: auto-completion Mar 11, 2024
@gvwilson gvwilson added the help-wanted A request for assistance label Mar 11, 2024
@faldor20
Copy link
Contributor

Yeah, I'd be interested in taking that on. I was actually thinking it would be cool to try building a language server framework in roc ontop of tower-lsp, so maybe we could have two examples, one showing roc wrapped around an existing rust framework and one showing a pure roc implimentation that just talks of stdio?

The pure roc one is a much bigger task so I'd probably try to only show the most basic part, basically reading and writing jsonrpc from stdio and handling some basic updates and responding to one or two language server requests.

I was imagining either I could base everything of the roc compiler and just kind of hand wave how the actual calls work, or do something like @Anton-4 suggested and just turn every word in the text into a "symbol" and pretend it's the output of a compiler.

How in depth would we like to go here? Well made language servers tend to have a lot of pretty complex state management. They do a lot of caching and incrimental updating and recompilation. How far into the weeds do we want to get? Or should I just keep it as simple as "this is a naive implementation, here is where you could improve it in the real world"?

@gvwilson
Copy link
Collaborator

I think it would be a lot more approachable to do the simple version first (here's a vocabulary, autocomplete from it) and then build the language server as a separate chapter - I don't believe both will fit into our one-hour-per-lesson limit, and I think the latter will be more comprehensible after people have seen the former. @faldor20 are you interested in doing the first part?

@faldor20
Copy link
Contributor

I'm honestly unsure what you imagine the first part to look like?

I'm not sure it makes sense to implement any kind of autocomplete system with no foundation to actually use it in, I would argue the only really hard part of autocomplete for plain-text is dealing with the document updates and sending info out of the language server.

Implementing autocomplete as I imagine it is basically just a fuzzy search algorithm and a super simple parser that finds all the words In a document. Infact in roc-ls we don't even have fuzzy autocomplete yet 😅

But I think maybe I'm misunderstanding what you were imagining.

@faldor20
Copy link
Contributor

faldor20 commented Mar 12, 2024

Oh, and I tried a quick mock up of jsonrpc parsing and realised roc is unfortunately currently unable to parse Json that contains unions.(types like id:number|string) Which makes implementing LSP in roc impossible right now :(
( see this zulip thread in Json null handling)

@faldor20
Copy link
Contributor

Okay, I went off and tried to work on my knowledge of abilities and decoders and it is actually possible, I take it all back, I'll get my implementation done soon.

@gvwilson gvwilson added assigned Topic has been assigned and removed help-wanted A request for assistance discuss An issue or PR currently being discussed propose-addition A suggestion for an addition to content or infrastructure labels Mar 12, 2024
@gvwilson
Copy link
Collaborator

Thanks @faldor20 - can you please create a subdirectory under the project root called completion and put your work there, along with an index.md file with notes to yourself? Cheers - Greg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned Topic has been assigned in-content Is the issue in lesson content?
Projects
None yet
Development

No branches or pull requests

5 participants