it looks ok to me on a brief browse of the codebase... i have written tree walkers before but not full graphs, graphs are more complicated because of the loops
i'd need you to specify the packages and types and functions that you want me to review specifically, i mean, it all looks fairly routine worker loops with channel listener, context done and so forth
the documentation comments should not have () after the function/method name, but you can use an article (a, an, the) in front if appropriate to make a more concise or natural sentence.
also the Database interface is huge, i think it should be broken into several smaller specific interfaces, very often they come as pairs but can be several associated methods that interface with a specific type aspect of the data, you probably know of io.Writer and io.ReadWriterCloser and such, these are composited so you define one interface and then you embed (just put the interface name instead of a method) and it pulls those methods into the interface set
one of the biggest errors that you see in a lot of people's especially early code is mixing what should be two things into one, creating a tight coupling between things that should not be tied together, it makes refactoring and bugfixing harder
anyhow, as i said, i've not written full graph structures before... but a general rule with Go is to avoid recursion unless you know it won't go deep, it has no optimization for recursion, best to stick with iterators instead, but i think that's how you wrote it looking at the Walk stuff - i might only say that it can offen be convenient to write these so they pass a function that lets you put the work nearby the type definitions they relate to, and can also be a neat way to enable you to fan out to multiple threads without having to concern yourself with the concurrency on the "outside" of the type (like how http and filepath libraries let you define handlers, one for responding to requests, the other for doing tree walks)
Login to reply
Replies (2)
First of all, thanks for taking the time to look into it :)
> i'd need you to specify the packages and types and functions that you want me to review specifically, i mean, it all looks fairly routine worker loops with channel listener, context done and so forth
I am glad it looks fairly routine, as that was my goal since I've started using Go just 3 months ago.
> also the Database interface is huge, i think it should be broken into several smaller specific interfaces.
This is exactly the high-level feedback I am more interested in, thanks! Yes, the Database and RandomWalkStore interfaces are huge. First of all, do u think they should be interfaces at all? I've taken this approach for separating testing from performance optimization.
I first test that the algorithm and the data structures work together (e.g. the walks have the correct statistical distribution, and the algorithm uses that distribution correctly). For this purpose I found useful and quicker to use in-memory database mocks. The thing is that I need to tests on several graphs, and produce a big number of walks. Doing that on Redis without having it optimized will take a lot of time per test, possibly minutes.
By using mocks to test I reduced that to seconds. Once it works, I can test it with redis and optimize the perfomance.
The tradeoff is that I am testing on something that is not the production code but a mock, which also requires work to be maintained and so on.
Secondly, how would you split the interface?
The first idea that comes to mind is to have a DatabaseReader that has NodeByID, NodeByKey, NodeIDs, Pubkeys, Follows, Followers methods.
a DatabaseWriter that has AddNode and Update
a DatabaseScanner that has ScanNodes and AllNodes
Do u think this would be better?
with the interface decomposition, establish the common contexts that overlap between the methods and if you recognise they are independent then they can be made into their own, like for example if a set of methods mainly relate to filtering, then they are filtering methods, if a set of methods are about counting, they are counting methods, if they are iterators (or walkers) they are this type
it can be good to make an interface for anything that could have an entirely differently constructed API behind it, thus necessitating a shim or adapter that translates from one API to another (and this is why interfaces are an indispensible construct that is neglected by all but Java programmers other than Go)
and, regarding the last parts about the interface splitting there, i mentioned above from the first how to figure it out, it could be 2 or 3, just from what i saw, but i wasn't familiar enough with their semantics to catch that
creating a neutral interface for the database makes a lot of sense too, because you have used redis there, which is fine, but by using an interface instead of directly coupling to the other API you retain the flexibility to use a more optimal store, or more distributed, or byzantine fault tolerant even, you can also then too use "gateways" that cache access to other data and so on and so on
and i think i mentioned this, but making "iterator/walker function types" that the iterator calls with the data from the current node, you can simplify isolating your per-node scope but also enable concurrent safe access to a shared state, which is essential in order to parallelise the processing, again, you may not need it now but if your API calls methods that contain the node state data they can handle locking the data for teh case that concurrently that might be part of a pool of data objects, so not just for enabling synchronisation, but you can also enable bespoke memory and scheduling independant from the primary calculation work, in order to change the way the data flows through