** summary **
- reboot the project
- current status
Rebooting the project
A technical analyzer is one of those project I keep returning to. I had already laid down a nice foundation with TaLib4J for the technical calculations so it’s usally cleaning up the glue code that is changing with each new iteration. This time around I’ve taken a new approach by building a team around the project to drive me to turn it into a product. I’ve created a multi module spring project with the following modules: API, Databus, Engine, corelib. The API is the gateway for clients to access analysis data and takes care of user management. The databus module is an API for the api module to access the data. The engine is a commandline application that does the actual calculations and the corelib is the module that contains common classes shared by the modules. After initial testing it turned out that having the Databus as a seperate service comes with severe performance penalities due to json data conversions etc, so I integrated the databus module into the corelib module. You want to go service oriented until you realize that there are only 2 other services that want to consume it and then you decide to integrate it back. Keeping the engine seperate made sense though as it runs as a batch processor.
As of today there are 33 unique scanners, and I also run the 2 element combinations of these scanners for a total of 33c2 (528) scanners. This produces around 6M scan results for a 20 month worth of stock EOD data. I want to increase the combination count but I’m not sure if the server I have in mind for production deployment can handle the amount. That’s on my try this list and will report how it goes. I have most of the basic user management features completed in the API, except for the integrations to payment and transactional emails. I’m also adding performance evaluation for the scanners by means of running a test from the date the signal was generated going forward and checking the price went up/down X ATRs confirming the signal. This calculation is affected by the amount of scan results so increasing the combination count will have an impact on the performance calculation. I’ve asked a question on the math stackexchange forum about calculating the conditional probability of 2 technical indicator but have yet to receive a satisfying answer. Being able to calculate a reasonably (~%1 error maybe?) accurate approximation for this would mean that I do not have the actually run all the combinations to get their scores. I’m also using an error rate calculation based on Z-tables to give a confidence interval on the scanner score. A nice optimization I did was to keep all the stock OHLCV data in cache and use a binary search to query data between given dates instead of hitting the DB for each time. In the earlier version I was actually keeping the raw OHLCV data in files but reading them into cache takes longer than reading them from a DB plus using a DB also gives opportunities for querying in different ways which I need in the future. Yesterday I noticed that one the scanners used 4 conditions to check for a signal ma5 > ma26 , ma26 > ma50, ma50 > ma200, stoch < 20. This led me to the idea that I should actually make each of these conditions a scanner on it’s own and brute force my way through all of the combinations to reach the ideal scenario for each symbol by calculating the score. Maybe the best results for a stock are when ma5 < ma26, ma26 > ma50, … because there was a short term fall in the price for the stoch to reach a low etc. I also implemented an optimization for the score calculator yesterday. Pre-optimization I was looping each symbol, fetching the scan results for each symbol and calculating the score for the scanners from those results. This was doing too much DB roundtrips. I changed it to looping through each scanner combination and storing the results of the scans in a map keyed by it’s symbol. This is 1 less loop and less DB requests.