Scalable, cost-effective, and flexible data system for storing, processing and analyzing medical records coming from clinics in NA
The client is an international corporation that develops, manufactures and distributes medical products and services.
PROBLEM AND OBJECTIVES
The company has accumulated a huge amount of raw data from clinics and laboratories in North America. Specialized programs collected and recorded all information about patient visits, test results, purchases, etc. But due to the multiplicity of data and lack of classification, it was impossible to use them for reports, forecasts, etc.
The aim of the project was to develop a scalable, cost-effective and flexible system for storing, processing and analyzing big data from multiple clinics.
Our development team managed to annotate unstructured data and bring them to a single structure for future use.
• Big data storage was implemented based on Apache HBase.
• For big data processing, at first our team has manually annotated a small amount of data (appr. 100,000 records). Then we created a Decision Tree (a list of specific questions for each object) to get correctly classified and structured data.
• We implemented model building using the h2o.ai framework for Machine Learning. We trained special built-in models on annotated samples. The models can then be applied to the rest amount of the data and extend the classification even to updated data.
• We have developed the solution based on Apache and Rabbit MQ to ensure that the new data is constantly coming from external clients.
• As a result, we have big data suitable for analyzing and forecasting. It is possible to predict, for instance, when a client will return to the clinic based on information about previous visits.
All in all, the client received an easily scalable highly system that can process large amounts of data from clinics all across the NA region effectively and at a reasonable price.
Need a technological solution?