User data collecting system for online shop

The Python-based solution aimed at user data collection system automation and user analytics enhancement.

Customer

Industry
eCommerce
Region
USA
Client since
2019

Our client is a major USA online shop. The main range of the store is diversified clothing of various brands for men, women, and children.

Detailed information about the client cannot be disclosed under the provisions of the NDA.

Challenge

We have been working with an online platform that provides the opportunity to purchase clothes and accessories from various brands.

Our client brought us an actively developing recommendation system architecture for analytics and collecting data on user activity.

Solution

Our goal was to create a data collection and processing system for providing both shoppers with recommendations on the relevant goods and the client with more pertinent information on the shopper's activity in one place.

Data user analytics
The platform was not developed from scratch; we modified it within the scope of the client’s tasks. The solution is built on cloud technologies, a modern development approach that allows saving on DevOps, as cloud services offer everything needed within a cloud.The data is collected based on what customers buy or add to the cart, their clicks, mouse moves, etc. Further, the system creates models which will offer shoppers potentially desired goods. We have been responsible for arranging accurate data collection.

OPTIMIZATION OF QUERIES FOR UPLOADING METRICS INTO SNOWFLAKE

We were provided with a huge file consisting of multiple rows (a couple of thousand) with different SQL queries. The client collected data from different tables and calculated various metrics. There were a lot of repetitive parts of the code, and we needed to create a query generator that, based on several code templates, changed the query input values and executed them instead of executing the same multiple queries. This made it possible to create a convenient, flexible, and scalable tool for quickly and dynamically adding queries to calculate new metrics.
Data user analytics

DATA MANAGEMENT AUTOMATION

AWS is a cloud platform by Amazon to enable app providers, ISVs, and vendors to quickly and securely host their solutions – whether it’s an existing app or a new SaaS-based app. AWS Systems Manager Parameter Store ensures a secure repository for managing configuration data and passwords. Our task was to automate adding new or changing outdated configurations or sensitive passwords or data so that a user doesn’t have to do it manually through the graphical interface.

AIRFLOW SETTING UP

In Airflow, workflows are designed and expressed as DAGs, where each DAG step is defined as a specific task. It is designed with the knowledge that all data extraction, transformation, loading, and manipulation processes are best expressed as code, and as such, it is a code-based platform that allows iterating workflows quickly and efficiently. As Airflow is highly-effective in organizing and scheduling data pipeline workflows, we use it to set up the pre-scheduled events. DAG can run hourly or, for example, every 3 hours 30 minutes, and so on. If all the tasks in the DAG were completed successfully, then DAG is considered to be successfully completed. It’s convenient because DAGs run at all times with no manual actions needed.  
User data collecting system for online shop

DATABRICKS MANAGEMENT

We created new jobs that read data from the client’s S3 bucket, performed some processing, and uploaded data directly to us in DynamoDB. These tasks were added as part of the Airflow DAGs to automate this process.

CI/CD IMPLEMENTATION

While working on the project, we set up CI/CD, one of the DevOps practices that allows developers to deploy software changes more often and more reliably, minimize errors, increase development speed, and improve the quality of the final product. We enabled it between GitHub and Databricks. Thus, when something has changed in GitHub, it is automatically displayed in our Databricks. And as a result, the client obtains the solution of a higher quality with a minimum number of bags.

Technologies & tools

Main programming languages
Python, Scala, Java, SQL
Data analysis
Scala, Python, Tableau
Cloud services
AWS (EC2, MWAA, Lambda, S3, SSM, CloudWatch, IAM, CloudFormation, CodeBuild, EMR), DataDog
Databases
Snowflake, Databricks, Kafka, DynamoDB
Frameworks
Hadoop, Spark

Process

Taking into account all the requirements of the client and the specifics of the project, we proposed Scrum as a software development life cycle methodology using Jira and Confluence. As for the communication tool, the customer suggested using Microsoft Teams.

Based on our rich experience in developing various web applications and data management systems, our team proposed the most suitable technology stack.

Throughout the project, we hold daily and weekly meetings, technical reviews, sprint reviews, retro, planning, and constant one-on-one meetings with the team lead on any questions or concerns.

Thanks to the well-planned workflow and timely and transparent communication processes, we are able to deliver results faster and more efficiently.

Team

4
Data Engineers
6
Data Analytics
1
Project Manager
1
Product Manager
1
QA Engineer

Results

After completing the project’s active phase, which refers to updating the data analytics and recommendation system, the online shopping platform has gained better performance, stability, and usability, thereby increasing its marketing opportunities and higher sales.

The project’s team was acknowledged as professionals for their extensive technical background and high communication skills. As we managed to successfully arrange the cooperation with the client in the active phase of the project, our IT experts kept cooperating with the client, providing long-term support for the solution.

Project duration
  • Since 2022
  • The project is still ongoing, at this stage we support the platform and implementing new functionality

Need a technological solution? Contact us!

Select the subject of your inquiry

Please be informed that when you click the Send button Innowise Group will process your personal data in accordance with our Privacy Policy for the purpose of providing you with appropriate information.

What happens next?

1

Having received and processed your request, we will get back to you shortly to detail your project needs and sign an NDA to ensure the confidentiality of information.

2

After examining requirements, our analysts and developers devise a project proposal with the scope of works, team size, time, and cost estimates.

3

We arrange a meeting with you to discuss the offer and come to an agreement.

4

We sign a contract and start working on your project as quickly as possible.