Classification on Bank Marketing dataset

The Bank Marketing dataset was used in Wisaeng, K. (2013). A comparison of different classification techniques for bank direct marketing. International Journal of Soft Computing and Engineering (IJSCE), 3(4), 116-119.

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y). The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (‘yes’) or not (‘no’) subscribed.

There are four datasets:
1) bank-additional-full.csv with all examples (41188) and 20 inputs, ordered by date (from May 2008 to November 2010), very close to the data analyzed in [Moro et al., 2014]
2) bank-additional.csv with 10% of the examples (4119), randomly selected from 1), and 20 inputs.
3) bank-full.csv with all examples and 17 inputs, ordered by date (older version of this dataset with less inputs).
4) bank.csv with 10% of the examples and 17 inputs, randomly selected from 3 (older version of this dataset with less inputs).
The smallest datasets are provided to test more computationally demanding machine learning algorithms (e.g., SVM).

The classification goal is to predict if the client will subscribe (yes/no) a term deposit (variable y).

The columns in this dataset are:

  • age
  • job
  • marital
  • education
  • default
  • housing
  • loan
  • contact
  • month
  • day_of_week
  • duration
  • campaign
  • pdays
  • previous
  • poutcome
  • emp.var.rate
  • cons.price.idx
  • cons.conf.idx
  • euribor3m
  • nr.employed

 

Data Mining Syllabus – PyMathCamp

Demand for Data science talent is exploding. McKinsey estimates that by 2018, a 500,000 strong workforce of data scientists will be needed in US alone. The resulting talent gap must be filled by a new generation of data scientists. The term data scientist is quite ambiguous. The Center for Data Science at New York University describe data science as,

the study of the generalizable extraction of knowledge from data [using] mathematics, machine learning, artificial intelligence, statistics, databases and optimization, along with a deep understanding of the craft of problem formulation to engineer effective solutions

Data science.

Data science.

As you can see, a data scientist is a professional with a multidisciplinary profile. Optimizing the value of data is dependent on the skills of the data scientists who process the data.

Intellij.my is offering these essentials with PyMathCamp. This course is your stepping stone to become a data scientist. Key concepts in data acquisition, preparation, exploration and visualization along with examples on how to build interactive data science solutions are presented using Ipython notebooks.
You will learn to write Python code and apply data science techniques to many field of interest, for example in finance, robotic, marketing, gaming, computer vision, speech recognition and many more. By the end of this course, you will know how to build machine learning models and derive insights from data science.

The course is organized into 11 chapters. The major components of PyMathCamp are:

1) Data management (extract, transform, load, storing, cleaning and transformation)

We begin with studying data warehousing and OLAP, data cubes technology and multidimensional databases. (Chapter 2, 3 and 4)

2) Data Mining (machine learning technology, math and statistics)

Descriptive statistics are applied for data exploration. Mining Frequent Patterns, Association and Correlations. We will also learn more on the different types of machine learning methodology through python programming. (Chapter 5)

3) Data Analysis/Prescription (classification, regression, clustering, visualization)

At this stage, we are ready to dive into data modelling with different types of machine learning methods. PyMathcamp includes many different machine learning techniques to analyse and mine data, including linear regression, logistic regression, support vector machines, ensembling and clustering among numerous others. Model construction and validation are studied. This rigorous data modelling process is further enhanced with graphical visualisation. The end result will lead to insight for intelligent decision making. (Chapter 6 and 7)

Source: Pethuru (2014)

Source: Pethuru (2014)

Encapsulating data science intelligence and investing in modelling is vital for any organization to be successful.

Hence, we will use our data mining knowledge gained from the above chapters to analyse, extract and mine different types of data for value. Or more specifically spatial and spatiotemporal data, object, multimedia, text, time series and web data. (Chapter 8, 9 and 10)

After spending a few months learning and programming with PyMathCamp, we will end the course by updating you with the latest applications and trends of data mining. (Chapter 11)

In conclusion, PyMathCamp is the perfect course for student who might not have the rigorous technical and programming background required to do data science on their own.

Credit to: Joe Choong

“Future belongs to those who figure out how to collect and use data successfully.” 

Muhammad Nurdin, CEO of IntelliJ.

button

Dimensionality Reduction on Github Event using PCA approach

This case study using Github Event dataset focus on Malaysia’s developers.

This is a read-only API to the GitHub events. These events power the various activity streams on the site.

github star wars

github star wars

The columns in this dataset are:

  1. a_login
  2. e_CommitCommentEvent
  3. e_CreateEvent
  4. e_DeleteEvent
  5. e_DeploymentEvent
  6. e_DeploymentStatusEvent
  7. e_DownloadEvent
  8. e_FollowEvent
  9. e_ForkEvent
  10. e_ForkApplyEvent
  11. e_GistEvent
  12. e_GollumEvent
  13. e_IssueCommentEvent
  14. e_IssuesEvent
  15. e_MemberEvent
  16. e_MembershipEvent
  17. e_PageBuildEvent
  18. e_PublicEvent
  19. e_PullRequestEvent
  20. e_PullRequestReviewCommentEvent
  21. e_PushEvent
  22. e_ReleaseEvent
  23. e_RepositoryEvent
  24. e_StatusEvent
  25. e_TeamAddEvent
  26. e_WatchEvent

Sample Github Event data.

sample Github Event data

sample Github Event data

Lower dimension representation of our data frame.

lower dimension representation of our data frame

lower dimension representation of our data frame

Explained variance ratio.

explained variance ratio

explained variance ratio

Plot on the data frame.

plot on the data frame

plot on the data frame

Re-scaled mean per a_login across all the events.

re-scaled mean per a_login across all the events

re-scaled mean per a_login across all the events

Bubble plot chart (a_login mean).

bubble plot chart (a_login mean)

bubble plot chart (a_login mean)

Bubble plot chart (a_login sum).

bubble plot chart (a_login sum)

bubble plot chart (a_login sum)

PyMathCamp aims to produce modern innovator through data science & mathematics

Innovative thinking and necessary skills set are critically crucial to solve real world problems. Approaching the future, problem will be getting more complex. Malaysia is in dire need of modern innovator to develop state-of-the-art solutions to solve them. And to develop solution, with just innovative thinking is not enough.

With lack of data science and mathematics talent, Malaysia is going to have tough time to have intellectual local resources to solve local problems.

Yes, it is true that Malaysia can outsource talents to foreign expertise but it is not right to be too dependent on them all the time. Even the dependency, the supply is still insufficient. Technology transfer can be very expensive and second, foreign workers shall be taking time to adapt with local structure before developing suitable solution. The more time taken, the more money out.

Malaysia is lacking of innovators.

study data scientist Malaysia

“Malaysia may not have enough engineers, architects, and other professionals, to achieve Vision 2020 based on the low level of interest by our students in science, technology, engineering, and mathematics (STEM). If the situation goes on, Malaysia may have to depend on foreign workers to attain developed status, warn expert.” Star Sunday.

Wawasan 2020 is getting nearer yet we are still incapable to show that we can ‘supply’ the vision.

Here we are, want to provide highly-impact education which focus on data science and mathematics, to ALL Malaysian for FREE so that, whole nation can change million of lives to be better.

Introducing to you, PyMathCamp.

PyMathCamp will be an online learning platform to teach data science and mathematics that make use of programming languages such as Python, C++ or R in preparation to produce future actionable Malaysian innovator to solve problems.

The online learning platform shall help them to learn how to code and further career in science, technology, engineering and mathematics (STEM). How?

How subjects of data science and mathematics can invent innovator?

Data science and mathematics are not “subjects in the class, stay in the class”. They are basic necessities to all kind of businesses; health, agriculture, finance, social sciences, maritime sciences, planetary sciences, meteorology, geography, and many more. You name it. STEM is WIDE. 

Data science in a simple word is a study of how to gather interesting data. And the interestingness of data shall depend on the searcher or data looker. Data is one oceanic word. However he/she may want to look for a matter that he/she is desired into, he/she must learn the science of pulling it from the ocean (of data), clean it, groom it and present it informatively.

Mathematics, on the other hand, is what makes life measurable to the basic thing like genomic. Mathematics demands wisdom, judgment and maturityWe can make error to find solution, we can alter our methods or start all over. When it comes to life, reality mostly doesn’t allow us to redo anything most of the time, but when it comes to ‘measurable condition’, we are allowed to attempt to change things.

By defining their importance in state-of-the-art programming, we shall have idea how both subjects are keys to economic prosperity. Without above talents, we will have difficulties to obtain interesting parameters. To obtain, data science and mathematics must be learnt.

Modern students of PyMathCamp should expect the following:

Student shall be able to create emphatic solutions. They shall be able to build advanced innovation through data science and mathematics and deliver curing values to others.

A variety of topics such as data exploration, visualization, feature engineering, predictive analytics, predictive modeling, clustering, big data pipelines, metrics and many more should be expected.

All trainers and mentors are experts, highly trained and well-experienced Malaysians. They are specialized in data science, computer vision, big data, machine learning, artificial intelligence and etc.

Students are also expected to find own solutions by leveraging our programming community portal and discussion group (chit chat). For open source development, PyMathCamp will be integrated with Github. 

We have evidential method to improve every of users’ learning curve to the finish line.

Note that PyMathCamp will only be committed to specific fields that are data science and mathematics.

There will be no age limit.

PyMathCamp will be focusing on Python, C++ or R because it’s beginner-friendly (easy to use and understand), math supported and mother tongue of Artificial Intelligence. Truly high in-demand skills set for sure.

And it is free. Yup. No charges.

Carpe diem.

Seize the day.

We want to build smart society to build smart structures.

We want to produce intelligent society. Malaysia needs smart society to help nation grow each other better to achieve Wawasan 2020 and further ages.

Other than fulfilling job vacancy, we aim that students shall be able to invent advanced solution and create intelligent startups to solve all society’s problems. This is our deepest aim actually. We want students to be modern innovator.

In simple word, PyMathCamp is really preparing Malaysians for the amazing (automated) future.

Join PyMathCamp.

IntelliJ is a deeply value-oriented company.

We want to educate and bring Malaysian mind to advanced level, starting from small, FOR FREE, which is the essence to change Malaysia into economically, a prosperous place.

We want to produce marketable Malaysians, in this self-serving economy, with highly-impact education as the first defense.

We pray that every mission of ours enrich all lives.

“Future is belongs to those who figure out how to collect and use data successfully.” 

Muhammad Nurdin, CEO of IntelliJ.

button

EagleEye – Malaysia’s first drone company aims to alert you real-time if your neighborhood is under conflicts.

Human transplanted kidney costs MYR 5,000 to MYR 9,000. This is the price tag in recorded 2012, four years back. And about 60,000 transplants are taking place worldwide each year, where 1 in 10 were done illegally. 

Scary enough?

Daily Mail reported that:

“Wealthy patients are paying up to £128,500 (roughly $191,028 today) for a kidney to gangs, often in China, India and Pakistan, who harvest the organs from desperate people for as little as £3,200 (roughly $4,756 today)”

For your information, organ trading and trafficking has been a “very profitable and impatient industry” since The New Millennium. And it is still ongoing, staggering US$1 billion a year! That number is only  in China alone. p:s I wonder if they are hiring.

Organ harvesting drama.

 

Enough about kidneys, how about other crimes such as human kidnapping, robberies, rapes and murders?

Crimes are taking places everywhere and citizen’s security is still a gamble. We do not have military level security, to watch over huge perimeter, taking care of everyone’s safety. 

The Solution – EagleEye

an intelligent drone that fly for only one reason; to save lives.

 

Cypher-UAV

Cypher-UAV

 

#1 How?

EagleEye patrols your neighborhood by flying autonomously day and night, 24/7. Using high definition camera, EagleEye detects suspicious activity, analysing them using artificial intelligence.

EagleEye captures visual evidences like video and image, in real-time using Computer Vision technology called as OpenCV.

Whenever EagleEye detects suspicious activity, it will instantly triggers alert to authorities like police and security agencies for instant action. Snapshots are taken for human verification, making sure that crimes are managed at almost instant.

#2 EagleEye is intelligent enough to analyse crime event

 

Cypher

Cypher

We implanted artificial intelligence so that it can learn crime and differentiate between serious and non-serious activities. Well, we do not want to send officer just because two guys just give each other’s a paw.

The level of intelligence does not only detect occurred events but uses captured data to predict upcoming crime activity. Like, busy but wide roads are highly potential for theft involving bikes.

EagleEye is also ready to provide feeds 100% accurate to the involved parties.

#3 EagleEye craves for top safety

 

Imagine a security guard “flying” 10 metres away from your house and he never need get drowsy, never need lunch break or even take 5. He keeps roaming around until he needs “sleep”. While he’s off to sleep, another security guards come and continue scouting. Another words saying is, its a 24/7, no pause, security.

Nahhh.. I have CCTVs?

Good for you, sir. However, CCTV is static. 

Meanwhile, EagleEye flies autonomously, detects suspicious activity intelligently, alerts the involved authorities, and at the same time, helps police to execute safety and security well planned and throughout.

#4 Shut up and take my money!!

Hang on there buddy, EagleEye is waiting for a new regulation on the use of drones in Malaysia which is anticipated that Department of Civil Aviation (DCA) will renew Act of Aeronautical Information Circular (AIC) 4/2008 this year 2016.

Lets pray together it is executed ASAP.

Conclusion

Drones is very soon to be reliable usage in daily life.

We are not here to take anyone’s job. But drones will be the ultimate solution, to ensure highest level of security around us. The implementation will be in stages, thus we still need people to cooperate with.

And EagleEye will never be dictator’s army. It has the communication ability to hear what you want it to hear. For instance, you saw a guy brutally attacked at a corner. You can tell the drone to go and check it out, instead of risking your safety. Drones will take care of it with its procedures.

Some words from the IntelliJian – “we are too excited to get this project on field. We cannot wait to help more people with our AI solutions”.

Notice

To achieve better neighborhood’s security, we are inviting skillful engineers and innovative designers to work with us, so that all Malaysian citizens can live happier in a longer period of time. Please do contact us.

Many thanks.

Images credited to KONAMI: https://us.konami.com/mgs/

Machine Learning for startup development

Machine Learning is not a new thing in computer knowledge landscape. The part of Artificial Intelligence concept is growing popularly nowadays along with the growing of awareness of several parties for managing digital data and system automation for replacing manual part done by human.

The implementation and the usage of Machine Learning has been felt, yet we do not realize that. In a simple language, Machine Learning technique as a computer algorithm to learn data to recognize patterns and to make model based on historical data. That model is used for classifying or predicting new data that facilitate us for making or supporting a decision taking process.

The Analogy of  Machine Learning Concept

If we want to hold an event that involves many startup for doing presentation about the product and its potential in Malaysian market. The committee has successfully collected 5 startups, those are: GrabCar, Traveloka, Sallyfashion, Tiket and Foodpanda.

In order to make the event runs smoothly, the committee decided to separate the presentation session based on startup category. For example your team is not in the office because they have to do field work. The team event should identify certain startup based on the defining category independently. Since your team event has done the same event many times and has met with various startup the team event has several things to judge for designing startup based on its category.

Startup Name


The food category product startup will usually contain words related to food or restaurant. It works the same for travel category product startup will usually contain things related to travel. And the transportation category product startup will usually relate to transportation. Fashion category startup product will usually contain words related to fashion and clothes.

Startup Logo

The food category product startup will use the logo about food equipment attribute. Travel category product startup will present about travel. Transportation category product startup will have a logo related to street. Fashion category product startup will have a logo related to fashion and clothes.

Based on those two details, the team event can classify 5 startups based on product cathegory:

Based on startup name

Food product category: Foodpanda (there is a food word). Travel product category: Traveloka (there is a travel word), Tiket (there is a tiket word). Transportation product category: GrabCar (there is a word car). Fashion product category: sallyfashion (there is fashion).

Based on Startup Logo

Its definition is about the same with the name of the first startup point.
In technical Machine Learning term, name and startup logo are part of features and several of each feature are called a frequency distribution. That is the learning process of a machine.

Sometimes in certain feature, an object does not have suitable specification, for example in the above example, the sallyfashion logo is in the form of word. However, it can be clearly identified in the previous feature that it is a startup. But it can be tricked by giving more features and detailed frequency distribution. That is how the Machine leraning algorithm is arranged so that a computer machine can learn.

The solution development for startup tech on Machine Learning basis

Before talking in detail about scope that can be done by a tech-startup with Machine Learning concept, there are technology implemented challenge:

There are many challenges against technology implementation of Machine Learning in Malaysia. Those are the low payment of workforce so that it makes it difficult to make budget efficiency argument, the low understanding of the usage of technology until the fear of irreplaceable workforce.

However, the existence of nowadays Machine Learning is excellent and it makes it possible to make a system that learns by itself about a set of complex data and large scale with minimum human intervension. The implementation is considered successful if the automatization process if the result of automatization process is able to get close to human job quality with reachable price. This kind of usage is believed can be a contradiction from the above issue.

I am optimistic that Machine Learning technology implementation in Indonesia will be used by many companies that specialized themselves in technology development.

An important simple law becomes a reference even though a technology concept seems complicated. But the implementation of technology is not always for big and complicated problems.

The simplest implementation of Machine Learning, is the identification of spam/junk. The techniques that is used is learning the given data that has been labeled (spam or not) by extracting the features which is later used as an input parameter from algorithm that is used for classification. For automatization, a model is made for showing the learning result and also the algorithm that is used. That model is now later used for classifying or predicting new data.

Another example that is commonly found in online public sector is content recommendation. It starts from article recommendation that is related to the articles that are being read in an online media sites, another product that is related to the product that is seen in another commercial site, until the video that is related to the video that is being watched in online watching sites.

Machine Learning has already been used for specific industry in Malaysia that is not directly related to public, for example identifying attack pattern (from hacker, rootkit, virus, and etc) that is aimed to a certain network and it automatically doing the blocking, doing an automatic bide advertisement (autobid), identifying the users character based on their daring activity, predicting the events or figures who are predicted will be the news centered until making automatic content texts.

The developing area for tech-startup for Machine Learning technology is for the products mentioned above. It is because the budget can be paid based on its necessity (on-demand) because the easiness of cloud computing technology. Even it is for back-end technology Machine Learning. Several provider cloud computing have prepared them for ready to use.

Market demand for technology solution based on Machine Learning platform

Author sees that nowadays Malaysia needs Machine Learning development technology experts. This might be connected to industrial focus which is more focus on marketing aspects rather than technology investment. As a result, the industry chooses to be more focus on choosing a ready to use implementation ( by using PaaS and SaaS) that have been developed by overseas parties. Along with the awareness of efficiency that can be accumulated by more custom technology, it is sured that this kind of technology will be prioritized.

So this is a good chance for quickly preparing ourselves to learn. Ikhwan recommends several learning references that can be used, those are:
Introduction to Artificial Intelligence” by Sebastian Thrun and Peter Norvig. Sebastian Thrun is known as self-driving car maker in Google and Peter Norvig is one of the artificial technology pioneers who is now becomes Director of Research in Google.“Machine Learning” by Andrew Ng, who is one of the Stanford professors who is later becomes Chief Scientist in Baidu Research.“Neural Networks for Machine Learning” by Geoffrey Hinton, who is known because his research about neural network. He works as a Distinguished Researcher for Google and also Distinguished Emeritus Professor in University of Toronto.

Besides those three members mentioned above, there are many sources that can be used and can be found by using search engine. This automatization system will become a new way of service, including in Malaysia as its benefit can be enjoyed by many.