2014-11-01
[Technology] Work, web automation, Firefox and Selenium
Posted by
Richard
um
17:23
Labels: #Technology, automation, firefox, open source, Selenium, testing, web, web development
Labels: #Technology, automation, firefox, open source, Selenium, testing, web, web development
The following doesn't have a point, it's just me rambling about a software testing tool.
There is a lot I'd like to write about, but for now I'll just put notes.
At work, an unusual situation let me spend a lot of time on Selenium. Selenium is an open source tool set that lets you automate activities in a web browser. It comes in two flavours; one is a browser extension, Selenium IDE, that lets you record your actions and then let's you replay them. The other is the Selenium Web Driver, which provides a plugin or a driver for your browser that you can communicate with from many programming languages, using supporting libraries.
Selenium IDE is nice because it's fairly accessible to non-programmers, though you definitely benefit from programming experience. You can record a set of actions and save them as a test case that you can re-use, or that you can integrate into a collection of test cases, forming a test suite. Whoa. Then you can run the entire suite and sit back and watch while the browser simulates your activity and see whether a test succeeds or fails. Brill. Selenium IDE has a provide of built-in commands, such as entering text into a field, clicking an element, comparing text on the page, and even has a concept of variables, letting you store text at one point to re-use later. One thing it lacks, though, is flow control; no looping or conditional branching by default. (Though a go-to implementation exists for it...)
Selenium Web Driver is a bit nicer if you're comfortable with programming, because you have the benefits of file I/O, conditional logic, looping, building re-usable functions, rich exception handling, etc. You can even built your own GUI around your Selenium test cases to cater to your specific needs.
batch administration through automation
We ended up actually using Selenium to automate a lot of administrative tasks for third-party web software that we didn't have control of the back-end, and whose interface was ... tedious. To that end, I got to use the Web Driver to build a few classes of re-usable functions that we trivialised a lot of repetitious tasks the administrators had to handle. (They once hired a co-op to manually go over 120 pages and make the same multi-step configuration change every time something new came up.) Using Selenium, I helped reduce the effort involved from someone taking hours of repetition, and risking human error, to 10 minutes of writing a <10 line script, starting it, and letting it run the background in a separate instance of Firefox. :) They thought it was magic. I thought it was ridiculous to have software that we have to configure in such a round-about way; seriously, having access to the back-end or database would obviate the need for software to simulate a human on the client-side. Ugh.
So that's interesting. Openness and control. Who owns software that a company uses. Ideally, if you're selling proprietary software, you're trying to provide an interface that is adequate for your users, to make tasks easy. Perhaps we just have unrealistic requirements. This is why I ultimately prefer working with and using open source software. No obstacles imposed by others. Any problem just requires my attention and my time (which is not bountiful these days).
accessing data on the web
There are a lot of websites I use on a daily basis that I would love to have more direct access to my data. Facebook, GMail, Google Calendar. And a lot of these websites offer APIs. They're not always convenient APIs. They're rarely standardised. I miss the days when Facebook allowed RSS feeds, for example. It is honestly sometimes easier to use a client-side automation tool like Selenium to achieve larger scale operations.
Some more examples are shopping sites, where they list product information in a free-form way, often incomplete. Sometimes, for electronics, they'll have a 'specifications' section, but the specifications will be in different formats for different devices. Why are they not interested in standardising/normalising their data so their customers can make better decisions? Perhaps it's because making it easier to compare would change consumer habits, making the best choice more trivial, so one supplier would receive a lot more customer attention than others, ruining some businesses. I do believe a lot of businesses survive on the basis of consumer ignorance; if people could only see how awful a product/deal they were getting on some things (e.g. horrible smartphones), I'm sure some suppliers would have to go out of business (as smaller ones especially might find it hard to compete with the price points of the largest distributors).
That said, I would still love to be able to quickly draw all the information on hard drives at FutureShop, Staples, Canada Computers, and NCIX into a single spreadsheet so I could easily compare the characteristics that matter most to me, and also filter for my esoteric hard drive size (7mm, 2.5 shorter than the industry average). Or see table schemas and do standard SQL queries on their data sets to obtain the information I want.
In lieu of those, I can use tools like wget and curl to automatically scrape websites for information, but sometimes the HTML you pull down does not reflect the page as you could see it to interact with. That's where web automation tools with a programming component (like Selenium Web Driver) can really help. They can see the live DOM and let you output what you find to files.
Sadly, this obviously requires a bit more effort than if web sites provided sane, easy access to their data. Ah well. Open Data for the future.
Performance
Regardless, a lot of what I like to use tools like Selenium and JMeter (from Apache) for is actual testing for quality assurance. Websites grow, and as they become more dynamic, it becomes increasingly difficult to verify their correctness and performance by hand. I have a private little web-app called My Daily which I use to track my daily routine. It has helped me rise out of a few slumps, by making me more accountable to myself. (I know I'm in dire straits when I start ignoring myself, though, and that's when I can make the most drastic changes.) However, for a while, it's been getting more sluggish, and sometimes as I've added features, I've unwittingly broken existing features.
I enjoy using Selenium and JMeter to measure performance. I can get average timing information. That's heavily influenced by external factors, yes, but it's a good idea of how things are going. I can also do things to help compensate for variable network performance. I can have a standard, simple page that I can access a variety of times to get a base measurement for performance for the entire network, and then consider other numbers against that. Is it taking me 9 seconds to load my 20-item task list for today because the whole network is slow or is the network in general still fast and I've introduced a slow-down somewhere? Even if it is the former, is there anything I can do with caching or batching information or reducing overall data size to help alleviate that problem?
I can also do things like measure the size of pages that are constructed, their complexity (e.g. how many nodes), so I can try to keep things simpler for computers with memory constraints (like mobile devices). KISS.
Correctioness, verification of verification
I also just enjoy verifying correctness. Selenium has VerifyText/AssertText-type commands built-in to its IDE and comparable features connected to testing frameworks for the Web Driver based on language. It's so important whenever you're letting something operate automatically to verify your context, especially before performing actions that meaningfully affect data. So you have a test that creates a few records, modifies them, then deletes them? Let's be certain that the record you're about to delete is indeed the test one you created.
Look before you leap. Metsuke before kiritsuke. Assert before you click. All good advice.
So, it's useful to try to create a test for each feature you invent (oh wow, what a time commitment comprehensive testing can be (and oh the perceived diminishing returns)), but it's hard to verify that tests work. That's why you make tests for your te- oh, oh my. Actually, I sometimes do. Sometimes I do negative testing. I'll intentionally test broken situations, and see what happens, but not just verifying that the system handles it correctly, but that some of the more important tests do. "This test better actually fail when I give it- oh nope, it just passes everything :(" This situation actually arose with an auto-marker I wrote for a course I was TA'ing the other year. As I was reviewing the marks, I noticed it strange that everyone passed a certain test. The class was not happy to see their collective grade drop. :D
Reusability
One of the lovely things about Selenium Web Driver (and the IDE) is of course the ability to re-use parts. The IDE isn't so great, because I tend to need to copy the test into multiple suites. The Web Driver is a bit nicer as I have a library of common parts that I recycle. I try to treat testing code not just as scaffolding or something quick-and-dirty at the end, but as a complete software package in itself. (While trying to avoid Testception.) Frameworks like JUnit can help with that, too.
Knowledge Requirements
It's a bit of a downer that rich web development tends to involve so many different languages each with a different style. Web development is one of the most desirable areas of development for a lot of common folk, but in some ways less accessible. Do you understand your CSS, your JavaScript, your (X)HTML, and how they interact? How about we throw in a server-side language, its standard libraries and extra libraries for this and that. Do you understand your server configuration? Are you secure (oh dear Lord). Speaking of security, Andy Wingo has a nice blog post recently on trying to setup HTTPS access for your website and how it's a gigantic mess.
Selenium IDE is nice in that you can record and play back with minimal knowledge. That can be helpful to people who are building their website with a tool like Wordpress for example. It would be nice if the common-folk could rely on simple, straightforward interfaces to accomplish this. Sadly, even in Selenium IDE, it helps to understand XPath + DOM + CSS, at the very least for target disambiguation. This ties in with my earlier topic of verification, actually. It's not hard to write a test that ends up identifying an element by a numeric offset. "Hmm, this has the id 'box9', lets rely on that!" when in reality sometimes your Tweet Box is not just box9, but sometimes it's box8! (Yeah, it sure would pay to have given it a more descriptive name than "box9", but let's pretend the page elements are dynamically generated and you might have a variable # of Tweet Boxes because your page shows as many as are trending on the topics of various cheeses, one for each trending cheese. (Go Daiya!)
If you don't understand the target in use, then it's hard to understand the potential consequences of what Selenium has (somewhat intelligently) picked for you.
Closing Comments
This is just me rambling with a head full of thoughts after using Selenium for a few months. There are a lot more technical considerations I've encountered and dwelled on, and I am sure I am not using it completely optimally (though I do find its documentation straightforward and helpful). I just like to write unimportant rambling sometimes.
I don't have a point.
Abonnieren
Kommentare zum Post (Atom)
Dieses Blog durchsuchen
Labels
#Technology
#GNOME
gnome
gxml
fedora
bugs
linux
vala
google
#General
firefox
security
gsoc
GUADEC
android
bug
xml
fedora 18
javascript
libxml2
programming
web
blogger
encryption
fedora 17
gdom
git
emacs
libgdata
memory
mozilla
open source
serialisation
upgrade
web development
API
Spain
containers
design
evolution
fedora 16
fedora 20
fedora 22
fedup
file systems
friends
future
glib
gnome shell
internet
luks
music
performance
phone
photos
php
podman
preupgrade
tablet
testing
typescript
yum
#Microblog
Network Manager
adb
apache
art
automation
bash
brno
catastrophe
css
data loss
debian
debugging
deja-dup
disaster
docker
emusic
errors
ext4
facebook
fedora 19
gee
gir
gitlab
gitorious
gmail
gobject
google talk
google+
gtk
html
libxml
mail
microsoft
mtp
mysql
namespaces
nautilus
nextcloud
owncloud
picasaweb
pitivi
ptp
python
raspberry pi
resizing
rpm
school
selinux
signal
sms
speech dispatcher
systemd
technology
texting
time management
uoguelph
usability
video
web design
youtube
#Tech
Air Canada
C
Electron
Element
Empathy
Europe
GError
GNOME 3
GNOME Files
Go
Google Play Music
Grimes
IRC
Mac OS X
Mario Kart
Memento
Nintendo
Nintendo Switch
PEAP
Selenium
Splatoon
UI
VPN
Xiki
accessibility
advertising
ai
albums
anaconda
anonymity
apple
ask
asus eee top
automake
autonomous automobiles
b43
backup
battery
berlin
bit rot
broadcom
browsers
browsing
canada
canadian english
cars
chrome
clarity
comments
communication
compiler
complaints
computer
computers
configuration
console
constructive criticism
cron
cropping
customisation
dataloss
dconf
debug symbols
design patterns
desktop summit
development
discoverability
distribution
diy
dnf
documentation
drm
duplicity
e-mail
efficiency
email
english
environment
estate
experimenting
ext3
fedora 11
festival
file formats
firejail
flac
flatpak
forgottotagit
freedom
friendship
fuse
galaxy nexus
galton
gay rights
gdb
german
germany
gimp
gio
gjs
gnome software
gnome-control-center
google assistant
google calendar
google chrome
google hangouts
google reader
gqe
graphviz
growth
gtest
gtg
gvfs
gvfs metadata
hard drive
hard drives
hardware
help
hp
humour
ide
identity
instagram
installation
instant messaging
integration
intel
interactivity
introspection
jabber
java
java 13
jobs
kernel
keyboard
language
language servers
languages
law
learning
lenovo
letsencrypt
libreoffice
librpm
life
livecd
liveusb
login
lsp
macbook
maintainership
mariadb
mario
matrix
memory leaks
messaging
mounting
mouse
netflix
new zealand
node
nodelist
numix
obama
oci
ogg
oggenc
oh the humanity
open
open standards
openoffice
optimisation
org-mode
organisation
package management
packagekit
paint shedding
parallelism
pdo
perl
pipelight
privacy
productivity
progress
progressive web apps
pumpkin
pwa
pyright
quality
recursion
redhat
refactoring
repairs
report
rhythmbox
rust
sandboxes
scheduling
screenshots
self-navigating car
shell
sleep
smartphones
software
software engineering
speed
sql
ssd
synergy
tabs
test
tests
themes
thesis
tracker
travel
triumf
turtles
tv
tweak
twist
typing
university
update
usb
user experience
valadoc
video editing
volunteering
vpnc
waf
warm
wayland
weather
web apps
website
wifi
wiki
wireless
wishes
work
xinput
xmpp
xorg
xpath
Powered by Blogger.
Keine Kommentare:
Kommentar veröffentlichen